Dec 29, 2022

Directed Graph Cycle Detection, Part I: Depth-First Search Algorithm

Given a directed graph, how can we determine whether there exists a cycle in the graph? There are several methods that can be used to answer this question. In this post, we are going to explore a depth-first-search approach to solving such a problem.

Icebreaker: can you spot where the cycle is found in the above graph?

Background

One scenario for cycle detection in a directed graph is a course schedule sequence with prerequisites. This scenario is presented in LeetCode's Course Schedule II problem, which consists in determining whether it is possible to finish all courses in a course schedule with prerequisites and, if so, return a valid sequence for the courses. Let's illustrate the problem with an example: say we have a course schedule with four courses: A, B, C, and D, where A is prerequisite of B and C, and C and B are prerequisites of D. We can represent the course sequence and the prerequisites with the following directed graph:

As seen in the graph, all courses can be completed, starting from course A (the valid sequences are [A, C, B, D] and [A, B, C, D]). Now let's add one more prerequisite: D is prerequisite of A (that is, to take course A you need to complete course D first). The graph will now look as follows:

If we start at A, running the path through either C or B will takes us back to A, so there is no way of completing all courses because there is a cycle where course D requires course A (indirectly), and course A requires course D (directly).

In the above graph, it is easy to see that there is at least one cycle; however, when we have a graph with hundreds of vertices, it would be hard to determine whether there is a cycle visually. So we are going to use a depth-first search (dfs) approach for cycle detection in directed graphs. Let's explain the approach using the first graph above as an example:

We are going to mark all the vertices of the graph (A, B, C, and D) as unvisited initially. Then we are going to iterate over the source vertices one at a time using a dfs algorithm. At the start of the dfs, we are going to mark the corresponding vertex as visiting, and when the dfs for the given vertex is over, that is, when the vertex's neighbors have been processed or the vertex does not have neighbors, we are going to mark the vertex as done. If, while the dfs algorithm is running for a given vertex, there exists a path through the vertex's neighbor(s) that leads back to the vertex, then the graph contains a cycle.

Let's now apply the previous approach to the first graph:
All vertices are initially marked as unvisited. Starting a dfs algorithm at vertex A, vertex A is marked as visiting; then dfs for vertex C runs, and it is marked as visiting; then dfs for vertex D runs, and D is marked as visiting; since D does not have neighbors, dfs for D concludes, and it is marked as done; since C does not have more neighbors, dfs for C concludes, and it's status is changed from visiting to done; now dfs for the second neighbor of A, namely, B, runs, and B is marked as visiting; since the only neighbor of B, namely, D, is done, dfs for B concludes and it is marked as done; finally, A is marked as done. If we were to collect the courses at the end of the dfs calls we would get the sequence [D, C, B, A]. Note that this sequence is in reverse order from the true order when taking the courses (this fact will be later used in the coded solution).

If we carry out a similar analysis for the second graph above, we get:

A visiting -> then B visiting -> then D visiting -> then A visiting

Notice that we were still processing (visiting) A, and we encountered A again, which means the graph contains a cycle closing at vertex A.

For the code representation of the above approach, we are going to use codes to represent the states of a vertex: 0 for unvisited, 1 for visiting, and 2 for done. Also, the givens for the problem will be the total number of courses (numCourses) starting at course 0 until numCourses - 1, and an array of prerequisites, where prerequisites[i] = [x,y], where y is a prerequisite of x, or y leads to x (y -> x).

Solution: Cycle Detection plus Ordering

Before writing the solution, let's note that, given n courses with no prerequisites, means that the courses can be taken in any order, and the answer will be an array containing all the courses. When provided with prerequisites, we'll use the prerequisites to build a directed graph and implement a dfs algorithm on the graph to find whether there is a cycle in the graph. This means that given n courses, there can be less prerequisites than courses, but we need to return all the courses with the prerequisites included in a valid order (sequence).

Let's dive into the code:

const UNVISITED = 0;
const VISITING = 1;
const DONE = 2;

function findOrder(numCourses, prerequisites) {
  let graph = new Map();
  let order = [];
  let cycle = false;
  const status = new Array(numCourses).fill(UNVISITED);
 
  for(let [dest, source] of prerequisites) {
    graph.has(source) 
      ? graph.get(source).push(dest) 
      : graph.set(source, [dest]);
  }

  for(let i = 0; i < numCourses; i++) {
    if(status[i] === UNVISITED) {
      dfs(i);
    }
  }

  let final = [];
  if(!cycle) {
    for(let i=0; i < numCourses; i++){     
      final[i] = order[numCourses - i - 1];
    }
  }

  return final;

  function dfs(course) {
    if(cycle) return;
    status[course] = VISITING;
    for(let neighbor of (graph.get(course) || [])) {
      if(status[neighbor] === UNVISITED) {
        dfs(neighbor);
      } else if(status[neighbor] === VISITING) {
        cycle = true;
      }
    }
    status[course] = DONE;
    order.push(course);
  }
}

On line 9 we initialize a status array containing all courses (the array's indices represent the courses), each with a status of 0 (or unvisited) initially. We then build a graph using the prerequisites array (line 11). Since we need to return all courses (if no cycle found), we iterate over all the courses (line 17), and run a dfs algorithm for unvisited courses (line 19). The dfs algorithm immediately pushes the passed-in course (after changing the course status) to the order array if the course does not belong to the graph or if the course belongs to the graph and does not have neighbors (line 43). Otherwise, the course's neighbors are processed (line 35). If a course is in visiting state and a neighbor's path leads back to such course, a cycle has been found.

If no cycle is found after iterating over all the courses, we iterate over the order array (line 25) to collect the courses in the true order since the dfs algorithm collects the courses in reverse order (this was discussed in background section). Otherwise if a cycle is found, we return an empty array.

Solution: Cycle Detection Only

If we did not care about the order of the courses, and we only wanted to find whether the prerequisites array contains a cycle (LeetCode's Course Schedule problem), we can then use the following approach to solve:

Build a graph out of the prerequisites array, and run a dfs algorithm on every source vertex (node) of the graph. We'll initialize two sets, visited and exploring, to keep track of the courses fully processed, and the courses being processed, respectively. The dfs algorithm returns a boolean: true if a cycle is found, and false otherwise. At the start of the dfs call, we'll check whether the current course is already being explored from a previous dfs call, if so, we'll return true, as a cycle is closed at that course; otherwise, we'll check whether the current course has already been visited; if so, we'll return false to exit the dfs call and continue exploring with the next course; if the current course has not been visited, we'll start exploring the course and its neighbors. When done exploring, we'll remove the course from the exploring set, then add it to the visited set, and return false as no cycle was found during the dfs call.

Let's write the code implementation:

function canFinish(numCourses, prerequisites) {
  let graph = new Map();
  let visited = new Set();
  let exploring = new Set();
 
  for(let [dest, src] of prerequisites) {
    graph.has(src) 
      ? graph.get(src).push(dest) 
      : graph.set(src, [dest]);
  }

  for(let course of [...graph.keys()]) {
    if(dfs(course)) return false;
  }

  return true;

  function dfs(crs) {
    if(exploring.has(crs)) return true;
    if(visited.has(crs)) return false;
    exploring.add(crs);
    for(let neighbor of (graph.get(crs) || [])) {
      if(dfs(neighbor)) return true;
    }
    exploring.delete(crs);
    visited.add(crs);
    return false;
  }
}

The key point in the solution is that the dfs algorithm returns a boolean.

Let's now proceed to analyze the time and space complexity for the solutions.

Time Complexity

For the first solution, a worst-case scenario in the problem would be that all the courses except for one have one or more prerequisites. This would make all the courses to be included in the prerequisites array. Each ordered pair in the prerequisites array represents a unique edge for the subsequent directed graph. Hence, building the directed graph out of the prerequisites array takes O(E) time, where E is the total number of edges of the graph (or, equivalently, the length of the prerequisites array) as we loop over all the edges, and the set and push operations on the edges take O(1) time.

The initialization operation for setting the status of the vertices of the graph, that is, marking the vertices as unvisited initially, takes O(V) time, where V is the number of vertices of the graph (or, equivalently, the total number of courses), as we loop over all the courses.

For time complexity purposes, the most relevant operation in the dfs function in the solution is the for loop. For a given vertex, the for loop visits each of the vertex's neighbors or, equivalently, the edges for the vertex. Since the dfs function will be called for every vertex in the graph (at most once), the for loop will visit every edge on the graph, hence the time complexity for the dfs function is O(E), where E is the total number of edges in the graph.

Combining the time complexity for the vertices initialization and the time complexity for the dfs function we get an overall time complexity of O(V + E) for the solution algorithm.

Space Complexity

For the space complexity, building the adjacency list graph with a hashmap takes O(E) space, where E is the number of edges. Since we are using recursion to solve, in the worst case scenario of the prerequisites being linearly chained, the call stack for the dfs function would grow to size V, the number of courses, hence, taking O(V) space.

Also, the status array used to keep track of the status of the vertices takes O(V) space.

As a result, the overall space complexity for the algorithm is O(V + E).

Since the second solution also involves an adjacency list directed graph with dfs, the time and space complexity for the solution algorithm are both O(V + E).

Optimization

In the first solution the algorithm continues to loop through the remainder courses even after finding a cycle. We can improve the time complexity of the algorithm a bit by having it return immediately after finding a cycle. For this, we need to refactor the dfs function to return a boolean instead of not returning anything. Also, rather than keeping three states for the courses (unvisited, visiting, and done), we'll only keep two states (processing, and visited), and we'll keep track of these states in two separate hashmaps or sets, respectively. The visited state will be stored globally in the algorithm, while the processing state will be reinitialized at the end of the dfs calls. Introducing these changes yields a solution very similar to the second solution presented previously:

function findOrder(numCourses, prerequisites) {
  let graph = new Map();
  let order = [];
  const [visit, cycle] = [new Set(), new Set()];

  for(let [dest, src] of prerequisites) {
    graph.has(src) 
      ? graph.get(src).push(dest) 
      : graph.set(src, [dest]);
  }

  for(let i = 0; i < numCourses; i++) {
    if(dfs(i)) return [];
  }

  let final = [];
  for(let i = 0; i < numCourses; i++) {
    final[i] = order[numCourses - i - 1];
  }

  return final;

  function dfs(course) {
    if(cycle.has(course)) return true;
    if(visit.has(course)) return false;
    cycle.add(course);
    for(let neighbor of graph.get(course) || []) {
      if(dfs(neighbor)) return true;
    }
    cycle.delete(course);
    visit.add(course);
    order.push(course);
    return false;
  }
};

In the dfs function, the current course is marked as processing at the start of the function by adding it to the cycle set (line 26), and later marked as visited at the end of the function by adding it to the visit set (line 31); since by this time the course has been processed, the course is removed from the cycle set (line 30). Compared to the previous solution we get extra speed at the tradeoff of extra space with the addition of the cycle set.

Summary

One way of detecting cycles in a directed graph is by using a depth-first search algorithm. If we also want to return a valid order of traversing the graph (also know as topological sort), we need to reverse the order of the vertices collected during the dfs process, as vertices to be visited last are collected first during dfs.

Answer to the icebreaker: the cycle is found among vertices [D, G, F, J].