In the previous post of this series we discussed how to use depth-first search (DFS) for detecting cycles in a directed graph, as well as returning a topological sort for the graph when no cycles are found. In this post we'll discuss a breadth-first search (BFS) approach to cycle detection in directed graphs.
Icebreaker: the graph in the image contains a cycle; can you tell where the cycle is found?
Similar to the part I post, Directed Graph Cycle Detection: Depth-First Search Algorithm, we'll have a directed graph represented by a course schedule with prerequisites (as in LeetCode's Course Schedule II problem). We want to know whether it is possible to complete all the courses in the schedule and, if so, return a valid order to take the courses in. To solve this problem we'll use what is known as Khan's algorithm, an algorithm that utilizes breadth-first search (bfs) to determine the topological sort in a directed graph, and detect cycles (if an ordering is not possible).
Let's consider the below directed graph:
Consider the number of dependencies for each of the vertices in the graph or, equivalently, the indegree value for the vertices: vertex A has 0 dependencies (no arrows coming in to A; indegree 0); vertices C and B each have 1 dependency (indegree 1), and vertex D has 2 dependencies (indegree 2). It turns out that the indegree value of a vertex is an indication of the order in which the vertex should be visited: vertices with lower indegree values are visited first, then vertices with higher indegree values. If the previous graph represented a course schedule, we could easily deduct the two possible valid orders in which to take the courses just by looking at the dependencies of the courses: [A, C, B, D], and [A, B, C, D].
This leads us to the following approach to Khan's algorithm: Given an array of course prerequisites representing the edges of a directed graph, and given a total of numCourses
, where courses run from 0
to numCourses - 1
, we'll initialize an indegree
array containing all the courses, each course initially with 0 dependencies (indegree 0). Then we'll iterate over the prerequisites
array to build a directed graph and, at the same time, increment the indegree value for the destination courses in the indegree
array using the graph edges. We'll then push the courses with indegree 0 to a queue
to be used by a breadth-first search (bfs) algorithm in iterative form. In the bfs algorithm we'll dequeue the first value and iterate over the course's neighbors to decrement the indegree value of the neighbors by 1, as the current course has been dequeued and will not be seen again. If, in this process, a neighbor reaches indegree 0, we'll push it to the queue
for processing. This way, courses will be visited in a valid order. At the end of the bsf iteration, we'll push the course to an output array. If, at the end of the bfs process (when the queue
is empty), the ouput array is empty or does not contain all the courses, it means there was no course with indegree 0 initially (because of cycles) or a cycle was found at some point later in the graph traversal which prevented collecting the courses that formed part of a cycle.
Let's apply the previous approach to the below graph:
From the graph, vertex A has indegree 0, vertex G has indegree 2, and the remainder vertices have indegree 1 each. We can take course A (it is the only course with indegree 0 initially), which will make the indegree value of vertex G to decrease to 1, but then, the remainder courses cannot be taken because course G never reaches indegree 0 due to a cycle closing at G: Since not all the courses can be taken, the graph contains a cycle.
Let's put into code the previous approach to Khan's algorithm:
function findOrder(numCourses, prerequisites) {
const indegree = [...Array(numCourses).fill(0)];
const graph = new Map();
const queue = [];
const order = [];
for(let [crs, pre] of prerequisites) {
if(graph.has(pre)) {
graph.get(pre).push(crs);
} else {
graph.set(pre, [crs]);
}
indegree[crs]++;
}
for(let i = 0; i < numCourses; i++) {
if(indegree[i] === 0) queue.push(i);
}
// breadth-first search, iterative
while(queue.length) {
const current = queue.shift();
for(let neighbor of (graph.get(current) || [])) {
indegree[neighbor]--;
if(indegree[neighbor] === 0) queue.push(neighbor);
}
order.push(current);
}
return numCourses === order.length ? order : [];
};
In the above algorithm, the indices of the indegree
array (line 2) represent the courses, running from 0
to numCourses - 1
. The graph is built on line 7, courses with indegree 0 are enqueued on line 15; neighbor courses that reach indegree 0 during the bfs process are enqueued on line 25, and the topological sort collection to be returned by the algorithm occurs on line 27.
If we were only interested in determining whether the graph contains a cycle, and leave out the topological sort (ordering) part, we would then need to design an algorithm that returns a boolean instead an array. It turns out that we can use the previous section's algorithm as is, and only need to change one line of code, the return
line (line 30), to make the algorithm work in cycle detection mode only:
return numCourses === order.length ? true : false;
and that suffices the requirement.
Let's move on to time and space complexity analysis.
Building the graph out of the prerequisites
array takes O(E) time, as we iterate over all the edges of the graph in the prerequisites
array, where E is the number the number of prerequisites.
Assigning indegree values to the courses takes O(V) time, as we iterate over all the courses, where V is the number of courses.
A worst-case scenarios for the while loop (line 21) in the solution algorithm would be that all courses except for one course have one or more dependencies (prerequisites), and that there are no cycles. Hence the loop will iterate over all the courses at most once, thus taking O(V) time and, additionally, for each course, the course's neighbors or, equivalently, the edges are processed for indegree decrement (in the for loop on line 23), hence processing all edges and thus taking O(E) time. As a result, the algorithm's overall time complexity is O(V + E).
Building the graph out of the prerequisites array takes O(E) space as we iterave over all the prerequisites, while building the indegree
array takes O(V) space. Hence the overall space complexity for the solution algorithm is O(V + E).
Two common methods for cycle detection in directed graphs are depth-first search (dfs), and breadth-first search (bfs). The indegree value of a vertex is an indicator of the relative order in which the vertex should be visited, and Khan's bfs algorithm leverages this fact to collect the vertices in a valid order and, thus, return a topological sort for the graph, or detect cycles if not all vertices can be collected.
Answer to icebreaker: the cycle is found among vertices [5, 8, 7]
.