Graph Traversal Algorithms: A Comprehensive Guide

Introduction to Graph Traversal Algorithms

Graph traversal is the process of visiting each vertex in a graph in a systematic way. In other words, it means exploring the graph’s nodes and edges to discover all reachable vertices. Traversal algorithms ensure every vertex is visited exactly once (in some order), which is fundamental for analyzing graph structures. This matters because many higher-level graph problems (like searching for specific nodes, checking connectivity, or finding paths) rely on efficient traversal as a building block.

Graphs come in different types, and the traversal methods apply to all of them with slight considerations. A graph can be undirected (edges have no direction) or directed (edges have a direction, like one-way links). In directed graphs, a traversal from a given start node will only follow edges along their direction, which can restrict reachability compared to an undirected graph. Graphs may also be unweighted (all edges treated equally) or weighted (each edge has a numerical weight or cost). Traversal algorithms like DFS and BFS typically ignore weights (they just explore connectivity), whereas pathfinding algorithms take weights into account (e.g. Dijkstra’s algorithm for shortest paths in weighted graphs). In this article we focus on unweighted graph traversal; weighted pathfinding is covered separately.

Which Graph Traversal Strategy You Should Choose

The two core strategies for graph traversal are Depth-First Search (DFS) and Breadth-First Search (BFS). These form the basis for many graph algorithms. We will also discuss related algorithms built on traversal concepts: topological sorting (ordering nodes in directed acyclic graphs) and finding connected components (identifying isolated subgraphs, including strongly connected components in directed graphs). Each section includes how the algorithm works, real-world use cases, and pseudocode or step outlines where appropriate. The tone is tutorial-like but technically precise, similar in completeness and formality to our earlier sorting and searching articles.

The Algorithms Guide

This article is part of a comprehensive guide about algorithms. To access the main article and read about other types of algorithms, from sorting to machine learning, follow the following link:

A Friendly Guide to Computer Science Algorithms by Category.

Depth-First Search (DFS)

Depth-First Search (DFS) is a traversal algorithm that explores as far down one branch of the graph as possible before backtracking. In DFS, when you visit a vertex, you immediately continue to an unvisited neighbor of that vertex, then to an unvisited neighbor of that neighbor, and so on. You only backtrack when you reach a vertex that has no unvisited neighbors. This depth-first strategy means child vertices are visited before sibling vertices. Conceptually, it’s like walking through a maze by always taking the next unexplored corridor until you hit a dead end, then retracing your steps.

Traversal algorithms. — Figure: One possible Depth-First Search order on a sample graph. Starting at node A (1), DFS goes deep to C (2), then to D (3). It backtracks to explore C’s next neighbor E (4) before finally backtracking and exploring B (5). This illustrates how DFS dives down one path completely before visiting siblings.

How it works: DFS can be implemented recursively or iteratively. The recursive version naturally uses the call stack to remember the path, whereas the iterative version uses an explicit stack data structure to simulate recursion. In both cases, the algorithm marks each vertex as visited when first encountered, to avoid processing it multiple times (which is especially important if the graph has cycles). Starting from a chosen root vertex, DFS proceeds as follows:

Visit the start vertex and mark it visited.
For each unvisited neighbor, go deeper by recursively (or iteratively) visiting that neighbor, then its neighbors, etc., until no further unvisited vertices can be reached.
Backtrack: When a vertex has no unvisited adjacent vertices, the algorithm returns to its parent (the vertex from which it was discovered) to continue exploring other branches.
Repeat this process until all vertices reachable from the start have been visited. If any vertices remain unvisited, choose another unvisited vertex and repeat (ensuring all components are covered).

Using a stack (either implicit via recursion or explicit) ensures that the most recently discovered vertex that still has unexplored neighbors is processed next. The time complexity of DFS is O(V + E) (where V is the number of vertices and E is the number of edges) because in the worst case it will explore every vertex and every edge once.

Applications of DFS: DFS is a versatile algorithm used in many scenarios:

Path Finding: Although DFS does not guarantee the shortest path, it can be used to find a path between two nodes in a graph. For example, in puzzle solving or maze traversal, a DFS will explore possible paths deeply; it might find a solution quickly if lucky (going in the right direction), but it could also get lost down a long path that isn’t fruitful.
Cycle Detection: DFS is useful for detecting cycles in graphs. In a directed graph, a cycle exists if during DFS we find an edge that leads to an ancestor in the recursion stack (a back edge). By keeping track of nodes currently in the recursion stack (or marked with a “temporary mark”), DFS can detect cycles and is often used to determine if a directed graph is acyclic. In an undirected graph, a cycle is detected if an already visited neighbor is found that is not the direct parent of the current vertex.
Topological Sorting: DFS is the basis for one method of topological sort (discussed later). By recording the order in which vertices finish (are fully explored) in DFS, you can produce a valid topological ordering of a DAG. Essentially, vertices get added to the sort in reverse order of their DFS finish times.
Connected Components: Running a DFS from an unvisited vertex will mark all vertices in that vertex’s connected component as visited. By iterating this process (starting a new DFS whenever you find an unvisited vertex), you can identify all the connected components of an undirected graph. Similarly, DFS is part of algorithms for finding strongly connected components in directed graphs (covered later).
Puzzle and Game Algorithms: Many backtracking algorithms utilize DFS. For example, solving Sudoku or performing a exhaustive search in a game state space can be done via DFS (trying a move, then recursively trying subsequent moves, and backtracking when a move leads to an invalid state).

Real-world use cases of DFS include analyzing networks (e.g. finding clusters of connected users in a social network), performing searches in AI (like brute-force searches in problem-solving), and even tasks like web crawling (where one might follow links deeply before backtracking). Its memory footprint is typically lower than BFS for wide graphs, since it stores only a path at a time (O(V) in the worst case depth). However, DFS can get stuck going deep in one part of a graph; if the graph is infinite or extremely large in one branch, a naive DFS might not find solutions that lie in other branches. To mitigate that, variants like iterative deepening DFS impose depth limits.

Searching Graphs

Understanding Graph based search is an invaluable skill when dealing with Graph-based algorithms. Learn more about these techniques and algorithms following this link: Graph-Based Search Algorithms.

Breadth-First Search (BFS)

Breadth-First Search (BFS) is a traversal algorithm that explores the graph level by level. It starts at a given source vertex and first visits all neighbors of that vertex, then all the neighbors of those neighbors, and so on, expanding outward in waves. This approach ensures that vertices are visited in order of increasing distance from the start node (in terms of number of edges). In other words, BFS visits sibling vertices before moving to the next level of children.

How it works: BFS uses a queue to manage the frontier of exploration. The algorithm can be outlined as follows:

Enqueue the start vertex and mark it as visited.
While the queue is not empty, dequeue the front vertex v. Visit v (process it) and then for each unvisited neighbor of v, mark it visited and enqueue it.
The loop continues, level by level: all neighbors of the start (distance 1) will be visited before any vertices at distance 2, and so on. If the graph is not connected, once the queue empties, you pick another unvisited vertex and repeat to cover the next component.

Because BFS visits nodes in increasing order of distance from the source, it naturally finds the shortest path in an unweighted graph from the start node to any other reachable node. The first time you encounter a node in BFS, you have used the fewest edges possible to reach it. This makes BFS ideal for certain routing and distance problems. Like DFS, BFS runs in O(V + E) time with an adjacency list representation, since every vertex and every edge is processed once. However, BFS’s memory usage can be higher in breadth (it may store an entire level of nodes in the queue at once, which in a worst-case dense graph could be O(V) nodes).

Applications of BFS:

Shortest Path in Unweighted Graphs: If you need the minimum number of hops or edges between two nodes, BFS is the go-to method. For example, in a social network graph, BFS can find the degree of separation between two people (the smallest number of friend connections linking them). In routing, if all roads are considered equal length, BFS finds the route with fewest intersections.
Bipartite Graph Check: BFS can determine if a graph is bipartite (can be colored with two colors such that no edge connects same-colored vertices). The algorithm tries to color the graph during traversal: start by coloring the source one color, then all its neighbors the opposite color, then their neighbors the first color, and so on. If at any point a conflict is found (an edge connects two vertices of the same color), the graph is not bipartite. This technique uses BFS layering: one can think of it as alternating colors at each BFS level.
Flood Fill and Distance Mapping: In image processing or puzzle games (like finding areas in a grid, or the reachable territories in a board game), BFS can be used to “flood-fill” outward from a starting region, marking distances. For example, given a map with passable and impassable cells, BFS from a start cell will mark the distance (in steps) to every reachable cell, effectively computing shortest paths on a grid.
Web Crawling and Networking: A web crawler might use BFS to ensure it finds all pages reachable from a start site by following links level by level (which naturally finds pages by increasing link distance). In networking, BFS can be used in broadcasting algorithms – e.g. finding all nodes within K hops in a peer-to-peer network.
Finding Connected Components: Similar to DFS, BFS can be repeated to find all connected components of an undirected graph. You perform BFS from an unvisited node, mark the entire component, then repeat from another unvisited node until all vertices are covered. This will group the graph into its isolated clusters.

One key property of BFS is that it is complete (if there is a path from the start node to a goal node, BFS will definitely find it, and find the shortest such path). This is in contrast to DFS, which could potentially go down an infinite path and fail to find a reachable goal if not carefully managed. Because of this, BFS is often used in scenarios where finding the optimal solution (minimal steps) is more important than memory usage, whereas DFS might be chosen for scenarios where the search space is large but solutions are dense and one just needs a solution (not necessarily the shortest) with less overhead.

Topological Sort

Topological sorting is an ordering of the vertices of a directed graph such that every directed edge (u → v) goes from an earlier vertex u to a later vertex v in the order. In other words, it’s a linear sequence of the graph’s nodes where each node comes after all of its prerequisites. Topological sort only applies to directed acyclic graphs (DAGs) – if there is a cycle, no valid topological order exists (you can’t order tasks if there’s a circular dependency). A classic example is a set of tasks with dependencies: if task X depends on task Y, then Y must appear before X in the ordering. A topological sort provides a schedule that respects all such constraints. Every DAG has at least one valid topological ordering, and in fact many DAG-related problems (like scheduling or determining compilation order) rely on computing this order.

There are two common algorithms to perform topological sorting: one based on DFS and another using Kahn’s algorithm (which uses a form of BFS). Both algorithms run in linear time O(V + E). We will describe each approach:

DFS-Based Method

One way to generate a topological sort is to leverage Depth-First Search. The idea is to do a DFS traversal and take advantage of the natural backtracking order of DFS. When DFS explores a node, it recursively explores all that node’s descendants (outgoing edges) before marking the node as finished. If we record nodes in the reverse order of when they finish (i.e. push each node onto a stack when its DFS call is complete), we obtain a topological ordering.

A DFS-based topological sort algorithm typically works like this:

Initialize all vertices as unvisited. Also maintain an empty list (or stack) to record the topological order.
For each vertex that is still unvisited, perform a DFS from that vertex. During the DFS, when you finish exploring a vertex (meaning all its outgoing edges lead to already visited nodes), add that vertex to the front of the output list (or push it onto a stack).
Continue until all vertices have been processed. The list now represents a topological sort.

During this DFS, one must detect cycles: if a back edge is encountered (meaning we found an edge to a vertex that is in the current recursion stack, i.e. temporarily marked), then the graph has a cycle and a topological sort is impossible. In implementation terms, you might use three states for each node: unvisited, visiting (in recursion stack), and visited. If you ever try to visit a “visiting” node, you detected a cycle. Otherwise, the order in which nodes are added to the output list (a node is added after exploring its neighbors) ensures that each node comes before any nodes that depend on it. This approach is described in algorithms texts (such as the one by Cormen, Leiserson, Rivest, and Stein) and was originally published by Tarjan in 1976.

Use cases: Topological sorting via DFS is useful when you want to both detect cycles and produce an order. For instance, if you need to determine a valid build order of software components and also warn if there’s a cyclic dependency, a DFS-based solution can do both in one pass. In coding, one might implement this with recursion and a stack easily. However, care must be taken with recursion depth if the graph is large (to avoid stack overflow in languages without tail recursion optimization).

Kahn’s Algorithm (BFS-Based)

Kahn’s algorithm is an alternative that uses a breadth-first approach to produce a topological order. This algorithm explicitly computes in-degrees (the number of incoming edges) of vertices and repeatedly removes nodes with zero in-degree. The procedure for Kahn’s algorithm is:

Compute in-degrees: Calculate the in-degree of each vertex (how many edges come into it).
Initialize queue: Collect all vertices with in-degree 0 into a queue (or set) – these have no prerequisites and can come first.
Process queue: While the queue is not empty, remove a vertex n from the queue and add n to the topological order list. For each neighbor m of n (for each edge n → m), decrement the in-degree of m (because we are effectively “removing” the edge n → m from the graph). If any neighbor’s in-degree becomes 0, add that neighbor to the queue.
Repeat until the queue is empty. If all vertices were output, you have a valid topological sort. If there are remaining vertices with nonzero in-degree, that means the graph had a cycle (the cycle members were never freed of incoming edges).

Kahn’s algorithm will produce a topological order (not necessarily unique) for the DAG or report that none exists if a cycle is detected. It essentially peels the graph layer by layer: first all nodes with no incoming edges, then those whose prerequisites are all satisfied, and so on.

Use cases: Kahn’s method is intuitive for scenarios like scheduling. For example, consider course prerequisites: initially, take all courses with no prerequisites (in-degree 0). Those can be taken first. Once those are “removed,” some courses that had those as prerequisites now have one fewer prerequisite remaining; when some course’s prerequisites count drops to zero, it becomes available. This is exactly how Kahn’s algorithm would schedule courses. The algorithm cleanly separates the concern of cycle detection (if something’s left with in-degree > 0 at the end, you have a cycle in prerequisites). In practice, this approach is often used in systems that explicitly manage dependency counts – for instance, build systems (Make, etc.) or task schedulers might use a variant of this. One advantage of Kahn’s algorithm is that it doesn’t use recursion, so it can be easier to implement in an iterative environment and potentially can be adapted to handle dynamically added nodes (though in that case you’re really dealing with online scheduling).

Both DFS-based and Kahn’s algorithm achieve the same result: an ordering of nodes that respects all directed edges. It’s worth noting that if the graph has multiple valid topological orders, different algorithms (or different choices within the algorithms, like which zero in-degree node to pick first) may produce different orders. That’s fine as long as the partial order constraints are satisfied.

Connected Components

In many graphs, especially undirected ones, the graph may not be fully connected – it can consist of multiple separate clusters of vertices with no paths between them. Each such cluster is called a connected component. In an undirected graph, a connected component is a set of nodes where each node is reachable from any other node in that set (by some path). If an undirected graph has multiple connected components, no traversal starting from one component will ever reach another, so it’s often necessary to identify all components by restarting a traversal on unvisited nodes as needed. In directed graphs, we distinguish between general connectivity and strong connectivity. A strongly connected component (SCC) is a subset of vertices in a directed graph such that every vertex in the subset can reach every other vertex in the subset (each node is mutually reachable). This is a stricter condition – for instance, in a directed graph of web pages, an SCC would be a set of pages where each page has a path to every other (a strongly interconnected cluster of sites). Identifying connected components (undirected) or strongly connected components (directed) is a common task, useful for understanding the structure of networks and simplifying graphs by treating components as single units.

Undirected Graphs

For an undirected graph, finding connected components is straightforward with either DFS or BFS. You can start a DFS/BFS from an unvisited node, mark all nodes reachable from it, that set of nodes forms one connected component. Then you pick another unvisited node (if any) and repeat, which yields another component, and so on. The algorithm ensures that each vertex is assigned to exactly one component. The result is typically a partition of the vertex set into disjoint components.

Example: Suppose you have a friendship network where each node is a person and edges represent friendship. The connected components of this graph would correspond to isolated friend groups (nobody in one group knows anyone in another group). By running a DFS or BFS from each person who hasn’t been placed in a group yet, you can discover all such friend circles. This has real-world use in social network analysis for finding communities.

The complexity is O(V + E) for finding all components, since essentially it is performing a full traversal of the graph (visiting each edge and vertex once). Many graph algorithms begin by finding connected components as a preprocessing step – for example, if you want to check some property in each independent section of a graph or run a heavy computation on each component separately.

Directed Graphs (Strongly Connected Components)

In directed graphs, connectedness is not symmetric; one can have two vertices A and B where A can reach B but B cannot reach A. For strong connectivity, we require mutual reachability. The strongly connected components (SCCs) of a directed graph form a partition of the graph’s vertices (each vertex belongs to exactly one SCC), and if we contract each SCC into a single “super-node,” the resulting graph (called the component graph or condensation) is a DAG. Finding SCCs is important in many applications – for example, in dependency graphs, an SCC might indicate a set of interdependent tasks that form a cycle; in program analysis, SCCs of a call graph can identify groups of functions with mutual recursion.

There are two classic linear-time algorithms for finding all SCCs in a directed graph: Kosaraju’s algorithm and Tarjan’s algorithm. Both run in O(V + E) time.

Kosaraju’s Algorithm: This method uses two passes of DFS:

Perform a DFS traversal of the original directed graph, keeping track of the finish times of each vertex (the time when a vertex’s DFS call is completed). One way is to push vertices onto a stack in the order of completion (same idea as the DFS topological sort approach, but here the graph may not be acyclic).
Compute the transpose of the graph (also called the reverse graph) by reversing the direction of every edge.
Perform DFS on the transposed graph, but in the order of decreasing finish times from the first pass. In practice, you pop vertices from the stack that was filled in step 1 and start a DFS from each popped vertex (if not already visited in this second pass). Each DFS tree in this second pass yields one strongly connected component: all nodes reached in that DFS are in one SCC. Mark them as such and continue with the next popped vertex that hasn’t been visited in the second pass.

The rationale is that in the original DFS, we get a sort of ordering of influence (finish time order). Reversing edges and exploring in that order ensures that we jump into the “heads” of each SCC first. Nodes that are strongly connected will end up adjacent in the finish time ordering and will be grouped together in the second pass. Kosaraju’s algorithm is straightforward to understand and implement (essentially two DFS runs), though it does the work of DFS twice.

Tarjan’s Algorithm: Tarjan’s algorithm finds SCCs in one DFS pass. It was invented by Robert Tarjan and is a bit more intricate, using a single DFS with a stack and some bookkeeping:

As you perform DFS, each node is given a discovery time index (an incrementing counter as in typical DFS) and a low-link value. The low-link of a node is the smallest discovery index of any node reachable from it including itself via the DFS (this includes the possibility of going forward through tree edges and backward through back edges). Essentially, the low-link value tells you if there’s a path from the current node that goes back to some earlier node in the DFS.
All nodes are pushed onto a stack as they are visited (and remain on stack until they are assigned to an SCC). If a node finds that it can reach no earlier node than itself (i.e., its low-link value equals its own discovery index), then it must be the root of a strongly connected component. When this condition is found for a node u, it means u and all nodes on top of u in the stack up to itself form one SCC. The algorithm will pop the stack until it has removed u and all nodes above it, which yields the full component. Those nodes are then marked as having their SCC found, and the DFS continues.
The algorithm uses recursion or an explicit stack and maintains the stack of active nodes (those in the current DFS path) along with their low-link values. By updating low-link values across tree edges (DFS calls) and back edges (edges to ancestors on the stack), Tarjan’s method identifies components in a single sweep.

Tarjan’s algorithm is efficient and slightly more complex to implement, but it has the advantage of not needing a second pass or an explicit graph transpose. It directly identifies components on the fly.

Applications of SCCs: Finding SCCs in directed graphs has numerous applications. In compilers or dependency management, SCCs represent cyclical dependencies that might need special handling (e.g., mutually recursive modules). In websites or sociology, SCCs can indicate clusters of strongly connected entities (for example, a set of websites that all link to each other, or a tight-knit group of people in a directed friendship/follow graph). Once SCCs are identified, one can analyze the higher-level structure by looking at the DAG of components – for instance, one might collapse each SCC and then perform a topological sort on the component graph to understand overall dependencies between groups. SCC detection is also used in graph algorithms like solving 2-SAT problems (each implication graph of a 2-SAT formula is a directed graph, and the formula is satisfiable if and only if no variable and its negation are in the same SCC).

In summary, graph traversal algorithms like DFS and BFS are fundamental tools that enable us to explore graph data structures. They form the basis for solving more complex problems such as finding shortest paths, detecting cycles, ordering tasks via topological sort, and identifying connected or strongly connected components. Mastering DFS and BFS provides a strong foundation for understanding and implementing a wide range of graph algorithms and for reasoning about graph-structured data in real-world applications.