Graph-Based Search Algorithms

In contrast to searching within linear structures or hierarchical trees, graph search algorithms deal with traversing a general graph (which could be thought of as a network of connected nodes). Two fundamental graph search strategies are Breadth-First Search (BFS) and Depth-First Search (DFS). These algorithms systematically explore nodes and edges of a graph to find a target or to traverse all reachable nodes from a starting point. While they are often used for “traversal” (visiting all nodes) rather than direct key-based lookup, they are essential whenever the data forms a graph or network.

This article is all about searching inside Data Structures. For a more general article about Searching Algorithms, go to the main article: Search Algorithms: A Comprehensive Guide.

Depth-First Search (DFS)

Depth-First Search explores a graph by going as deep as possible along one path before backtracking. It uses a stack implicitly or explicitly: it picks a starting node, explores one neighbor, then a neighbor of that neighbor, and so on, diving deeper until it can’t continue, at which point it backtracks and explores alternative branches.

How it works: There are two common implementations of DFS – recursive (which uses the call stack) or iterative (using an explicit stack data structure):

Start at the source node (the node where the search begins).
Mark it as visited (to avoid revisiting nodes and possibly looping infinitely on cycles).
For each adjacent neighbor of this node, if it has not been visited, go deeper into that neighbor (recurse or push it onto a stack to process next).
Continue this process: from that neighbor, visit an unvisited neighbor, and so forth.
If you reach a node that has no unvisited neighbors, backtrack to the last node that had unexplored neighbors and continue with the next neighbor.
Repeat until all reachable nodes are visited or the target of the search is found.

In essence, DFS follows one path until it dead-ends, then backtracks and tries another path. For example, if you are solving a maze by hand, a DFS approach is like always taking a turn and following a corridor as far as it goes, only backtracking when you hit a dead end.

Use in searching for a target: If you are looking for a particular node with some property, DFS can be used to find it. However, DFS does not guarantee the shortest path to the target, but it will find a path if one exists (assuming the graph is finite and connected). You may need to traverse a lot of nodes before luckily hitting the target, depending on the structure and where the target is.

Complexity: DFS runs in O(V + E) time, where V is the number of vertices (nodes) and E is the number of edges in the graph. This is because in the worst case DFS will traverse every edge and vertex once. The space complexity is O(V) in the worst case (because of the recursion stack or explicit stack usage). DFS can handle very large graphs as long as it has enough memory for recursion or stack.

Use cases: DFS is a general technique used in many graph algorithms and scenarios:

Pathfinding and puzzle solving: While DFS might not always give the shortest path, it is used in scenarios like puzzle solvers (e.g., depth-first backtracking algorithms for Sudoku or mazes) where you need to exhaustively search through configurations.
Topological sorting: In directed acyclic graphs (DAGs), DFS is used to perform a topological sort by finishing times of recursion (Kahn’s algorithm or DFS-based algorithm).
Detecting cycles: By doing a DFS, you can detect back edges which indicate cycles in a graph. For example, DFS is used to detect cycles in dependency graphs or to find strongly connected components (Tarjan’s and Kosaraju’s algorithms leverage DFS).
Tree traversal: DFS is essentially the family of tree traversal algorithms (preorder, inorder, postorder for binary trees are DFS variants).
Connectivity and component identification: In an undirected graph, performing DFS from a node will visit its entire connected component. If you want to find all connected components, you can run DFS from each unvisited node and mark the component.

In summary, DFS is a fundamental tool for exploring graphs. It’s memory-efficient (only needs to store the stack of nodes on the current path) and straightforward to implement. However, if you need the shortest path in an unweighted graph or any notion of minimal steps, DFS might not be the best choice (you’d use BFS for that, described next). DFS, by virtue of going deep, can get “trapped” going down a long path that doesn’t lead to the goal before exploring shorter alternatives.

Breadth-First Search (BFS)

Breadth-First Search explores a graph in layers. It starts at a source node and first visits all nodes at distance 1 (all immediate neighbors), then all nodes at distance 2, and so on. BFS uses a queue to manage the frontier of exploration, ensuring that nodes are explored in increasing order of their distance from the start.

How it works:

Start at the source node and mark it as visited. Enqueue it.
While the queue is not empty, dequeue the front node u.
For each neighbor v of u that has not been visited, mark v as visited and enqueue v.
Continue until the queue is empty or until you find the target node (if doing a search for a specific target).

By using a queue, BFS ensures that you first handle the starting node, then all its neighbors, then the neighbors’ neighbors, and so on. This results in a layered exploration:

First layer: start node.
Second layer: all nodes one edge away from start.
Third layer: all nodes two edges away, etc.

If you visualize a graph as a ripple or wave emanating from the start node, BFS is like a wavefront that moves outward uniformly.

Key property – shortest paths: In an unweighted graph (where each edge has equal cost), BFS naturally finds the shortest path from the start node to any other node. The first time you encounter a node in BFS, you have reached it via the shortest possible route (fewest edges) from the start. This makes BFS the algorithm of choice for shortest-path problems in unweighted networks. For example, in a social network graph, BFS from a person will find all friends at distance 1 (direct friends), then distance 2 (friends of friends), etc. If you are looking for the degree of separation between two people, BFS will find it in the minimum number of hops.

Complexity: Like DFS, BFS runs in O(V + E) time, as it will end up exploring each vertex and each edge in the worst case. Space complexity is O(V) due to the queue and visited set (or boolean array). BFS can also handle large graphs, but if the graph is very wide (each node has many neighbors), the queue can grow quite large at peak (in the worst case, the last layer might have O(V) nodes in the queue).

Use cases: BFS is used whenever layer-by-layer exploration is needed or when shortest paths in terms of edge count are desired:

Shortest path in unweighted graphs: As mentioned, BFS will give you the minimum number of steps to reach a target. For example, solving a puzzle where you can make moves and each move has equal cost – BFS can find the solution in the fewest moves. A classic application is finding the shortest path in a maze (each step to an adjacent cell is one move).
Networking and routing: In network routing (when all links are considered equal for simplicity), BFS can be used to find the smallest number of hops between two routers. Also, concepts like “broadcast storm” in networking simulate a BFS-like propagation when a message is flooded.
Level order traversal of trees: Printing a binary tree level by level is exactly BFS on the tree (the tree is a special case of a graph).
Finding connected components in undirected graphs: If you run BFS from an unvisited node and mark everything reachable, you identify one connected component. Repeating this will label all components.
Web crawling: A web crawler that tries to stay shallow (collect all pages within one link from a start, then two links away, etc.) could use BFS to avoid going deep into one site before exploring others.

In summary, BFS and DFS are complementary strategies. BFS is great for shortest paths and layer-wise processing, while DFS is great for exhaustive search and depth-based tasks. Both ensure that every node is eventually explored (if the graph is connected or in each connected component separately), but the order of exploration differs, which gives them different practical applications. When using these algorithms, it’s important to handle marking of visited nodes properly, especially in graphs with cycles, to avoid infinite loops. Typically, a boolean visited structure or marking mechanism is used.

Path traversal and Pathfinding Graphs

Searching through graphs is directly and intrinsically related to traversing and pathfinding through graphs. You can find more about these techniques and algorithms in the following articles: