United Kingdom: +44 (0)208 088 8978

FP-style breadth-first search

Interested in using functional programming but wondering how to translate your favourite algorithms to work with immutable data? Matt shows how to write a functional breadth-first search in this week's blog post.

We're hiring Software Developers

Click here to find out more

I've enjoyed tackling Advent of Code problems last year and this year (I'm hoping to save Christmas 2024 in early 2025 😅). One thing that you notice after completing many of them is that variants of the same algorithms are often used multiple times in one year. One such algorithm is breadth-first search.

Suppose that you have a graph, a node in that graph (the "start" node) and a property that some nodes satisfy. Breadth-first search is an algorithm that you can use to find the closest node to the start node that satisfies the property. If there are multiple at the same distance from the start node, one is chosen arbitrarily. The basic idea is that you start at the start node, then check all of its neighbours, then all of their neighbours that you haven't already checked, etc. Starting at the entrance to a maze and trying to find the shortest path to the exit using breadth-first search, the algorithm proceeds as illustrated below.

Animation of breadth-first search being used to find the shortest path from a maze entrance to exit

Animation sourced from Wikimedia Commons, created by NeptoTech, licensed under the attribution-sharealike 4.0 international Creative Commons license.

As I already mentioned, variants of the basic idea are useful. For example, you could use a variant of breadth-first search to find the distance between the start node and all nodes it is connected to: start with the start node and keep checking nodes at the next distance until you run out of nodes.

If you're learning about algorithms such as breadth-first search for the first time, chances are that the implementation that you first encounter will be imperative and use mutable data structures. For example, it's common to use a mutable queue, as in the following Python code (thanks Claude 3):

from collections import deque

def bfs(graph, start_node):
    visited = set()
    queue = deque([(start_node, 0)])  # (node, distance)
    visited.add(start_node)

    distances = {start_node: 0}

    while queue:
        node, distance = queue.popleft()

        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, distance + 1))
                distances[neighbor] = distance + 1

    return distances
graph = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F'],
    'D': [],
    'E': ['F'],
    'F': []
}

distances = bfs(graph, 'A')
print(distances)  # Output: {'A': 0, 'B': 1, 'C': 1, 'D': 2, 'E': 2, 'F': 2}

This queue-based algorithm is illustrated in the following animation (using a different graph than in the Python code):

Animation showing nodes being iteratively queued and checked during depth-first search

Animation sourced from Wikimedia Commons, created by Blake Matheny, licensed under the attribution-sharealike 3.0 unported Creative Commons license.

However, if you're reading this post, you're probably interested in functional programming and know that FP users tend to prefer to use immutable data where possible. You could of course use a recursive function that passes an immutable set and immutable queue to the next invocation. That's perfectly viable, but the concept of an immutable queue feels somewhat weird. Let's look at an alternative approach.

let bfs graph startNode =
    (set [ startNode ], set [ startNode ])
    |> Array.unfold (fun (nodesAtPrevDistance, visitedNodes) ->
        if Set.isEmpty nodesAtPrevDistance then
            None
        else
            let nextNodes =
                nodesAtPrevDistance
                |> Seq.map (fun n -> graph |> Map.find n)
                |> Set.unionMany
                |> fun ns -> Set.difference ns visitedNodes

            let visitedNow = Set.union visitedNodes nextNodes
            Some(nodesAtPrevDistance, (nextNodes, visitedNow)))

The implementation uses unfold, which takes the previous state and calls the given function until None is returned. Check out Joost's previous post for a deep dive.

The state that we use consists of two sets: the nodes at the previous distance and the already visited nodes. To start with, they both only contain the start node. In each iteration, we find those at the next distance (neighbours of those at the previous distance that haven't already been visited), add the nodes at the previous distance to the array that we're building up, and pass through the nodes at the next distance and those that we've now visited as the next state. We stop if there weren't any nodes at the previous distance.

As you can see, it returns the correct data:

let graph =
    Map.empty
    |> Map.add 'A' (set [ 'B'; 'C' ])
    |> Map.add 'B' (set [ 'D'; 'E' ])
    |> Map.add 'C' (set [ 'F' ])
    |> Map.add 'D' (set [])
    |> Map.add 'E' (set [ 'F' ])
    |> Map.add 'F' (set [])

bfs graph 'A' // Returns [| set [ 'A' ]; set [ 'B'; 'C' ]; set [ 'D'; 'E'; 'F' ] |]

The main benefits of the imperative code are beginner-friendliness and performance. There are some performance optimisations that you can make to the functional code (for example using a read-only dictionary rather than a map), but I think it's fair to say that it's easier to get your head round a queue than Array.unfold.

The main thing that the functional code has going for it is that it better matches the conceptual description of the algorithm—check the nodes at distance 1, then those at distance 2, etc. Working at this higher level of abstraction can make it easier to work out how to adapt the algorithm to certain problems. For example, maybe you need to find all nodes that are the same distance from the start node as another particular node and it takes too long to keep calculating further distances. Adapting the functional code to make that change would be easy, but it would take a bit more thought and care with the imperative code.

To some extent it's a matter of taste; F# is a pragmatic language that lets you write imperative code, and sometimes it makes sense to do so. But learning how to do things without mutation can be a fun challenge, and using functional approaches can make some problems easier to tackle.