United Kingdom: +44 (0)208 088 8978

Quickly trapping guards using Set

The performance of data types is not something you are often aware of in a high-level programming language like F#. In this blog post, we have a look at how changing a single data type boosted the performance of an algorithm by an order of magnitude.

We're hiring Software Developers

Click here to find out more

I hate to admit, but I don't often think about the performance of the data types that I use when writing code. In a lot of the things we build, especially when it comes to front end applications, the data sizes we work with are not big enough for it to make a noticeable difference. Not having this on the top of my mind meant that I was slightly surprised by a suggestion my colleague made to improve my solution to one of the Advent of Code solutions. I already discussed part 1 of this puzzle in an earlier post.

A quick recap of the problem at hand:

the puzzle tracking the whereabouts of a guard navigating a lab. The puzzle input is something like this:

........#..
.#.....#...
#....#.....
..#...#....
#....^....#
.#...#.....

The guard is indicated by the ^, while the # are obstructions. The guard initially starts moving north, and will rotate 90 degrees clockwise every time he encounters an obstacle.

The initial problem is: how many squares will the guard cover before leaving the grid?

You'll find how we get to the solution of that problem in the previous blog post. In this post we deal with part 2: You want to trap the guard in a loop. In how many positions can you place an extra obstruction, so that the guard will end up walking in circles?

Creating loops

We add a loops fuction, that takes a map and a guard position, and returns whether the map contains a loop given that start position; It uses our tryMove function, that returns the new guard position based on the behaviour described above, or None when the guard leaves the map.

The loops function is a recursive function that keeps track of the previous states (Position and direction) the guard has been in. There's 3 branches here:

  1. The guard leaves the map: obviously no loops here; we return false
  2. The guard moves into a state he has already been in: we are looping! We return true
  3. The guard moves into a new position: add that position to our list, and recursively check what happens next
let loops map =

    let rec implementation previousStates state =
        match tryMove map state with
        | None -> false
        | Some nextState ->
            if (previousStates |> List.contains nextState) then
                true
            else
                implementation (state :: previousStates) nextState

    implementation []

To solve the solution, we'll create a list of new maps, each with a new block in a different position. We only need to add new blocks to the positions the guard visited in part 1; if we block another position, he'll simply walk the same route he initially did. We then filter that map based on our loops function:

If the beginning of this code block does not make sense to you, refer back to the previous post about this topic

let positions =
    List.unfold (Guard.tryMove map >> Option.map (fun s -> (s.Position, s))) guardState

let uniquePositions = positions |> List.distinct

let mapsWithLoops =
    uniquePositions
    // Generate new maps, with an obstruction (represented by true) in the given positions.
    |> List.map (fun (col, row) -> Array.updateAt row (Array.updateAt col true map[col]) map)
    |> List.filter (fun map -> Guard.loops map guardState)

printfn $"There are {mapsWithLoops.Length} maps with loops"

Performance issues!

If you try to run the full version of this code, you'll see that it solves the problem... very slowly. Solving full input given by the Advent of Code site takes around 90 seconds on my device.

If you think about it, it makes a bit of sense: the number of positions of visited in the original puzzle is in the thousands. For each of those thousands of positions, we will run the simulation again, with a new block in that position, meaning that we have to run the simulation (of possibly a few thousand steps), thousands of times. That's millions of steps; Performance matters now!

Jaz came to the rescue with a simple suggestion: replace the List you use to keep track of the positions visited with a Set. Our solution becomes:

let loops map =

    let rec implementation previousStates state =
        match tryMove map state with
        | None -> false
        | Some nextState ->
            if (previousStates |> Set.contains nextState) then
                true
            else
                implementation (Set.add state previousStates) nextState

    implementation Set.empty

All of a sudden, the solution takes only 10 seconds to run! What makes this solution an order of magnitude faster?

Time Complexity

Apart from having different properties (like a set only containing unique values), F# data types also have different storage implementation. For example, the F# List is a linked list; a chain of elements, each referring to the next. It's quite easy to implement yourself:

type MyList<'a> = {
    Value: 'a
    Next: MyList<'a> option
}

let exampleList =
    {
        Value = 12
        Next = Some {
            Value = 9
            Next = Some {
                Value = 6
                Next = None
            }
        }
    }

These different storage strategies influence how easy it is to do all kinds of things with the collection. Adding a new value to the front of a linked list is easy:

let add value list =
    {
        Value = value
        Next = Some list
    }

Adding to the back is more complex: We will have to build up the list from the beginning up:

    let rec addLast value list =
        match list.Next with
        | Some innerList ->
            { list with
                Next = Some(addLast value innerList) }
        | None ->
            { list with
                Next = Some { Next = None; Value = value } }

The time an operation on a collection takes is called its time complexity. We express this as a function of N, the number of items in the collection.

The complexity of adding at the back of the linked list is O(N): for every element in the list, adding a new one to the back becomes a bit slower, by a constant amount.

Adding to the beginning of a linked list is O(1): No matter how long the list is, if we add to the beginning of it will take the same amount of time

Let's compare the time complexity of a Set and List on the operations we used in our loops function

Operation List Set
Add O(1) O(log(N))
Contains O(N) O(log(N))

As you can see, when it comes to adding elements, a List has an advantage: No matter how long the list is, it's just as quick to add a new item. On the other hand, adding to a Set is logarithmic: the performance degrades quickly to begin with, but as the set gets larger, the time it takes to add a new element doesn't increase that much between successive calls.

Contains is more interesting: List.contains has a time complexity of O(N), making the speed very strongly dependent on the collection size. Set.contains is still O(log(N)), making it better suited for larger collections.

As you can see, a List has an advantage when adding items to a large collection. When it comes to the Contains operation on larger collections, there is a much larger difference, and a Set is a much better option.

When working with large amounts of data, you'll probably want to take a bit longer to think about the data types you want to use than you would for small amounts. On Microsoft learn, you'll find an overview of all F# data types and their properties, including a table of their operations and the time complexity.