I hate to admit, but I don't often think about the performance of the data types that I use when writing code. In a lot of the things we build, especially when it comes to front end applications, the data sizes we work with are not big enough for it to make a noticeable difference. Not having this on the top of my mind meant that I was slightly surprised by a suggestion my colleague made to improve my solution to one of the Advent of Code solutions. I already discussed part 1 of this puzzle in an earlier post.
A quick recap of the problem at hand:
the puzzle tracking the whereabouts of a guard navigating a lab. The puzzle input is something like this:
........#.. .#.....#... #....#..... ..#...#.... #....^....# .#...#.....
The guard is indicated by the ^, while the # are obstructions. The guard initially starts moving north, and will rotate 90 degrees clockwise every time he encounters an obstacle.
The initial problem is: how many squares will the guard cover before leaving the grid?
You'll find how we get to the solution of that problem in the previous blog post. In this post we deal with part 2: You want to trap the guard in a loop. In how many positions can you place an extra obstruction, so that the guard will end up walking in circles?
Creating loops
We add a loops
fuction, that takes a map and a guard position, and returns whether the map contains a loop given that start position; It uses our tryMove function, that returns the new guard position based on the behaviour described above, or None when the guard leaves the map.
The loops function is a recursive function that keeps track of the previous states (Position and direction) the guard has been in. There's 3 branches here:
- The guard leaves the map: obviously no loops here; we return false
- The guard moves into a state he has already been in: we are looping! We return true
- The guard moves into a new position: add that position to our list, and recursively check what happens next
let loops map =
let rec implementation previousStates state =
match tryMove map state with
| None -> false
| Some nextState ->
if (previousStates |> List.contains nextState) then
true
else
implementation (state :: previousStates) nextState
implementation []
To solve the solution, we'll create a list of new maps, each with a new block in a different position. We only need to add new blocks to the positions the guard visited in part 1; if we block another position, he'll simply walk the same route he initially did. We then filter that map based on our loops
function:
If the beginning of this code block does not make sense to you, refer back to the previous post about this topic
let positions =
List.unfold (Guard.tryMove map >> Option.map (fun s -> (s.Position, s))) guardState
let uniquePositions = positions |> List.distinct
let mapsWithLoops =
uniquePositions
// Generate new maps, with an obstruction (represented by true) in the given positions.
|> List.map (fun (col, row) -> Array.updateAt row (Array.updateAt col true map[col]) map)
|> List.filter (fun map -> Guard.loops map guardState)
printfn $"There are {mapsWithLoops.Length} maps with loops"
Performance issues!
If you try to run the full version of this code, you'll see that it solves the problem... very slowly. Solving full input given by the Advent of Code site takes around 90 seconds on my device.
If you think about it, it makes a bit of sense: the number of positions of visited in the original puzzle is in the thousands. For each of those thousands of positions, we will run the simulation again, with a new block in that position, meaning that we have to run the simulation (of possibly a few thousand steps), thousands of times. That's millions of steps; Performance matters now!
Jaz came to the rescue with a simple suggestion: replace the List
you use to keep track of the positions visited with a Set
. Our solution becomes:
let loops map =
let rec implementation previousStates state =
match tryMove map state with
| None -> false
| Some nextState ->
if (previousStates |> Set.contains nextState) then
true
else
implementation (Set.add state previousStates) nextState
implementation Set.empty
All of a sudden, the solution takes only 10 seconds to run! What makes this solution an order of magnitude faster?
Time Complexity
Apart from having different properties (like a set only containing unique values), F# data types also have different storage implementation. For example, the F# List
is a linked list; a chain of elements, each referring to the next. It's quite easy to implement yourself:
type MyList<'a> = {
Value: 'a
Next: MyList<'a> option
}
let exampleList =
{
Value = 12
Next = Some {
Value = 9
Next = Some {
Value = 6
Next = None
}
}
}
These different storage strategies influence how easy it is to do all kinds of things with the collection. Adding a new value to the front of a linked list is easy:
let add value list =
{
Value = value
Next = Some list
}
Adding to the back is more complex: We will have to build up the list from the beginning up:
let rec addLast value list =
match list.Next with
| Some innerList ->
{ list with
Next = Some(addLast value innerList) }
| None ->
{ list with
Next = Some { Next = None; Value = value } }
The time an operation on a collection takes is called its time complexity. We express this as a function of N, the number of items in the collection.
The complexity of adding at the back of the linked list is O(N): for every element in the list, adding a new one to the back becomes a bit slower, by a constant amount.
Adding to the beginning of a linked list is O(1): No matter how long the list is, if we add to the beginning of it will take the same amount of time
Let's compare the time complexity of a Set
and List
on the operations we used in our loops
function
Operation | List | Set |
---|---|---|
Add |
O(1) | O(log(N)) |
Contains |
O(N) | O(log(N)) |
As you can see, when it comes to adding elements, a List
has an advantage: No matter how long the list is, it's just as quick to add a new item. On the other hand, adding to a Set
is logarithmic: the performance degrades quickly to begin with, but as the set gets larger, the time it takes to add a new element doesn't increase that much between successive calls.
Contains
is more interesting: List.contains
has a time complexity of O(N), making the speed very strongly dependent on the collection size. Set.contains
is still O(log(N)), making it better suited for larger collections.
As you can see, a List
has an advantage when adding items to a large collection. When it comes to the Contains
operation on larger collections, there is a much larger difference, and a Set
is a much better option.
When working with large amounts of data, you'll probably want to take a bit longer to think about the data types you want to use than you would for small amounts. On Microsoft learn, you'll find an overview of all F# data types and their properties, including a table of their operations and the time complexity.