What are immutable data structures?
We use data structures like lists, arrays, sets etc. to store and organise our data. Immutable data structures are ones that cannot be changed once they are created, meaning that the values inside them can't be added, removed or changed. Instead of changing the data structure we make a new version of the data structure which is a separate value. This approach combines well with the functional programming style. They can also be referred to as purely functional or persistent data structures.
Performance and structural sharing
Doesn't it take a long time to make a whole new version of a data structure every time you need a change? Not necessarily.
Immutable data structures are usually designed so that common operations can be performed quite quickly. When a change is made, a new version of the data structure is created by referring to the old version, or part of it, and adding information about what changed. Most of the previous structure is used in the new structure. This is known as structural sharing. This also means that all of the values aren't physically copied into the new version, so this is much more memory efficient then creating a full copy.
Note that structural sharing only works because the structures are immutable. If you could actually change the first version of a structure, it would have a knock-on effect on all of the later versions and it would be very hard to tell exactly what you were changing and in which data structures.
Lists in F#
An example of an immutable data structure in F# is the list, a linear collection like an array. Each item in the list also has a reference to the next item, unless it is the last item. Because it's immutable, you can't change any part of it after creating it. We can draw the list
[ 2; 3; 4 ] with arrows showing the direction that we can travel as we read the list:
A 2 -> 3 -> 4
This list, which I'll call
A, is a reference to the first item in the list (
2), which in turn provides a reference to the next item and so on, until the end. Now let's add
1 to the start of
A and call the new list
B A 1 -> 2 -> 3 -> 4
Here, we created
B by creating a single item that points to the list
A as the next item, which is one step and therefore very quick. With that we now have two lists,
A = [ 2; 3; 4 ] and
B = [ 1; 2; 3; 4 ], they appear to be two completely separate lists in our code that we can use independently, but privately they actually share most of their structure.
What if we want to add
5 to the end of the list
A? Because we can't actually change
A, we can't add a new pointer from the last item. So now we have to create a whole new list with no structural sharing:
A 2 -> 3 -> 4 C 2 -> 3 -> 4 -> 5
This process is slow and uses more memory because it has to go through each element in
A and copy it into a new list. This is a case where an immutable data structure is optimised for certain operations, but not all of them. When using very large lists in F#, you should be aware of which operations are fast and which are slow.
F# lists use the simplest possible method of structural sharing. Other immutable data structures like maps and sets also use structural sharing in a much more complicated way, but the general principle is the same: use what you can from the old structure and refer to it from the new structure.
Advantages of immutable data structures
Let's consider two advantages of working with immutable data structures.
The most important benefit of immutable data structures comes from the general benefit of avoiding mutation: it's possible to understand smaller pieces of code separately from the context that they run in. If mutable data structures are passed around your functions, then the behaviour of one function may affect the behaviour of another function elsewhere in the code, and you need to consider them both together to understand the flow of data. With immutable data structures you only need to look at what goes into a function and what comes out. You can consider it completely on its own.
Safety and efficiency in parallel code
Since immutable data structures can't be changed, there is no risk of one thread accessing a value at the same time as it is changed, and there is no need to synchronize reads and writes to prevent conflicts. This means that references to them can be safely passed between threads running in parallel and each thread can read them directly. As a result, if you're using immutable data structures by default you can usually safely parallelise code where it might help with performance. In F# there are functions that allow you to quickly parallelise existing functions. You could replace
Array.Parallel.map so that instead of processing each item in sequence, you can process many at once and use all of your CPU cores to complete the work faster:
data |> Array.Parallel.map crunchSomeNumbers