United Kingdom: +44 (0)208 088 8978

Testing for breaking changes

We can use some test libraries to prove or disprove breaking changes in code. Isaac shows how we used it to help prove a breaking change that occurred in LINQ between the transition from .NET to .NET Core.

I wrote last summer regarding the use of the excellent FSCheck tool to give us extra confidence when refactoring to ensure external behaviour remains consistent. Well, it turns out that in the move to .NET Core, a number of unexpected breaking changes were accidentally introduced in LINQ. So, I spent some time using FSCheck to "prove" the breaking changes, but then never wrote up my findings. So, here they are!

Breaking Changes in LINQ

It turns out that whilst LINQ is essentially a number of features whose background is firmly in the FP background that are baked into C#, this doesn't prevent people using it in ways that perhaps weren't expected. One of those is that .NET doesn't guarantee purity of function (this is true of both F# and C#) which means that it's perfectly possibly to write code which, in the process of execution, executes some side-effect - such as writing to a database table.

In F#, there's a dedicated "side-effectful" version of map called iter, which is designed explicitly for "dead end" operations over collections that don't return anything. This doesn't prevent the case above, but it does at least try to support separating pure and impure collection operations.

Unfortunately, in the move from .NET Framework to .NET Core, one of the many optimisations in the base class library changed the number of items that LINQ internally iterated over where composing the OrderBy and FirstOrDefault methods. In other words (this taken directly from the GitHub issue):

"We have a bit of code to reserve an account from an available pool, which looks like this:

var account =
        .OrderBy(x => x.UsageCount)
        .FirstOrDefault(x => x.TryReserve(token));

After porting our code from .NET Framework to .NET Core, this now invokes the predicate method for every item in the list. In practise, this code now reserves ALL accounts and then returns the first."

Whoops! Imagine if this was a destructive change e.g. delete the first order in database that meets some condition. Sorry - now you've deleted all your orders.

Property-based testing to the rescue

Obviously this is not a great place to be, but I wanted to try to create a set of exhaustive tests to see if (a) we could prove this issue, and (b) if there were any other methods that had been affected by similar optimisations.

At a high level, this means comparing the behaviour of the .NET Framework and .NET Core LINQ implementations. In terms of behaviour, I was interested in two things:

  1. The result of calling both methods with the same input i.e. do they both give the same outputs?
  2. The number of calls to any higher order functions supplied to both methods i.e. do they make the same number of calls, or has this been changed?

Testing through building blocks

We'll start with a basic helper function that we can use later on:

/// A helper function that will track calls to any higher order function passed into another
/// function.
let trackCalls func higherOrderFunc data =
    let key = obj()
    let mutable count = 0
    let higherOrderFunc input =
        lock key (fun () -> count <- count + 1)
        higherOrderFunc input
    {| Result = func(data, higherOrderFunc); CallCount = count |}

This function takes in some function, a higher order function that is used by the function, and some data that the function operates on. For example:

[1 .. 5].Select(fun n -> n * 2)

In this case:

  • [1 .. 5] is the data
  • Select is the function
  • fun n -> n * 2 is the higher order function that Select will call on every item

trackCalls silently decorates the higher order function with a counter, to monitor how many times the higher order function has been called. It then returns back out the result of the function, and the number of calls:

(Functions.trackCalls (Enumerable.Select >> Seq.toArray) (fun n -> n * 2) [ 1 .. 5 ])

//  { CallCount = 5
//    Result = [|2; 4; 6; 8; 10|] }

It's important to include the toArray call - this forces LINQ to fully evaluate the call across all data.

Creating a test function

With this helper, we can now create a generic "test" function:

let testTwoFuncs firstFunc secondFunc higherOrderFunc inputData =
    let actual = Functions.trackCalls firstFunc higherOrderFunc inputData
    let expected = Functions.trackCalls secondFunc higherOrderFunc inputData
    actual = expected

In other words, given two functions (in our case, a netcore and netfx implementation of some code), some higher order function and some input data, check that both functions return the same results.

Interestingly, we could rename higherOrderFunc and inputData as simply argOne and argTwo - because F# automatically genericises everything for us, this would work for any function that simply takes in two arguments in tupled form.

Now that we've done this, we can test out both implementations of a basic LINQ function - in this case Select:

    (Enumerable.Select >> Seq.toArray) // net core implementation
    (OldEnumerable.Select >> Seq.toArray) // net fx implementation
    (fun x -> x * 2) // some arbitrary higher order function to use in Select
    [| 1 .. 10 |] // input data

This call will return true - for an input dataset of 1 to 10 with a higher order function that squares the numbers, both the NetCore and NetFx versions of Select both return the same result set and make the same number of calls.

OldEnumerable is a module I have created which is a port of a subset of the original netfx LINQ implementation.

Introducing FSCheck

Of course, now that we've done this, we can generalise this test by omitting the final two parameters (the higher order function and the input data set) and letting FSCheck test against other datasets (and against other random higher order functions!):

        (Enumerable.Select >> Seq.toArray)
        (OldEnumerable.Select >> Seq.toArray)

//Ok, passed 100 tests.

In other words, FsCheck has tried 100 combinations of random higher order functions (yes, FSCheck has generated functions for us!) and input data, and confirmed that the behaviour of those both versions of Select are always the same.


In this post I explained the breaking change that was introduced in .NET Core in LINQ. I then showed how we can write a simple function to decorate arbitrary functions for the purposes of call counting, before looking at how to compose this together with standard LINQ methods for the purposes of comparison.

We've not actually looked at the bug though - that'll come in my next post in this series!

Until then, have (fun -> _).