United Kingdom: +44 (0)208 088 8978

Don’t persist GetHashCode() in .NET!

Isaac explains the risks associated with persisting a hash code in .NET.

We're hiring Software Developers

Click here to find out more

The GetHashCode() method is one of the "top" methods in .NET that exists on System.Object. Every object in .NET can generate a hash code for itself.

let x = "Customer 123"
let xHash = x.GetHashCode()

Let's imagine a situation where you have a calculation against some object - let's say a customer's credit rating. This is an expensive calculation, so you decide it makes sense to cache it in a persistent store e.g. Azure Tables, Redis or SQL. You then decide to use the hash code of the object as the key for the lookup:

/// Naive implementation to check cache before calculating credit score
let tryGetCustomerRating customer calculateRating =
    let key = customer.GetHashCode() // Key is the hash code
    match Cache.tryGet key with
    | Some rating -> rating
    | None ->
        let rating = calculateRating customer
        Cache.put key rating
        rating

This looks totally reasonable. Yet the problem you'll get is something that you won't see initially - GetHashCode() is deliberately not determinstic across app domains. That is, each time you run your application, GetHashCode() can (and will) generate a different value for the same object.

> let x = "test";;
val x: string = "test"

> let xHash = x.GetHashCode();;
val xHash: int = -154387121

> let xHash = x.GetHashCode();;
val xHash: int = -154387121 // consistent in same FSI session

> exit 0;; // restart FSI

> let x = "test";;
val x: string = "test"

> let xHash = x.GetHashCode();;
val xHash: int = 143179770 // oh look, new value!

In .NET Framework, this behaviour was slightly different - GetHashCode() appeared to be deterministic and across app domains generally would give back the same value for the same .NET Framework / CPU, but I recall on one project I worked on I still had issues with this.

In .NET / .NET Core, this behaviour was changed so that every app domain will give back different hash codes. In this way, hopefully people will observe this much more quickly (e.g. whilst in development rather than having released into production!) which can only be a good thing.

This official documentation on GetHashCode() makes this very clear:

.NET does not guarantee the default implementation of the GetHashCode method, and the value this method returns may differ between .NET implementations, such as different versions of .NET Framework and .NET Core, and platforms, such as 32-bit and 64-bit platforms.

You should never persist or use a hash code outside the application domain in which it was created, because the same object may hash across application domains, processes, and platforms.

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table. A hash code is not a permanent value. For this reason:

  • Do not serialize hash code values or store them in databases.
  • Do not use the hash code as the key to retrieve an object from a keyed collection.
  • Do not send hash codes across application domains or processes. In some cases, hash codes may be computed on a per-process or per-application domain basis.

Summary

If you're trying to create a key/value lookup, do not use GetHashCode as it is not deterministic or stable. Instead, use a stable key that you own and can safely persist across application domains.