United Kingdom: +44 (0)208 088 8978

Easy document searching with Typesense and F#

Isaac makes use of F# scripts to take Typesense for a quick spin.

Searching for data on a specific value can be done in code easily enough. For one-off searches, you can simply iterate over the collection of data using standard predicates:

customers
|> List.filter (fun row -> row.Name = "Fred") // Customer list

If you're doing a search on the same field repeatedly, you can also quickly create a dictionary for much quicker lookups:

let lookup =
    data
    |> List.groupBy _.Name
    |> Map

lookup.TryFind "Fred" // Customer list option

Instead of Map, you can also use readOnlyDict, which is a high-performance implementation of IReadOnlyDictionary. However, Map contains a more F#-friendly API including copy-and-update operations.

However, once you get into anything more complicated, things can quickly get out of hand:

  • Searching for data against multiple fields
  • Searching for data with inexact (fuzzy) matching logic such as:
    • Typos
    • Different case
    • Substring matching
  • Larger datasets which do not make sense to store entirely in-memory
  • Getting multiple search results based on closeness
  • Geo-location searching
  • Search suggestions
  • Faceted search
  • Vector-based searching (common when integrating with machine learning solutions)
  • LLM support

All of these require either external libraries or extra code (or both).

Dedicated Search Indexes

There are a number of dedicated libraries and / or services that are designed for this kind of thing, including:

Another service that I've seen recently is Typesense, a free and open-source alternative to Elastic Search. Let's give it a quick run through with some F#!

Typesense and FSharp

Typesense runs as a dedicated service; as such, running it locally should be done through a Docker container. There's a ready-made container on Docker Hub for you to use:

docker pull typesense/typesense:27.0
docker run -p 8108:8108 --name search -v"$(pwd)"/typesense-data:/data typesense/typesense:27.0 --data-dir /data --api-key=PASSWORD_GOES_HERE --enable-cors

Notice the use of the directory typesense-data, which is a folder on your local filesystem (not within the container) that persists the Typesense database itself.

Although Typesense is built on top of a standard HTTP API, there's also a (community-maintained) .NET API. To be honest, I am not especially fond of the design decisions of the .NET API, which appears to me to be heavily coupled to DI abstractions such as IOptions and ServiceCollection - to the point that even calling the constructor on the basic Typesense client isn't straightforward - but it seems to generally work fine (although I did see a couple of serialization exceptions getting thrown by the API).

#r "nuget:Typesense"
#r "nuget:FSharp.Data"

open Microsoft.Extensions.DependencyInjection
open Typesense
open Typesense.Setup

let typesense =
    let provider =
        ServiceCollection()
            .AddTypesenseClient(fun config ->
                config.ApiKey <- "PASSWORD_GOES_HERE"
                config.Nodes <- [ new Node("localhost", "8108", "http") ])
            .BuildServiceProvider()

    provider.GetService<ITypesenseClient>()

Once you have created a local client, you can insert data into collections. Each collection has a name and a schema, which tells Typesense what the fields are and what their type is. In the example below, we're importing a dataset that contains City, State and geolocation information into Typesense:

let schema =
    Schema(
        "location",
        [
            Field("state", FieldType.String)
            Field("city", FieldType.String)
            Field("location", FieldType.GeoPoint)
        ]
    )

// Convert data into the schema above
let parsed =
    locations
    |> Seq.map (fun line -> {|
        State = line.State
        City = line.City
        Location = [| float line.Latitude; float line.Longitude |]
    |})

typesense.CreateCollection schema // create the collection
typesense.ImportDocuments(schema.Name, parsed) // asynchronously import the data into the collection

These datasets are automatically indexed in the background, ready for you to query.

Notice that the schema and actual data are provided separately - this means that it's possible for you to supply a dataset that does not match the schema; an alternative would be to provide the data in a form that the API can use to generate the schema automatically.

Searching documents

Searching against a collection is fairly simple:

// search for the word "new" against all fields.
typesense.Search(schema.Name, SearchParameters("new", "*"))

This will return back a list of "hits" - scored documents that matched the search criteria, along with details of the nature of the match such as what field was matched against what token in the search criteria.

[|
    {| Score = 578730123365187697; Location = ("NY", "NEW HYDE PARK") |}
    {| Score = 578730123365187697; Location = ("NY", "NEW YORK") |}
    {| Score = 578730089005449329; Location = ("NC", "NEWTON") |}
|]

The example here searches for a single word but you can easily include multiple tokens in a search. For example, searching for pan fl would return results including Panama City, FL as well as Florence, KY. Searches are also automatically "fuzzy" in nature - searching for YoKr will still match against New York, albeit with a significantly lower "score" than an exact match - this can be useful to help decide how to handle search results. For example, you might wish to ignore low-scoring results from a user interface.

You can also execute more advanced searches - for example, searching against geolocations:

// Search for any locations within 50km of New York City, sorting based on closeness to that location.
let parameters =
    SearchParameters(
        "*",
        "*",
        FilterBy = "location:(40.7127753, -74.0059728, 50 km)",
        SortBy = "location(40.7127753, -74.0059728):asc"
    )

typesense.Search("location", parameters).Result
|> _.Hits
|> Seq.map (fun (hit: Hit<_>) -> hit.GeoDistanceMeters["location"], hit.Document.City, hit.Document.State)
|> Seq.toArray

(*
[|
    (0, "NEW YORK", "NY")
    (24336, "KENILWORTH", "NJ");
    (26912, "NEW HYDE PARK", "NY")
|]
*)

Summary

Exact-match searches can be easy to achieve in F#. However, for more complex types of searches, you should consider using a dedicated search service, which can simplify much of this work. Typesense is one such service. In addition, we've also seen how you can easily consume a C# .NET library from F# and interact with a service through a simple script - which makes learning about the API much, much easier and interactive than a console test harness or automated unit tests.

As always, the code for this blog post is freely available here.