Searching for data on a specific value can be done in code easily enough. For one-off searches, you can simply iterate over the collection of data using standard predicates:
customers
|> List.filter (fun row -> row.Name = "Fred") // Customer list
If you're doing a search on the same field repeatedly, you can also quickly create a dictionary for much quicker lookups:
let lookup =
data
|> List.groupBy _.Name
|> Map
lookup.TryFind "Fred" // Customer list option
Instead of
Map
, you can also usereadOnlyDict
, which is a high-performance implementation of IReadOnlyDictionary. However, Map contains a more F#-friendly API including copy-and-update operations.
However, once you get into anything more complicated, things can quickly get out of hand:
- Searching for data against multiple fields
- Searching for data with inexact (fuzzy) matching logic such as:
- Typos
- Different case
- Substring matching
- Larger datasets which do not make sense to store entirely in-memory
- Getting multiple search results based on closeness
- Geo-location searching
- Search suggestions
- Faceted search
- Vector-based searching (common when integrating with machine learning solutions)
- LLM support
All of these require either external libraries or extra code (or both).
Dedicated Search Indexes
There are a number of dedicated libraries and / or services that are designed for this kind of thing, including:
Another service that I've seen recently is Typesense, a free and open-source alternative to Elastic Search. Let's give it a quick run through with some F#!
Typesense and FSharp
Typesense runs as a dedicated service; as such, running it locally should be done through a Docker container. There's a ready-made container on Docker Hub for you to use:
docker pull typesense/typesense:27.0
docker run -p 8108:8108 --name search -v"$(pwd)"/typesense-data:/data typesense/typesense:27.0 --data-dir /data --api-key=PASSWORD_GOES_HERE --enable-cors
Notice the use of the directory
typesense-data
, which is a folder on your local filesystem (not within the container) that persists the Typesense database itself.
Although Typesense is built on top of a standard HTTP API, there's also a (community-maintained) .NET API. To be honest, I am not especially fond of the design decisions of the .NET API, which appears to me to be heavily coupled to DI abstractions such as IOptions
and ServiceCollection
- to the point that even calling the constructor on the basic Typesense client isn't straightforward - but it seems to generally work fine (although I did see a couple of serialization exceptions getting thrown by the API).
#r "nuget:Typesense"
#r "nuget:FSharp.Data"
open Microsoft.Extensions.DependencyInjection
open Typesense
open Typesense.Setup
let typesense =
let provider =
ServiceCollection()
.AddTypesenseClient(fun config ->
config.ApiKey <- "PASSWORD_GOES_HERE"
config.Nodes <- [ new Node("localhost", "8108", "http") ])
.BuildServiceProvider()
provider.GetService<ITypesenseClient>()
Once you have created a local client, you can insert data into collections. Each collection has a name and a schema, which tells Typesense what the fields are and what their type is. In the example below, we're importing a dataset that contains City, State and geolocation information into Typesense:
let schema =
Schema(
"location",
[
Field("state", FieldType.String)
Field("city", FieldType.String)
Field("location", FieldType.GeoPoint)
]
)
// Convert data into the schema above
let parsed =
locations
|> Seq.map (fun line -> {|
State = line.State
City = line.City
Location = [| float line.Latitude; float line.Longitude |]
|})
typesense.CreateCollection schema // create the collection
typesense.ImportDocuments(schema.Name, parsed) // asynchronously import the data into the collection
These datasets are automatically indexed in the background, ready for you to query.
Notice that the schema and actual data are provided separately - this means that it's possible for you to supply a dataset that does not match the schema; an alternative would be to provide the data in a form that the API can use to generate the schema automatically.
Searching documents
Searching against a collection is fairly simple:
// search for the word "new" against all fields.
typesense.Search(schema.Name, SearchParameters("new", "*"))
This will return back a list of "hits" - scored documents that matched the search criteria, along with details of the nature of the match such as what field was matched against what token in the search criteria.
[|
{| Score = 578730123365187697; Location = ("NY", "NEW HYDE PARK") |}
{| Score = 578730123365187697; Location = ("NY", "NEW YORK") |}
{| Score = 578730089005449329; Location = ("NC", "NEWTON") |}
|]
The example here searches for a single word but you can easily include multiple tokens in a search. For example, searching for pan fl
would return results including Panama City, FL
as well as Florence, KY
. Searches are also automatically "fuzzy" in nature - searching for YoKr
will still match against New York
, albeit with a significantly lower "score" than an exact match - this can be useful to help decide how to handle search results. For example, you might wish to ignore low-scoring results from a user interface.
You can also execute more advanced searches - for example, searching against geolocations:
// Search for any locations within 50km of New York City, sorting based on closeness to that location.
let parameters =
SearchParameters(
"*",
"*",
FilterBy = "location:(40.7127753, -74.0059728, 50 km)",
SortBy = "location(40.7127753, -74.0059728):asc"
)
typesense.Search("location", parameters).Result
|> _.Hits
|> Seq.map (fun (hit: Hit<_>) -> hit.GeoDistanceMeters["location"], hit.Document.City, hit.Document.State)
|> Seq.toArray
(*
[|
(0, "NEW YORK", "NY")
(24336, "KENILWORTH", "NJ");
(26912, "NEW HYDE PARK", "NY")
|]
*)
Summary
Exact-match searches can be easy to achieve in F#. However, for more complex types of searches, you should consider using a dedicated search service, which can simplify much of this work. Typesense is one such service. In addition, we've also seen how you can easily consume a C# .NET library from F# and interact with a service through a simple script - which makes learning about the API much, much easier and interactive than a console test harness or automated unit tests.
As always, the code for this blog post is freely available here.