United Kingdom: +44 (0)208 088 8978

String Parsing with Active Patterns

Isaac shows how Active Patterns can be utilised to simplify string parsing using standard pattern matching logic.

We're hiring Software Developers

Click here to find out more

Active Patterns (APs) are a powerful yet oft-overlooked feature of F# which allow you to expand the standard in-built pattern matching patterns with your own patterns. Active Patterns can also be parameterised - in effect, you can make active patterns over functions. The compiler will still - where possible - give you exhaustive checks. And even better still, active patterns can be combined with built-in patterns recursively, giving you the ability to create succinct, highly readable code.

String Parsing with Active Patterns

In the recent Advent of Code challenges, string parsing is inevitably required for virtually every day. I want to show how you can utilise APs to quickly parse patterns without resorting to regexes (although ironically there is a sample on the official docs for using Regexs as an AP!).

The challenge

Here's an example string representing the winning numbers for a lottery, and a purchased ticket for that lottery, that we would like to parse:

Card 1: 41 48 83 86 17 | 83 86  6 31 17  9 48 53

into a record structure like so:

type LotteryAttempt = {
    CardId : int // could be a CardId single-case discriminated union etc.
    WinningNumbers : int list
    PurchasedTicket : int list
}

We can clearly see how this string would be separated out:

  • The first section contains the Card Id.
  • The second section contains the Winning Numbers.
  • The third section contains the Purchased Ticket.

You can think of this as first splitting on :

(Section 1) : (Section 2)

and then splitting Section Two on |:

(Section 1) : (Section 2.1) | (Section 2.2)

Once you've done that, you could then split them out further - for example, splitting out Section 1 on spaces to get "Card" and "1":

(Section 1.1) (Section 1.2) : Section (2.1) | (Section 2.2)

and so on.

In imperative .NET code, this would typically be processed using String.Split:

let text = "Card 1: 41 48 83 86 17 | 83 86  6 31 17  9 48 53"
let elements = text.Split ':'
let section1 = elements[0].Split ' '
let section2 = elements[1].Split '|'
let cardId = int section1[1] // convert "1" to 1

and so on. However, it's not especially clear what's happening at a glance, and we're also using an unsafe int conversion here (shorthand for Int32.Parse). We're also using array indexing - if the array had less items, this would lead to an exception. If it had more, we'd be ok, but this really shouldn't happen and probably should be caught as well.

Active Patterns to the rescue

Active Patterns can "democratise" all of this:

let (|Split|) (on: char) (s: string) =
    s.Split(on, StringSplitOptions.RemoveEmptyEntries ||| StringSplitOptions.TrimEntries)
    |> Array.toList

This looks like a normal function that takes in two arguments (on and s) and splits the string on the character provided, before converting the resultant array into an F# List. The strange bit is the function name definition, (|Split|) - this is how you define an active pattern.

With the pattern, we can start to match on the original string as follows:

match text with
| Split ':' elements -> ...

This pattern splits the original text on : and binds the output into the symbol elements so that we can use it on the right hand side of the ->. In this case, elements would be a list of two elements:

[
    "Card 1"
    "41 48 83 86 17 | 83 86  6 31 17  9 48 53
]

F# supports recursive (or nested) patterns, which means that you can indefinitely "expand" symbols in a pattern match with more patterns:

match s with
| Split ':' [ a; b ] -> ...
| _ -> failwith "bad input"

F# lists are denoted with [ elem; elem; elem ] syntax

Compare this with the previous match and you'll see we've replaced elements with [ a ; b ] - a standard F# list pattern that says "this list must be two items in length; bind those elements to the symbols a and b":

let a = "Card 1"
let b = "41 48 83 86 17 | 83 86  6 31 17  9 48 53

If the list has more elements, the pattern would not match, leaving the new "catch all" _ handler to throw an exception.

We can now split a on space to extract both Card and `1":

match s with
| Split ':' [ Split ' ' [ "Card"; n ]; b ] -> ...
| _ -> failwith "bad input"

Notice here I'm actually specifying several things here:

  • The list must have two elements.
  • The first element must be the word Card
  • Bind the second element to n.

With the addition of a second pattern that tries to convert a string to an integer, we can expand further and safely get to the Card Id:

// Active pattern to safely convert string to int
let (|Int|_|) (s: string) =
    match Int32.TryParse s with
    | true, n -> Some(Int n)
    | false, _ -> None

match s with
| Split ':' [ Split ' ' [ "Card"; Int n ]; b ] ->
    {
        CardId = n
        ....
    }
| _ -> failwith "bad input"

Putting it all together

Armed with these two active patterns, plus F#'s existing list and constant patterns, we can parse the entire thing thus:

match s with
| Split ':' [ Split ' ' [ "Card"; Int n ]; Split '|' [ Split ' ' winning; Split ' ' ticket ] ] ->
    {
        CardId = n
        WinningNumbers = winning |> List.map int
        PurchasedTicket = ticket |> List.map int
    }
| _ -> failwith "bad input"

There are several nested matches going on here! I've put the nested match on multiple lines here so you can see the individual matches with indentent visualisations:

Split ':' [                // [ "Card 1"; "41 48 83 86 17 | 83 86  6 31 17  9 48 53" ]
    Split ' ' [                // [ "Card"; "1" ]
        "Card"
        Int n                  // 1
    ]
    Split '|' [                // [ "41 48 83 86 17"; "83 86  6 31 17  9 48 53" ]
        Split ' ' winning          // [ "41"; "48"; "83"; "86"; "17" ]
        Split ' ' ticket           // [ "83"; "86"; "6"; "31"; "17"; "9"; "48"; "53" ]
    ]
 ]

A fully-working gist for the above is available here, which also contains an "imperative" solution without using pattern matching.

Summary

Active Patterns are awesome! Combined with F#'s built-in pattern matching features such as list matching and recursive matching, you can express business rules for parsing with relative ease that can be highly readable as well as typesafe. You won't need active patterns for every-day use, but whenever you have complex pattern matching code or parsing, Active Patterns can be a very useful tool to have in your arsenal.