FParsec is a powerful parsing library for F#, which works using parser combinators. In this post we'll look at writing a parser for Portable Game Notation (PGN) for chess.
Here's an example of a chess game in PGN:
[Event "F/S Return Match"]
[Site "Belgrade, Serbia JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[Result "1/2-1/2"]
1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 {This opening is called the Ruy Lopez.}
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8 10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2
The first section contains tags in square brackets, e.g. [Event "F/S Return Match"]
. These contain metadata about the game. In this post we'll skip over those.
The second section contains a list of rounds. A round contains two moves, one for white and one for black. A move consists of an optional piece character, and then a square on the board. If there is no piece specified, then it is assumed to be a pawn.
So if the round text is 3. d8 Rf1
it means that on round 3, a white pawn moved to D8 and a black rook moved to F1.
An example parser
To learn more about FParsec, we at CIT worked together to create a parser for this file format. It supports some of the main elements, e.g. the parsing of basic moves in a round mentioned above, but at the time of writing there are still several rules it does not support.
The chess PGN parser we created:
We created a domain with some F# types that we wanted to parse the game into. Here's a sample of some of those types:
type Piece =
| King
| Queen
| Rook
| Bishop
| Knight
| Pawn
type File =
| A
| B // ... goes up to H
type Rank =
| One
| Two // ... goes up to 8
type Square = File * Rank
type Move = {
Piece : Piece
Square : Square
}
Now let's look at some details of how the parser works, starting from small elements and building up to large ones.
Parsing a piece
Here's the complete function for parsing a piece. Note that we have done open FParsec
above so we are using FParsec functions and operators.
let piece =
choice [
pchar 'N' >>% Knight
pchar 'B' >>% Bishop
pchar 'R' >>% Rook
pchar 'Q' >>% Queen
pchar 'K' >>% King
] <|>% Pawn
Each pchar
function returns a parser that expects a certain character. And then the >>%
operator creates another parser which maps the result into one of our domain types.
So here we say that if we have the character 'N'
, then return a Knight
from our domain. And if we have 'B'
, then return a Bishop
, and so on. These parsers are placed in a list and passed to the choice
function to create yet another parser, which tries each of the parsers in order and returns the first one that succeeds.
Then the <|>% Pawn
at the end of the function means that if there is no character, that means a pawn.
Parsing a square
Here are the functions used to parse a square:
let file =
choice [
pchar 'a' >>% A
pchar 'b' >>% B
pchar 'c' >>% C
pchar 'd' >>% D
pchar 'e' >>% E
pchar 'f' >>% F
pchar 'g' >>% G
pchar 'h' >>% H
]
let rank =
choice [
pchar '1' >>% One
pchar '2' >>% Two
pchar '3' >>% Three
pchar '4' >>% Four
pchar '5' >>% Five
pchar '6' >>% Six
pchar '7' >>% Seven
pchar '8' >>% Eight
]
let square = file .>>. rank
The file
and rank
parsers work in a similar way to the piece
function above, except they are not optional.
The square
parser uses those two parsers and combines them with the .>>.
operator, which says that we expect the left parser to succeed, and then the right one. And the fact that there is a .
on each side of the >>
means we should keep the parsed the result from both sides and return them in a tuple, so the parser result type is File * Rank
. This is fine because that is the exact type alias of Square
in our domain: type Square = File * Rank
Parsing a move
Here is the move parsing function:
let move = piece .>>. square >>= (fun (piece, square) -> preturn { Piece = piece; Square = square })
It combines two parsers with .>>.
in a very similar way to the square
function, but in addition it also uses the >>=
operator to map the result from a tuple into our domain record Move
.
Parsing a round
The function to parse a round number:
let roundNumber = pint32 .>> skipChar '.'
Here we parse the round number with pint32
, a built-in FParsec parser which parses one or more characters into an int32
, and skipChar '.'
which says to expect a '.'
here but ignore it in the parser output (return unit). They are combined with .>>
, which means keep the result of the parser on the left and discard the result (in this case unit) of the parser on the right. So this parser just returns the type int32
.
And finally, the round parser, which combines all the elements above:
let round =
tuple3 (roundNumber .>> spaces1) (move .>> spaces1) move
>>= (fun (round, move1, move2) -> preturn { RoundNumber = round; WhiteMove = move1; BlackMove = move2 })
Note that spaces1
is a built-in parser which matches one or more whitespace characters.
roundNumber .>> spaces1
says to expect a round number and then at least one whitespace character, keeping only the round number. We then expect a move, followed by whitespace, followed by another move. This part consists of three parsers.
These parsers are then combined with the tuple3
parser so that the result comes out as a flat 3-tuple instead of a nested 2-tuple, which you would get if you used the .>>.
operator.
Finally, we use >>=
again to map the tuple into our domain type.
Summary
Hopefully, this gave you a sense of how FParsec works, including its parser combinator style, which makes heavy use of composition of the parser functions. It's a powerful approach that ends up with very succinct code. In a way this makes it quite readable, but also it can be quite hard to understand, especially for those who aren't familiar with the approach or even those who don't use it regularly. So please use it with care and moderation!