We've today posted a medium length video on performing rapid data analysis in F#. The example involves merging and parsing a number of CSV files, before reshaping them and ultimately visualising the results in a chart. If you've not watched it yet, go do that, and then come back here!
The video does not explain too much of F# the language - indeed, if you're new to the language, you may well have some questions (please do post them either in the video comments or email us!). What I would like to emphasise here instead is some of the key takeaways of the video and the associated code repository from the point of view of the overall process as well as the different tools that came together for this video.
Visual Studio Code
Visual Studio Code is a modern and flexible code editor that I use these days primarily for all exploratory workloads as well as mainstream development. It has a huge library of high-quality extensions, and is regularly updated with improvements. For the sample covered in the video, I used the following extensions:
- Rainbow CSV: CSV viewer with colour highlighting.
- Excel Viewer: Excel and CSV viewer with sorting and filtering.
- Ionide: Rich support for F#.
- Paket: Integration for Paket dependency management.
- Path Intellisense: Intellisense for file paths.
Scripts and the REPL
Scripts are a powerful tool in our arsenal. They allow us to explore data, rapidly get feedback and explore different things before transitioning into a larger application. The video shows how we can get started by working from a single F# script file. No need for projects, web applications or even console test rigs.
Scripts should be the defacto way that you start to interact and understand data when working in code; I'm looking forward to the experience being improved further in the future within VS Code through the potential use for inline chart visualisations.
Paket
Paket is a package management tool for .NET. When working in an exploratory mode of development, I find it far superior to the standard NuGet client tooling. The reasons should be self-evident from the video, but primarily:
- Excellent integration with VS Code
- Excellent support for working with scripts
- No need for msbuild or project files
Using Paket, I was able to get hold of a NuGet package and start working with it from a script within just a few seconds. This kind of rapid access to packages within F# and scripts is incredibly powerful. In the future, we'll also have #r
support for NuGet packages, but for now, Paket fills this gap in standard .NET tooling.
Type Providers
Type Providers have had an interesting history in F#. Whilst, to my mind, they've never quite fulfilled the benefits that they promised when first released, and despite some shortcomings they still present an excellent way to bridge the gap between the static type system in F# and external data sources, either for exploratory analysis, or as a way to rapidly parse raw datasets before moving into a more suitable shape in F#. Whilst some type providers haven't yet been ported across to .NET Core, FSharp.Data works just fine and comes with excellent support for CSV, JSON, XML and HTML files.
Alternative query syntaxes
F# has a number of different ways to query and manipulate data including collection modules, query expressions and comprehensions. Use the right one for the job. For example, now that F# has rendered the yield
keyword obsolete (though not the yield!
keyword!), comprehensions have become a more attractive proposition. And whilst query expressions are rarely used in every day F#, for certain types of queries they can provide succinct and highly readable solutions.
Simple data visualisation
Libraries such as XPlot provide both Google Charts and Plotly integration - both popular web-based charting libraries in their own rights that allow you to quickly and easily visualise data. The APIs are well designed in that they support common F# idioms such as pipelining and don't require you to massage data into custom types - instead, they work with simple primitive collections such as lists of tuples, where each pair represents the X and Y values. This approach makes it incredibly easy to go from raw data to visualisation.
A small step away from a web application
One of F#'s biggest strengths is the ability for you to rapidly pivot away from working in an "exploratory" mode to an "application development" mode. Many programming languages and runtimes that excel at data exploration - such as R or even Python - do not fare quite as well once you move to production web applications. Here, F# provides a smooth transition, thanks especially now to some excellent F# libraries that sit on top of the highly performant and reliable .NET Core and e.g. ASP .NET Core.
Summary
I hope that this post, and the associated video, was useful to show some of F#'s strengths, particularly when combined with the flexible tools shown above, which make the overall outcome more than the sum of their parts. We're looking forward to seeing some of the suggestions or improvements that you may have on our code repository, whilst we plan on doing subsequent videos showing how to transform this dataset into a set of queries that can be hosted in a web application.
Have (fun _ -> ())
with F# and data!
Isaac