In many ways, the problems that we solve and the solutions we provide as modern software engineers are very similar to those forty or even fifty years ago, albeit with a few new toys and more transistors to play with.
One thing I think it is safe to say has changed however, and continues to do so at a blisteringly fast pace, is the amount of data available for us to use.
So much so in fact, that it turns data processing into a whole new type of problem, which requires new types of solution.
We can no longer expect a person, or any number of people, to manually analyse the data.
We are increasingly finding new, innovative ways to allow computers to take over the job, crunching through massive data sets and identifying key features which allow them to perform tasks such as classifying information and making predictions about the future.
It is a fascinating time, in particular because we are seeing people come together from the worlds of computer and data science, not just at the academic level but in the service of real world, practical application development.
Both groups are having to learn each other's skills to some degree, and this can be challenging.
F# to the rescue!
I have been keen to start exploring machine learning, but I haven't really known where to start.
This really appealed to me because I am a .NET developer, and in fact after the first chapter the book quickly moves to using F# almost exclusively.
Without the cognitive overhead of an unfamiliar language, and coming from a fellow F# developer, I find the explanations clear and easy to follow and the examples intuitive.
One challenge I did face however, is that as is inevitable with printed code examples, some parts are out of date and reference tools or libraries which are no longer supported. In particular, it uses .NET Framework FSI rather than .NET Core / .NET 5.
It didn't take very long for me to get things up to date, and I thought I would share the process here to help others do the same.
I have also taken the opportunity to have a play with another new technology that has been on my list for a while, .NET Notebooks.
These are basically the same idea as a Jupyter notebook or other similar technologies, for those familiar, which are all enabled in VS Code by .NET Interactive.
These notebooks allow you to intersperse blocks of markdown, blocks of code and rendered output from that code in a single document.
They are particularly suited to data exploration, much in the same way we have traditionally used FSI.
I started using a notebook at chapter 4, 'Of Bikes and Men', which explores how to predict bicycle hire activity based on historical data.
As mentioned earlier, I had a few things to work around to begin with.
It required two things
- A charting library
- Loading a sample data set using the CSV type provider
The charting library used in the book is FSharp.Charting, which displays its output using Windows-specific libraries.
A much better solution today is Plotly.Net.
Handily, the API is almost identical to that of FSharp.Charting, so you can switch them very easily.
Also, because I am using a notebook, the charts are rendered in-line in the document, rather than launched in a separate browser every time I execute the code as would be the case with a normal F# script.
I did have some issues cloning Plotly to start with, and then found in their guide that you may need to add a custom package feed as a source whilst .NET notebooks are in preview.
This fixed it, but soon afterwards I found I didn't need it anymore - I assume because the official nuget feed was updated.
I couldn't get the CSV provider to load the data using the instructions in the book or the
__SOURCE_DIRECTORY__ magic variable. I did a bit of digging, and I believe it was something related to the compiled script being executed in a different directory to that which it originated.
Regardless, I found that I had to hard code the path of the CSV file - not ideal for sharing and maybe someone can find a way around it, I didn't really spend much time trying.
This is the first cell in my notebook (an F# code cell).
// #i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json" // #i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-tools/nuget/v3/index.json" #r "nuget: FSharp.Data" #r "nuget: Plotly.NET, 2.0.0-beta9" #r "nuget: Plotly.NET.Interactive, 2.0.0-beta9" open FSharp.Data open Plotly.NET [<Literal>] let Path = @"C:\ML Notebook\day.csv" // Have to update this to hard-coded local path for CSV provider type Data = CsvProvider<Path> let dataset = Data.GetSample () let allData = dataset.Rows
I have a repo which contains the notebook I am working on, feel free to check it out as an example. It might even help you get through the chapter if you understand things in a similar way to me!
Disclaimer - these are my own notes, and I may have misunderstood something or phrased it badly - apologies to Mathias in that case and use them at your own risk! Better yet just go buy the book, it's ace 😉
You can find it at our Github.