Apache Spark is a general-purpose distributed processing engine for analytics over large data sets—typically terabytes or petabytes of data. Apache Spark can be used for processing batches of data, real-time streams, machine learning, and ad-hoc queries. Processing tasks are distributed over a cluster of nodes, and data is cached in-memory, to reduce computation time.
Now, with the release of Spark .NET, developed and fully supported by Microsoft, F# developers can take advantage of the high performance and scalable Spark platform running either on-premise or in the cloud using any industry-standard platform such as Microsoft Azure.
Spark .NET is .NET Core ready, meaning it can run F# equally well on Linux, Mac or Windows, and can take advantage of the millions of libraries that are available on .NET.
Apache Spark is a flexible and powerful data platform, offering many different features including:
Apache Spark also allows working within the F# object model or through data frames, and can work with a wide variety of query engines including Spark SQL, as well as with disparate data sources including flat files such as JSON or CSV.
Spark .NET already exceeds the performance of Python's Spark integration tier, with further performance improvements actively being worked on.
F# is a great fit for working with Spark .NET. With its outstanding data exploration features, powerful type system and unique capabilities including type providers, F# is the ideal way to develop Spark .Net workloads, whether for ad-hoc analysis or as part of a larger application.
Spark .Net provides a full .NET API that allows you access to the Spark data engine which you can use seamlessly with F#, whilst also allowing you to use the parts of .NET that you already familiar with.
Use Microsoft Azure to provide the hosting environment for compute.
Spark .Net lets you scale your application quickly and effectively, but without the backing of the cloud you're limited to what your operations team can manage. Embrace the scale of Azure to develop Spark .NET applications using one of the ready-made platform services such as HDInsight or DataBrick, which are backed by Azure's rock solid compute and data storage technologies to ensure your data is stored securely, reliably and at a low cost with no upfront pricing.