Spark Summit Europe 2016 – speech summaries

Spark Summit Europe 2016 was in Brussels between the 25th and 27th of October. I have seen some speaches on the Spark Summit website and this post is about a short summary with notes which are useful in my work. Hope someone else finds them useful.
The speaches are in no praticular order.

Lambda Architecture with Spark in the IoT

by Bas Geerdink from ING

Link to the video.

The speaker presents how they have used Lambda architecture proposed by Nathan Marz from LinkedIn. Marz has initially used HDFS and Storm in the Lambda architecture.

The Use Case is Smart Parking and it is about optimizing parking challenges in Amsterdam – IoT helps a car driver finding the most optimal parking place.

Stream process:

  • get car events
  • filter events according to the business rules
  • store events
  • get information from the car park in the neighborhood
  • predict score and update database
Lambda Architecture

Capacity updates (information about car parks) coming in in a batch and are stored in HDFS, GPS updates from the cars are coming in a stream in message broker Kafka 0.10.
Spark is used in streaming and batch layers in the Lambda architecture.
Spark is also used for Machine Learning modelling and Zeppelin is the graphical user interface the data scientists use for their work. In the video, at 22:40, this is graphically presented.
Cassandra is the place for storing the scores (results) and the APIs on top of Cassandra are available to the users.

Event processing with Kafka is shown on one slide, streaming is also explained with code example, and the batch processing is the as well. The speaker mentions a github account with available code – fast-data.

Bas is very good at explaining the IoT architecture, too bad he did not have more time.

Nathan Marz’s book on Lambda architecture. Very well written and explained.