This image accompanies the Spark course Applying the Lambda Architecture with Spark, Kafka, and Cassandra on Pluralsight.com by Ahmad Alkilani
The course aims to get beyond all the hype in the big data world and focus on what really works for building robust highly scalable batch and real-time systems. We will build on an architecture dubbed as the Lambda Architecture which aims to address the complexity of building distributed applications that can work with streaming data just as easily as they can with batch data in a manner that maintains system availability and robustness accounting for human fault tolerance and system updates. In this course we'll string together different technologies that fit well and have been designed by some of the companies with the most demanding data requirements from Facebook, Twitter, and LinkedIn to companies that are leading the way in the design of data processing frameworks like Apache Spark. The ability to process data and learn from it has shown benefits in technologies we use every day to advances in studies in medicine. Spark and Spark Streaming play an integral role throughout this course as we look at each individual component, from Apache Kafka, to Cassandra and Hadoop/HDFS and work out details about their architectures that make them good fits for building a system based on the Lambda Architecture. The course continues to build out a full application from scratch starting with building a small application that simulates the production of data in a stream all the way to addressing global state, non-associative calculations, application upgrades and restarts, and finally presenting real-time and batch views in Cassandra. The course comes with a VM specifically built to allow you to hit the ground running with these technologies including Apache Zeppelin which we use frequently to demonstrate a lot of the code we build.
Please visit the course page for more details https://www.pluralsight.com/courses/spark-kafka-cassandra-applying-lambda-architecture
This image contains derivitive work from sequenceiq/hadoop-docker