We'll assume you're already familiar with Spark and SparkSQL - modules 1 and 2 in this series cover the basics
Having problems? check the errata for this course.
1 |
Introduction and DStreams |
Preview
55m 53s |
|
DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data. | |||
2 |
Integrating with Apache Kafka |
Watch
77m 52s |
|
Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module. | |||
3 |
Structured Streaming |
Watch
66m 45s |
|
This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline. |