Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data

Pre-requisites

We assume you're already familiar with Spark Core from modules 1 and 2.

Having problems? check the errata for this course.

1	Introduction	Preview 24m 2s
What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process
2	Building a Linear Regression	Watch 30m 40s
Assembling vectors of features and Model Fitting
3	Training Data	Watch 26m 33s
Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests
4	Model Fitting Parameters	Watch 25m 41s
Setting Linear Regression Parameters
5	Feature Selection	Watch 36m 23s
Correlation of features, Identifying duplicate features, data preparation
6	Non Numeric Data	Watch 25m 48s
Using OneHotEncoding and Vectors
7	Pipelines	Watch 19m 42s
How to build a pipeline in SparkML
8	Case Study	Watch 34m 51s
A full practical exercise
9	Logistic Regression	Watch 26m 12s
True and False Negatives and Postives, Coding a Logistic Regression Model
10	Decision Trees	Watch 46m 21s
Building a decicision tree model, Interpreting a tree and Random Forests
11	Unsupervised Learning: K-Means Clustering	Watch 10m 49s
K-Means Clustering and how to implement in SparkML
12	Recommender Systems	Watch 29m 7s
Matrix Factorisation and how to build a model in SparkML