Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data
  • Learn the basics of Machine Learning and how to apply to big data with SparkML
  • Supervised vs Unsupervised Learning
  • Linear Regressions
  • Logistic Regressions
  • Decision Trees
  • K-Means Clusters
  • Random Forests
  • Recommender Systems

Pre-requisites

We assume you're already familiar with Spark Core from modules 1 and 2.

Contents - The course will take on average 3 days to complete, including practical work

 

Having problems? check the errata for this course.

1

Introduction Preview
24m 2s
What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process

2

Building a Linear Regression Watch
30m 40s
Assembling vectors of features and Model Fitting

3

Training Data Watch
26m 33s
Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests

4

Model Fitting Parameters Watch
25m 41s
Setting Linear Regression Parameters

5

Feature Selection Watch
36m 23s
Correlation of features, Identifying duplicate features, data preparation

6

Non Numeric Data Watch
25m 48s
Using OneHotEncoding and Vectors

7

Pipelines Watch
19m 42s
How to build a pipeline in SparkML

8

Case Study Watch
34m 51s
A full practical exercise

9

Logistic Regression Watch
26m 12s
True and False Negatives and Postives, Coding a Logistic Regression Model

10

Decision Trees Watch
46m 21s
Building a decicision tree model, Interpreting a tree and Random Forests

11

Unsupervised Learning: K-Means Clustering Watch
10m 49s
K-Means Clustering and how to implement in SparkML

12

Recommender Systems Watch
29m 7s
Matrix Factorisation and how to build a model in SparkML

Let the Course Come to You

About Us Contact Privacy T&Cs
Facebook Twitter YouTube LinkedIn