Professional Documents
Culture Documents
Learning with
Spark
Jesús Montes
jesus.montes@upm.es
Nov. 2022
Machine Learning with Spark
What is Machine Learning? (just in case…)
● Machine Learning
○ Related to/part of the AI field and Data Science
○ Providing machines with new, advanced capabilities
● Examples:
○ Database analysis
○ Autonomous driving
○ Handwriting recognition
○ Natural language processing
○ Product recommendation
● Study and analysis of the human learning process
Values
● At the end of the learning process, we ● Training set: Used to train the model.
obtain a performance measurement for ● Test set: Used to evaluate the model.
the data used during the training
(training dataset). The test set is only used at the end, and the
resulting value of the performance measure is
WARNING: overfitting the final performance of our model.
● The model created could only be able to The test set must never be used for training.
correctly predict the values it has
already “seen” (the training dataset).
So far, we have learned how Spark can be used to load and process data, but
Spark can be used for many steps of a Data Science project:
● Data load
● Processing/cleaning
● Model training
● Model evaluation
The Spark component that provides Machine Learning tools is called MLlib.
println(s"Coefficients: ${lrModel.coefficients}")
println(s"Intercept: ${lrModel.intercept}")