Professional Documents
Culture Documents
Supervised Learning: Andreas Müller
Supervised Learning: Andreas Müller
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Andreas Müller
Core developer, scikit-learn
What is machine learning?
The art and science of:
Giving computers the ability to learn to make decisions from
data
Examples:
Learning to predict whether an email is spam or not
Example:
Grouping customers into distinct categories (Clustering)
Applications
Economics
Genetics
Game playing
Other libraries
TensorFlow
keras
Hugo Bowne-Anderson
Data Scientist, DataCamp
The Iris dataset
Features:
Petal length
Petal width
Sepal length
Sepal width
Versicolor
Virginica
Setosa
sklearn.datasets.base.Bunch
print(iris.keys())
(numpy.ndarray, numpy.ndarray)
iris.data.shape
(150, 4)
iris.target_names
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Hugo Bowne-Anderson
Data Scientist, DataCamp
k-Nearest Neighbors
Basic idea: Predict the label of a data point by
Looking at the ‘k’ closest labeled data points
KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski',metric_params=None, n_jobs=1,
n_neighbors=6, p=2,weights='uniform')
iris['data'].shape
(150, 4)
iris['target'].shape
(150,)
X_new.shape
(3, 4)
print('Prediction: {}’.format(prediction))
Prediction: [1 1 0]
Hugo Bowne-Anderson
Data Scientist, DataCamp
Measuring model performance
In classi cation, accuracy is a commonly used metric
knn.score(X_test, y_test)
0.9555555555555556
1 Source: Andreas Müller & Sarah Guido, Introduction to Machine Learning with
Python