Professional Documents
Culture Documents
The term "machine learning" is sometimes thrown around as if it is some kind of magic pill: apply machine
learning to your data, and all your problems will be solved! As you might expect, the reality is rarely this
simple.
• To introduce the fundamental vocabulary and concepts of machine
learning.
• To introduce the Scikit-Learn API and show some examples of its use.
• To take a deeper dive into the details of several of the most important
machine learning approaches, and develop an intuition into how they
work and when and where they are applicable.
What Is Machine Learning?
• Machine learning is often categorized as a subfield of artificial
intelligence, but I find that categorization can often be misleading at
first brush.
Target array:
In addition to the feature matrix X, we also generally work with a label or target
array, which by convention we will usually call y. The target array is usually one
dimensional, with length n_samples, and is generally contained in a NumPy array
or Pandas Series. The target array may have continuous numerical values, or
discrete classes/labels
Data Representation in Scikit-Learn
Finally, let's visualize the results by plotting first the raw data, and then this
model fit:
Supervised learning example:
Iris classification Naive Bayes Classification
Given a model trained on a portion of the Iris data, how well can we predict the remaining labels?
With the data arranged, we can follow our recipe to predict the labels:
As before, we will add the cluster label to the Iris DataFrame and use Seaborn to plot the results:
Unsupervised learning:
Dimensionality reduction Isomap
We see that the projected data is now two-dimensional. Let's plot this data to see if we can learn
anything from its structure:
• We have covered the essential features of the Scikit-Learn data
representation, and the estimator API.
• Regardless of the type of estimator, the same
import/instantiate/fit/predict pattern holds.
• Armed with this information about the estimator API.
https://scikit-learn.org/
Hyperparameters and Model Validation
Model validation the wrong way
Choose a model and hyperparameters and use a k-neighbors classifier with n_neighbors=1
Train the model, and use it to predict labels for data we already know:
Asia University
Health|Care|Innovation|Excellence