Professional Documents
Culture Documents
Week 9
This class
• Discuss ML algorithm categories
• Perform simple and multiple Linear Regression on rectangular datasets.
• Interpret the results of a Linear Regression.
Core Branches of Artificial Intelligence
• Supervised machine learning
o Refers to models that are trained with labeled data sets, which allow the models to learn and grow
more accurate over time. For example, an algorithm would be trained with pictures of dogs and
other things, all labeled by humans, and the machine would learn ways to identify pictures of dogs
on its own. Supervised machine learning is the most common type used today.
• Unsupervised machine learning
o Refers to models that look for patterns in unlabeled data. Unsupervised machine learning can find
patterns or trends that people aren’t explicitly looking for. For example, an unsupervised machine
learning program could look through online sales data and identify different types of clients making
purchases.
Supervised machine learning contd.
Supervised learning can be separated into two types of problems when data
mining—classification and regression:
Classification uses an algorithm to accurately assign test data into specific categories. It
recognizes specific entities within the dataset and attempts to draw some conclusions on how
those entities should be labeled or defined. Common classification algorithms are linear
classifiers, support vector machines (SVM), decision trees, k-nearest neighbor, and random
forest, which are described in more detail below.
The process of training a model on data where the outcome is known, for
subsequent application to data where the outcome is not known, is termed
supervised learning.
Regression is also a supervised learning algorithm.
Simple linear regression provides a model of the relationship between the
magnitude of one variable and that of a second—for example, as X increases, Y
also increases. Or as X increases, Y decreases.
Key Terms for Simple Linear Regression
Response : The variable we are trying to predict also called dependent variable, Y
variable, target, outcome
Record : The vector of predictor and outcome values for a specific individual or case.
Synonyms row, case, instance, example
Intercept : The intercept of the regression line—that is, the predicted value when X =
0.
Predicted Values : The estimates Yi obtained from the regression line. predicted
values
Residuals : The difference between the observed values and the fitted values also
called errors
Key Regression Terms continued
• Simple linear regression: A regression analysis for which any one unit change in
the independent variable, x, is assumed to result in the same change in the
dependent variable, y.
• Multiple linear regression: A regression analysis involving two or more
independent variables.
Regression continued
• Simple linear regression estimates how much Y will change when X changes
by a certain amount. With the correlation coefficient, the variables X and Y
are interchangeable. With regression, we are trying to predict the Y variable
from X using a linear relationship (i.e., a line):
Multiple Linear Regression
• When there is a clear relationship, you could imagine fitting the line by hand.
In practice, the regression line is the estimate that minimizes the sum of
squared residual values, also called the residual sum of squares or
RSS:
• The method of minimizing the sum of the squared residuals is termed least
squares regression, or ordinary least squares (OLS) regression
Prediction Versus Explanation (Profiling)
Explanation
• Historically, a primary use of regression was to illuminate a supposed linear
relationship between predictor variables and an outcome variable. The goal
has been to understand a relationship and explain it using the data that the
regression was fit to i.e primarily focusing on model coefficients.
• Economists want to know the relationship between consumer spending and
GDP growth. Public health officials might want to understand whether a
public information campaign is effective. In such cases, the focus is not on
predicting individual cases but rather on understanding the overall
relationship among variables.
Prediction Versus Explanation (Profiling) cont.
Prediction
• With the advent of big data, regression is widely used to form a model to
predict individual outcomes for new data (i.e., a predictive model) rather than
explain data in hand. In this instance, the main items of interest are the fitted
values Y.
• In marketing, regression can be used to predict the change in revenue in
response to the size of an ad campaign. Universities use regression to
predict students’ GPA based on their SAT scores.
Linear Regression in Python
Scikit-Learn
• In this section we will look at how to work with scikit-learn library for performing machine
learning analysis in python.
Most commonly, the steps in using the Scikit-Learn estimator API are as follows (we will step through a
handful of detailed examples in the sections that follow using example if a linear regression model).
1. Choose a class of model by importing the appropriate estimator class from Scikit-Learn.
2. Choose model hyperparameters by instantiating this class with desired values. The hyperparameters
may differ depending on the type of model chose.
3. Fit the model to your data by calling the fit() method of the model instance. All model fitting takes place
at this step.
Sklearn cont.
4.) Apply the Model to new data:
1. For supervised learning, often we predict labels for unknown data using the
predict() method. Here for a new scenario the model predicted a value of approx.
$11