D 1

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input
directory
from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))
# Any results you write to the current directory are saved as output.
ex1data2.txt
Machine Learning #1: Linear Regression in the beginning
This tutorial series is for absolute beginners in machine learning algorithms, for those who want to
review/practice the fundamentals of machine learning and how to build them from scratch.
What is Machine Learning?
"Machine Learning is the science (and art) of programming computers so they can learn from data "
Aurelion Geron, 2017
Type of Machine Learning Systems
reference 1: Hands-On Machine Learning with Scikit-Learn and Tensorflow
reference 2: Machine Learning, Stanford University by Andrew Ng

There are a lot of different types of Machine Learning Systems and usually it is best to classify them
in broad categories based on:
* Whether or not they are trained with human supervision (supervised, unsupervised,
semisupervised, and Reinforcement Learning
* Whether or not they can learn incrementally on the fly (online versus batch learning)
* Whether they work by simply comparing new data points to know data points or instead
detect patterns in the training data and build a predictive model, much like scientists
do (instance-based versus model-based learning)
Let's look at the very first criteria a bit more closely...
Supervised/Unsupervised Learning
Machine Learning systems are usually classified according to the amount and type of supervision
they get during training. There are four major categories: supervised learning, unsupervised learning,
semisupervised learning, and Reinforcement Learning.
Let us tackle Supervised Learning for now
Supervised Learning
In supervised learning, the training data you feed to the algorithm includes the desired solutions,
called labels
A typical supervised learning task is classification. The spam filter is a good example of this: it is
trained with many examples emails along with their class (spam or ham), and it must learn how to
classify new emails.
Another typical task is to predict a target numeric value, such as the price of a car, given a set of
features (mileage, age, brand, etc.) called predictors. This sort of task is called regression. To train
the system, you need to give it many examples of cars, including both their predictors and their
labels (i.e., their prices)
Let's try Regression!

Linear Regression: Univariate
Let's start with a very simple task of linear regression using a sample dataset called Portland Housing
Prices, wherein we are given some features of a house (i.e. area, no. of rooms, etc) and predict the
target price.
To make things much simpler. Let us use only one feature or in this case one variable, also known as
univariate linear regression. That is we are only gonna use the 'Area' of a given house to train a
linear model
Let's get the data and examine it!
#importing dependencies
import numpy as np #python library for scientific computing
import pandas as pd #python library for data analysis and dataframes
data = pd.read_csv('../input/ex1data2.txt', header=None)
data.head()
0 1 2
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
The data itself does not contain feature names or labels, let's set that up first. According to the
source the first column is the size of the house in sq.ft. followed by the no. of bedrooms and lastly
the price.
data.columns =(['Size','Bedroom','Price'])
data.head()
Size Bedroom Price
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900
Let us remove the 'Bedroom' feature since we are doing univariate linear regression
data.drop('Bedroom', axis=1, inplace=True)
data.head()
Size Price
0 2104 399900
1 1600 329900
2 2400 369000
3 1416 232000
4 3000 539900
#data = data.sample(frac=1)
#data.head()
Now that looks much simpler! Let's plot our data and draw some insights of how a linear model
could fit.
# necessary dependencies for plotting
import matplotlib.pyplot as plt #python library for plot and graphs
%matplotlib inline
plt.plot(data.Size, data.Price, 'r.')
plt.show()
From the plot results we could see that there is a high correlation between Housing Area and
Housing Price (obviously) and therefore we could use a line (linear model) to fit this data.
# another way to test the correlation
data.corr()
Size Price
Size 1.000000 0.854988
Price 0.854988 1.000000
Linear Model
The idea of linear regression is to fit a line to a set of points. So let's use the line function given by:
f(x)=y=mx+b
where m is the slope and b is our y intercept, or for a more general form (multiple variables)
h(x)=θ0x0+θ1x1+θ2x2+...+θnxn
such that for a single variable where n = 1,
h(x)=θ0+θ1x1
when x0=1
where theta is our parameters (slope and intercept) and h(x) is our hypothesis or predicted value
class LinearModel():
def __init__(self, features, target):
self.X = features
self.y = target
def GradDesc(self, parameters, learningRate, cost):
self.a = learningRate
self.c = cost
self.p = parameters
return self.a, self.Cost(self.c), self.p
def Cost(self,c):
if c =='RMSE':
return self.y
elif c == 'MSE':
return self.X
X=1
y=0
a = LinearModel(5,4)
print(a.GradDesc(2,0.01,'MSE'))
print(a.Cost('RMSE'))
(0.01, 5, 2)
Matrix Math
As it turns, using Matrices and Vectors is actually very convenient in these type of problems (talking
about the obvious) To demonstrate that let's have an example:
# given a matrix A (3x2) and a matrix B (1x2)
A = np.array([[1,2],
[1,3],
[1,4]])
B = np.array([[2],[3]])
print('A =')
print(A,'\nsize =',A.shape)
print('\nB =')
print(B,'\nsize =',B.shape)
A=
[[1 2]
[1 3]
[1 4]]
size = (3, 2)
B=
[[2]
[3]]
size = (2, 1)
Suppose A is our feature matrix X and B as our parameter matrix theta, that is,
X=[ 1 2 ] θ=[ 2 3 ]
[13]
[14]
Remember that we have our linear model
h(x)=θ0x0+θ1x1
We know that
X0=[ 1 ] X1=[ 2 ] θT=[ 2 ]
[1] [3] [3]
[1] [4]
then we can actually use matrix dot product to do the multiplication and addition at the same time
(and faster)
H=[ θ0X00+θ1X01 ]=[ θ0+θ1X01 ]=[ 2+3(2) ]=[ 8 ]
[ θ0X10+θ1X11 ] [ θ0+θ1X11 ] [ 2+3(3) ] [ 11 ]
[ θ0X20+θ1X21 ] [ θ0+θ1X21 ] [ 2+3(4) ] [ 14 ]
can be as simple as
H=X dot θ
Yes, that is the power of Matrices!

D 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

D 1

Uploaded by

Copyright:

Available Formats

# This Python 3 environment comes with many helpful analytics libraries installed

# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python

# For example, here's several helpful packages to load in

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.

from subprocess import check_output

Machine Learning #1: Linear Regression in the beginning

What is Machine Learning?

Aurelion Geron, 2017

Type of Machine Learning Systems

reference 1: Hands-On Machine Learning with Scikit-Learn and Tensorflow

reference 2: Machine Learning, Stanford University by Andrew Ng

semisupervised, and Reinforcement Learning

do (instance-based versus model-based learning)

Let's look at the very first criteria a bit more closely...

Let us tackle Supervised Learning for now

Let's try Regression!

Let's get the data and examine it!

import numpy as np #python library for scientific computing

import pandas as pd #python library for data analysis and dataframes

data = pd.read_csv('../input/ex1data2.txt', header=None)

Size Bedroom Price

data.drop('Bedroom', axis=1, inplace=True)

# necessary dependencies for plotting

import matplotlib.pyplot as plt #python library for plot and graphs

plt.plot(data.Size, data.Price, 'r.')

# another way to test the correlation

Size 1.000000 0.854988

Price 0.854988 1.000000

such that for a single variable where n = 1,

def __init__(self, features, target):

def GradDesc(self, parameters, learningRate, cost):

return self.a, self.Cost(self.c), self.p

# given a matrix A (3x2) and a matrix B (1x2)

Remember that we have our linear model

X0=[ 1 ] X1=[ 2 ] θT=[ 2 ]

[1] [3] [3]

H=[ θ0X00+θ1X01 ]=[ θ0+θ1X01 ]=[ 2+3(2) ]=[ 8 ]

[ θ0X10+θ1X11 ] [ θ0+θ1X11 ] [ 2+3(3) ] [ 11 ]

[ θ0X20+θ1X21 ] [ θ0+θ1X21 ] [ 2+3(4) ] [ 14 ]

Yes, that is the power of Matrices!

You might also like

def init(self, features, target):