Professional Documents
Culture Documents
ML 1,2 - Merged
ML 1,2 - Merged
Unit 1: Introduction to ML
Lec.-1: Dissemination of Institute & department vision-mission, PEO, POs, PSO, COs & POs
mapping
2
Lecture 1
Dissemination of
Institute & department
vision-mission, PEO, POs,
PSO,
COs & POs mapping
Institute vision
To foster and permeate higher and quality education with value added
engineering, technology programs, providing all facilities in terms of technology
and platforms for all round development with social awareness and nurture the
youth with international competencies and exemplary level of employability
even under highly competitive environment so that they are innovative,
adaptable and capable of handling problems faced by our country and world
at large.
Source: dypatil.edu/engineering/vision-mission-goal.php
4
Institute vision
RAIT’s firm belief in a new form of engineering education that lays equal stress on
academics and leadership building extracurricular skills has been a major
contribution to the success of RAIT as one of the most reputed institutions of higher
learning. The challenges faced by our country and the world in the 21st century needs
a whole new range of thoughts and action leaders, which a conventional educational
system in engineering disciplines are ill equipped to produce. Our reputation in
providing good engineering education with additional life skills ensures that high
grade and highly motivated students join us. Our laboratories and practical
sessions reflect the latest that is being followed in the industry. The project
works and summer internships make our students adept at handling the real-life
problems and be industry ready. Our students are well placed in the industry and
their performance make reputed companies visit us with renewed demands and vigor.
Source: dypatil.edu/engineering/vision-mission-goal.php
5
Institute mission
The Institution is committed to mobilize the resources and equip itself with men
and materials of excellence, there by ensuring that the Institution becomes a
pivotal center of service to Industry, Academy, and society with the latest
technology. RAIT engages different platforms such as technology enhancing Student
Technical Societies, Cultural platforms, Sports excellence centers,
Entrepreneurial Development Centers and a Societal Interaction Cell. To develop
the college to become an autonomous institution & deemed university at the
earliest, we provide facilities for advanced research and development programs on
par with international standards. We also seek to invite international and reputed
national Institutions and Universities to collaborate with our institution on the
issues of common interest of teaching and learning sophistication.
Source: dypatil.edu/engineering/vision-mission-goal.php
6
Institute mission
The Institute is working closely with all stake holders like industry,
Academy to foster knowledge generation, acquisition, dissemination using
the best available resources to address the great challenges being faced by our
country and World. RAIT is fully dedicated to provide its students skills that
make them leaders and solution providers and are industry ready when
they graduate from the Institution.
Source: dypatil.edu/engineering/vision-mission-goal.php
7
Department of Computer Engineering vision
Source: dypatil.edu/engineering/vision-of-the-dept.php
8
Department of Computer Engineering mission
• To mobilize the resources and equip the institution with men and materials
of excellence to provide knowledge and develop technologies in the thrust
areas of computer science and Engineering.
• To collaborate with IITs, reputed universities and industries for the technical
and overall upliftment of students for continuing learning and entrepreneurship.
Source: dypatil.edu/engineering/mission-of-the-department.php
9
Program Outcomes (POs)
Source: dypatil.edu/engineering/pdf/po-co-anyalsis.pdf
10
POs
Source: dypatil.edu/engineering/pdf/po-co-anyalsis.pdf
11
POs
Source: dypatil.edu/engineering/pdf/po-co-anyalsis.pdf
12
Course Outcomes (Cos)
14
Course scheme
Machine
CEC601 Learning
03 - - 03 - 03
Examination Scheme
Total
Theory
Sub IA
Subject Name
code (out of20) Exam Duration Pract. and oral
Mid Oral
End sem
Sem
Test1 Test2 Avg.
100
Machine
CEC601 20 20 20 20 60 2 Hr. - -
Learning
15
Course LAB scheme
Machine
CEL601 Learning Lab
- 02 - - 01 01
Examination Scheme
Total
Theory
Sub IA
Subject Name
code (out of20)
Mid TW Pract. and oral Oral
End sem
Sem
Test1 Test2 Avg.
Machine
CEL601 - - - - - 25 25 - 50
Learning Lab
16
Syllabus
17
Syllabus
18
Text books
19
Lecture 2
Introduction to
Machine Learning (ML)
What is Machine Learning?
21 Lec-3: Introduction to ML
Machine Learning
22 Lec-3: Introduction to ML
Machine Learning
23 Lec-3: Introduction to ML
Machine Learning
24 Lec-3: Introduction to ML
What is Machine Learning
25
A machine has the ability to learn if it can improve its
performance by gaining more data.
26
How does Machine Learning work
27
Features of Machine Learning:
28
Need for Machine Learning
29
• The importance of machine learning can be easily understood by its uses
cases,
• Currently, machine learning is used in
–Self-driving cars
– Cyber fraud detection
– Face recognition, and
– friend suggestion by Facebook
30
Key points which show the importance of Machine Learning:
31
Machine Learning Basics
32 Lec-3: Introduction to ML
Categories of Algorithms
Machine Learning
Supervised Unsupervised Reinforcement
Dimension
Regression Classification Clustering
reduction
33 Lec-3: Introduction to ML
34
Supervised Learning
35
Steps Involved in Supervised Learning:
36
Regression
37
Classification
38
Advantages and Disadvantages of Supervised learning:
• With the help of supervised learning, the model can predict the
output on the basis of prior experiences.
• In supervised learning, we can have an exact idea about the classes
of objects.
• Supervised learning model helps us to solve various real-world
problems such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:
• Supervised learning models are not suitable for handling the
complex tasks.
• Supervised learning cannot predict the correct output if the test data
is different from the training dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge about the classes
of object.
39
Applications of Supervised Learning
• Image segmentation
• Medical Diagnosis
• Fraud Detection
• Spam detection
• Speech Recognition
40
Unsupervised Machine Learning
41
Unsupervised Machine Learning
42
Why use Unsupervised Learning?
43
Types of Unsupervised Learning Algorithm:
• Clustering:
• Clustering is a method of grouping
the objects Cluster analysis
• Find the commonalities between the
data
• Association:
– An association rule finding the
relationships between variables in
the large database.
• It determines the set of items that
occurs together in the dataset
44
Advantages and Disadvantage of Unsupervised Learning
45
Reinforcement
46
Supervised vs Unsupervised vs Reinforcement Learning
47 Lec-3: Introduction to ML
Classification vs Regression
48 Lec-3: Introduction to ML
Classification vs Regression
49 Lec-3: Introduction to ML
Lecture 3
Machine
Learning tasks, Issues,
50
Machine learning Task
1. Regression
2. Classification
3. Clustering
4. Transcription
5. Machine translation
6. Anomaly detection
7. Synthesis & sampling
8. Estimation of probability density and probability mass function
9. Similarity matching
10.Co-occurrence grouping
11.Causal modeling
12. Link profiling
Issues in Machine Learning
1 Inadequate Training Data
52
2. Poor quality of data
53
Overfitting and Underfitting
54
• Getting bad recommendations
• Lack of skilled resources
• Customer Segmentation
• Process Complexity of Machine Learning
• Data Bias
• Lack of Explainability
• Slow implementations and results
• Irrelevant features
55
Lecture 4
56
Key Technologies
Gathering data
Data preparation
Choosing a model
Evaluation
Hyper-parameter tuning
Prediction
Gathering data
• More will be the data, the more accurate will be the prediction.
58 Lec-3: Introduction to ML
Data preparation
• Put all data together, and then randomize the ordering of data.
• Missing Values
• Duplicate data
• Invalid data
• Noise
59 Lec-3: Introduction to ML
Choosing a model
• Building models
60 Lec-3: Introduction to ML
Training
61 Lec-3: Introduction to ML
Training
62 Lec-3: Introduction to ML
Evaluation
63 Lec-3: Introduction to ML
Hyper-parameter tuning
• Data is used to find the optimal rules and (hyper) parameters of the
trained model.
• Primary focus is to increase the model efficiency.
64 Lec-3: Introduction to ML
Prediction
• Use testing data and trained model to check for the efficiency as per
the requirement of project or problem.
65 Lec-3: Introduction to ML
Thank You
Machine Learning
Unit 2: Data Preprocessing
Lecture -1: Need of data preprocessing, creating training and test sets
2
Lecture 1
4
Remember…
5
Data Preprocessing in Machine learning
6
Need of Data Pre-processing
7
Data pre-processing
• Process of preparing the raw data and making it suitable for a machine
learning model.
• First and crucial step When creating a machine learning project, it is not
always a case that we come across the clean and formatted data. And
while doing any operation with data, it is mandatory to clean it and put
in a formatted way. So for this, we use data pre-processing task.
• A real-world data generally contains noises, missing values, and maybe in
an unusable format which cannot be directly used for machine learning
models.
• Data pre-processing is required tasks for cleaning the data and making it
suitable for a machine learning model which also increases the accuracy
and efficiency of a machine learning model
8
Why Preprocess Data?
9
Why Preprocess Data?
10
Data Preprocessing in Machine learning
11
Major Tasks in Data Preparation
• Data discretization
• Part of data reduction but with particular importance, especially for
numerical data
• Data cleaning
• Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
• Data integration
• Integration of multiple databases, data cubes, or files
• Data transformation
• Normalization and aggregation
• Data reduction
• Obtains reduced representation in volume but produces the same or
similar analytical results
12
TYPES OF DATA
13
Types of Data
• Types of Data: Broader categories
• Discrete
• Continuous
• Types of Measurements:
• Nominal scale
content
More information
• Categorical scale Qualitative
• Ordinal scale
• Interval scale
Quantitative
• Ratio scale
14
Discrete or Continuous
Types of Measurements: Examples
• Nominal:
15
Data Conversion
• Examples:
• US State Code (50 values)
• Profession Code (7,000 values, but only few frequent)
• Ignore ID-like fields whose values are unique for each record
1
7
OUTLIERS
18
Outliers
• Approaches:
• do nothing/ Treat Seperately
• Imputing/ enforce upper and lower bounds
• Deleting/let binning (discarding) handle the problem
1
9
Outlier detection
• Univariate
• Compute mean and std. deviation.
2
0
MISSING DATA
21
Missing Data
2
2
How to Handle Missing Data?
• Usually done when class label is missing as most prediction methods do not
handle missing data well
• Not effective when the percentage of missing values per attribute varies
considerably as it can lead to insufficient and/or biased sample sizes
• Use only features (attributes) with all values (may leave out important
features)
• tedious + infeasible?
2
3
How to Handle Missing Data?
• Use the attribute mean for all samples belonging to the same class to
fill in the missing value
2
4
How to Handle Missing Data?
• Nearest-Neighbour estimator
• Finding the k neighbours nearest to the point and fill in the most frequent value or
the average value
• Finding neighbours in a large dataset may be slow
2
5
Missing Values Treatment
How to deal:
o Deletion
o Mean/ Mode/ Median Imputation
o Prediction Model
26
Handling Missing data practically with python
• This strategy is useful for the features which have numeric data such as age,
salary, year, etc. Here, we will use this approach.
• Calculating the mean we will calculate the mean of that column or row
which contains any missing value and will put it on the place of missing
value.
• #handling missing data (Replacing missing data with the mean value)
• from sklearn.preprocessing import Imputer
• imputer= Imputer(missing_values ='NaN', strategy='mean', axis = 0)
• #Fitting imputer object to the independent variables x.
• imputerimputer= imputer.fit(x[:, 1:3])
• #Replacing missing data with the calculated mean value
• x[:, 1:3]= imputer.transform(x[:, 1:3])
27
Summary
• Every real world data set needs some kind of data pre-
processing
2
8
Splitting the Dataset into the Training set and Test set
29
Training and Testing Set
30
Links for Useful videos
• https://www.google.com/search?q=data+preprocessing+in+ML+youtube&rlz=1C1C
HZN_enIN1030IN1030&biw=1366&bih=600&tbm=vid&ei=WNPOY-
nQCt66seMPu9S5gAw&ved=0ahUKEwjp3PDCqt78AhVeXWwGHTtqDsAQ4dUDC
A0&uact=5&oq=data+preprocessing+in+ML+youtube&gs_lcp=Cg1nd3Mtd2l6LXZp
ZGVvEAMyBQghEKABOgUIABCiBDoICCEQFhAeEB06BwghEKABEApQtAZYthdg
wBloAHAAeACAAc4BiAHJC5IBBTAuNy4ymAEAoAEBwAEB&sclient=gws-wiz-
video#fpstate=ive&vld=cid:8146a305,vid:4i9aiTjjxHY
• https://www.google.com/search?q=data+preprocessing+in+ML+youtube&rlz=1C1C
HZN_enIN1030IN1030&biw=1366&bih=600&tbm=vid&ei=WNPOY-
nQCt66seMPu9S5gAw&ved=0ahUKEwjp3PDCqt78AhVeXWwGHTtqDsAQ4dUDC
A0&uact=5&oq=data+preprocessing+in+ML+youtube&gs_lcp=Cg1nd3Mtd2l6LXZp
ZGVvEAMyBQghEKABOgUIABCiBDoICCEQFhAeEB06BwghEKABEApQtAZYthdg
wBloAHAAeACAAc4BiAHJC5IBBTAuNy4ymAEAoAEBwAEB&sclient=gws-wiz-
video#fpstate=ive&vld=cid:5f6afa8c,vid:9uvIazKs2uI
31
Lecture 2
32
How Do You Handle Missing Values
• Syntax : Data.Drop([‘Cabin’],Axis=1,Inplace=True)
33
How Do You Handle Missing Values
34
Example
• Consider data set where we collected the ages of the people who attend a yoga class. After
asking all 25 of the people in the class, the data obtained about their ages is the following:
21, 16, 34, 33, 57, 18, 44, 41, 63, 72, 54, 44, 39, 30, 45, 45, 61, 18, 29, 27, 55, 48, 59, 66, 70.
• With this data in mind, let us look at the calculation and definition of mean, median, mode
and range:
35
Example
Mean:
• The mean of a data set is the addition of the values divided by the amount of all
the values in the data set. Notice that, given this mean definition, this is the same
as the arithmetic average of a set of numbers; thus the terms mean and average
are usually used as synonyms.
•
The mean of a data set tries to find the central value of a set by comparing all of
the values in the set and producing the average of them; if all of the values in the
set were to be equal, the mean of this set would be equal to all of them too.
36
How to find the mean:
37
Replacing With Mean / Median
• This method can calculate the mean, median, or mode of the feature
and replace it with the missing values. This approach can be applied to a
feature which has numeric data like the age of a person or the ticket
fare, but the loss of the details is negated by this methodology that gives
higher results compared to deleting of rows and columns.
38
How Do You Handle Missing Values
39
Replacing Missing Data With The Most Frequent Values
• Syntax : Data[‘Cabin’].Fillna(‘Unknown’)[:10]
40
Replacing Missing Data With The Most Frequent Values
• https://www.google.com/search?q=data+preprocessing+in+ML+youtube&rlz=1C1C
HZN_enIN1030IN1030&biw=1366&bih=600&tbm=vid&ei=WNPOY-
nQCt66seMPu9S5gAw&ved=0ahUKEwjp3PDCqt78AhVeXWwGHTtqDsAQ4dUDC
A0&uact=5&oq=data+preprocessing+in+ML+youtube&gs_lcp=Cg1nd3Mtd2l6LXZp
ZGVvEAMyBQghEKABOgUIABCiBDoICCEQFhAeEB06BwghEKABEApQtAZYthdg
wBloAHAAeACAAc4BiAHJC5IBBTAuNy4ymAEAoAEBwAEB&sclient=gws-wiz-
video#fpstate=ive&vld=cid:8146a305,vid:4i9aiTjjxHY
• https://www.google.com/search?q=data+preprocessing+in+ML+youtube&rlz=1C1C
HZN_enIN1030IN1030&biw=1366&bih=600&tbm=vid&ei=WNPOY-
nQCt66seMPu9S5gAw&ved=0ahUKEwjp3PDCqt78AhVeXWwGHTtqDsAQ4dUDC
A0&uact=5&oq=data+preprocessing+in+ML+youtube&gs_lcp=Cg1nd3Mtd2l6LXZp
ZGVvEAMyBQghEKABOgUIABCiBDoICCEQFhAeEB06BwghEKABEApQtAZYthdg
wBloAHAAeACAAc4BiAHJC5IBBTAuNy4ymAEAoAEBwAEB&sclient=gws-wiz-
video#fpstate=ive&vld=cid:5f6afa8c,vid:9uvIazKs2uI
41
Handling Categorical Data In Machine Learning
42
Handling Categorical Data In Machine Learning
Ordinal Data :
43
Lecture 3
44
What is data scaling and normalization in machine learning?
• Scaling technique brings data points that are far from each other closer
in order to increase the algorithm effectiveness and speed up the
Machine Learning processing.
• Goal- data enables the model to learn and understand the problem.
45
The difference is that:
46
Feature Scaling
47
Feature Scaling
Shape of the data doesn't change, but that instead of ranging from 0
to 8, it now ranges from 0 to 1.
48
Normalization
• min-max normalization
• z-score normalization
• normalization by decimal scaling
4
9
Normalization
• min-max normalization
v min v
v' (new _ max v new_min v) new_minv
max v min v
• z-score normalization
5
0
Age min‐max (0‐1) z‐score dec. scaling
44 0.421 0.450 0.44
35 0.184 ‐0.450 0.35
34 0.158 ‐0.550 0.34
34 0.158 ‐0.550 0.34
39 0.289 ‐0.050 0.39
41 0.342 0.150 0.41
42 0.368 0.250 0.42
31 0.079 ‐0.849 0.31
28 0.000 ‐1.149 0.28
30 0.053 ‐0.949 0.3
38 0.263 ‐0.150 0.38
36 0.211 ‐0.350 0.36
42 0.368 0.250 0.42
35 0.184 ‐0.450 0.35
33 0.132 ‐0.649 0.33
45 0.447 0.550 0.45
34 0.158 ‐0.550 0.34
65 0.974 2.548 0.65
66 1.000 2.648 0.66
38 0.263 ‐0.150 0.38
28 minimun
66 maximum
39.50 avgerage
53
10.01 standard deviation
Normalization
52
Normalization
53
Some Common Methods To Perform Feature Scaling..
1. Standardization:
54
2. Min-Max Normalization:
The Formula For Rescaling A Range Between An Arbitrary Set Of Values [A, B] Is Given
As :
55
3. Unit Vector:
56
Normalization Techniques
57
Summary of normalization techniques
58
Scaling to a range
Scaling means converting floating-point feature values from their natural range
(for example, 100 to 900) into a standard range—usually 0 and 1 (or sometimes -
1 to +1)
Scaling to a range is a good choice when both of the following conditions are
met:
• You know the approximate upper and lower bounds on your data with few or
no outliers.
• Your data is approximately uniformly distributed across that range.
• A good example is age. Most age values falls between 0 and 90, and every
part of the range has a substantial number of people.
• In contrast, you would not use scaling on income, because only a few people
have very high incomes. The upper bound of the linear scale for income
would be very high, and most people would be squeezed into a small part of
59 the scale.
Feature Clipping
If your data set contains extreme outliers, you might try feature clipping,
which caps all feature values above (or below) a certain value to fixed value.
For example, you could clip all temperature values above 40 to be exactly 40.
You may apply feature clipping before or after other normalizations.
• .
Formula: Set min/max values to avoid outliers
60
Comparing a raw distribution and its clipped version.
61
Log Scaling
• Log scaling computes the log of your values to compress a wide range to a
narrow range.
• Log scaling is helpful when a handful of your values have many points, while
most other values have few points. This data distribution is known as
the power law distribution. Movie ratings are a good example.
• In the chart below, most movies have very few ratings (the data in the tail),
while a few have lots of ratings (the data in the head).
62
Comparing a raw distribution to its log..
63
Z-Score
• You would use z-score to ensure your feature distributions have mean =
0 and std = 1. It’s useful when there are a few outliers, but not so
extreme that you need clipping.
64
Comparing a raw distribution to its z-score distribution.
z-score squeezes raw values that have a range of ~40000 down into a range from
roughly -1 to +4.
Suppose you're not sure whether the outliers truly are extreme. In this case, start
with z-score unless you have feature values that you don't want the model to learn;
for example, the values are the result of measurement error or a quirk.
65
Summary
66
Lecture 4
67
Feature Selection
68
Feature Selection
69
Benefits of Feature Selection
70
Feature Selection Techniques
• There are mainly two types of Feature Selection techniques, which are:
71
Feature Selection Techniques
72
Lecture 5
Dimension Reduction
73
Dimensionality reduction
74
Dimensionality reduction
75
Advantages of Dimensionality Reduction
76
Disadvantages of Dimensionality Reduction
77
Lecture 6
78
Principal Components Analysis ( PCA)
79
It involves the following steps:
80
Principal Components Analysis ( PCA)
– Face recognition
– Image compression
– Gene expression analysis
81
Principal Components Analysis Ideas ( PCA)
82
Principal Components Analysis Ideas
X2
Y1
Y2
x
x
x
Note: Y1 is the x xx
x x
first eigen vector, x
Y2 is the second. x x Key observation:
x
Y2 ignorable. x variance = largest!
x x
x X1
x
x x x
x
x x
x x
83
Principal Component Analysis: one attribute first
Temperature
• Question: how much spread is in 42
the data along the axis? (distance 40
to the mean)
24
• Variance=Standard deviation^2
30
15
n
18
i
( X X ) 2 15
30
s
2 i 1
(n 1) 15
30
35
30
40
84 30
X=Temperature Y=Humidity
Now consider two dimensions
40 90
40 90
• Covariance: measures the 40 90
correlation between X and Y 30 90
• cov(X,Y)=0: independent
15 70
• Cov(X,Y)>0: move same dir
15 70
• Cov(X,Y)<0: move oppo dir
15 70
30 90
n 15 70
(X
i 1
i X )(Yi Y ) 30 70
cov( X , Y ) 30 70
(n 1)
30 90
40 70
30 90
85
More than two attributes: covariance matrix
86
Eigenvalues & eigenvectors
2 3 3 12 3
x 4 x
2 1 2 8 2
87
Eigenvalues & Eigenvectors
• Ax = x (A-I)x = 0
88
Principal components
89
Steps of PCA
90
Eigenvalues
j n
V j 100 n x n
x
x 1
x 1
91
Principal components - Variance
25
20
Variance (%)
15
10
0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
92
Transformed Data
yi1 e1 xi1 x1
yi 2 e2 xi 2 x2
... ...
...
y e x x
ip p in n
93
An Example
Mean1=24.1
Mean2=53.8
X1 X2 X1' X2'
19 63 -5.1 9.25
100
90
80
70
60
39 74 14.9 20.25 50 Series1
40
30
20
30 87 5.9 33.25 10
0
0 10 20 30 40 50
30 23 5.9 -30.75
40
30
15 35 -9.1 -18.75 20
10
-10
-20
-40
30 73 5.9 19.25
94
Example
95
PCA –> Original Data
96
Principal components
97
Applications – Gene expression analysis
98
References
• ‘Data Mining: Concepts and Techniques’, Jiawei Han and Micheline Kamber, 2000
• ‘Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations’, Ian H. Witten
and Eibe Frank, 1999
• ‘Data Mining: Practical Machine Learning Tools and Techniques second edition’, Ian H. Witten and Eibe
Frank, 2005
• DM: Introduction: Machine Learning and Data Mining, Gregory Piatetsky-Shapiro and Gary Parker
• (http://www.kdnuggets.com/data_mining_course/dm1-introduction-ml-data-mining.ppt)
9
9
Thank You
Subject Name: MACHINE LEARNING
Unit 3: Learning with Regression
Lecture No: 13
Linear Regression
Supervised
Learning Tasks
Week4:
Week 2 Data Science with Machine Learning
What is Regression?
5
Regression….
6
Broad categories of Regression
Regression can be broadly classified into two major types.
• Linear Regression.
o The simplest case of linear regression is to find a
relationship using a linear model (i.e. line) between an
input independent variable (input single feature) and an
output dependent variable.
o This is also called Bivariate Linear Regression.
7
Broad categories of Regression….
• Logistic Regression:
o It is used when the output is categorical. It is more like a
classification problem. The output can be Success / Failure,
Yes / No, True/ False or 0/1. There is no need for a linear
relationship between the dependent output variable and
independent input variables.
8
Scatter Plot
9
Scatter Plot : Example
x 1 2 3 4 5 6 7 8 9 10 11 12
y 16 35 45 64 86 96 106 124 134 156 164 182
10
Linear Regression
11
Regression Line
12
Error in prediction
• The black diagonal line in Figure is the regression line and consists
of the predicted score on Y for each possible value of X. The
vertical lines from the points to the regression line represent the
errors of prediction.
• The error of prediction for a point is the value of the point minus the
predicted value.
13
Error in
prediction
Objective: Minimize the difference between the observation and
its prediction according to the line.
yi 0 1 xi i
for i 1, 2,..., n
i yi yˆ i
yi ( ˆ0 ˆ1 xi )
Week 2
Method of Least
Squares
We want the line which is best for all points. This is done by
finding the values of b0 and b1 which minimizes some sum of
errors. There are a number of ways of doing this. Consider
these two n
min
0 , 1
i 1
i
i
2
min
0 , 1
i 1
Week4:
Week 2 Data Science with Machine Learning
Method of
Least Squares
‘Best Fit’ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. But Positive Differences
Off-Set Negative. So square errors
n n
E ( 0 , 1 ) i ( yi 0 1 xi ) 2
2
i 1 i 1
E
0
0
E
0
1
Week 2 Data Science with Machine Learning
Week4:
Least Square Graphically
n
LS minimizes i
2
2
1
2
2
2
3
2
4
i 1
^4
^2
^1 ^3
yi 0 1 xi
2 2
0 i
1 1
2 xi yi 0 1 xi
2 xi yi y 1 x 1 xi
1 xi xi x xi yi y
1 xi x xi x xi x yi y
ˆ SS xy
1
SS xx
( x1 x ) 2 ( x2 x ) 2 ( xn x ) 2 Sums of squares of x.
2
n
1 n
i
2
( xi ) x
i 1 n i 1
n
Syy (y
i 1
i y )2
( y 1 y )2 ( y 2 y )2 ( y n y )2 Sums of squares of y.
2
n
1 n
( y i ) y i
2
i 1 n i 1
ˆ S XY
1
S XX
ˆ0 y ˆ1 x
yi 0 1 xi i
for i 1, 2,..., n
Where,
y – variable that is dependent
x – Independent (explanatory) variable
– Intercept
– Slope
ϵ – Residual (error)
21
Linear Regression : In Simpler Form
• Regression model :
22
Linear Regression : Example
1. The following data pertain to number of computer jobs per day and the
central processing unit (CPU) time required.
23
Linear Regression : Example
y 2x
Week 2 Data Science with Machine Learning
Week4:
Exercise
2. The following table shows the midterm and final exam grades obtained for
students in a database course. Use the method of Least squares using
regression to predict the final exam grade of a student who received 80 on
the midterm exam. Midterm Exam (X) Final Exam (Y)
72 84
50 63
81 77
74 78
94 90
86 75
59 49
83 79
65 77
33 52
88 74
81 90
25
Exercise
3. A clinical trail gave the following data about the BMI and Cholesterol level
of 10 patients. Predict the likely value of Cholesterol level for a patient who
has BMI of 27.
BMI Cholesterol
17 140
21 189
24 210
28 240
14 130
16 100
19 135
22 166
15 130
18 170
26
Exercise
27
Let’s revise through a small video
https://www.youtube.com/watch?v=zPG4NjIkCjc
28
Multivariable/Multivariate Regression
30
Steps for Multivariate Regression
31
Multivariable regression
Solved Problem:
https://www.statology.org/multiple-linear-regression-by-hand/
32
Applications of Regression
• Forecasting continuous outcomes like house prices, stock prices, or
sales.
• Predicting the success of future retail sales or marketing campaigns to
ensure resources are used effectively.
• Predicting customer or user trends, such as on streaming services or
ecommerce websites.
• Analysing datasets to establish the relationships between variables and an
output.
• Predicting interest rates or stock prices from a variety of factors.
• Creating time series visualisations.
33
Real Life Examples on Linear Regression
1. Businesses often use linear regression to understand the relationship
between advertising spending and revenue.
• The regression model would take the following form:
revenue = β0 + β1(ad spending)
• The coefficient β0 would represent total expected revenue when ad
spending is zero.
• The coefficient β1 would represent the average change in total revenue
when ad spending is increased by one unit (e.g. one dollar).
• If β1 is negative, it would mean that more ad spending is associated with
less revenue.
• If β1 is close to zero, it would mean that ad spending has little effect on
revenue.
• And if β1 is positive, it would mean more ad spending is associated with
more revenue.
• Depending on the value of β1, a company may decide to either
decrease or increase their ad spending.
34
Real Life Examples on Linear Regression
2. Medical researchers often use linear regression to understand the
relationship between drug dosage and blood pressure of patients.
• The regression model would take the following form:
blood pressure = β0 + β1(dosage)
• The coefficient β0 would represent the expected blood pressure when
dosage is zero.
• The coefficient β1 would represent the average change in blood
pressure when dosage is increased by one unit.
• If β1 is negative, it would mean that an increase in dosage is associated
with a decrease in blood pressure.
• If β1 is close to zero, it would mean that an increase in dosage is
associated with no change in blood pressure.
• If β1 is positive, it would mean that an increase in dosage is associated
with an increase in blood pressure.
• Depending on the value of β1, researchers may decide to change the
dosage given to a patient.
35
Real Life Examples on Linear Regression
3. Agricultural scientists often use linear regression to measure the effect of
fertilizer and water on crop yields.
• The regression model would take the following form:
crop yield = β0 + β1(amount of fertilizer) + β2(amount of water)
• The coefficient β0 would represent the expected crop yield with no
fertilizer or water.
• The coefficient β1 would represent the average change in crop yield
when fertilizer is increased by one unit, assuming the amount of water
remains unchanged.
• The coefficient β2 would represent the average change in crop yield
when water is increased by one unit, assuming the amount of fertilizer
remains unchanged.
• Depending on the values of β1 and β2, the scientists may change the
amount of fertilizer and water used to maximize the crop yield.
36
Real Life Examples on Linear Regression
4. Data scientists for professional sports teams often use linear regression to
measure the effect that different training regimens have on player
performance.
• For example, data scientists in the NBA might analyze how different
amounts of weekly yoga sessions and weightlifting sessions affect the
number of points a player scores.
• The regression model would take the following form:
points scored = β0 + β1(yoga sessions) + β2(weightlifting
sessions)
• The coefficient β0 would represent the expected points scored for a
player who participates in zero yoga sessions and zero weightlifting
sessions.
37
Unit No: 3 Unit Name :Learning with regression
Lecture No: 14
Logistic Regression
Logistic regression
introduction
• Logistic regression models a relationship between predictor
variables and a categorical response variable.
Week 2
Why not Linear Regression?
Week 2
Why not Linear
Regression?
1. We cannot use any of the well-established routines for statistical
inference with least squares (e.g., confidence intervals, etc.),
because these are based on a model in which the outcome is
continuously distributed. At an even more basic level, it is hard to
precisely interpret β
Week 2
Why not Linear
Regression?
Week 2
Logistic regression
The y is usually a yes/no type of response.
Week 2
Logistic regression
For a more general case, involving multiple independent variables, x, there is:
logit = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +…+ 𝑏𝑛 𝑥𝑛
The logit is the logarithm of the odds of the response, y, expressed as a function of
independent or predictor variables, x, and a constant term.
Week 2
Logistic regression
The problem here is that the range is restricted and we don’t want a restricted range
because if we do so then our correlation will decrease.
It is difficult to model a variable that has a restricted range. To control this we take
the log of odds which has a range from (-∞,+∞).
Logit function :
This formulation is also useful for interpreting the model, since the logit can be
interpreted as the log odds of a success
Week 2
Logistic regression
we will multiply by exponent on both sides and then solve for P.
Week 2
Logistic regression
Week 2
Linear Vs Logistic regression
Week 2
Logistic regression- Example
Week 2
Logistic regression- Example
Week 2
Logistic regression- Example
Week 2
Logistic regression-
Excercise
The dataset of amount of saving and loan non
defaulter is given in below table. Find the sigmoid
function values for logistic regression
Log odd= - 4.0778+1.5046*amoumt of savings
Calculate the probability of loan non deafaulter for
2.5 X Y
(Amount of savings) (Loan Non Defaulter)
0.5 0
1.0 0
2.0 1
2.5 0
4.0 1
Week 2
Unit No: 3 Unit Name : Learning with Regression
Lecture No: 15
Evaluation Metrics for
Regression
Model Evaluation
• Model evaluation helps you to understand the
performance of your model and makes it easy to
present your model to others.
• There are 3 main metrics for model evaluation in
regression:
1. R Square/Adjusted R Square
2. Mean Square Error(MSE)/Root Mean Square
Error(RMSE)
3. Mean Absolute Error(MAE)
Week 2
R Square/Adjusted R
Square
• R Square measures how much variability in dependent
variable can be explained by the model.
• It is the square of the Correlation Coefficient(R) and that is
why it is called R Square.
Week 2
Thank You
Subject Name: MACHINE LEARNING
Unit No: 3 Classification
Faculty Name : i
Ms.Rajashree Shedge
Index
Lecture 16 – Introduction to NN
Lecture No: 16
Introduction to NN
Human Brain
Human brain:
one hundred billion (100,000,000,000) neurons
each with about 1000 synaptic connections
4 Week 2
INTERCONNECTIONS IN BRAIN
Biological Neural Network (Visualization)
6
Biological neuron
8 Week 2
Artificial Neural Networks
• 85-90% accurate.
Definition of Neural Networks
w2
Output is :
12 Lecture 5 – Basics of NN
Components of Neural Networks
Basic Models of
ANN
Activation
Interconnections Learning rules
function
13
Basic models of ann
The arrangement of neurons to form layers and the connection pattern formed
within and between layers is called the network architecture.
Five types:
o Single layer feed forward network
o Multilayer feed-forward network
o Single node with its own feedback
o Single-layer recurrent network
o Multilayer recurrent network
To make work more efficient and for exact output, some force or activation is
given.
Like that, activation function is applied over the net input to calculate the output
of an ANN.
Information processing of processing element has two major parts: input and
output.
An integration function (f) is associated with input of processing element.
Several activation functions are there.
1. Identity function:
it is a linear function which is defined as
f(x) =x for all x
The output is same as the input.
2. Binary step function
it is defined as
5. Ramp function
Lecture No: 17
McCulloch-Pitt’s Neuron
Mcculloch-Pitts neuron
Discovered in 1943.
Usually called as M-P neuron.
M-P neurons are connected by directed weighted paths.
Activation of M-P neurons is binary (i.e) at any time step the neuron may fire
or may not fire.
Weights associated with communication links may be excitatory(weights are
positive)/inhibitory(weights are negative).
Threshold plays major role here. There is a fixed threshold for each neuron
and if the net input to the neuron is greater than the threshold then the
neuron fires.
They are widely used in logic functions.
27
Continued...
28
Continued....
For inhibition to be absolute, the threshold with the activation function should
satisfy the following condition:
θ >nw –p
Output will fire if it receives “k” or more excitatory inputs but no inhibitory
inputs where
kw≥θ>(k-1) w
The M-P neuron has no particular training algorithm.
An analysis is performed to determine the weights and the threshold.
It is used as a building block where any function or phenomenon is modeled
based on a logic function.
29
Practice Problem
30
Practice Problem
31
Practice Problem
32
Practice Problem
33
Reference for problem solving
https://pg.its.edu.in/sites/default/files/MCAKCA032-
PRINCIPALES%20OF%20SOFT%20COMPUTING-SN%20SIVNANDAM%20AND%20DEEPA%20SN.pdf
Lecture No: 18
NN Case Study
NN for Regression
The purpose of using Artificial Neural Networks for Regression over Linear
Regression is that the linear regression can only learn the linear relationship
between the features and target and therefore cannot learn the complex non-
linear relationship.
In order to learn the complex non-linear relationship between the features
and target, we are in need of other techniques. One of those techniques is to
use Artificial Neural Networks.
Artificial Neural Networks have the ability to learn the complex relationship
between the features and target due to the presence of activation function in
each layer.
36
NN Case Study on Regression
37
Thank You