You are on page 1of 22

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

CSN-373: Probability Theory for Computer Engineers

Lecture 28: Correlation and Regression

Dr. Sudip Roy (a.k.a., SR)


Department of Computer Science & Engineering
Extra Discussions:

● Scatter Plots and Correlation


● Regression
● Coefficient of Determination and Standard Error of the Estimate
● Multiple Regression

2
Regression:

• When r is not significantly different from 0, the best predictor of y is the mean of
the data values of y. For valid predictions, the value of the correlation coefficient
must be significant. Also, two other assumptions must be met.

3
Regression:

• Extrapolation, or making predictions beyond the bounds of the data, must be


interpreted cautiously.
• Remember that when predictions are made, they are based on present
conditions or on the premise that present trends will continue. This assumption
may or may not prove true in the future.
• The steps for finding the value of the correlation coefficient and the regression
line equation are summarized in the following Procedure Table:

4
Regression:

• A scatter plot should be checked for outliers. An outlier is a point that seems out
of place when compared with the other points. Some of these points can affect
the equation of the regression line. When this happens, the points are called
influential points or influential observations.

5
Coefficient of Determination and Standard
Error of the Estimate:
• The previous sections stated that if the correlation coefficient is significant, the
equation of the regression line can be determined. Also, for various values of the
independent variable x, the corresponding values of the dependent variable y can
be predicted.
• Several other measures are associated with the correlation and regression
techniques. They include the coefficient of determination, the standard error of
the estimate, and the prediction interval. But before these concepts can be
explained, the different types of variation associated with the regression model
must be defined.

6
Coefficient of Determination and Standard
Error of the Estimate:
• Types of Variation for the Regression Model
• Consider the following hypothetical regression model.

7
Coefficient of Determination and Standard
Error of the Estimate:

8
Coefficient of Determination and Standard
Error of the Estimate:

9
Coefficient of Determination:

10
Standard Error of the Estimate:

11
Example:

12
Example:

13
Example:

14
Example:

15
Example:

16
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

CSN-373: Probability Theory for Computer Engineers

Lecture 29: Sampling and Simulation

Dr. Sudip Roy (a.k.a., SR)


Department of Computer Science & Engineering
Common Sampling Techniques:

• Random Sampling
• A random sample is obtained by using methods such as random numbers, which
can be generated from calculators, computers, or tables. In random sampling, the
basic requirement is that, for a sample of size n, all possible samples of this size
have an equal chance of being selected from the population.

• Systematic Sampling

18
Common Sampling Techniques:

• Cluster Sampling

19
Example:

20
Simulation Techniques and the Monte
Carlo Method:
• Many real-life problems can be solved by employing simulation techniques.

• Mathematical simulation techniques use probability and random numbers to


create conditions similar to those of real-life problems. Computers have played
an important role in simulation techniques, since they can generate random
numbers, perform experiments, tally the outcomes, and compute the probabilities
much faster than human beings. The basic simulation technique is called the
Monte Carlo method. This topic is discussed next.

21
Next Class…

22

You might also like