12 views

Uploaded by RosalieTilos Lorico TindocOrito

Regression 2

Regression 2

© All Rights Reserved

- Maria Bolboaca - Lucrare de Licenta
- Cheat Sheet Econometrics
- assumptions_in_multiple_regression.pdf
- Admission Criteria and Subsequent Academic Performance of General Nursing Diploma Students
- Analysis
- abst.doc
- Statistics in Business Research
- Janssen Status Writers
- HW3
- RELACION HTO-HB.pdf
- STA302syllabus 2018 Autumn
- COURSE 6 ECONOMETRICS 2009 regression.ppt
- OSL
- BitlyCaseStudy BSD 4
- Mechanical Measurements
- Linear Regression
- Project 1 Education
- ISM Final Ppt
- 7252455 Marketing Research 4
- journal.pone.0213918.pdf

You are on page 1of 17

Lesson 2: Going Beyond Correlation with Simple Linear Regression

TIME FRAME: 3 hour session

OVERVIEW OF LESSON

In this lesson, students will continue exploring the data they examined in Lesson 7-01, this time,

making use of the simple linear regression. They will be asked to predict the value of the

dependent variable After each student has interpreted their regression results, i.e. both regression

coefficients, they pool their findings as a class to explore the variability in the regression

coefficients that they estimated. As a class, they construct approximations to the sampling

distributions of the regression coefficients and use the sampling distributions to make assertions

about the values of the population parameters.

LEARNING COMPETENCIES: At the end of the lesson, the learner should be able to:

calculate the slope and y-intercept of the regression line;

interpret the calculated slope and y-intercept of the regression line;

predict the value of the dependent variable given the value of the independent variable;

solve problems involving regression analysis.

LESSON OUTLINE:

1. Motivation / Introduction

2. Preliminary Lesson : Simple Linear Regression Line

3. Main Lesson : Obtaining the Simple Linear Regression Line and Explaining the

Regression Coefficients

4. Enrichment: Sampling Distribution of Regression Coefficients

DEVELOPMENT OF THE LESSON

(A) Introduction

Inform students that when examining the relationship between two variables x and y, we can

consider one variable as some kind of input variable within an input-output framework, we

plot this variable along the horizontal (also called x) axis in the scatterplot. The output

variable is the variable along the vertical (also called the y) axis. The input or x variable is

typically called an independent variable; it is also called a covariate or an exogenous,

explanatory, regressor, or control variable. The output or y variable is called the dependent

variable; it is also called the regressand or the endogenous, explained, or response variable.

In Lesson 7-01, Karl Pearsons data on heights of fathers and of their respective first born

sons from the work of was presented. While taller-than-average fathers tend to have tallerthan-average sons, the sons are not quite as tall as the fathers. There is a regression toward

the average heights, thus the term regression analysis. Likewise, shorter-than-average

fathers tend to have shorter-than-average sons, but the sons are not quite as short as the

fathers.

(B) Preliminary Lesson : Simple Linear Regression Line

When we visualize the points in a scatterplot generally clustering about a line, we may be

interested to obtain an estimate of such a line in order to help us estimate the expected level

of a variable Y for a known specific value x of the variable X (say, daily allowance). For

instance, for the worked example in the previous lesson, we may want to determine how

many text messages a student to usually send if his/her daily allowance is 150 pesos. In

lesson 7-01, it was mentioned that we could consider the line that passes the point of

averages and whose slope is the ratio of the standard deviations as one possible line. Inform

students that this line ignores information about the magnitude of the association between the

two variables. If the correlation coefficient is zero, then we should not expect any increase in

one variable to accompany an increase or decrease in the other.

An alternative to this SD line that incorporates information provided by the correlation

coefficient, the means and standard deviations is the regression line:

contains the point of averages and whose slope is

the product of the correlation coefficient and the

ratio of the standard deviation of y to the standard

deviation of x.

y y 1=m ( xx 1 )

where

the point of averages

x1 , y

( 1) . In the given

y y =r

y

( xx )

x

y=r

y

y

x + y r x

x

x

The term in parentheses in this expression is the y-intercept of the regression line. It can be

interpreted as what we expect y to be when the value of x is zero.

Explain to students that the regression line relates how much change in the y-value is

associated with a unit increase in the x-value. It estimates the expected value for the Y

variable corresponding to a particular level x of the variable X. On average, it associates with

each increase of one standard deviation in the x-units, r standard deviations in the y-units

(where r is the correlation coefficient).

Note that when we consider the notion of regression, we assume a functional dependence of

Y on X. Thus, we consider Y as a dependent, response, or output variable, while X is an

independent, explanatory or input variable. The magnitude of the output variable Y is

dependent on the magnitude of the input variable X. A persons blood pressure, for instance,

functionally depends on a persons age. This does not, however, suggest that age is the only

factor that is responsible for blood pressure, but that it is one possible determinant for blood

pressure.

On the other hand, arm length and leg length are correlated but not functionally dependent.

Increasing arm length would not have an effect on leg length although these variables are

correlated. In such instances, correlation can be calculated but obtaining a regression line

may not be of practical utility.

(C) Main Lesson : Obtaining the Simple Linear Regression Line and Explaining the Regression

Coefficients

Consider the worked example in Lesson 7-01 pertaining to information from the database

generated in Lesson 1-01. Students were asked in Lesson 7-01 to generate a random sample

of 30 students from the databse.

Worked Example: We have generated the following summary measures in the worked

example for students with complete information on their daily allowance and the usual

number of text messages they send in a day:

Summary

Measure

Daily Allowance

in School

Usual Number of

Text Messages

Sent in a Day

Mean

(Population)

Standard

Deviation

Correlation

90.37037

33.2963

120.9984

43.11124

0.780283

The regression line for Daily Allowance in School on Usual Number of Text

Messages Sent in a Day is then estimated as:

(Expected Usual Number of Text Messages Sent in a Day -33.2963) =

43.11124

( 0.780283 )

120.9984 (Daily Allowance in School

-90.37037)

or simply

Expected Usual Number of Text Messages Sent in a Day =

0.278011805 Daily Allowance in School + 8.172270273

The earlier representation of the estimated regression line clearly indicates that

students with an average daily allowance are expected also have an average number of

text messages. That is, the point of averages is a point in the estimated regression line.

The later representation of the regression line is shown in a typical intercept-slope

form of an equation. In particular, the slope is interpreted as follows: for each increase

of 1 peso in total daily allowance, we expect a corresponding increase of 0.28 text

messages sent in a day, or equivalent, every 4 peso increase in allowance is expected

to have a corresponding increase of 1 text message sent by a student in a day.

Explaining the Regression Coefficients

Since the slope of a line is the rise over run, the slope of the regression line represents the rise

in Y over the run in X, i.e.,

how much we expect Y to change per unit increase in X.

Ask the students what a positive slope means. They should say that when the slope is

positive, Y increases as X increases. In this case, we say that Y is directly or positively

related to X. Ask students also what a negative slope means. They should say that when the

slope is negative, Y decreases as X increases. Here, we say that Y is inversely or negatively

related to X. Ask the students what happens with a zero slope? They should say that when

the slope is zero, Y is a constant and is equal to the y-intercept. Here, there is no change in Y

whatever X will be, i.e., the fit is a horizontal line. In the next lesson, we consider how to

make valid statistical inferences about the slope of the regression line.

Remind students that in an equation of a line, the y-intercept is the value of Y when X is

zero. For the worked example, the intercept may be interpreted as the usual number of text

messages sent daily by a student that has zero daily allowance. Students may have zero daily

allowance when the family of the student decides not to give an allowance to the student

because the family is poor, or because the student is deemed not to need an allowance since

everything is being provided for the student. However, in other situations, such an

interpretation may not be valid as we may be unnecessarily extending the segment

representing the regression way outside of the usual range of X values. Consider for

instance relating the monetary value of a house (Y) to the area of the dwelling in square

meters (X). Here, a house must always have nonzero area, and thus the data on area does not

include X=0.

Using the Regression Line for Predictions

The utility of the estimated regression line is not merely for explaining relationships between

X and Y but also for making predictions about Y given a certain value of X. Suppose, we

wish to randomly pick one of the students who gave information for Lesson 1-01, and we

wish to guess his or her usual number of text messages per day. In the absence of any

information, the best guess would naturally be the average usual number of text messages

sent by the students per day. However, we may be given some specific level of daily

allowance of the student that can be utilized to improve the prediction.

Suppose that for the worked example, we are provided information about the level of daily

allowance of a student, say 150 pesos. According to our estimated regression line,

Expected Usual Number of Text Messages Sent in a Day =

0.278011805 Daily Allowance in School + 8.172270273

a student with a daily allowance of 150 pesos is expected to usually have the following total

number of text messages sent per day

Expected Usual Number of Text Messages Sent in a Day =

0.278011805 (150) + 8.172270273

= 49.87404 50

which is more than the average usual number of text messages sent by students per day.

In many cases, obtaining a regression fit gives a sensible way of estimating the y-value. If,

however, there are nonlinearities in the relationship between the variables, one may have to

transform the variables, say, generate firstly the square root or logarithms of the X and/or Y

variables, and then perform a regression model on the transformed variables. In this case, tell

students that one will eventually have to re-express the generated analyses in terms of the

original units rather than the transformed data.

If you have extra time, you can ask students to individually compute for the regression

coefficients based on the data they sampled in the last lesson and to also share their results

with the class. Instruct the students to form the groups of five that they formed in the

previous lesson. Using the data that they have used in constructing the scatterplot in the

previous session, ask them to compute for the regression coefficients, the slope and the

intercept. Then, together with the scatterplot that they have constructed in the previous

session, instruct them to plot the equation of the regression line. Ask them to describe the

position of the line in light of the different points on the scatter plot. Are any of the points on

the line? Are all the points on the line? Is it necessary to have as many points on the line?

Should you want to extend this further, make them draw vertical lines from the regression

line to the individual points. Ask them how do they think are these vertical lines related to the

position of the regression line?

The regression line ought to be viewed as a sample regression line since we are only

working with sample data. This line is the best fitting line for predicting Y for any value of

X, in the sense of minimizing the distance between the data and the fitted line. By distance

here, we mean the sum of the squares of the vertical distances of the points to the line. Thus,

the resulting coefficients, slope and intercept, in the sample regression line are typically

called the least squares estimates (of the population regression line) or the least squares

regression coefficients.

In the next lesson, we will state the assumptions that underlie the fitting of a regression line

and the generation of these least squares estimates. Such assumptions will enable us to

proceed to making statistical inferences, i.e. hypothesis tests and confidence intervals, on the

regression coefficients. This will also be discussed in more detail in the next lesson.

KEY POINTS

The regression model suggests that for every increase in one unit of an independent

variable x, we expect a change of

y

y is

x units in a dependent variable y, where

the standard deviation of the y-values (with the data treated as a population),

is the

standard deviation of the x-values (with the data treated as a population), and

is the

correlation coefficient.

The regression line may be used to make predictions. Given the value x for an independent

variable X, we expect or predict Y to take the value

y=r

where

x and

x + y r y x

x

x

REFERENCES

Much of the material here adapted from:

Text Messaging is Time Consuming! What Gives? by Jeanie Gibson, Mary McNelis, and Anna

Bargagliotti, STatistics Education Web (STEW), Available on the Internet at

https://www.amstat.org/education/stew/pdfs/TextMessagingisTimeConsumingWhatGives.doc

See also:

Albert, J. R. G. (2008).Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo

Patungan, Nelia Marquez), published by Rex Bookstore.

De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.

Freedman, D., Pisani, R, and Purves (2007). Statistics. Fourth Edition. W. W. Norton &

Company, New York.

Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Banos, College Laguna

4031

Make use of the data set you used for Activity Sheet 6-01, pertaining to a random sample of 30

observations from the database collected at the beginning of the Statistics and Probability course.

Individually carry out the following steps:

1. Compute for the sample regression line

Y ___________ X _____________

2. Provide an interpretation for the estimate slope of the sample regression line

4. Illustrate how to use the sample regression line you generated to predict Y for a given

level of X. (Make sure to agree with group mates what X is)

5. Collect the regression coefficients and predictions found by each person in the class into

a table:

Slope

Studen

t

Intercep

t

Prediction

for Y

given

X = ___

Slope

Student

11

12

13

14

15

16

17

18

19

10

20

Intercep

t

Prediction

for Y

given

X = ___

6. Create a dot plot for the regression coefficients (slope and interception) and for the

prediction for Y given X= ____ (Note taht three dot plots will be created).

7. Look at the dot plot for the slope. This dot plot represents an approximation to the

sampling distribution of the estimated slopes. What do you notice about the dot plot?

What is the range of the estimated slopes? What seems to be the most common slope? If

you had to guess what the slope of the regression line was for the entire population, what

would you guess? Explain why.

ASSESSMENT 6-02

1. In a regression line, the Y-intercept represents the

a) predicted value of Y when X = 0.

b) change in estimated average Y per unit change in X.

c) predicted value of Y.

d) variation around the sample regression line.

ANSWER: a

2.

a) predicted value of Y when X = 0.

b) the estimated average change in Y per unit change in X.

c) the predicted value of Y.

d) variation around the line of regression.

ANSWER: b

Case 1 (For items 3 - 5 ) : A candy bar manufacturer is interested in trying to estimate how sales are

influenced by the price of their product. To do this, the company randomly chooses 6 cities and offers

the candy bar at different prices. Using candy bar sales as the dependent variable, the company will

conduct a simple linear regression on the data below:

City

Price (PHP)

Sales

Los Banos

39

100

Legazpi

48

90

Cagayan de Oro 54

90

Davao

60

40

Cebu

72

38

Makati

87

32

3. Referring to Case 1, what is the estimated average change in the sales of the candy bar if price

goes up by 1 peso?

a) 161.386

b) 0.784

c) 3.810

d) -1.606426

ANSWER: d

a) 0.8854

b) 0.7839

c) 0.7839

d) 0.8854

ANSWER: a

5. Referring to Case 1, if the price of the candy bar is set at 60 pesos, the estimated average sales

will be

a) 30

b) 65

c) 90

d) 100

ANSWER: b

II. A study was done to investigate the relationship between the amount of protix (a new proteinvitamin-mineral supplement) on fortified-vitamin rice, known as FVR, and the gain in weight of

children. Ten randomly chosen sections of grade one pupils were fed with FVR containing

protix; different amounts X of protix were used for the 10 sections. The increase in the weight of

each child was measured after a given period. The average gain Y in weight for each section

with a prescribed protix level X is as follows:

Section

1

2

3

4

5

Protix

Gain

50

92.6

60

70

80

90

97.5

96.5

102.3

105.8

Section

6

7

8

9

10

Protix

100

110

120

130

140

Gain

106.2

108.9

108.4

110.2

110.8

a. Obtain the sample regression line to predict the average gain in weight given the protix

level

ANSWER: Estimated Average weight gain = .2014546 ( Protix) + 83.78182

b. How would you predict the average gain in weight to be at a protix level of 125.

ANSWER: Using the regression line at Protix = 125, the estimated Average weight gain is

0.2014546 ( 125) + 83.78182 = 109

III. At a large local high school, the principal wanted to ensure that her students would perform

well on this years standardized tests. As such, the principal came up with a list of factors that

may negatively or positively impact test scores and aimed to prove it to the students while giving

a practice test out of 100 points. A month before the practice test the principal asked students to

fill out a survey asking them how many hours per week they hung out with their friends and how

many hours per week they spent in study hall. Because the high school was very large, the

principal only surveyed a sample of the students. The following two scatterplots provided show

the results of the survey versus the students scores on the practice exam.

Scatter Plot

Collection 1

110

100

90

80

70

60

50

0

10

15

20

25

Hours_With_Friends

30

35

Scatter Plot

Collection 1

110

100

90

80

70

60

50

0.0

0.5

1.0

1.5

2.0

2.5

Hours_in_Study_Hall

3.0

3.5

1. Is there a positive or negative relationship between the hours a student spends with their

friends and their test scores? Hours spent in study hall and their test scores?

2. On average, what would a student score if they spent zero hours per week hanging out

with friends? In study hall?

3. On average, how many points on the test would a student increase/decrease if they spent

1 extra hour in study hall? Hanging out with friends?

When the students heard the results of the study, they asked the principal to look at different

samples of students in the high school. To satisfy the students, the principal decided to randomly

sample groups of 20 students at a time 15 more times. The following dot plots provide the

summary of the results.

Dot Plot

1.5

2.0

2.5

3.0

Slope

3.5

-3.5

4.5

Dot Plot

-4.0

4.0

-3.0

-2.5

Slope

-2.0

-1.5

4. Should the students believe that the principals decision to mandate an extra hour of study

hall every week should increase their scores on the test? Explain.

5. Should the students try to decrease the number of hours they spent hanging out with

friends before the test? Explain.

Answers

1. There appears to be a negative linear relationship between the amount of time a student

spends hanging out with their friends and their test scores. There does not seem to be a

clear positive or negative relationship between the number of hours spent in study hall

and the test scores.

2. On average, a student would score 122.87 on the test if they spent zero hours per week

hanging out with friends. This y-intercept does not have a practical interpretation since

there is no way to score more than 100 on the test. Also note that 0 is not within the

range of the collected data values for hours spent with friends. On average, a student

would score 76.183 on the test if they spent zero hours per week in study hall.

3. On average, a students score will change by -2.69 points for every hour they spend

hanging out with friends. On average, a student will increase 2.85 points on the test for

every hour they spend in study hall.

4. The dot plot illustrates that all the sampled slopes are positive. This means that for every

one of the 50 samples of 20 subjects sampled, the slope of the regression line was

positive showing that as the number of hours of study hall increases, the scores on the test

increase. In particular, the dot plot shows that the slopes tend to be for the most part

between 2.6 and 3.6, meaning that on average scores would be raised between 2.6 and 3.6

for every hour extra spent in study hall.

5. The dot plot illustrates that all the sampled slopes are negative. This means that for every

one of the 50 samples of 20 subjects sampled, the slope of the regression line was

negative showing that as the number of hours of spent with friends increases, the scores

on the test decrease. In particular, the dot plot shows that the slopes tend to be centered

2.5,

around

meaning that on average scores would change by about -2.5 for every hour

extra spent in hanging out with friends.

- Maria Bolboaca - Lucrare de LicentaUploaded byBucurei Ion-Alin
- Cheat Sheet EconometricsUploaded bypat
- assumptions_in_multiple_regression.pdfUploaded byHira Mustafa Shah
- Admission Criteria and Subsequent Academic Performance of General Nursing Diploma StudentsUploaded byJennyrose Novero
- AnalysisUploaded byDivya Kakumanu
- Statistics in Business ResearchUploaded byDipayan_lu
- Janssen Status WritersUploaded byÓli Ál Ri
- HW3Uploaded byrogervalen5049
- abst.docUploaded byTariq Rahim
- RELACION HTO-HB.pdfUploaded bysusan saavedra diaz
- STA302syllabus 2018 AutumnUploaded byCindy Han
- COURSE 6 ECONOMETRICS 2009 regression.pptUploaded byAlex Ionescu
- OSLUploaded byOkay325
- BitlyCaseStudy BSD 4Uploaded byAbhishek Anand
- Mechanical MeasurementsUploaded bycaptainhass
- Linear RegressionUploaded byRAPID M&E
- Project 1 EducationUploaded bypoop2269
- ISM Final PptUploaded byNighat Kathuria
- 7252455 Marketing Research 4Uploaded byVRSHABANU
- journal.pone.0213918.pdfUploaded by3chelon
- 45 Data Scientist QuestionsUploaded byVipin Chugh
- Regression 1Uploaded byMuhammad Sohaib Shahid
- Lect 04 LSCM Sterl (R0-July 16,09) ForecastingUploaded byVideha Pathre
- Simple Linear RegressionUploaded byPratama Yuly Nugraha
- [7] Jurnal desy sofita EDIT.docxUploaded byade syura
- phase IIIUploaded byAngelina Diaz
- CHAP14part1studentUploaded byGaurav Gupta
- Abhishek AgnihotriUploaded byvoodooschild
- 193-AutomaticDataCollectioninLogisticsCosting-AnalysingtheCausesandEffectsofVariationUploaded bySiddharth Khajanchi
- 14242-14273-1-SMUploaded bymohak bettercalmemonu

- Family RolesUploaded byRosalieTilos Lorico TindocOrito
- SalerioUploaded byDionysius Septian Cahya Oliviano
- 3007[1]Uploaded byPooja Reddy
- 10 Basic Tips in Preparation for Let ExainationUploaded byMarciano Ken Hermie
- How Much Can We Perceive With Sixth SenseUploaded byRosalieTilos Lorico TindocOrito
- Class Observation Guide SHS.docxUploaded byRosalieTilos Lorico TindocOrito
- Cost and Return AnalysisUploaded byRosalieTilos Lorico TindocOrito
- Maintaining PPEUploaded byRosalieTilos Lorico TindocOrito
- chap_13practest.pdfUploaded byRosalieTilos Lorico TindocOrito
- Worksheet Production CostUploaded byRosalieTilos Lorico TindocOrito
- Land ClearingUploaded byRosalieTilos Lorico TindocOrito
- Farm Tools UsesUploaded byRosalieTilos Lorico TindocOrito
- Cover PageUploaded byRosalieTilos Lorico TindocOrito
- BicyclesUploaded byRosalieTilos Lorico TindocOrito
- History of TheoriesUploaded byRosalieTilos Lorico TindocOrito
- Types of KinshipUploaded byRosalieTilos Lorico TindocOrito
- Term i NologiesUploaded byRosalieTilos Lorico TindocOrito
- Degrees of KinshipUploaded byRosalieTilos Lorico TindocOrito
- Types of FamilyUploaded byRosalieTilos Lorico TindocOrito
- Family SocialUploaded byRosalieTilos Lorico TindocOrito
- FamilyUploaded byRosalieTilos Lorico TindocOrito
- rose UsesUploaded byRosalieTilos Lorico TindocOrito
- Rose SpeciesUploaded byRosalieTilos Lorico TindocOrito
- RoseUploaded byMarjorie Refuerzo
- The Last LeafUploaded byRosalieTilos Lorico TindocOrito
- The Lost SymbolUploaded byRosalieTilos Lorico TindocOrito
- peUploaded byRosalieTilos Lorico TindocOrito
- Constitutional CommissionsUploaded byRosalieTilos Lorico TindocOrito

- BA1 Linear Regression Model-27.02.2012Uploaded byandreea143
- RfastUploaded byMichail
- REGRESSION, THEIL’S AND MLP FORECASTING MODELS OF STOCK INDEXUploaded byIAEME Publication
- Students Tutorial Answers Week12Uploaded byHeoHamHố
- Assignment6.1 DataMining Part1 Simple Linear RegressionUploaded bydalo835
- Datasets.matsonUploaded bymazin903
- 2320 Final Pass PaperUploaded byAnonymous 7CxwuBUJz3
- 4 analyze moduleUploaded byapi-101303155
- BOOKFE_Kalot14Uploaded byAhmed
- chap12Uploaded byImam Awaluddin
- Stat992(Chap14)Uploaded byYanhua Lee
- Regression Analysis Multiple ChoiceUploaded byAugust Mshingie
- MATH30-6 Lecture 4.pptxUploaded byLevi Pogi
- Quantitative Analysis for Management Ch04Uploaded byQonita Nazhifa
- Comprehensive Course Outline_MBA (IB) 20010-13, T-IUploaded bysatyajit18
- 30C00200_problem_set_1Uploaded bywazaawazaa
- Chap 012Uploaded byBG Monty 1
- Statistical AnalysisUploaded byPrem Kumar
- Navidi Ch07 4e Linear RegressionUploaded byAmin Zaquan
- Simple Linear RegressionUploaded byrayhan555
- Chapter 5Uploaded byjayroldparcede
- Exercise Simple Linear RegressionUploaded byIşık Demeti
- Regression With PythonUploaded byraz_939
- Set6Uploaded byArka
- MAS.M-1414. Cost concepts, Classification and Segregation.MC.docxUploaded bychowchow123
- A Guide to Using EViews_Johnson_00_aUploaded byAna-Maria Jinca
- 05.aktasUploaded byAhmad Rizki Gusti
- Simple Linear Regression Scott M LynchUploaded bypedda60
- RcmdrPlugin.hhUploaded byzaki_b6
- 2.Format.man-Internally Generated Revenue and the Revenue Profile of Selected South Western State Governments in NigeriaUploaded byImpact Journals