You are on page 1of 51

Fish 224

Assignment

Prepared and Submitted by:


Maria Liza T. Farquerabao
MS-Fisheries (Fisheries Biology Student)
1. Correlation example with two variables
 Correlation of Gestational Age and Birth Weight
A small study is conducted involving 17 infants to investigate the
association between gestational age at birth, measured in weeks, and birth weight,
measured in grams.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
We wish to estimate the association between gestational age and infant
birth weight. In this example, birth weight is the dependent variable and gestational
age is the independent variable. Thus y=birth weight and x=gestational age. The
data are displayed in a scatter diagram in the figure below.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient

 The formula for the sample correlation coefficient is:

where Cov(x,y) is the covariance of x and y defined as

s2x and s2y are the sample variances of x and y, defined as follows:

 The variances of x and y measure the variability of the x scores and y


scores around their respective sample means of X and Y considered
separately.
 The covariance measures the variability of the (x,y) pairs around the mean
of x and mean of y, considered simultaneously.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient

 To compute the sample correlation coefficient, we need to compute the


variance of gestational age, the variance of birth weight, and also the
covariance of gestational age and birth weight.

 We first summarize the gestational age data.

 The mean gestational age is:

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient
 To compute the variance of gestational age, we need to sum the squared
deviations (or differences) between each observed gestational age and the
mean gestational age. The computations are summarized below.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient

 The variance of gestational age is:

 Next, we summarize the birth weight data.

 The mean birth weight is:

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient
 The variance of birth weight is computed just as we did for gestational age
as shown in the table below.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient

 The variance of birth weight is:

 Next we compute the covariance:

 To compute the covariance of gestational age and birth weight, we need to


multiply the deviation from the mean gestational age by the deviation from
the mean birth weight for each participant, that is:

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient
 The computations are summarized below. Notice that we simply copy the
deviations from the mean gestational age and birth weight from the two
tables above into the table below and multiply.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables
 Computing the Correlation Coefficient

 The covariance of gestational age and birth weight is:

 Finally, we can now compute the sample correlation coefficient:

 Not surprisingly, the sample correlation coefficient indicates a strong positive


correlation.

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Correlation-Regression/BS704_Correlation-Regression_print.html
1. Correlation example with two variables using SPSS
1. Correlation example with two variables using SPSS
1. Correlation example with two variables using SPSS
1. Correlation example with two variables using SPSS

 Same answer with the manual


computation.

 The sample correlation


coefficient indicates a strong
positive correlation.
2. Correlation example with three/more variables

Suppose that we are given the number of class periods missed by the 12
students taking the chemistry course. The data are recorded in the table below.

Compute and interpret the coefficient of multiple determination for the sample
above.

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


2. Correlation example with three/more variables

Solution:

we find

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


2. Correlation example with three/more variables

Solution:

and hence

The result indicates that the regression plane

explains 77.4% of the variation of Y.

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


2. Correlation example with three/more variables using SPSS
2. Correlation example with three/more variables using SPSS
2. Correlation example with three/more variables using SPSS
2. Correlation example with three/more variables using SPSS
2. Correlation example with three/more variables using SPSS

Table above shows that the chemistry grade of the students has a significant
relationship with their test scores. However, classes missed by the students do
not affect their test scores and chemistry grades.
3. Simple linear regression example
 Problem:
 A college bookstore must order books two months before each semester
starts. They believe that the number of books that will ultimately be sold for any
particular course is related to the number of students registered for the course
when the books are ordered. They would like to develop a linear regression
equation to help plan how many books to order. From past records, the
bookstore obtains the number of students registered, X, and the number of
books actually sold for a course, Y, for 12 different semesters. These data are
below.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Problem:

A. Obtain a scatter plot of the number of books sold versus the number of registered
students.

B. At a .01 level of significance is there sufficient evidence to conclude that the number of
books sold is related to the number of registered students in a straight-line manner?

C. Carefully explain what the p-value found in part A means.

D. Fully interpret the strength of the straight-line relationship.

E. Give the regression equation, and interpret the coefficients in terms of this problem.

F. If appropriate, predict the number of books that would be sold in a semester when 30
students have registered. Use 95% confidence.

G. If appropriate, estimate the average number of books that would be sold in a semester
for all courses with 30 students registered. Use 95% confidence.

H. If appropriate, predict the number of books that would be sold in a semester when 5
students have registered. Use 95% confidence.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

A. The following scatterplot with the fitted line was obtained.

 As the number of students registered for the course increases, the number of
books sold by the bookstore appears to increase in a straight-line manner.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

B. Ho: The number of students registered and the number of books sold are not
correlated

Ha: The number of students registered and the number of books sold are
correlated

 Decision Rule: Accept Ha if the calculated p-value < .01.

 Test Statistic: r = the Pearson coefficient of correlation

 Calculations from StatCrunch:

r = 0.8997, p-value < 0.0001 < .01 ---> Accept Ha

 Interpretation:
 At the .01 level of significance I conclude that as the number of
students registered increases, the number of books sold
increases in a straight-line manner.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

C. Carefully explain what the p-value found in part A means.

 Since the p-value is less than 0.0001, this indicates that if the number of
students registered and the number of books sold are not correlated (if the null
hypothesis is true), then there is virtually no chance that the observed points in
the scatterplot would exhibit such an obvious straight-line pattern.

D. Fully interpret the strength of the straight-line relationship.

 r 2 = .809 (80.9%).

 80.9% of the variability in the number of books sold is explained by the


straight-line relationship with the number of registered students.

 19.1% of this variability is unexplained, and due to error.

 This relationship is quite strong.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

E. The regression equation is y = 9.30 + 0.637x

 When no students have registered for a course, the number of books sold is 9.30
(or about 9).

 This is the starting point of the straight-line when x = 0.

 It is not particularly meaningful in this problem since all the classes sampled had
more than 25 students registered.

 For each additional student registered for a course, the number of books sold
increases by 0.673.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

F. If appropriate, predict the number of books that would be sold in a semester


when 30 students have registered. Use 95% confidence.

 Since 30 students is within the range of the sampled number of students, it is


appropriate to make this prediction.

 From Minitab the calculated prediction interval is (25.865078, 33.09856).

 I am 95% confident that for a course that has 30 students registered the bookstore
will sell between 25.9 and 33.1 books.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

G. If appropriate, estimate the average number of books that would be sold in a


semester for all courses with 30 students registered. Use 95%
confidence.

 Since 30 students is within the range of the sampled number of students, it is


appropriate to make this estimation.

 From Minitab the calculated confidence interval is (28.279491, 30.684145).

 I am 95% confident that for all courses that have 30 students registered the
bookstore will sell an average of between 28.3 and 30.7 books per semester.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

H. If appropriate, predict the number of books that would be sold in a semester `


when 5 students have registered. Use 95% confidence.

 Since 5 students is not within the range of the sampled number of students, it is not
appropriate to use the regression equation to make this prediction.

 We do not know if the straight-line model would fit data at this point, and we should
not extrapolate.

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example
 Solutions:

https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
3. Simple linear regression example using SPSS
3. Simple linear regression example using SPSS
3. Simple linear regression example using SPSS
3. Simple linear regression example using SPSS
3. Simple linear regression example using SPSS

Used to predict
3. Simple linear regression example using SPSS

79% of the variance in ordered books


can be explained by the number of
students enrolled in a semester.

Let us know if the model is significant or not.

Since sig-value is less than alpha


F (1 , 10) = 42.483 , p = 0.000 (0.01 and 0.05), we say the model is
significant.
3. Simple linear regression example using SPSS

y = 9.300 + 0.673x
3. Simple linear regression example using SPSS
3. Simple linear regression example using SPSS
4. Multiple linear regression example

Suppose that we are given the number of class periods missed by the 12
students taking the chemistry course. The data are recorded in the table below.

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


4. Multiple linear regression example
Solution:

From the given data, we find that

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


4. Multiple linear regression example
Solution:

Inserting the values in the equation, we obtain

Walpole, R.E. 1982. Introduction to Statistics Third Edition.


4. Multiple linear regression example using SPSS
4. Multiple linear regression example using SPSS
4. Multiple linear regression example using SPSS
4. Multiple linear regression example using SPSS
4. Multiple linear regression example using SPSS
End!

You might also like