You are on page 1of 2

DATAX111 Tutorial 10

Attempt Question 1 BEFORE the tutorial.


QUESTION 1

In relation to simple linear regression, describe in words how the method of least squares is used to calculate the
‘line of best fit’. The line of best fit is the line which has the lowest number of squared vertical distances between the
line and the given data points.

QUESTION 2

Researchers were interested in whether there was a relationship between sepal length and petal length for irises.

This data can be found on Moodle in the Excel spreadsheet iris.xlsx.

(a) Produce a (well labelled) scatterplot with petal length on the y-axis and sepal length on the x-axis.

(b) Calculate the correlation between sepal length and petal length. 0.864225
A single number to tell you the relationship between two continuous variables. It is always between 1 & -1.
Excel = CORREL

(c) Comment on the strength and direction of the relationship between sepal length and petal length. There is a
strong positive correlation between sepal length and petal length.

(d) Using Excel, fit a simple linear regression model with petal length as the response (y) variable and sepal
length as the explanatory (x) variable. What is the fitted equation?
y= 0.7501x + 0.6105
R2 = 0.7469
A petal length will be 0.7501 times the size of the sepal length, plus 0.6105
DATAX111 Tutorial 10
m= gradient
c= constant

(e) In words, describe what the R 2 value tells you. How much of the variation from the equation in the data is
explained by the regression line. A higher R2 value will equate to a stronger correlation.

(f) Based upon your equation, what do you predict the petal length would be for an iris with sepal length of 7?
5.8612

(g) Jeff wants to use the fitted regression equation to predict the petal length for an iris with sepal length of 10.
Explain why this would be inappropriate.

(h) Calculate (from Excel) a confidence interval for the (slope) coefficient for sepal length.

(i) In words, interpret what the confidence interval tells you.

(j) Perform a hypothesis test (from Excel) to test whether the (slope) coefficient for sepal length is equal to 0 or
not. Clearly state the null and alternative hypotheses, the test statistic, the p-value and the conclusion to
your hypothesis test.

(k) Based upon your answers to parts (f) and (g), conclude whether there is a (linear) relationship between sepal
length and petal length.

Submission:

Submit your answers to Moodle.

Your submission is due at 5pm on Friday October 6.

You might also like