You are on page 1of 3

Unit #2

Objective:

Reference:

STATISTICAL INVESTIGATION involving Bivariate Data


Carry out investigations of phenomena, using the statistical enquiry cycle:
o finding, using, and assessing appropriate models (including linear regression for bivariate data), seeking explanations, and
making predictions
o using informed contextual knowledge, exploratory data analysis, and statistical inference
o communicating findings and evaluating all stages of the cycle.
Sigma (3rd ed):
Chapter 13-14
MWS: Ch 13, 14
Nulake Workbook: Section 5

Homework: for each hour in class, about an hour more is needed to effectively (ractice and process the work from that lesson, in readiness for the next one. If
you are having difficulty with something related to a lesson DO SOMETHING about it straight away. Internal assessments in particular, dont wait!

Detail

Lesson

1-4
plus
homework
time

Data display - write about what can be seen in the graph


Bi-variate Data - each data point represents two variables e.g. a persons height & weight. It is
important to understand the variables being used and the units applied to them. Research them.
Data Display scatter diagram (points are NOT joined). Make notes from: scatterplot powerpoint.
Put the independent (or explanatory) variable on the horizontal axis based on the question asked.
The dependent (or response) variable goes on the vertical axis.
Look for interesting patterns in the data display:
Shape or general trend: is there a recognisable pattern e.g. linear / non-linear?
Is there a positive or negative association? i.e. does the response variable tend to increase or
decrease as the explanatory variable increases?
Scatter: is it a strong / weak relationship? i.e. are the data points tightly packed or not?
Is the scatter constant / not? i.e. does the relative spread of the data points change?
Unusual features: Are there clusters of data?
Are there any outliers? These are well away from the others but may still be near the best-fit-line.
Read more about outliers on pages 35-36 in the workbook.
Regression line is the line of best fit for the plotted data (trendline in Excel and INZight)
Excel, Graphics Calculators and INZight can give the equation
Correlation Coefficient, r, measures how well the (linear) regression line fits the data -1 < r < 1
It measures the strength of the mathematical relationship between the two variables; there may
or may not be a causal relationship between the variables as well.

Sigma

Workbook

Always do
Achieved
questions first!

Ex 13.01
Ex 13.02

Classifying
variables
#1 14
Inspecting a
scatterplot
#15
16(INZight)
Describing scatter
#30-37

Learn the basics of Bivariate Data investigation, by following example #1 and working through
#2-4 on the Yellow Resource sheets and associated handouts (p 2 23)

Practice #1 Kiwibirds
Practice #2 BMI and Bodyfat
Practice #3 Open Road Fuel Efficiency and Engine Size
Practice #4 Diamond Price and Size
To continue your learning and your preparation for the internal assessment:
Continue with the workbook questions as follows:
- print off the material you need from iNZight (at home or at school) for several questions
- answer the questions in the spaces provided in your workbook
- check your answers with those at the back of the workbook
- learn to use appropriate statistical language in your answers
Finding the Correlation Coefficient, r [and the Coefficient of determination, R 2]
Excel and the Graphics Calculator give the Coefficient of determination called R2.
Correlation
r is found by taking positive/negative
R 2 depending on the observed slope of the
coefficient
#20 29
regression line
and re-look at
[This is outside of the requirements of this course! R2 measures how much of the variability of the
#30 37
response variable is explained by the model (the equation of the regression line). The closer R 2 is
to 1, the more that variation is explained by the model. The closer R 2 is to 0, the less that
Using INZight
variation is explained by the model.
2
Multiplying R by 100 gives the %age of y-values explained by the model.]
# 38 Income

7
plus
homework
time

Comprehensively inspecting and describing scatterplots using INZight


Use INZight to draw graphs and best-fit-lines.
LOOK for any of the relevant features, mentioned above.
WRITE about what you see.
RESEARCH contexts so you can offer sensible explanations for the phenomena you see in the graphs.
STUDY the answers at the back of the workbook, to LEARN more about what (and how) to write!
INZight is on K-drive; the workbook downloads are also on K-drive. There are INZight tutorials on
censusatschool.org.nz

#39 Red wine


#40 Life expectancy
#41 Electricity
demand

Lesson

Detail

Workbook

Regression Analysis - linear regression


Simple linear regression is used when it seems 2 variables are connected by a linear relationship.
The regression line is used to predict values of the response variable.
run
It is called a best fit line - it minimises the residuals.
For each data point: residual = observed predicted
rise
Read more about residuals on pages 40-43 in the workbook.

Using INZight
# 45 Alcohol problems
#46 Abalone

The regression line equation is of the form: y = m x + c


where c is the y-intercept
and m is the gradient or

residuals

rise
run

i.e. how variable y tends to change for each unit of variable x.


7

The purpose of the regression line is to predict y, given x. It can be used to predict y, for x-values
from within the data set. To make a prediction, substitute the given x into the regression line.

#47 Wheat kernels


#48 Investigating the
effect of outliers
# 49 Blood alcohol
concentration

For a linear relationship, a graph of residuals vs explanatory variable, should NOT show a pattern
it should look random.
Some
more
difficult
concepts
that
could
arise

# 50 Income and hours


worked

Residual = difference between an actual data value and its predicted (model) value
A graph of residuals vs x-values shows the appropriateness/otherwise of the use of a linear model.

# 51 Vehicle economy
# 52 Abalone
# 53 What model?

Correlation (mathematical relationship) We SEE: as x increases, y tends to increase/decrease


Causality (physical relationship) e.g. An increase in x CAUSES an increase/decrease in y
Non-linear regression
In practice, the linear model may not be a good fit at, for example, one or both of the extremities.
This means that interpolation (making estimates about variables within the observed range) gives
sensible values in context, but extrapolation (making estimates about variables outside the
observed range) does not.
Where a single straight line does not represent the relationship very well, an improvement might be
a piece-wise linear model or a curved model. (Curved models not expected in this standard)

mostly
homework
time

# 54 Metabolic rate and


lifespan
# 55 Bacterial counts

More Practice tasks: Investigate bi-variate measurement data

Sport Science

Research the context

Subspecies

Write a purpose statement and formulate your question.


Select appropriate data displays
Discuss features of the display(s)
Find an appropriate model
Use the model to make a prediction
Communicate findings in a conclusion

Abalone

Do practices hand them in for marking make use of the feedback to prepare for the final assessment task

If you have not done practice tasks, and had them marked, you will not be sitting the
assessment task.
.
Internal Assessment the context for the assessment will be given out on Monday Term 3 week 3.
Research the context in preparation for doing the assessment at the end of week 4 / start of week 5. (TBC)

You might also like