Professional Documents
Culture Documents
BIVARIATE DATA
• 1.01 Variables & Summary Data
• 1.02 Two Way Frequency Table
• 1.03 Relationships / Associations.
• 1.04 Displaying Relationship: Scatter Plot
• 1.05 Analysing Relationships: Correlation Coefficient
• 1.06 Fitting Least Square Regression Lines
• 1.07 Reliability of Regression Model
Objective:
What is Bivariate?
1
1.01 VARIABLES & SUMMARY DATA
Height Waistline
Weight Exam Scores
Pollution level Distance from Earth
Gaming hours Velocity
Number of people Amount of fertilizer
Population Sales
TYPES OF DATA
2
1.01 VARIABLES & SUMMARY DATA
1) Mean – is the arithmetic average of the data values. The mean is not an
appropriate measure of centre if there are outliers in the data set.
2) Median – is the middle score when the data values are ordered from smallest to
largest. The median is not affected by outliers.
3) Mode – the most common data value, or the data value with the highest frequency.
3
1.01 VARIABLES & SUMMARY DATA
Calculating Measures of Spread
1) Range = highest value – lowest value. The range measures the spread of all the
data and therefore is not an appropriate measure of spread when outliers are
present.
4
USING CAS CALCULATOR
5
USING CAS CALCULATOR
6
1.02 Two Way Frequency Table
● Data can be represented in a 2-way frequency table.
● Eg. Suppose that the number of male and female ‘Members of Parliament
(MP)’ from two parties are tabulated
● The table below shows the totals in the form of a 2-way frequency table.
• From the table, we can get their proportions based on the total number, or
even their percentages.
7
Suppose that a survey asking a number of people their age and which they
most preferred as a main course at a restaurant, out of beef, chicken, fish or
other, gave rise to the two way table shown below.
☺Row Percentages
Male
Female
8
1.02 –Two Way Frequency Table
Display – Stacked 100% column Graph:
☺Column Percentages
Party A Party B
Male
Female
Total
9
1.02 –Two Way Frequency Table
10
(c)
Comment on association:
Describing association:
11
1.03 RELATIONSHIPS / ASSOCIATIONS
• Identifying the explanatory and response variables is important. It will affect how
you would represent data and conduct your analysis.
RV depends on the EV
12
For the following sets of variables, state which is the explanatory variable and
which is the response variable.
1. Temperature and number of ice creams sold
2. Exam scores and time spent studying
3. Time travelling and distance travelled
4. Working hours and wage
5. Caffeine consumption and heart rate
6. Time spent dating and Couple happiness level
13
1.04 Displaying Relationship: Scatter Plot
1. FORM
14
1.04 Displaying Relationship: Scatter Plot
2. DIRECTION
3. STRENGTH
15
16
17
18
1.05 Analyzing Relationships: Correlation Coefficient
• Calculating the correlation coefficient will produce a value. This value can be
used to determine the strength and direction of the relationship.
19
1.05 Analyzing Relationships: Correlation Coefficient
20
21
22
1.05 Analyzing Relationships: Correlation Coefficient
CAUSALITY
• Causality is the relationship between cause and effect.
• This is used when the explanatory variable absolutely and solely affects the
outcome of the response variable.
23
1. A negative correlation exists between the number of ice-creams sold and the
number of flu cases reported. Does ice cream prevent the flu? Comment.
2. There exist a strong positive correlation between the number of televisions and
the life expectancy for the world’s nations. Does having multiple televisions
increase life expectancy? Comment.
OUTLIER
• The correlation coefficient is also affected by outliers.
24
25
1.06 Fitting Least Square Regression Lines
26
27
1.06 Fitting Least Square Regression Lines
28
1.06 Fitting Least Square Regression Lines
29
1.06 Fitting Least Square Regression Lines
MAKING PREDICTIONS
Once a regression line has been found, the equation can be used to make predictions.
Data was collected from people aged between 7 and 19 years of age and a linear
regression line was found with the equation.
Height(cm) = 100 + 2.5 x age(years)
What is the predicted height for a 8-year old? 21 year old?
30
1.07 Reliability of Regression Model
31
1.07 Reliability of Regression Model
2. Coefficient Of Determination, R 2
• This coefficient of determination is used to ‘determine how well does our
regression line represent our set of data’.
• It has a numerical value of 0 to 1.
• When interpreting, a general sentence can be used:
r2 x 100% of the variation in the response or dependent variable can be explained
by the variation in the explanatory or independent variable.
32
1.07 Reliability of Regression Model
33
1.07 Reliability of Regression Model
34
1.07 Reliability of Regression Model
4. Existence Of Outlier
• The existence of outliers shows extreme data points on the scatter plot.
• These outliers can reduce the strength of correlation coefficients and affect
the regression line equation.
35
1.07 Reliability of Regression Model
5. Shape Of Residual Plot
36
37
38
1.07 Reliability of Regression Model
39
1.07 Reliability of Regression Model
• From the residual plot, we can determine the ‘linearity’ of the scatter plot.
• A residual plot with scattered/random points suggests that the scatter plot
is linear. This suggests that the linear regression model is suitable for the
data.
• A residual plot with pattern/shape suggests that the scatter plot is non-
linear. This suggests that the linear regression model is not suitable for
the data.
40
41