Correlation Between A Person's Height and Shoe Size Main

Surname 1
Mathematics Internal Assessment
Applications and Interpretations
Standard Level, SL
Correlation Between a Person's Height and Shoe Size
Session:
Number of Pages: 12 pages

Surname 2
Introduction
As a biology student with interest in both human physiology, and anatomy, over the years, I have
been fascinated by the diverse variations, and unique characteristics in human beings. In
particular, I have noted that human beings have unique heights, shoe size, and tongue rolling,
among others. In addition, and through personalized research, I have read from the different
pieces of online studies, including ReachMD (1) and Cleveland Clinic (2) that tall persons tend
to have large-sized feet, compared to short individuals. In the backdrop of these developments, I
have had an opportunity to learn, and interact with various forms of statistical studies, and tests
in the standard level mathematics, including correlation tests, regression analysis, and chi-tests,
among others, and how they can be used to relationships between two or more variables. In an
attempt to gain more understanding on how these concepts can be utilized in real-life situations,
and relationships between shoe size, and heights of persons, I decided to design this investigative
study. Thus, this correlation-based investigation will not only be an academic exercise, but also a
quest to unravel the existing mysteries between different patterns of growth; size of feet
(measurable) by the shoe size, and the heights of persons.
shoe design proces

Surname 3
Aims of the Investigation
This investigation is designed to find out both the direction, and strength of the correlation
between a person’s shoe size, and height. The data of shoe size, and heights will be randomly
measured from a randomly selected sample of 50 individuals. After collecting the datasets, they
will be used to run a two-sample t-test to find out whether the statistical mean differences
between the heights of male, and female subjects is significant. The data sets will then be used to
create box, and whisker plots in the SPSS statistical software to identify, and strike out all the
outliers that could be present. Thereafter, the means of all heights of the individuals, and also for
the shoe sizes will be computed to be utilized in other sections of the investigation. The chi-
square test of independence will then be conducted to find out whether the two variables are
associated or not, paving way for the determination of the relationship. The test of relationship
will involve creating a scatterplot diagram, and analyzing the trends of the data points
(observations). If the relationship that exist between these two variables is deemed to be non-
linear, the Spearman’s rank approach will be utilized to determine the correlation. Otherwise, the
Pearson’s method of working out correlation coefficient will be adopted. In the event the
Pearson’s method has been utilized, and the correlation coefficient found to be statistically strong
enough (more than +/- 0.5), then a regression equation will be calculated, and used to create a
linear fit regression line, which can be used to predict the shoe sizes for known heights of
persons, and vice versa. Towards the end, a comprehensive conclusion, and evaluation section
will be developed.
Surname 4
Hypothesis
In conducting this investigative study, it is hypothesized that the shoe size will very linearly with
the person’s height. This hypothesis is in line with the earlier revelation in the introductory
section that tall persons tend to have large-sized feet, compared to short individuals.
Data Collection
This mathematical piece of study adopted primary method of data collection. A sample of 50 IB
students was used, comprising both female, and male subjects randomly selected. The students
were then briefed on what activity they were to be subjected into, and their consent sought. Their
shoe sizes were checked, and their personal heights measured using a meter rule. The pair of
datasets measured were then recorded into the data table, presented into the appendix section of
this investigative study. Notably, use of a sample of 50 students would validate the findings.
Definition of Variables
In order to ensure clarity in the process of finding out the direction, and strength of the
correlation between a person’s shoe size, and height, the pair of variables had to be defined, such
that:
 Independent variable (x): Person’s height, measured to the nearest centimeter (cm)
 Dependent variable (y): Shoe size, measured to US sizes.
Two Sample t-test
It is a statistical test utilized to determine whether the difference in means of any pair of
population or categorical variables are statistically different or not (Zach 1). Thus, this test would
Surname 5
be suitable in the determination of whether the statistical mean differences between the heights
of male, and female subjects is significant before utilizing the data in conducting test of
correlation. In conducting the sample t-test, the data involved have to meet the following
assumptions in line with the study of Zach (1):
The observations in samples be independent of each other
The data be normally distributed
The sample have approximately the same level of variance
The data in either sample have been collected using a random sampling approach.
The t-test operates under a pair of defined hypotheses, such that:
Null hypothesis, H0: There is not significant mean difference
Alternative hypothesis, H1: There is significant mean difference
The statistical value of t-test is computed using the following general formula:
( x 1−x 2 )
Test statistics ,t stat . =
Sp
√ 1 1
+
n1 n2
Where:
x 1∧x 2=Sample means of the two variables represented
n1∧n2=¿ of the sample s
√
S p= ( n1−1 ) s 12+ ( n2−1 ) s22
s1∧s 2=Standard deviations of the samples

Surname 6
The test statistics is compared with a critical value read from the distribution table of t-test based
on a predefined significance level, and degree of freedom determined through the following
basic formula:
Degree of freedom , df =(n ¿ ¿ 1+n 2)−2 ¿
In the event the critical value is found to be more in value than the test statistics value, then the
null hypothesis is deemed to hold. Otherwise, the alternative hypothesis is supported.
To run the t-test using the heights of persons, the following seven steps were used:
Step 1: The data set on person’s height was sorted into two; for the male, and female subjects, as
shown in Table 1 below.
Step 2: Definition of hypotheses:
Null hypothesis, H0: No significant mean difference between heights of males, and females
Alternative hypothesis, H1: Significant mean difference between heights of males, and females
Step 3: The sample mean of the male subjects, and the corresponding standard deviation:
'
Sample mean of person s height for the female subjects , x1=
∑x
n
3454
¿
22
¿ 157 cm
Standard deviation , s 1=
√( 156−157 )2 + ( 156−157 )2 +..+ ( 15 0−157 )2
22
¿ 8.4 35 cm
Surname 7
Table 1: Distribution Table of Heights of Persons Based on Gender
Person’s Height, x (cm)

Count, n
Females Males
1 156 145
2 157 155
3 151 158
4 154 157
5 151 152
6 149 153
7 170 150
8 164 152
9 155 152
10 162 163
11 153 163
12 148 174
13 156 153
14 174 157
15 181 156
16 155 152
17 150 150
18 152 147
19 154 156
20 156 146
21 156 154
22 150 165
23 160
24 162
25 149
26 164
27 173
28 167
∑ ,∑ x 3454 4385
Surname 8
Step 4: The sample mean of the male subjects, and the corresponding standard deviation:
'
Sample mean of person s height for the male subjects , x2 =
∑x
n
4385
¿
28
¿ 15 6 .6 07 cm
Standard deviation , s 1=
√
( 1 45−156.607 )2 + ( 1 55−156.607 )2+..+ (1 67−156.607 )2
28
¿ 7.397 cm
Step 5: The pooled standard deviation was then computed:
√
S p= ( n1−1 ) s 12+ ( n2−1 ) s22
¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2
¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2
¿ 41.156 cm
Step 6: Calculation of test statistic
( 157−156.607 )
Test statistics ,t stat . =
41.157
√ 1 1
+
22 28
¿ 0.0335
Step 7: Critical value, and decision:

Surname 9
df =(22+28)−2
¿ 48
Using this degree of freedom, and 0.05 significance level, the critical value from distribution
table was 1.665.
Since the the critical value is found to be more in value than the test statistics value, then the null
hypothesis is deemed to hold, indicating that there was no significant mean difference between
heights of males, and females. Thus, the data sets would be used jointly in the rest of statistical
analysis
Box and Whisker Plots for the Identification of Outliers
The data set presented in Table 1 was copied into the graphing application of SPSS, and used to
create box, and whisker plots shown below
Figure 1: Box, and Whisker Plot for the Person's Height

Surname 10
Figure 2: Box, and Whisker Plot for the Shoe Size
A closer examination on the box, and whisker plot in Figure 1 above reveals that the
sizes of the whisker on the upper side of the central blue box was larger compared to the size if
the whisker on the lower side, an indication of positive skewness in the person’s height dataset.
In addition, the data point 33 on the whisker indicates that the corresponding height of subject
33; 81 cm is an outlier that had to be struck out the distribution in the subsequent statistical
analysis of data. On the other hand, there was no identified outlier in the dataset of shoe sizes in
the box, and whisker plot presented in Figure 2 above, even though the whisker on the upper
side of the central blue box was larger compared to the size if the whisker on the lower side,
again an indication of positive skewness in the shoe size dataset.
Determination of Means
The means of the two statistical variables would be calculated using the same approach that had
been adopted in the two-sample t-test section, but having to strike the outlier identified in the
preceding section, leaving 49 data points, such that:

Surname 11
'
Mean of Person s height , x =
∑ xi
N
145+155+156+ …+167
x=
49
¿ 156.286 cm
Mean of shoe sizes , y=

∑ yi
N
5.0+5.4 +5.0+…+6.5
y=
49
¿ 5.659 US
From the two computations, the means of shoe sizes, and heights of persons considered in the
statistical study were 5.659 US size, and 156.286 cm, respectively. These values would find
significant application in the subsequent sections of this mathematics investigative study
Chi-Square Test for Independence
It is a statistical test used to determine if two categorical or measurable variables are related on
not (Biswal 1). The chi-square test is calculated using the following general formula:
2
Chi−square Test , x c =
∑ ( Oi−Ei ) 2 (Biswal2)
Ei
Where:
C=The degree of freedo m
O=Observed value
Surname 12
E=Observed value
The degree of freedom (df) statistical calculation can vary with the type of statistical test under
study. For chi-square test, df is computed as a function of the total number of rows, and columns
of a distribution table of comprising either observed values or expected values, such that:
The degree of freedo m , C=(No . of rows−1)×(No .of columns−1)
This degree of freedom is used to determine the critical value of the chi-square at a
specified significance level. One the critical value has been computed; it is compared with the
test value. In the event the critical value is found to be more than the test value, a null hypothesis
is supported, indicating that there is no significant association or relationship between the pair of
categorical variables that are being studied. Otherwise, the null hypothesis would be supported,
indicating that there the variables are related. In finding out whether the person’s height, and
shoe sizes are related, six different steps were adopted.
Step 1: Formulation of hypothesis:
Null hypothesis, H0: There is no significant relationship person’s height, and shoe size
Alternative hypothesis, H1: There is a significant relationship person’s height, and shoe size
Step 2: Creation of frequency distribution table for the observed values of shoe size based on
defined ranges of person’s heights. A 3 column by 3 row was created as illustrated below.
Surname 13
Table 2: Frequency Distribution for the Observed Values of Shoe Sizes
Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5. 5< y ≤7.0 7.0< y ≤ 8 .5 Total
135.0< x ≤150.0 48.5 0.0 0.0 48.5
1 50.0< x ≤ 165 .0 110.9 82.4 0.0 193.3
1 65 .0< x ≤ 180. 0 0.0 21.0 22.5 43.5
Total 159.4 103.4 22.5 285.3
Step 3: Creation of frequency distribution table for the expected values of shoe sizes, as a
functions summation of rows, and columns from Table 2 above:
Row Total ×Column Total

Expected Value , E=
Tot al of all Observations
In a sample computations, using the total in row 2, and column 1:
193.5 ×159.4
Expected Value , E=
285.3
¿ 108.1
Table 3: Frequency Distribution for the Expected Values of Shoe Sizes
Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5.5< y ≤ 7.0 7.0< y ≤ 8.5 Total
135.0< x ≤150.0 27.1 17.6 3.8 48.5
1 50.0< x ≤ 1 65.0 108.0 70.1 15.3 193.3
1 65.0< x ≤ 180.0 24.3 15.7 3.4 43.5
Total 159.4 103.4 22.5 285.3
Step 4: Computation of test statistic of chi-square. The values from Tables 2, and 3 were used in
this computation:
2 ( 48.5−27.1 )2 ( 0.0−17.6 )2 ( 22.5−3.4 )2

Chi−square Test , x c = + +…+
27.1 17.6 3.4
Surname 14
¿ 189 .22
Step 6: Determination of critical value, and decision
The degree of freedo m , C=(3−1)×(3−1)
¿4
With this df, and 0.05 significance level, the critical value read from the distribution table was
9.49.
Since the critical value is found to be less than the test value, a null hypothesis is supported,
indicating that there was a significant relationship person’s height, and shoe size.
Graphical Representation: Scatterplot Graph
Scatterplot graphical plot is used in statistical tests to determine the nature of relationship
between any two categorical variables (Lumen Learning 1). The relationship between such
variables could be non-linear, linear, or no relationship:
Linear relationship: When the observations on the scatterplot graph assumes a
specified pattern, and direction.
Non-linear relationship: When the observations on the scatterplot graph assumes a
specified pattern but not clear direction.
No relationship: When the observations on the scatterplot graph don not assumes
specified pattern, and direction
The data set presented in Table 6 (minus the outlier) was copied into the graphing application of
SPSS, and used to scatterplot shown below.

Surname 15
Figure 3: A Scatterplot of Shoe Size against Person's Height

The scatterplot shown above reveals that the observations on the scatterplot graph assumes a
specified pattern, and direction, hence a case of linear relationship between shoe sizes, and the
person’s height. Based on this revelation, the Pearson’s method would be preferred to
Spearman’s rank correlation in the determination of correlation coefficient for strength, and
direction analysis.
Linear Correlation: Pearson’s Correlation Coefficient
In statistical analysis, correlation is considered to be the measure of association between any pair
of categorical or numerical variables (Kiernan 2). A correlation is deemed to exists between such
variables if they are related to each other. As observed in the preceding section, correlation
determination stems from scatterplot graph, which determines the nature of relationship. For
linear relationship, a Pearson’s Product Moment Correlation Coefficient (PPMCC) is calculated

Surname 16
to determine the direction, and strength of the correlation between the variables. The PPMCC is
calculated through the following general formula, in line with the study of Kierman (3):
S xy
PPMCC , r=
√ S xx × S yy
Where:
S xy =Convariant of combined variables x ( independent ) ,∧ y (dependent )
S x x =Convariant of variable x ( independent )
S yy =Convariant of variable y ( dependent )
The properties of correlation coefficient, r include:
It takes any value between -1, and +1
It is a unitless quantity
When positive, it indicates direct relationship, and when negative, it indicates indirect
relationship.
The correlation coefficient assumes different interpretations:
Table 4: Analysis of Strength of Correlation Coefficient
Strength of Correlation Coefficient Value of Correlation Coefficient, (r)
Positive Negative
No correlation 0 0
Weak correlation 0.1 up to 0.3 -0.1 up to -0.3
Fairly strong correlation 0.3 up to 0.5 -0.3 up to -0.5
Strong correlation 0.5 up to 0.75 -0.5 up to -0.75
Very strong 0.75 up to 0.99 -0.75 up to -0.99
Perfectly strong correlation 1.0 -1.0

Surname 17
Building on this foundation, the data set presented in Table 6 (minus the outlier) was used to
create a Pearson’s distribution table, as illustrated below.
Table 5: PPMCC Distribution Table for the Person's Height (x), and the Shoe Size (y)
x y xy x
2
y
2
145 5.0 725.00 21025.00 25.00

155 5.4 837.00 24025.00 29.16
156 5.0 780.00 24336.00 25.00
157 6.5 1020.50 24649.00 42.25
158 6.4 1011.20 24964.00 40.96
157 7.0 1099.00 24649.00 49.00
151 4.5 679.50 22801.00 20.25
152 5.0 760.00 23104.00 25.00
153 5.0 765.00 23409.00 25.00
154 6.0 924.00 23716.00 36.00
151 5.5 830.50 22801.00 30.25
150 5.0 750.00 22500.00 25.00
149 5.0 745.00 22201.00 25.00
152 5.5 836.00 23104.00 30.25
152 5.5 836.00 23104.00 30.25
170 7.0 1190.00 28900.00 49.00
164 6.5 1066.00 26896.00 42.25
163 6.0 978.00 26569.00 36.00
163 6.0 978.00 26569.00 36.00
155 5.5 852.50 24025.00 30.25
162 6.5 1053.00 26244.00 42.25
174 7.5 1305.00 30276.00 56.25
153 5.0 765.00 23409.00 25.00
153 5.0 765.00 23409.00 25.00
157 5.5 863.50 24649.00 30.25
156 6.0 936.00 24336.00 36.00
148 4.5 666.00 21904.00 20.25
152 5.0 760.00 23104.00 25.00
150 5.0 750.00 22500.00 25.00
156 5.5 858.00 24336.00 30.25
174 7.5 1305.00 30276.00 56.25
147 4.5 661.50 21609.00 20.25
Surname 18
156 5.5 858.00 24336.00 30.25

146 4.5 657.00 21316.00 20.25
155 5.5 852.50 24025.00 30.25
154 5.5 847.00 23716.00 30.25
150 5.0 750.00 22500.00 25.00
165 6.5 1072.50 27225.00 42.25
160 6.0 960.00 25600.00 36.00
162 6.0 972.00 26244.00 36.00
152 5.0 760.00 23104.00 25.00
154 5.5 847.00 23716.00 30.25
156 5.5 858.00 24336.00 30.25
149 5.0 745.00 22201.00 25.00
164 6.5 1066.00 26896.00 42.25
156 5.5 858.00 24336.00 30.25
150 5.0 750.00 22500.00 25.00
173 7.5 1297.50 29929.00 56.25
167 6.5 1085.50 27889.00 42.25
∑ x =7658 ∑ y=277.3 ∑ x y=43587.2 ∑ x 2=1 1 99268 ∑ 2
y =1600.37
Determination of correlation coefficient using the data sets obtained above involved use of four
different steps:
Step 1: Computation of covariant of x:
(∑ x )
2
S xx =∑ x −
2
n
( 7658 )2
¿ 1 199268−
49
¿ 2432
Step 2: Computation of covariant of y:
(∑ y )
2
S yy =∑ y −
2
n
Surname 19
( 277.3 )2
¿ 1600.37−
49
¿ 31.078
Step 3: Computation of combined covariant, xy:
( ∑ x )( ∑ y )
S xy=∑ xy −
n
(7658 )( 277.37 )
¿ 43587.2−
49
¿ 249.171
Step 4: Computation of correlation coefficient, and interpretation:
S xy
PPMCC , r=
√ S xx × S yy
249.171
¿
√2432 ×31.078
¿ 0. 9063
Hence, there is a very strong, and positive correlation between the person’s height, and the shoe
size.
Line of Regression, and Estimation of Shoe Sizes
When the correlation between the two categorical variables has been ascertained to be strong
enough, the relationship between then could be modeled through a linear regression equation.
The dependent variable can be determined as a function of the explanatory or predictor variable.
According to Kierman (5), the regression equation is defined by the following formula:
Surname 20
y=bx+ b0
Where:
S xy
b=
S xx
b 0= y−b x
The regression equation that could be used to model or predict the shoe size of an individual
based on their heights for this investigation could be calculated using three major steps:
Steps: Determination of constant, b:
S xy 249.171
b= =
S xx 2432
¿ 0. 102
Step 2: Determination of constant, b0:
b 0=5.659−0.102× 156.286
¿−10 .282
Step 3: Determination of regression equation:
The regression equation could now be expressed with the values obtained in steps 1, and 2
above:
y=0.102 x−10.282
Surname 21
For instance, and in a sample calculation, when the height of a person is 145 cm, the shoe size
would be:
y=0.102(145)−10.282
¿ 4.508
Similar computations were made with substitution of values of x from Table 6 from the
appendix, and results used to create the linear regression graph as a linear fit presented in the
Figure below.
Figure 4: A Linear Regression Plot of Shoe Size against Person's Height
Conclusion, and Evaluation
In sum, this investigation has achieved the designed aim in finding both the direction, and
strength of the correlation between a person’s shoe size, and height. The data of shoe size, and
heights will be randomly measured from a randomly selected sample of 50 individuals. It had
been hypothesized that the shoe size would vary linearly with the person’s height. Upon
Surname 22
analyzing datasets through several statistical computations, the correlation coefficient was found
to be 0.9028, indicative of strong, and positive correlation between shoe size, and person’s
height. Hence, the hypothesis of the investigation was supported. The strengths of this
investigation included use of several statistical tests, and use of SPSS software in analyzing, and
presenting data. The only limitation was lack of more than one trial of measurements for each
subject, which would have improved the accuracy, and validation of data.
Surname 23
Works Cited
Biswal, Avijeet. “What Is a Chi-Square Test? Formula, Examples & Uses | Simplilearn.”
Simplilearn.com, 17 Feb. 2023, www.simplilearn.com/tutorials/statistics-tutorial/chi-
square-test.
Cleveland Clinic. “Shoes Getting Tight? Why Your Feet Change Size over Time.” Cleveland
Clinic, 27 Jan. 2020, health.clevelandclinic.org/shoes-getting-tight-feet-change-size-
time/.
Kiernan, Diane. “Chapter 7: Correlation and Simple Linear Regression.”
Milnepublishing.geneseo.edu, Open SUNY Textbooks, 16 Jan. 2018,
milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-correlation-
and-simple-linear-regression/.
Lumen Learning. “Chapter 7: Correlation and Simple Linear Regression | Natural Resources
Biometrics.” Courses.lumenlearning.com, 2019, courses.lumenlearning.com/suny-
natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/.
ReachMD. “What Factors Influence a Person’s Height?” Reachmd.com, 27 Jan. 2020,
reachmd.com/news/what-factors-influence-a-persons-height/1632279/.
Zach. “Two Sample T-Test: Definition, Formula, and Example.” Statology, 23 Apr. 2020,
www.statology.org/two-sample-t-test/.
Surname 24
Appendix
Table 6: Raw Data on the Shoe Size, and the Heights of Persons
No. of People, n Gender Height (cm) Shoe Size (U.S)

1 Male 145 5.0
2 Male 155 5.4
3 Female 156 5.0
4 Female 157 6.5
5 Male 158 6.4
6 Male 157 7.0
7 Female 151 4.5
8 Male 152 5.0
9 Male 153 5.0
10 Female 154 6.0
11 Female 151 5.5
12 Male 150 5.0
13 Female 149 5.0
14 Male 152 5.5
15 Male 152 5.5
16 Female 170 7.0
17 Female 164 6.5
18 Male 163 6.0
19 Male 163 6.0
20 Female 155 5.5
21 Female 162 6.5
22 Male 174 7.5
23 Male 153 5.0
24 Female 153 5.0
25 Male 157 5.5
26 Male 156 6.0
27 Female 148 4.5
28 Male 152 5.0
29 Male 150 5.0
30 Female 156 5.5
31 Female 174 7.5
32 Male 147 4.5
33 Female 181 8.0
Surname 25
34 Male 156 5.5

35 Male 146 4.5
36 Female 155 5.5
37 Male 154 5.5
38 Female 150 5.0
39 Male 165 6.5
40 Male 160 6.0
41 Male 162 6.0
42 Female 152 5.0
43 Female 154 5.5
44 Female 156 5.5
45 Male 149 5.0
46 Male 164 6.5
47 Female 156 5.5
48 Female 150 5.0
49 Male 173 7.5
50 Male 167 6.5

Correlation Between A Person's Height and Shoe Size Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation Between A Person's Height and Shoe Size Main

Uploaded by

Copyright:

Available Formats

Surname 1

Mathematics Internal Assessment

Applications and Interpretations

Correlation Between a Person's Height and Shoe Size

Number of Pages: 12 pages

(measurable) by the shoe size, and the heights of persons.

shoe design proces

Aims of the Investigation

 Dependent variable (y): Shoe size, measured to US sizes.

Two Sample t-test

assumptions in line with the study of Zach (1):

The observations in samples be independent of each other

The data be normally distributed

The sample have approximately the same level of variance

The t-test operates under a pair of defined hypotheses, such that:

Null hypothesis, H0: There is not significant mean difference

Alternative hypothesis, H1: There is significant mean difference

x 1∧x 2=Sample means of the two variables represented

n1∧n2=¿ of the sample s

s1∧s 2=Standard deviations of the samples

Degree of freedom , df =(n ¿ ¿ 1+n 2)−2 ¿

null hypothesis is deemed to hold. Otherwise, the alternative hypothesis is supported.

shown in Table 1 below.

Step 2: Definition of hypotheses:

Table 1: Distribution Table of Heights of Persons Based on Gender

Person’s Height, x (cm)

Step 5: The pooled standard deviation was then computed:

¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2

¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2

Step 6: Calculation of test statistic

Step 7: Critical value, and decision:

table was 1.665.

Box and Whisker Plots for the Identification of Outliers

create box, and whisker plots shown below

Figure 1: Box, and Whisker Plot for the Person's Height

Figure 2: Box, and Whisker Plot for the Shoe Size

again an indication of positive skewness in the shoe size dataset.

preceding section, leaving 49 data points, such that:

Mean of shoe sizes , y=

significant application in the subsequent sections of this mathematics investigative study

Chi-Square Test for Independence

C=The degree of freedo m

The degree of freedo m , C=(No . of rows−1)×(No .of columns−1)

shoe sizes are related, six different steps were adopted.

Step 1: Formulation of hypothesis:

Table 2: Frequency Distribution for the Observed Values of Shoe Sizes

functions summation of rows, and columns from Table 2 above:

Row Total ×Column Total

In a sample computations, using the total in row 2, and column 1:

Table 3: Frequency Distribution for the Expected Values of Shoe Sizes

2 ( 48.5−27.1 )2 ( 0.0−17.6 )2 ( 22.5−3.4 )2

Step 6: Determination of critical value, and decision

The degree of freedo m , C=(3−1)×(3−1)

Graphical Representation: Scatterplot Graph

variables could be non-linear, linear, or no relationship:

Linear relationship: When the observations on the scatterplot graph assumes a

specified pattern, and direction.

Non-linear relationship: When the observations on the scatterplot graph assumes a

specified pattern but not clear direction.

specified pattern, and direction

SPSS, and used to scatterplot shown below.

Figure 3: A Scatterplot of Shoe Size against Person's Height

Linear Correlation: Pearson’s Correlation Coefficient