Professional Documents
Culture Documents
Standard Level, SL
Session:
Introduction
As a biology student with interest in both human physiology, and anatomy, over the years, I have
been fascinated by the diverse variations, and unique characteristics in human beings. In
particular, I have noted that human beings have unique heights, shoe size, and tongue rolling,
among others. In addition, and through personalized research, I have read from the different
pieces of online studies, including ReachMD (1) and Cleveland Clinic (2) that tall persons tend
to have large-sized feet, compared to short individuals. In the backdrop of these developments, I
have had an opportunity to learn, and interact with various forms of statistical studies, and tests
in the standard level mathematics, including correlation tests, regression analysis, and chi-tests,
among others, and how they can be used to relationships between two or more variables. In an
attempt to gain more understanding on how these concepts can be utilized in real-life situations,
and relationships between shoe size, and heights of persons, I decided to design this investigative
study. Thus, this correlation-based investigation will not only be an academic exercise, but also a
quest to unravel the existing mysteries between different patterns of growth; size of feet
This investigation is designed to find out both the direction, and strength of the correlation
between a person’s shoe size, and height. The data of shoe size, and heights will be randomly
measured from a randomly selected sample of 50 individuals. After collecting the datasets, they
will be used to run a two-sample t-test to find out whether the statistical mean differences
between the heights of male, and female subjects is significant. The data sets will then be used to
create box, and whisker plots in the SPSS statistical software to identify, and strike out all the
outliers that could be present. Thereafter, the means of all heights of the individuals, and also for
the shoe sizes will be computed to be utilized in other sections of the investigation. The chi-
square test of independence will then be conducted to find out whether the two variables are
associated or not, paving way for the determination of the relationship. The test of relationship
will involve creating a scatterplot diagram, and analyzing the trends of the data points
(observations). If the relationship that exist between these two variables is deemed to be non-
linear, the Spearman’s rank approach will be utilized to determine the correlation. Otherwise, the
Pearson’s method of working out correlation coefficient will be adopted. In the event the
Pearson’s method has been utilized, and the correlation coefficient found to be statistically strong
enough (more than +/- 0.5), then a regression equation will be calculated, and used to create a
linear fit regression line, which can be used to predict the shoe sizes for known heights of
persons, and vice versa. Towards the end, a comprehensive conclusion, and evaluation section
will be developed.
Surname 4
Hypothesis
In conducting this investigative study, it is hypothesized that the shoe size will very linearly with
the person’s height. This hypothesis is in line with the earlier revelation in the introductory
section that tall persons tend to have large-sized feet, compared to short individuals.
Data Collection
This mathematical piece of study adopted primary method of data collection. A sample of 50 IB
students was used, comprising both female, and male subjects randomly selected. The students
were then briefed on what activity they were to be subjected into, and their consent sought. Their
shoe sizes were checked, and their personal heights measured using a meter rule. The pair of
datasets measured were then recorded into the data table, presented into the appendix section of
this investigative study. Notably, use of a sample of 50 students would validate the findings.
Definition of Variables
In order to ensure clarity in the process of finding out the direction, and strength of the
correlation between a person’s shoe size, and height, the pair of variables had to be defined, such
that:
Independent variable (x): Person’s height, measured to the nearest centimeter (cm)
It is a statistical test utilized to determine whether the difference in means of any pair of
population or categorical variables are statistically different or not (Zach 1). Thus, this test would
Surname 5
be suitable in the determination of whether the statistical mean differences between the heights
of male, and female subjects is significant before utilizing the data in conducting test of
correlation. In conducting the sample t-test, the data involved have to meet the following
The data in either sample have been collected using a random sampling approach.
The statistical value of t-test is computed using the following general formula:
( x 1−x 2 )
Test statistics ,t stat . =
Sp
√ 1 1
+
n1 n2
Where:
√
S p= ( n1−1 ) s 12+ ( n2−1 ) s22
The test statistics is compared with a critical value read from the distribution table of t-test based
on a predefined significance level, and degree of freedom determined through the following
basic formula:
In the event the critical value is found to be more in value than the test statistics value, then the
To run the t-test using the heights of persons, the following seven steps were used:
Step 1: The data set on person’s height was sorted into two; for the male, and female subjects, as
Null hypothesis, H0: No significant mean difference between heights of males, and females
Alternative hypothesis, H1: Significant mean difference between heights of males, and females
Step 3: The sample mean of the male subjects, and the corresponding standard deviation:
'
Sample mean of person s height for the female subjects , x1=
∑x
n
3454
¿
22
¿ 157 cm
Standard deviation , s 1=
√( 156−157 )2 + ( 156−157 )2 +..+ ( 15 0−157 )2
22
¿ 8.4 35 cm
Surname 7
Step 4: The sample mean of the male subjects, and the corresponding standard deviation:
'
Sample mean of person s height for the male subjects , x2 =
∑x
n
4385
¿
28
¿ 15 6 .6 07 cm
Standard deviation , s 1=
√
( 1 45−156.607 )2 + ( 1 55−156.607 )2+..+ (1 67−156.607 )2
28
¿ 7.397 cm
√
S p= ( n1−1 ) s 12+ ( n2−1 ) s22
¿ 41.156 cm
( 157−156.607 )
Test statistics ,t stat . =
41.157
√ 1 1
+
22 28
¿ 0.0335
df =(22+28)−2
¿ 48
Using this degree of freedom, and 0.05 significance level, the critical value from distribution
Since the the critical value is found to be more in value than the test statistics value, then the null
hypothesis is deemed to hold, indicating that there was no significant mean difference between
heights of males, and females. Thus, the data sets would be used jointly in the rest of statistical
analysis
The data set presented in Table 1 was copied into the graphing application of SPSS, and used to
A closer examination on the box, and whisker plot in Figure 1 above reveals that the
sizes of the whisker on the upper side of the central blue box was larger compared to the size if
the whisker on the lower side, an indication of positive skewness in the person’s height dataset.
In addition, the data point 33 on the whisker indicates that the corresponding height of subject
33; 81 cm is an outlier that had to be struck out the distribution in the subsequent statistical
analysis of data. On the other hand, there was no identified outlier in the dataset of shoe sizes in
the box, and whisker plot presented in Figure 2 above, even though the whisker on the upper
side of the central blue box was larger compared to the size if the whisker on the lower side,
Determination of Means
The means of the two statistical variables would be calculated using the same approach that had
been adopted in the two-sample t-test section, but having to strike the outlier identified in the
'
Mean of Person s height , x =
∑ xi
N
145+155+156+ …+167
x=
49
¿ 156.286 cm
5.0+5.4 +5.0+…+6.5
y=
49
¿ 5.659 US
From the two computations, the means of shoe sizes, and heights of persons considered in the
statistical study were 5.659 US size, and 156.286 cm, respectively. These values would find
It is a statistical test used to determine if two categorical or measurable variables are related on
not (Biswal 1). The chi-square test is calculated using the following general formula:
2
Chi−square Test , x c =
∑ ( Oi−Ei ) 2 (Biswal2)
Ei
Where:
O=Observed value
Surname 12
E=Observed value
The degree of freedom (df) statistical calculation can vary with the type of statistical test under
study. For chi-square test, df is computed as a function of the total number of rows, and columns
of a distribution table of comprising either observed values or expected values, such that:
This degree of freedom is used to determine the critical value of the chi-square at a
specified significance level. One the critical value has been computed; it is compared with the
test value. In the event the critical value is found to be more than the test value, a null hypothesis
is supported, indicating that there is no significant association or relationship between the pair of
categorical variables that are being studied. Otherwise, the null hypothesis would be supported,
indicating that there the variables are related. In finding out whether the person’s height, and
Null hypothesis, H0: There is no significant relationship person’s height, and shoe size
Alternative hypothesis, H1: There is a significant relationship person’s height, and shoe size
Step 2: Creation of frequency distribution table for the observed values of shoe size based on
defined ranges of person’s heights. A 3 column by 3 row was created as illustrated below.
Surname 13
Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5. 5< y ≤7.0 7.0< y ≤ 8 .5 Total
135.0< x ≤150.0 48.5 0.0 0.0 48.5
1 50.0< x ≤ 165 .0 110.9 82.4 0.0 193.3
1 65 .0< x ≤ 180. 0 0.0 21.0 22.5 43.5
Total 159.4 103.4 22.5 285.3
Step 3: Creation of frequency distribution table for the expected values of shoe sizes, as a
193.5 ×159.4
Expected Value , E=
285.3
¿ 108.1
Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5.5< y ≤ 7.0 7.0< y ≤ 8.5 Total
135.0< x ≤150.0 27.1 17.6 3.8 48.5
1 50.0< x ≤ 1 65.0 108.0 70.1 15.3 193.3
1 65.0< x ≤ 180.0 24.3 15.7 3.4 43.5
Total 159.4 103.4 22.5 285.3
Step 4: Computation of test statistic of chi-square. The values from Tables 2, and 3 were used in
this computation:
¿ 189 .22
¿4
With this df, and 0.05 significance level, the critical value read from the distribution table was
9.49.
Since the critical value is found to be less than the test value, a null hypothesis is supported,
indicating that there was a significant relationship person’s height, and shoe size.
Scatterplot graphical plot is used in statistical tests to determine the nature of relationship
between any two categorical variables (Lumen Learning 1). The relationship between such
No relationship: When the observations on the scatterplot graph don not assumes
The data set presented in Table 6 (minus the outlier) was copied into the graphing application of
specified pattern, and direction, hence a case of linear relationship between shoe sizes, and the
person’s height. Based on this revelation, the Pearson’s method would be preferred to
Spearman’s rank correlation in the determination of correlation coefficient for strength, and
direction analysis.
In statistical analysis, correlation is considered to be the measure of association between any pair
of categorical or numerical variables (Kiernan 2). A correlation is deemed to exists between such
variables if they are related to each other. As observed in the preceding section, correlation
determination stems from scatterplot graph, which determines the nature of relationship. For
to determine the direction, and strength of the correlation between the variables. The PPMCC is
calculated through the following general formula, in line with the study of Kierman (3):
S xy
PPMCC , r=
√ S xx × S yy
Where:
S xy =Convariant of combined variables x ( independent ) ,∧ y (dependent )
It is a unitless quantity
When positive, it indicates direct relationship, and when negative, it indicates indirect
relationship.
Positive Negative
No correlation 0 0
Building on this foundation, the data set presented in Table 6 (minus the outlier) was used to
Table 5: PPMCC Distribution Table for the Person's Height (x), and the Shoe Size (y)
x y xy x
2
y
2
Determination of correlation coefficient using the data sets obtained above involved use of four
different steps:
(∑ x )
2
S xx =∑ x −
2
n
( 7658 )2
¿ 1 199268−
49
¿ 2432
(∑ y )
2
S yy =∑ y −
2
n
Surname 19
( 277.3 )2
¿ 1600.37−
49
¿ 31.078
( ∑ x )( ∑ y )
S xy=∑ xy −
n
(7658 )( 277.37 )
¿ 43587.2−
49
¿ 249.171
S xy
PPMCC , r=
√ S xx × S yy
249.171
¿
√2432 ×31.078
¿ 0. 9063
Hence, there is a very strong, and positive correlation between the person’s height, and the shoe
size.
When the correlation between the two categorical variables has been ascertained to be strong
enough, the relationship between then could be modeled through a linear regression equation.
The dependent variable can be determined as a function of the explanatory or predictor variable.
According to Kierman (5), the regression equation is defined by the following formula:
Surname 20
y=bx+ b0
Where:
S xy
b=
S xx
b 0= y−b x
The regression equation that could be used to model or predict the shoe size of an individual
based on their heights for this investigation could be calculated using three major steps:
S xy 249.171
b= =
S xx 2432
¿ 0. 102
b 0=5.659−0.102× 156.286
¿−10 .282
The regression equation could now be expressed with the values obtained in steps 1, and 2
above:
y=0.102 x−10.282
Surname 21
For instance, and in a sample calculation, when the height of a person is 145 cm, the shoe size
would be:
y=0.102(145)−10.282
¿ 4.508
Similar computations were made with substitution of values of x from Table 6 from the
appendix, and results used to create the linear regression graph as a linear fit presented in the
Figure below.
In sum, this investigation has achieved the designed aim in finding both the direction, and
strength of the correlation between a person’s shoe size, and height. The data of shoe size, and
heights will be randomly measured from a randomly selected sample of 50 individuals. It had
been hypothesized that the shoe size would vary linearly with the person’s height. Upon
Surname 22
analyzing datasets through several statistical computations, the correlation coefficient was found
to be 0.9028, indicative of strong, and positive correlation between shoe size, and person’s
height. Hence, the hypothesis of the investigation was supported. The strengths of this
investigation included use of several statistical tests, and use of SPSS software in analyzing, and
presenting data. The only limitation was lack of more than one trial of measurements for each
subject, which would have improved the accuracy, and validation of data.
Surname 23
Works Cited
Biswal, Avijeet. “What Is a Chi-Square Test? Formula, Examples & Uses | Simplilearn.”
square-test.
Cleveland Clinic. “Shoes Getting Tight? Why Your Feet Change Size over Time.” Cleveland
time/.
milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-correlation-
and-simple-linear-regression/.
Lumen Learning. “Chapter 7: Correlation and Simple Linear Regression | Natural Resources
natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/.
reachmd.com/news/what-factors-influence-a-persons-height/1632279/.
Zach. “Two Sample T-Test: Definition, Formula, and Example.” Statology, 23 Apr. 2020,
www.statology.org/two-sample-t-test/.
Surname 24
Appendix
Table 6: Raw Data on the Shoe Size, and the Heights of Persons