You are on page 1of 25

Surname 1

Mathematics Internal Assessment

Applications and Interpretations

Standard Level, SL

Correlation Between a Person's Height and Shoe Size

Session:

Number of Pages: 12 pages


Surname 2

Introduction

As a biology student with interest in both human physiology, and anatomy, over the years, I have

been fascinated by the diverse variations, and unique characteristics in human beings. In

particular, I have noted that human beings have unique heights, shoe size, and tongue rolling,

among others. In addition, and through personalized research, I have read from the different

pieces of online studies, including ReachMD (1) and Cleveland Clinic (2) that tall persons tend

to have large-sized feet, compared to short individuals. In the backdrop of these developments, I

have had an opportunity to learn, and interact with various forms of statistical studies, and tests

in the standard level mathematics, including correlation tests, regression analysis, and chi-tests,

among others, and how they can be used to relationships between two or more variables. In an

attempt to gain more understanding on how these concepts can be utilized in real-life situations,

and relationships between shoe size, and heights of persons, I decided to design this investigative

study. Thus, this correlation-based investigation will not only be an academic exercise, but also a

quest to unravel the existing mysteries between different patterns of growth; size of feet

(measurable) by the shoe size, and the heights of persons.

shoe design proces


Surname 3

Aims of the Investigation

This investigation is designed to find out both the direction, and strength of the correlation

between a person’s shoe size, and height. The data of shoe size, and heights will be randomly

measured from a randomly selected sample of 50 individuals. After collecting the datasets, they

will be used to run a two-sample t-test to find out whether the statistical mean differences

between the heights of male, and female subjects is significant. The data sets will then be used to

create box, and whisker plots in the SPSS statistical software to identify, and strike out all the

outliers that could be present. Thereafter, the means of all heights of the individuals, and also for

the shoe sizes will be computed to be utilized in other sections of the investigation. The chi-

square test of independence will then be conducted to find out whether the two variables are

associated or not, paving way for the determination of the relationship. The test of relationship

will involve creating a scatterplot diagram, and analyzing the trends of the data points

(observations). If the relationship that exist between these two variables is deemed to be non-

linear, the Spearman’s rank approach will be utilized to determine the correlation. Otherwise, the

Pearson’s method of working out correlation coefficient will be adopted. In the event the

Pearson’s method has been utilized, and the correlation coefficient found to be statistically strong

enough (more than +/- 0.5), then a regression equation will be calculated, and used to create a

linear fit regression line, which can be used to predict the shoe sizes for known heights of

persons, and vice versa. Towards the end, a comprehensive conclusion, and evaluation section

will be developed.
Surname 4

Hypothesis

In conducting this investigative study, it is hypothesized that the shoe size will very linearly with

the person’s height. This hypothesis is in line with the earlier revelation in the introductory

section that tall persons tend to have large-sized feet, compared to short individuals.

Data Collection

This mathematical piece of study adopted primary method of data collection. A sample of 50 IB

students was used, comprising both female, and male subjects randomly selected. The students

were then briefed on what activity they were to be subjected into, and their consent sought. Their

shoe sizes were checked, and their personal heights measured using a meter rule. The pair of

datasets measured were then recorded into the data table, presented into the appendix section of

this investigative study. Notably, use of a sample of 50 students would validate the findings.

Definition of Variables

In order to ensure clarity in the process of finding out the direction, and strength of the

correlation between a person’s shoe size, and height, the pair of variables had to be defined, such

that:

 Independent variable (x): Person’s height, measured to the nearest centimeter (cm)

 Dependent variable (y): Shoe size, measured to US sizes.

Two Sample t-test

It is a statistical test utilized to determine whether the difference in means of any pair of

population or categorical variables are statistically different or not (Zach 1). Thus, this test would
Surname 5

be suitable in the determination of whether the statistical mean differences between the heights

of male, and female subjects is significant before utilizing the data in conducting test of

correlation. In conducting the sample t-test, the data involved have to meet the following

assumptions in line with the study of Zach (1):

The observations in samples be independent of each other

The data be normally distributed

The sample have approximately the same level of variance

The data in either sample have been collected using a random sampling approach.

The t-test operates under a pair of defined hypotheses, such that:

Null hypothesis, H0: There is not significant mean difference

Alternative hypothesis, H1: There is significant mean difference

The statistical value of t-test is computed using the following general formula:

( x 1−x 2 )
Test statistics ,t stat . =
Sp
√ 1 1
+
n1 n2

Where:

x 1∧x 2=Sample means of the two variables represented

n1∧n2=¿ of the sample s


S p= ( n1−1 ) s 12+ ( n2−1 ) s22

s1∧s 2=Standard deviations of the samples


Surname 6

The test statistics is compared with a critical value read from the distribution table of t-test based

on a predefined significance level, and degree of freedom determined through the following

basic formula:

Degree of freedom , df =(n ¿ ¿ 1+n 2)−2 ¿

In the event the critical value is found to be more in value than the test statistics value, then the

null hypothesis is deemed to hold. Otherwise, the alternative hypothesis is supported.

To run the t-test using the heights of persons, the following seven steps were used:

Step 1: The data set on person’s height was sorted into two; for the male, and female subjects, as

shown in Table 1 below.

Step 2: Definition of hypotheses:

Null hypothesis, H0: No significant mean difference between heights of males, and females

Alternative hypothesis, H1: Significant mean difference between heights of males, and females

Step 3: The sample mean of the male subjects, and the corresponding standard deviation:

'
Sample mean of person s height for the female subjects , x1=
∑x
n

3454
¿
22

¿ 157 cm

Standard deviation , s 1=
√( 156−157 )2 + ( 156−157 )2 +..+ ( 15 0−157 )2
22

¿ 8.4 35 cm
Surname 7

Table 1: Distribution Table of Heights of Persons Based on Gender

Person’s Height, x (cm)


Count, n
Females Males
1 156 145
2 157 155
3 151 158
4 154 157
5 151 152
6 149 153
7 170 150
8 164 152
9 155 152
10 162 163
11 153 163
12 148 174
13 156 153
14 174 157
15 181 156
16 155 152
17 150 150
18 152 147
19 154 156
20 156 146
21 156 154
22 150 165
23 160
24 162
25 149
26 164
27 173
28 167
∑ ,∑ x 3454 4385
Surname 8

Step 4: The sample mean of the male subjects, and the corresponding standard deviation:

'
Sample mean of person s height for the male subjects , x2 =
∑x
n

4385
¿
28

¿ 15 6 .6 07 cm

Standard deviation , s 1=

( 1 45−156.607 )2 + ( 1 55−156.607 )2+..+ (1 67−156.607 )2
28

¿ 7.397 cm

Step 5: The pooled standard deviation was then computed:


S p= ( n1−1 ) s 12+ ( n2−1 ) s22

¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2

¿ √ ( 22−1 ) 8.4352 + ( 28−1 ) 7.397 2

¿ 41.156 cm

Step 6: Calculation of test statistic

( 157−156.607 )
Test statistics ,t stat . =
41.157
√ 1 1
+
22 28

¿ 0.0335

Step 7: Critical value, and decision:


Surname 9

df =(22+28)−2

¿ 48

Using this degree of freedom, and 0.05 significance level, the critical value from distribution

table was 1.665.

Since the the critical value is found to be more in value than the test statistics value, then the null

hypothesis is deemed to hold, indicating that there was no significant mean difference between

heights of males, and females. Thus, the data sets would be used jointly in the rest of statistical

analysis

Box and Whisker Plots for the Identification of Outliers

The data set presented in Table 1 was copied into the graphing application of SPSS, and used to

create box, and whisker plots shown below

Figure 1: Box, and Whisker Plot for the Person's Height


Surname 10

Figure 2: Box, and Whisker Plot for the Shoe Size

A closer examination on the box, and whisker plot in Figure 1 above reveals that the

sizes of the whisker on the upper side of the central blue box was larger compared to the size if

the whisker on the lower side, an indication of positive skewness in the person’s height dataset.

In addition, the data point 33 on the whisker indicates that the corresponding height of subject

33; 81 cm is an outlier that had to be struck out the distribution in the subsequent statistical

analysis of data. On the other hand, there was no identified outlier in the dataset of shoe sizes in

the box, and whisker plot presented in Figure 2 above, even though the whisker on the upper

side of the central blue box was larger compared to the size if the whisker on the lower side,

again an indication of positive skewness in the shoe size dataset.

Determination of Means

The means of the two statistical variables would be calculated using the same approach that had

been adopted in the two-sample t-test section, but having to strike the outlier identified in the

preceding section, leaving 49 data points, such that:


Surname 11

'
Mean of Person s height , x =
∑ xi
N

145+155+156+ …+167
x=
49

¿ 156.286 cm

Mean of shoe sizes , y=


∑ yi
N

5.0+5.4 +5.0+…+6.5
y=
49

¿ 5.659 US

From the two computations, the means of shoe sizes, and heights of persons considered in the

statistical study were 5.659 US size, and 156.286 cm, respectively. These values would find

significant application in the subsequent sections of this mathematics investigative study

Chi-Square Test for Independence

It is a statistical test used to determine if two categorical or measurable variables are related on

not (Biswal 1). The chi-square test is calculated using the following general formula:

2
Chi−square Test , x c =
∑ ( Oi−Ei ) 2 (Biswal2)
Ei

Where:

C=The degree of freedo m

O=Observed value
Surname 12

E=Observed value

The degree of freedom (df) statistical calculation can vary with the type of statistical test under

study. For chi-square test, df is computed as a function of the total number of rows, and columns

of a distribution table of comprising either observed values or expected values, such that:

The degree of freedo m , C=(No . of rows−1)×(No .of columns−1)

This degree of freedom is used to determine the critical value of the chi-square at a

specified significance level. One the critical value has been computed; it is compared with the

test value. In the event the critical value is found to be more than the test value, a null hypothesis

is supported, indicating that there is no significant association or relationship between the pair of

categorical variables that are being studied. Otherwise, the null hypothesis would be supported,

indicating that there the variables are related. In finding out whether the person’s height, and

shoe sizes are related, six different steps were adopted.

Step 1: Formulation of hypothesis:

Null hypothesis, H0: There is no significant relationship person’s height, and shoe size

Alternative hypothesis, H1: There is a significant relationship person’s height, and shoe size

Step 2: Creation of frequency distribution table for the observed values of shoe size based on

defined ranges of person’s heights. A 3 column by 3 row was created as illustrated below.
Surname 13

Table 2: Frequency Distribution for the Observed Values of Shoe Sizes

Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5. 5< y ≤7.0 7.0< y ≤ 8 .5 Total
135.0< x ≤150.0 48.5 0.0 0.0 48.5
1 50.0< x ≤ 165 .0 110.9 82.4 0.0 193.3
1 65 .0< x ≤ 180. 0 0.0 21.0 22.5 43.5
Total 159.4 103.4 22.5 285.3

Step 3: Creation of frequency distribution table for the expected values of shoe sizes, as a

functions summation of rows, and columns from Table 2 above:

Row Total ×Column Total


Expected Value , E=
Tot al of all Observations

In a sample computations, using the total in row 2, and column 1:

193.5 ×159.4
Expected Value , E=
285.3

¿ 108.1

Table 3: Frequency Distribution for the Expected Values of Shoe Sizes

Shoe Size, y
Person’s Height 4.0 < y ≤ 5.5 5.5< y ≤ 7.0 7.0< y ≤ 8.5 Total
135.0< x ≤150.0 27.1 17.6 3.8 48.5
1 50.0< x ≤ 1 65.0 108.0 70.1 15.3 193.3
1 65.0< x ≤ 180.0 24.3 15.7 3.4 43.5
Total 159.4 103.4 22.5 285.3

Step 4: Computation of test statistic of chi-square. The values from Tables 2, and 3 were used in

this computation:

2 ( 48.5−27.1 )2 ( 0.0−17.6 )2 ( 22.5−3.4 )2


Chi−square Test , x c = + +…+
27.1 17.6 3.4
Surname 14

¿ 189 .22

Step 6: Determination of critical value, and decision

The degree of freedo m , C=(3−1)×(3−1)

¿4

With this df, and 0.05 significance level, the critical value read from the distribution table was

9.49.

Since the critical value is found to be less than the test value, a null hypothesis is supported,

indicating that there was a significant relationship person’s height, and shoe size.

Graphical Representation: Scatterplot Graph

Scatterplot graphical plot is used in statistical tests to determine the nature of relationship

between any two categorical variables (Lumen Learning 1). The relationship between such

variables could be non-linear, linear, or no relationship:

Linear relationship: When the observations on the scatterplot graph assumes a

specified pattern, and direction.

Non-linear relationship: When the observations on the scatterplot graph assumes a

specified pattern but not clear direction.

No relationship: When the observations on the scatterplot graph don not assumes

specified pattern, and direction

The data set presented in Table 6 (minus the outlier) was copied into the graphing application of

SPSS, and used to scatterplot shown below.


Surname 15

Figure 3: A Scatterplot of Shoe Size against Person's Height


The scatterplot shown above reveals that the observations on the scatterplot graph assumes a

specified pattern, and direction, hence a case of linear relationship between shoe sizes, and the

person’s height. Based on this revelation, the Pearson’s method would be preferred to

Spearman’s rank correlation in the determination of correlation coefficient for strength, and

direction analysis.

Linear Correlation: Pearson’s Correlation Coefficient

In statistical analysis, correlation is considered to be the measure of association between any pair

of categorical or numerical variables (Kiernan 2). A correlation is deemed to exists between such

variables if they are related to each other. As observed in the preceding section, correlation

determination stems from scatterplot graph, which determines the nature of relationship. For

linear relationship, a Pearson’s Product Moment Correlation Coefficient (PPMCC) is calculated


Surname 16

to determine the direction, and strength of the correlation between the variables. The PPMCC is

calculated through the following general formula, in line with the study of Kierman (3):

S xy
PPMCC , r=
√ S xx × S yy
Where:
S xy =Convariant of combined variables x ( independent ) ,∧ y (dependent )

S x x =Convariant of variable x ( independent )

S yy =Convariant of variable y ( dependent )

The properties of correlation coefficient, r include:

It takes any value between -1, and +1

It is a unitless quantity

When positive, it indicates direct relationship, and when negative, it indicates indirect

relationship.

The correlation coefficient assumes different interpretations:

Table 4: Analysis of Strength of Correlation Coefficient

Strength of Correlation Coefficient Value of Correlation Coefficient, (r)

Positive Negative

No correlation 0 0

Weak correlation 0.1 up to 0.3 -0.1 up to -0.3

Fairly strong correlation 0.3 up to 0.5 -0.3 up to -0.5

Strong correlation 0.5 up to 0.75 -0.5 up to -0.75

Very strong 0.75 up to 0.99 -0.75 up to -0.99

Perfectly strong correlation 1.0 -1.0


Surname 17

Building on this foundation, the data set presented in Table 6 (minus the outlier) was used to

create a Pearson’s distribution table, as illustrated below.

Table 5: PPMCC Distribution Table for the Person's Height (x), and the Shoe Size (y)

x y xy x
2
y
2

145 5.0 725.00 21025.00 25.00


155 5.4 837.00 24025.00 29.16
156 5.0 780.00 24336.00 25.00
157 6.5 1020.50 24649.00 42.25
158 6.4 1011.20 24964.00 40.96
157 7.0 1099.00 24649.00 49.00
151 4.5 679.50 22801.00 20.25
152 5.0 760.00 23104.00 25.00
153 5.0 765.00 23409.00 25.00
154 6.0 924.00 23716.00 36.00
151 5.5 830.50 22801.00 30.25
150 5.0 750.00 22500.00 25.00
149 5.0 745.00 22201.00 25.00
152 5.5 836.00 23104.00 30.25
152 5.5 836.00 23104.00 30.25
170 7.0 1190.00 28900.00 49.00
164 6.5 1066.00 26896.00 42.25
163 6.0 978.00 26569.00 36.00
163 6.0 978.00 26569.00 36.00
155 5.5 852.50 24025.00 30.25
162 6.5 1053.00 26244.00 42.25
174 7.5 1305.00 30276.00 56.25
153 5.0 765.00 23409.00 25.00
153 5.0 765.00 23409.00 25.00
157 5.5 863.50 24649.00 30.25
156 6.0 936.00 24336.00 36.00
148 4.5 666.00 21904.00 20.25
152 5.0 760.00 23104.00 25.00
150 5.0 750.00 22500.00 25.00
156 5.5 858.00 24336.00 30.25
174 7.5 1305.00 30276.00 56.25
147 4.5 661.50 21609.00 20.25
Surname 18

156 5.5 858.00 24336.00 30.25


146 4.5 657.00 21316.00 20.25
155 5.5 852.50 24025.00 30.25
154 5.5 847.00 23716.00 30.25
150 5.0 750.00 22500.00 25.00
165 6.5 1072.50 27225.00 42.25
160 6.0 960.00 25600.00 36.00
162 6.0 972.00 26244.00 36.00
152 5.0 760.00 23104.00 25.00
154 5.5 847.00 23716.00 30.25
156 5.5 858.00 24336.00 30.25
149 5.0 745.00 22201.00 25.00
164 6.5 1066.00 26896.00 42.25
156 5.5 858.00 24336.00 30.25
150 5.0 750.00 22500.00 25.00
173 7.5 1297.50 29929.00 56.25
167 6.5 1085.50 27889.00 42.25
∑ x =7658 ∑ y=277.3 ∑ x y=43587.2 ∑ x 2=1 1 99268 ∑ 2
y =1600.37

Determination of correlation coefficient using the data sets obtained above involved use of four

different steps:

Step 1: Computation of covariant of x:

(∑ x )
2

S xx =∑ x −
2
n

( 7658 )2
¿ 1 199268−
49

¿ 2432

Step 2: Computation of covariant of y:

(∑ y )
2

S yy =∑ y −
2
n
Surname 19

( 277.3 )2
¿ 1600.37−
49

¿ 31.078

Step 3: Computation of combined covariant, xy:

( ∑ x )( ∑ y )
S xy=∑ xy −
n

(7658 )( 277.37 )
¿ 43587.2−
49

¿ 249.171

Step 4: Computation of correlation coefficient, and interpretation:

S xy
PPMCC , r=
√ S xx × S yy

249.171
¿
√2432 ×31.078
¿ 0. 9063
Hence, there is a very strong, and positive correlation between the person’s height, and the shoe

size.

Line of Regression, and Estimation of Shoe Sizes

When the correlation between the two categorical variables has been ascertained to be strong

enough, the relationship between then could be modeled through a linear regression equation.

The dependent variable can be determined as a function of the explanatory or predictor variable.

According to Kierman (5), the regression equation is defined by the following formula:
Surname 20

y=bx+ b0

Where:

S xy
b=
S xx

b 0= y−b x

The regression equation that could be used to model or predict the shoe size of an individual

based on their heights for this investigation could be calculated using three major steps:

Steps: Determination of constant, b:

S xy 249.171
b= =
S xx 2432

¿ 0. 102

Step 2: Determination of constant, b0:

b 0=5.659−0.102× 156.286

¿−10 .282

Step 3: Determination of regression equation:

The regression equation could now be expressed with the values obtained in steps 1, and 2

above:

y=0.102 x−10.282
Surname 21

For instance, and in a sample calculation, when the height of a person is 145 cm, the shoe size

would be:

y=0.102(145)−10.282

¿ 4.508

Similar computations were made with substitution of values of x from Table 6 from the

appendix, and results used to create the linear regression graph as a linear fit presented in the

Figure below.

Figure 4: A Linear Regression Plot of Shoe Size against Person's Height

Conclusion, and Evaluation

In sum, this investigation has achieved the designed aim in finding both the direction, and

strength of the correlation between a person’s shoe size, and height. The data of shoe size, and

heights will be randomly measured from a randomly selected sample of 50 individuals. It had

been hypothesized that the shoe size would vary linearly with the person’s height. Upon
Surname 22

analyzing datasets through several statistical computations, the correlation coefficient was found

to be 0.9028, indicative of strong, and positive correlation between shoe size, and person’s

height. Hence, the hypothesis of the investigation was supported. The strengths of this

investigation included use of several statistical tests, and use of SPSS software in analyzing, and

presenting data. The only limitation was lack of more than one trial of measurements for each

subject, which would have improved the accuracy, and validation of data.
Surname 23

Works Cited

Biswal, Avijeet. “What Is a Chi-Square Test? Formula, Examples & Uses | Simplilearn.”

Simplilearn.com, 17 Feb. 2023, www.simplilearn.com/tutorials/statistics-tutorial/chi-

square-test.

Cleveland Clinic. “Shoes Getting Tight? Why Your Feet Change Size over Time.” Cleveland

Clinic, 27 Jan. 2020, health.clevelandclinic.org/shoes-getting-tight-feet-change-size-

time/.

Kiernan, Diane. “Chapter 7: Correlation and Simple Linear Regression.”

Milnepublishing.geneseo.edu, Open SUNY Textbooks, 16 Jan. 2018,

milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-correlation-

and-simple-linear-regression/.

Lumen Learning. “Chapter 7: Correlation and Simple Linear Regression | Natural Resources

Biometrics.” Courses.lumenlearning.com, 2019, courses.lumenlearning.com/suny-

natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/.

ReachMD. “What Factors Influence a Person’s Height?” Reachmd.com, 27 Jan. 2020,

reachmd.com/news/what-factors-influence-a-persons-height/1632279/.

Zach. “Two Sample T-Test: Definition, Formula, and Example.” Statology, 23 Apr. 2020,

www.statology.org/two-sample-t-test/.
Surname 24

Appendix

Table 6: Raw Data on the Shoe Size, and the Heights of Persons

No. of People, n Gender Height (cm) Shoe Size (U.S)


1 Male 145 5.0
2 Male 155 5.4
3 Female 156 5.0
4 Female 157 6.5
5 Male 158 6.4
6 Male 157 7.0
7 Female 151 4.5
8 Male 152 5.0
9 Male 153 5.0
10 Female 154 6.0
11 Female 151 5.5
12 Male 150 5.0
13 Female 149 5.0
14 Male 152 5.5
15 Male 152 5.5
16 Female 170 7.0
17 Female 164 6.5
18 Male 163 6.0
19 Male 163 6.0
20 Female 155 5.5
21 Female 162 6.5
22 Male 174 7.5
23 Male 153 5.0
24 Female 153 5.0
25 Male 157 5.5
26 Male 156 6.0
27 Female 148 4.5
28 Male 152 5.0
29 Male 150 5.0
30 Female 156 5.5
31 Female 174 7.5
32 Male 147 4.5
33 Female 181 8.0
Surname 25

34 Male 156 5.5


35 Male 146 4.5
36 Female 155 5.5
37 Male 154 5.5
38 Female 150 5.0
39 Male 165 6.5
40 Male 160 6.0
41 Male 162 6.0
42 Female 152 5.0
43 Female 154 5.5
44 Female 156 5.5
45 Male 149 5.0
46 Male 164 6.5
47 Female 156 5.5
48 Female 150 5.0
49 Male 173 7.5
50 Male 167 6.5

You might also like