Statistics Final Project Work (Shivaharipathak)

Department of STEAM Education, KUSOED
EDMT 512: Statistics (3) Final Project - 2023
Course Facilitator: Netra Kumar Manandhar
Prepared by: Shiva Hari Pathak
In the field of mathematics education, I have consider an example consider an example
study that examines the relationship between students' marks in mathematics and various factors
such as gender, writing hand, caste, occupation of parents, weight, distance of home from school,
and marks in science.
Population
The population for this study could be all students in a particular grade level, such as
Grade 10 students in a specific school.
Sampling:
A sample of students would be selected from the population for the study. One possible
sampling method could be simple random sampling, where each student in the population has an
equal chance of being selected. For example, a random sample of 100 students could be chosen.
The type of sampling are mention below
Simple Random Sampling

Simple random sampling is a method of selecting a sample from a population in which
every member of the population has an equal chance of being selected. This is the most basic
type of sampling method, and it is often used in mathematics education research. For example, a
researcher might randomly select a sample of students from a school district to participate in a
study on mathematics achievement.
Stratified Sampling
Stratified sampling is a method of sampling that divides the population into strata, or
groups, and then randomly selects a sample from each stratum. This method is often used in
mathematics education research when the researcher wants to ensure that the sample is
representative of the population in terms of certain characteristics, such as gender, ethnicity, or
socioeconomic status. For example, a researcher might stratify a sample of students by gender
and then randomly select a sample of students from each gender group.
Cluster Sampling
Cluster sampling is a method of sampling that divides the population into clusters, or
groups, and then randomly selects a cluster to participate in the study. This method is often used
in mathematics education research when the researcher wants to save time and money. For
example, a researcher might cluster schools together and then randomly select a sample of
schools to participate in a study on mathematics instruction.
Systematic Sampling
Systematic sampling is a method of sampling that selects every kth member of the
population, where k is a random number. This method is often used in mathematics education
research when the researcher wants to ensure that the sample is representative of the population
in terms of certain characteristics, such as grade level or class size. For example, a researcher
might randomly select a number between 1 and 100 and then select every 100th student in a
school district to participate in a study on mathematics achievement.
Variables
Binary variable:
 Gender (male or female)
 Whether a student passed or failed a mathematics test
Categorical Variable:
 Grade level (elementary, middle, high school)
 Race/ethnicity (Asian, Black, Hispanic, White)
 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)
Continuous Variable:
 Mathematics achievement score
 Number of hours a student studies mathematics per week
 Time it takes a student to solve a mathematics problem

Control Variable:
 Socioeconomic status (SES)
 Prior mathematics achievement
 Gender
Dependent Variable:
 Mathematics achievement
 Mathematics anxiety
 Mathematics self-efficacy
Independent Variable:
 Type of mathematics instruction
 Use of technology in mathematics instruction
 Teacher experience
Dichotomous Variable:
 Synonym for binary variable
Discrete Variable:
 Number of correct answers on a mathematics test

 Number of times a student raises their hand in class
Dummy variables:
 Gender (male = 1, female = 0)
 Grade level (elementary = 1, middle = 2, high school = 3)
Endogenous Variable:
Exogenous Variable:
 SES
 Gender
Independent Variable:
 Type of mathematics instruction
 Use of technology in mathematics instruction
 Teacher experience
Interval Variable:
 Synonym for continuous variable
Intervening Variable:
 Self-regulation
 Mathematical problem-solving skills
 Motivation
Latent Variable:
 Mathematical ability
 Mathematical aptitude
 Mathematical talent
Mediating variable:
 Synonym for intervening variable
Manifest Variable:
 Mathematics achievement test score
 Number of correct answers on a mathematics test
 Time it takes a student to solve a mathematics problem

Manipulated Variable:
 Synonymous with independent variable
Moderating Variable:
 SES
 Gender
Nominal Variable:
 Synonymous with categorical variable
Ordinal Variable:
 Mathematics grade (A, B, C, D, E)
 Level of mathematics anxiety (low, medium, high)
Outcome Variable:
Polychotomous Variables:
 Grade level (elementary, middle, high school)
 Race/ethnicity (Asian, Black, Hispanic, White)
 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)
Predictor Variable:
 SAT score
 Gender
Treatment Variable:
 Synonymous with independent variable
Measurement Scales:
- Gender and Writing Hand are nominal variables.
- Caste and Occupation of Parents are also nominal variables.
- Weight and Distance of Home from School are ratio scale variables.
- Marks in Mathematics and Marks in Science are interval scale variables.
Correlation
Correlation could be used to determine the relationship between variables. For example,
the study might analyze the correlation between students' marks in mathematics and their marks
in science.
Regression
Regression analysis could be used to predict students' marks in mathematics based on
other variables. For instance, the study might examine how well gender, writing hand, caste,
occupation of parents, weight, and distance from school predict a student's marks in
mathematics.
Normal Distribution
The study could examine whether students' marks in mathematics or other variables
follow a normal distribution, which would have implications for statistical analyses.
t-Test
A t-test could be used to compare the mean marks in mathematics between two groups, such as
male and female students.
One-Sample t-Test
 A teacher wants to know if her students who use a new math software program are performing
better than the average student in the country. She could use a one-sample t-test to compare the
average scores of her students to the national average on a standardized math test.
Independent Samples t-Test
 A researcher wants to know if there is a difference in the average math scores of students who
are taught using a traditional method and students who are taught using a constructivist method.
She could use an independent samples t-test to compare the average scores of the two groups on
a standardized math test.
Paired t-Test
 A teacher wants to know if her students' math scores improve after they receive tutoring. She
could use a paired t-test to compare the average scores of her students on a standardized math
test before and after they receive tutoring.
ANOVA
ANOVA could be used to compare the mean marks in mathematics among three or more
groups, such as different caste categories or parental occupation groups.
Mann Whitney U Test
The Mann Whitney U test could be employed to compare the median marks in
mathematics between two groups that do not meet the assumptions of a t-test, such as students
with left-handed and right-handed writing. By employing these statistical techniques and
analyzing the data collected from the sample, I think I can gain insights into the relationships
between variables and make inferences about the larger population of students in mathematics
education.
Task – II
Choose a specific area of research related to Mathematics education and learning. Create at least
seven variables (must be nominal, ordinal, and scale (interval & ratio)). Enter at least 200 data
(hypothetical) on SPSS and use SPSS to do quantitative analysis on the following topics (must
be in APA 7th Edition).
Descriptive Statistics
Table: 1
Descriptive Analysis of Marks in Mathematics
Marks in Mathematics
N Valid 200
Missing 0
Mean 54.51
Median 54.00
Mode 52
Std. Deviation 6.058
Variance 36.693
Skewness .160
Std. Error of Skewness .172
Kurtosis .106
Std. Error of Kurtosis .342
Range 32
Minimum 40
Maximum 72
Sum 10902
Percentiles 25 50.00
50 54.00
75 58.75
The above table shows the statistical measures of marks in mathematics, out of 200 valid
data of the respondents, the average marks in science is 64.96, median value is 65, mode 66,
standard deviation of marks in mathematics is 4.986, variance is 24.857, skewness is 0.151,
standard error of skewness is 0.172, kurtosis is -0.054, standard error of kurtosis is 0.342, the
difference between highest and lowest marks of the participants i.e., range is 29, minimum marks
is 53, maximum marks is 82 and their sum is 12991. This informative table shows the first,
second and third quartile are 61,65 and 68 respectively.
Figure:1
Pie-Chart of Respondents’ Religion

The figure alongside shows the occupation of parents. This shows that 45 percent of
respondents are Government job holder, 35 percent are private job holder, and 20 percent are
physical worker. The highest percentage is government job holder whereas least is physical
worker
Figure:2
Pie-Chart of caste of student

Figure 3
Simple Bar Chart of Caste of Student

The figure alongside shows the caste of respondent. This shows that 45 percent of
respondents are Bramin, 35 percent are Vasya, and 20 percent are other. The highest percentage
is Bharmin job holder whereas least is other.

Figure 4
Bar Chart of Occupation of Parents
The above bar-graph shows the count of different occupation of a parent. This shows that
90 parent are government job holder, 70 parents are private job holder and 40 parents are
Physical worker. Goverement job holder parents are more than double of Physical worker.
Figure:5
Simple Boxplot of Weight of student
The figure alongside shows the whisker plot of the weight of student. The median of the
dataset is 40.. The variables have no outlier.

Figure: 6
Simple Boxplot of marks in mathematics
The figure alongside shows the whisker plot of the marks in mathematics. The median of
the dataset is 65. The variables have outlier 7, which is significantly lower than rest of the data. It
is necessary to exclude the outlier from the original dataset before analyzing the data.
Figure:7
Clustered Boxplot of weight of student by caste of student by occupation of parent
The figure above represents the cluster boxplot of Weight of student by caste of the
student. Data shows that median weight of Bhamin is 39, Median Weight of Vaisya is 41 and
median weight of Other caste student is 40. It means there are similar weight of student
according to their caste. The data have no outlier

Figure: 8
Simple Histogram of Marks in Mathematics
The histogram displayed above represents the distance of home from school distribution
of the participants. It indicates that the majority of the data falls within a normal curve, with only
a few data points outside this range. It is suggested to exclude the outlier data for a more accurate
representation of the distance of home from school distribution.

Figure: 9
Simple Histogram of Marks in mathematics
The histogram displayed alongside represents the marks in mathematics distribution of the
participants. It indicates that the majority of the data falls within a normal curve, with only a few
data points outside this range. It is suggested to exclude the outlier data for a more accurate
representation of the marks in mathematics distribution.

Perform normal distributions of scale data (at least 2)
Table:2
Normal Distribution Scale Data of Age of Participants
Distance of Home from School

N Valid 200
Missing 0
Mean 5.02
Std. Error of Mean .128
Median 5.00
Mode 5
Variance 3.296
Skewness .016
Kurtosis -.155
Range 10
Minimum 0
Maximum 10
Sum 1004
Percentiles 10 3.00
20 4.00
25 4.00
30 4.00
40 5.00
50 5.00
60 5.00
70 6.00
75 6.00
80 7.00
90 7.00
The above table shows that mean and median of Distance of home from the school are quite
similar i.e., 5.02 and 5 respectively. With this we can say that the data are normally distributed.
On the other hands the value of kurtosis is -0.155 which lies between +2 and -2. It also proves
that the Distance of home from school are normally distributed. Let’s have a quick look on value
of skewness which is 0.016, very near to the zero (0). This also shows above data is normally
distributed. Let’s draw a normal curve of age of respondents.
Figure: 10
Normal Distribution of Distance of Home from School

This normal distribution curve presents the distance of home from school. We see the mean
and standard deviation are 5.02 and 1.816 respectively. Our data seems that approximately 68%
of the data are contained in ±1 standard deviation from the mean i.e., Approximately 68% of the
data belongs to 16.13±1.14⇒14.99-17.27. In the same 95% of the data are necessary to be
included between (μ±2σ) and 99% of the data necessary to be included between (μ±3σ). Another
indication of being a normal distribution is it seems symmetrical with the mean.
Table:3
Normal Distribution Scale Data of Marks in Mathematics of Participants
marks in Mathematics
N Valid 200
Missing 0
Mean 69.30
Std. Error of Mean .418
Median 69.50
Mode 71
Variance 34.943
Skewness .022
Kurtosis -.272
Range 31
Minimum 55
Maximum 86
Sum 13859
Percentiles 10 62.00
20 64.00
25 65.00
30 66.00
40 68.00
50 69.50
60 71.00
70 72.00
75 74.00
80 75.00
90 77.00
The above table shows that mean and median of marks of mathematics are quite similar i.e.,
69.30 and 69.5 respectively. With this we can say that the data are normally distributed. On the
other hands the value of kurtosis is 0.72 which lies between +2 and -2. It also proves that the data
in marks in mathematics are normally distributed. Let’s draw a normal curve of marks in
mathematics of respondents.
Figure: 11
Simple Histogram of Marks in Mathematics

This normal distribution curve presents the marks in mathematics. We see the mean and
standard deviation are 69.3 and 5.99respectively. Our data seems that approximately 68% of the
data are contained in ±1 standard deviation from the mean i.e., Approximately 68% of the data
belongs to 69.±5.99 In the same 95% of the data are necessary to be included between (μ±2σ)
and 99% of the data necessary to be included between (μ±3σ). Another indication of being a
normal distribution is it seems symmetrical with the mean.
Do some custom tables and cross-tabulation (At least 2)
Table:4
Custom Tables of Writing hand*Caste of student

Writing Hand
Right handed Left hand
Caste of student Caste of student
Bharmin Vaisya other Bharmi Vaisya other
n
Count Count Count Count Count Count
Writing Right 60 20 20 0 0 0
Hand handed
Left hand 0 0 0 30 50 20
The above table shows the custom of writing hand and caste of students. The highest
number of right-handed students are Brahmins, with 60 students. The lowest number of right-
handed students are from the "other" caste, with 20 students. The highest number of left-handed
students are from the "other" caste, with 50 students. The lowest number of left-handed students
are Brahmins, with 0 students. Brahmins: 60 students are right-handed and 0 students are left-
handed. Vaisyas: 20 students are right-handed and 0 students are left-handed. Other: 20 students
are right-handed and 50 students are left-handed. The table shows that there is a clear difference
in the distribution of writing hand by caste. Brahmins are more likely to be right-handed, while
students from the "other" caste are more likely to be left-handed. This difference may be due to a
number of factors, including cultural or genetic differences.

Table: 5
Cross-tabulation Mode Writing Hand*caste of student

Caste Of student Total
Brami Vaisy
n a Other
Writing Hand Right handed 60 20 20 100
Left hand 30 50 20 100
Total 90 70 40 200
The table shows the cross-tabulation of caste of student and writing hand. The highest
number of right-handed students are Brahmin, with 60 students. The lowest number of right-
handed students are Vaisya, with 20 students. The highest number of left-handed students are
other caste, with 50 students. The lowest number of left-handed students are Brahmin, with 20
students
Figure: 11
Simple Scatter with Fit Line of Weight of Participants by Age of Participants
The above scatter plot presents the relationship between the Weight of student and Marks in
mathematics. Looking at this scatter plot the data are spreading everywhere. It shows that there is
no linear relationship between the variables. The points are scattered across the graph without a
clear trend or pattern. However, there are some clusters in the lower regions of the middle part of
the figure, it represents a group with varying scores in both variable. One important attribute of
this above figure is there is no relationship between the marks in Math and weight of
participants.
Figure: 12
Grouped Scatter of Marks in Science and Weight of student by Occupation of Parent
The above scatter plot presents the relationship between the marks in science by weight
of student by occupation of parents. Looking at this scatter plot the data are spreading
everywhere. It shows that there is no linear relationship between the variables. The points are
scattered across the graph without a clear trend or pattern. However, there are some clusters in
the lower regions of the middle part of the figure, it represents a group with varying scores in
both subjects. One important attribute of this above figure is there is no relationship between the
marks in science and Weight of student by Occupation of parents.

Table: 6
Correlation between Marks in Science and Marks in Mathematics.
Marks in Marks in
Science Mathematics
Marks in Science Pearson Correlation 1 -0.054
Sig. (2-tailed) .444
N 200 200
Marks in Mathematics Pearson Correlation -0.054 1
N 200 200
The table shows that the degree correlations between the marks in science and mathematics.
The Pearson r-value is -0.54. It lies in between 0 to - 0.25. which means there is very weak
correlations between the variables that means marks in science does not effects marks in
mathematics of the respondents.

Table: 7
Correlation between Distance of Home from School and Weight of Students.
Age of Weight of
Participants Participant
Distance of home from Pearson Correlation 1 -0.08
school Sig. (2-tailed) .912
N 200 200
Weight of Participant Pearson Correlation .-0.08 1
N 200 200
The table shows that the degree correlations between distance of home from school and
weight of participants. The Pearson r-value is -0.08. It lies in between 0 to- 0.25. Which means
there is very weak correlations between the variables that means Distance of home from school
does not effects weight of participants.

Table:8
Regression of Marks of Mathematics on Marks in Science
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
1 (Constant) 68.137 4.162 16.370 .000
Marks in -0.046 .060 -0.054 -0.767 .444
Mathematics
a. Dependent Variable: Marks in Science
The table above provides a regression model of marks in mathematics on marks on
science. The model found that (F, (1,198) = 0.589, p (0.00)<0.05, with R2 =0.03 and coefficient=
-0.046. the R- square value explains that the 3% variability of marks in science can be described
by the variability of marks in mathematics.
We have regression equation:
Y= a+ b X
Score in Science = 68.137 -0.046*(Score in Mathematics)
This regression model shows that there is positive impact of independent variables on
dependent variables. That is, if we increase the score of mathematics by 1, there is -0.046
changes in score in science, or if we increase the score of mathematics by 100%, there is 4.6%
Negative change in score in science.

Table:9
Regression of Weight of Participants on Age of Participants.
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
1 (Constant) 5.133 1.561 9.397 .000
Weight of -0.02 .041 .067 .940 .912
Participant
a. Dependent Variable: Distance of home from school
The table above provides a regression model of weight of participants on distance of
home from school. The model found that (F, (1,198) = 0.012, p (0.00)>0.05, with R2 =-0.005 and
coefficient= - 0.02. the R- square value explains that the 0.5% variability of Distance of school
from home can be described by the variability of weight of participants.
We have regression equation:
Y= a+ b X
Weight of Participants = 14.665-0.002*(Distance of home from school)

This regression model shows that there is negative impact of independent variables on
dependent variables. That is, if we increase distance of school by 1, there is- 0.002 changes in
weight of participants, or if we increase the Distance of home by 100%, there is 0.2%negative
change in weight of participants.
Inferential Statistics
 Perform two one sample t-test analysis of any applicable variables.
One-sampled t-Test of weight of student
Null Hypothesis (H0): The mean weight of students is 40 ( μ= 40).
Alternative Hypothesis (H1): The mean weight of student is not 40 ( μ ≠ 40).
Table
One sampled t-Test of weight of student
t df Sig. (2-tailed) Mean Difference
Weight of student -0.937 199 0.350 -.405
From the table, we have t(199) = -0.937, p-value = 0.350 < α -value = 0.05. In this case,
we accept the null hypothesis and conclude that the mean score of the respondents in science is
around 40. Actual mean weight of student is 39.6
One-sampled t-Test of distance of home from school
Null Hypothesis (H0): The mean distance of home from school is 5 ( μ= 5).
Alternative Hypothesis (H1): The mean weight of student is not 5 ( μ ≠ 5).
Table
One sampled t-Test of Distance of Home from school
t df Sig. (2-tailed) Mean Difference

Distance of home
0.156 199 0.876 0.20
from school
From the table, we have t(199) = 0.156, p-value = 0.876 >α -value = 0.05. In this case, we
accept the null hypothesis and conclude that the mean distance of home from school is around 5
km. Actual mean distance is 5.02.
 Perform two independent sample t-test analysis of any applicable variables.
Independent Sample t-Test of Marks in Mathematics based writing Hand.
Null Hypothesis (H0): There is no significant difference between the mean scores of respondents
in mathematics based on writing hand.
Alternative Hypothesis (H1): There is significant difference between the mean scores of
respondents in mathematics based on writing hand.
Table 4
Independent Sample t-Test of Score in Score in Mathematics Based on Writing Hand
Sig. t df Sig. (2- Mean

tailed) Difference
marks in Equal variances .141 .825 198 .411 .690

Mathematics assumed
Equal variances .825 192.350 .411 .690
not assumed
The results of the t-test showed that t(198) = 0.825, p-value = 0.411 > α-value = 0.05. In
this case, we must fail to reject the null hypothesis. The conclusion is that there is no significant
difference between the mean marks of respondents in mathematics based on their writing hand.
The mean mark of respondents in mathematics who are right-handed is 69.64 and that of
respondents who are left-handed is 68.95. These values are almost equal.
Independent Sample t-Test of weight of student based on gender
Null Hypothesis (H0): There is no significant difference in the mean weight of respondents based
on their gender.
Alternative Hypothesis (H1): There is a significant difference in the mean weight of respondents
based on their gender.
Table 4
Independent Sample t-Test of weight of student Based on Gender
Sig. t df Sig. (2- Mean

tailed) Difference
Weight of Equal variances .925 .658 198 .511 .574

student assumed
Equal variances .659 185.575 .511 .574
not assumed
The results of the t-test showed that t(198) = 0.658, p-value = 0.511 > α-value = 0.05. In
this case, we must fail to reject the null hypothesis. The conclusion is that there is no significant
difference between the mean weight of respondents based on their gender. The mean weight of
respondents who are male is 39.92 and that of respondents who are female is 39.35. These values
are almost equal.

 Insert 50/50 data for pre-test and posttest by creating a new file in SPSS and do Paired
sampled t-test analysis.
Null Hypothesis (Ho): There is no difference between the mean score of pretest and posttest.
Alternative Hypothesis (Ha): There is difference between the mean scores of pretests and
posttest.
Table 1
Mean Difference Tests in Pretest and Posttest.
Parameter t df Sig. (Two-

tailed)
Pair 1 Pretest Marks - -4.993 49 0.00

Posttest Marks
From the table, we have t(49) = -4.993, p-value= 0.00 < α-value 0.05. The result Reject
null hypothesis and accept alternative hypothesis and conclusion is there is significant difference
between the mean scores in pretest and posttest. From the descriptive table, we have mean scores
in pretest and posttest 60.94 and 71.42 respectively. Mean score of Posttest is 10.48 more than
Pretest.
 Perform F-test (ANOVA) analysis of any two applicable variables.
Table 1
ANOVA of Distance of Home from School Based on Occupation of Parents
Parameters df F Sig.
Between Groups 2 0.164 0.849
Within Groups 197
Total 199
From the table we have F(2, 197)= 0.164, p-value= 0.849> α-value = 0.05. in this case,
we have accepted hypothesis and conclude that there are no significant differences of distance of
home from school based on the occupation of parents. i.e occupation of parent has no effect on
the distance of home from school We can see the descriptive values in the following tables.
N Mean Std. Deviation
Government 90 5.02 1.767

Private 70 4.94 1.948
Physical worker 40 5.15 1.718
Total 200 5.02 1.816
From table we have descriptive values, Government job (N = 90, M= 5.02, SD= 1.767),
Private (N =70, M= 4.94, SD= 1.948), and Physical Tool (N = 40, M= 5.15, SD= 1.718). The
data shows that each category has almost equal mean distance of home from school. It leads to
conclude that there is no effect of occupation of parents to Distance of home from school.
 Perform Goodness of fit test analysis of one variable.
Ho: There is no significant difference between the observed and expected gender of respondents
Ha: There is significant difference between the observed and expected gender of respondents
Table 1
Observation of Gender Of student

Gender of student
Chi-Square 3.350a
df 1
Asymp. Sig. .066
a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell
frequency is 100.
From above table, we have χ2 (1) = 3.380, p-value = 0.066 > α-value= 0.05. In this case,
we retain null hypothesis and conclude that there is no significant difference between observed
and expected frequencies between male and female Respondent. In this case, The observe
frequency of male and female are 90 and 110 respectively.
Table 2
Frequencies Table of Gender of respondent
Observed N Expected N Residual

Male 90 100.0 -13.0
Female 110 100.0 13.0
Total 200
 Perform Chi-square independence test analysis (one variable).
Null Hypothesis (Ho): There is no significant association between the gender and Writing hand
of respondent
Alternative Hypothesis (Ha): There is significant association between the gender and Writing
hand of respondent
Table 1
Test of Independence of Association between Gender and Writing hand of respondent
Value df Asymptotic Significance (2-

sided)
Pearson Chi-Square 0.138a 1 0.669
N of Valid Cases 200
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 43.5.
b. Computed only for a 2x2 table
From the above table, we have χ2(1) = 0.138, p-value =0..669> α- value=0.05. therefore, we
fail to reject the null hypothesis and conclude that there is no association between the
variables. That is The observed frequencies are close enough to the expected frequencies, so
we cannot say that there is a significant difference between them. The data does not support
the claim that there is a relationship between gender and hometown.
Table 2
Frequency Table of Gender*Writing hand
writing hand Total
Right Left
handed hand
Gender of Male Count 45 42 87
Respondent Expected 43.5 43.5 87.0
Count
Female Count 55 58 113
Expected 56.5 56.5 113.0
Count
Total Count 100 100 200
Expected 100.0 100.0 200.0
Count
From the above table, we have Male respondents who is right hand (Expected =43.5,
observed =45), Left hand (Expected = 43.5, observed = 42). Similarly, female respondents who
is right hand (Expected = 56.5, observed = 55), and left hand (Excepted = 56.5, observed = 58).
The data show that there is no association between the variables.
 Perform correlation analysis (Two variable).
Null Hypothesis (Ho): There is no significance between mean score of mathematics and science.
Alternative Hypothesis (H1): There is significance between mean score of mathematics and
science.
Table 1
Correlation analysis of marks in mathematics and science.

marks in Marks in
Mathematics science
marks in Mathematics Pearson Correlation 1 -.054
N 200 200
Marks in science Pearson Correlation -.054 1
N 200 200
From a table, the p-value for the correlation coefficient is 0.444, which is greater than the
significance level of 0.05. Therefore, we fail to reject the null hypothesis and conclude that there
is no significant correlation between marks in mathematics and marks in science.
 Perform Mann Whitney U Test of two variables if you have data which is not normally
distributed (Optional)
All the data are normal, so no need of Mann Whitney U Test
Task – III
• By using the data created in task-II, do the following tasks on R-Studio.
 Descriptive Statistics
• Central Tendencies and Dispersions
Central Tendencies
Gender Writting_Hand Caste Occupation_OF_Parrent
Length:200 Length:200 Length:200 Length:200
Class :character Class :character Class :character Class :character

Mode :character Mode :character Mode :character Mode :character
Weight Distance_of_home_from_school Marks_in_Mathematics Marks_in_science
Min. :23.00 Min. : 0.00 Min. :55.0 Min. :53.00
1st Qu.:35.00 1st Qu.: 4.00 1st Qu.:65.0 1st Qu.:61.00
Median :40.00 Median : 5.00 Median :69.5 Median :65.00
Mean :39.59 Mean : 5.02 Mean :69.3 Mean :64.95
3rd Qu.:44.00 3rd Qu.: 6.00 3rd Qu.:74.0 3rd Qu.:68.00
Max. :57.00 Max. :10.00 Max. :86.0 Max. :82.00
Correlaton
#correlation
> cor(Marks_in_Mathematics,Marks_in_science)
[1] -0.05445055
> cor(Weight,Distance_of_home_from_school)
[1] -0.00787271
> cor(Distance_of_home_from_school,Marks_in_Mathematics)
[1] 0.0640648
> cor(Distance_of_home_from_school,Marks_in_science)
[1] 0.01897551
> cor(Weight,Marks_in_Mathematics)
[1] -0.03757749
> cor(Weight,Marks_in_science)
[1] -0.008848629
Measure Of Dispersion
Range
> range(Marks_in_Mathematics)
[1] 55 86
> range(Marks_in_science)
[1] 53 82
> range(Weight)
[1] 23 57
> range(Distance_of_home_from_school)
[1] 0 10
Variance
var(Weight)
[1] 37.33766
> var(Distance_of_home_from_school)
[1] 3.29608
> var(Marks_in_Mathematics)
[1] 34.94269
> var(Marks_in_science)
[1] 24.85726
> var(Weight)
[1] 37.33766
Mean Deviation
> mad(Weight)
[1] 5.9304
> mad(Marks_in_Mathematics)
[1] 6.6717
> mad(Marks_in_science)
[1] 5.9304
> mad(Distance_of_home_from_school)
[1] 1.4826
Standard Deviation
> sd(Weight)
[1] 6.110455
> sd(Marks_in_Mathematics)
[1] 5.911234
> sd(Marks_in_science)
[1] 4.985706
> sd(Distance_of_home_from_school)
[1] 1.815511
• Plots (1/1) (Scatter, Bar Plot, Violin Plot, Histogram, Line Plot, Box Plot, etc.)
Scatter Plot
Figure 1
Scatter plot of marks in science

Figure 2
Barplot Of Marks in Mathematics

Figure 3
Histogram of marks In mathematics

Figure 4
Line graph
Figure 5
Box Pot Of marks in science

Figure 6
Cluster Boxplot of Marks in Mathematics * Caste

Figure 7
Pie Chart of Caste of student
Figure: 8
Line Plot of Weight and Distance of Home from School

Violin Plot
• Correlation Analysis
Pearson's product-moment correlation
data: Marks_in_Mathematics and Marks_in_science
t = -0.76733, df = 198, p-value = 0.4438
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.19174303 0.08493209
sample estimates:
cor
-0.05445055
data: Marks_in_Mathematics and Weight
t = -0.52914, df = 198, p-value = 0.5973
-0.1754040 0.1016937
sample estimates:
cor
-0.03757749

data: Marks_in_science and Weight
t = -0.12452, df = 198, p-value = 0.901
-0.1474087 0.1300520
sample estimates:
cor
-0.008848629
 Inferential Statistics (t-Test, ANOVA, ANOVA with Post Hoc Test)
t-test
One Sample t-test
data: Marks_in_Mathematics
t = 10.275, df = 199, p-value < 2.2e-16
alternative hypothesis: true mean is greater than 65
68.60426 Inf
sample estimates:
mean of x
69.295
> t.test(Weight, mu = 55, conf.level = 0.95, alt = "greater")
One Sample t-test
data: Weight
t = -35.654, df = 199, p-value = 1
alternative hypothesis: true mean is greater than 55
38.88098 Inf
sample estimates:
mean of x
39.595
> t.test(Distance_of_home_from_school, mu = 4, conf.level = 0.95, alt = "less")
One Sample t-test
data: Distance_of_home_from_school
t = 7.9454, df = 199, p-value = 1
alternative hypothesis: true mean is less than 4
-Inf 5.232147
sample estimates:
mean of x
5.02
> t.test(Marks_in_Mathematics, mu = 55, conf.level = 0.95, alt = "less")
One Sample t-test
t = 34.2, df = 199, p-value = 1
alternative hypothesis: true mean is less than 55
-Inf 69.98574
sample estimates:
mean of x
69.295
> t.test(Marks_in_Mathematics, mu = 55, conf.level = 0.95, alt = "two.sided")
One Sample t-test
t = 34.2, df = 199, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 55
68.47075 70.11925
sample estimates:
mean of x
69.295
> t.test(Marks_in_science, mu = 55, conf.level = 0.95, alt = "two.sided")
One Sample t-test
data: Marks_in_science
t = 28.238, df = 199, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 55
64.2598 65.6502
sample estimates:
mean of x
64.955
Independent sample t-test
Welch Two Sample t-test

data: Marks_in_Mathematics by Writting_Hand
t = -0.82472, df = 192.35, p-value = 0.4106
alternative hypothesis: true difference in means between group Left hand and group Right handed is not
equal to 0
-2.866647 1.486647
sample estimates:
mean in group Left hand mean in group Right handed
68.95 69.64
Welch Two Sample t-test
data: Distance_of_home_from_school by Writting_Hand
t = 1.0911, df = 196.8, p-value = 0.2766
alternative hypothesis: true difference in means between group Left hand and group Right handed is not
equal to 0
-0.3875033 0.9475033
sample estimates:
mean in group Left hand mean in group Right handed

5.16 4.88
>
ANOVA
aov(Distance_of_home_from_school~Caste)
Call:
aov(formula = Distance_of_home_from_school ~ Caste)
Terms:
Caste Residuals
Sum of Squares 1.093 654.827
Deg. of Freedom 2 197
Residual standard error: 1.823183
Estimated effects may be unbalanced
aov(formula = Marks_in_Mathematics ~ Occupation_OF_Parrent)

Terms:
Occupation_OF_Parrent Residuals
ANOVA with Post HOC
#One-way ANOVA
> aov(Marks_in_Mathematics~Caste)
Call:
aov(formula = Marks_in_Mathematics ~ Caste)
Terms:
Caste Residuals

> n = aov(Marks_in_Mathematics~Caste)
> TukeyHSD(n)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Marks_in_Mathematics ~ Caste)
$Caste
diff lwr upr p adj
other-Bhamin -0.8222222 -3.484778 1.840334 0.7464294
vaisya-Bhamin -0.1793651 -2.412259 2.053529 0.9803586
vaisya-other 0.6428571 -2.134270 3.419984 0.8483037
> plot(TukeyHSD(n))
>
plot(TukeyHSD(n))
> aov(Weight~Caste)
Call:
aov(formula = Weight ~ Caste)
Terms:
Caste Residuals
Sum of Squares 187.855 7242.340
> n = aov(Weight~Caste)
> TukeyHSD(n)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Weight ~ Caste)
$Caste
diff lwr upr p adj
other-Bhamin 0.9305556 -1.79043288 3.651544 0.6987443
vaisya-Bhamin 2.1841270 -0.09776975 4.466024 0.0639137
vaisya-other 1.2535714 -1.58450180 4.091645 0.5506261
> plot(TukeyHSD(n))

Statistics Final Project Work (Shivaharipathak)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Final Project Work (Shivaharipathak)

Uploaded by

Copyright:

Available Formats

Department of STEAM Education, KUSOED

EDMT 512: Statistics (3) Final Project - 2023

Course Facilitator: Netra Kumar Manandhar

Prepared by: Shiva Hari Pathak

In the field of mathematics education, I have consider an example consider an example

and marks in science.

Grade 10 students in a specific school.

The type of sampling are mention below

Simple Random Sampling

study on mathematics achievement.

representative of the population in terms of certain characteristics, such as gender, ethnicity, or

schools to participate in a study on mathematics instruction.

school district to participate in a study on mathematics achievement.

 Gender (male or female)

 Whether a student passed or failed a mathematics test

 Grade level (elementary, middle, high school)

 Race/ethnicity (Asian, Black, Hispanic, White)

 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)

 Mathematics achievement score

 Number of hours a student studies mathematics per week

 Time it takes a student to solve a mathematics problem

 Socioeconomic status (SES)

 Prior mathematics achievement

 Type of mathematics instruction

 Use of technology in mathematics instruction

 Synonym for binary variable

 Number of correct answers on a mathematics test

 Gender (male = 1, female = 0)

 Grade level (elementary = 1, middle = 2, high school = 3)

 Prior mathematics achievement

 Type of mathematics instruction

 Use of technology in mathematics instruction

 Synonym for continuous variable

 Mathematical problem-solving skills

 Synonym for intervening variable

 Mathematics achievement test score

 Number of correct answers on a mathematics test

 Time it takes a student to solve a mathematics problem

 Synonymous with independent variable

 Prior mathematics achievement

 Synonymous with categorical variable

 Mathematics grade (A, B, C, D, E)

 Level of mathematics anxiety (low, medium, high)

 Race/ethnicity (Asian, Black, Hispanic, White)

 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)

 Prior mathematics achievement

 Synonymous with independent variable

- Gender and Writing Hand are nominal variables.

- Caste and Occupation of Parents are also nominal variables.

- Marks in Mathematics and Marks in Science are interval scale variables.

Regression analysis could be used to predict students' marks in mathematics based on

male and female students.

Independent Samples t-Test

a standardized math test.

test before and after they receive tutoring.

groups, such as different caste categories or parental occupation groups.

Mann Whitney U Test

be in APA 7th Edition).

Descriptive Analysis of Marks in Mathematics

Std. Deviation 6.058

Std. Error of Skewness .172

Std. Error of Kurtosis .342