You are on page 1of 68

Department of STEAM Education, KUSOED

EDMT 512: Statistics (3) Final Project - 2023

Course Facilitator: Netra Kumar Manandhar

Prepared by: Shiva Hari Pathak

In the field of mathematics education, I have consider an example consider an example

study that examines the relationship between students' marks in mathematics and various factors

such as gender, writing hand, caste, occupation of parents, weight, distance of home from school,

and marks in science.

Population

The population for this study could be all students in a particular grade level, such as

Grade 10 students in a specific school.

Sampling:

A sample of students would be selected from the population for the study. One possible

sampling method could be simple random sampling, where each student in the population has an

equal chance of being selected. For example, a random sample of 100 students could be chosen.

The type of sampling are mention below

Simple Random Sampling


Simple random sampling is a method of selecting a sample from a population in which

every member of the population has an equal chance of being selected. This is the most basic

type of sampling method, and it is often used in mathematics education research. For example, a

researcher might randomly select a sample of students from a school district to participate in a

study on mathematics achievement.

Stratified Sampling

Stratified sampling is a method of sampling that divides the population into strata, or

groups, and then randomly selects a sample from each stratum. This method is often used in

mathematics education research when the researcher wants to ensure that the sample is

representative of the population in terms of certain characteristics, such as gender, ethnicity, or

socioeconomic status. For example, a researcher might stratify a sample of students by gender

and then randomly select a sample of students from each gender group.

Cluster Sampling

Cluster sampling is a method of sampling that divides the population into clusters, or

groups, and then randomly selects a cluster to participate in the study. This method is often used

in mathematics education research when the researcher wants to save time and money. For

example, a researcher might cluster schools together and then randomly select a sample of

schools to participate in a study on mathematics instruction.

Systematic Sampling
Systematic sampling is a method of sampling that selects every kth member of the

population, where k is a random number. This method is often used in mathematics education

research when the researcher wants to ensure that the sample is representative of the population

in terms of certain characteristics, such as grade level or class size. For example, a researcher

might randomly select a number between 1 and 100 and then select every 100th student in a

school district to participate in a study on mathematics achievement.

Variables

Binary variable:

 Gender (male or female)

 Whether a student passed or failed a mathematics test

Categorical Variable:

 Grade level (elementary, middle, high school)

 Race/ethnicity (Asian, Black, Hispanic, White)

 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)

Continuous Variable:

 Mathematics achievement score

 Number of hours a student studies mathematics per week

 Time it takes a student to solve a mathematics problem


Control Variable:

 Socioeconomic status (SES)

 Prior mathematics achievement

 Gender

Dependent Variable:

 Mathematics achievement

 Mathematics anxiety

 Mathematics self-efficacy

Independent Variable:

 Type of mathematics instruction

 Use of technology in mathematics instruction

 Teacher experience

Dichotomous Variable:

 Synonym for binary variable

Discrete Variable:

 Number of correct answers on a mathematics test


 Number of times a student raises their hand in class

Dummy variables:

 Gender (male = 1, female = 0)

 Grade level (elementary = 1, middle = 2, high school = 3)

Endogenous Variable:

 Mathematics achievement

 Mathematics anxiety

 Mathematics self-efficacy

Exogenous Variable:

 SES

 Prior mathematics achievement

 Gender

Independent Variable:

 Type of mathematics instruction

 Use of technology in mathematics instruction

 Teacher experience
Interval Variable:

 Synonym for continuous variable

Intervening Variable:

 Self-regulation

 Mathematical problem-solving skills

 Motivation

Latent Variable:

 Mathematical ability

 Mathematical aptitude

 Mathematical talent

Mediating variable:

 Synonym for intervening variable

Manifest Variable:

 Mathematics achievement test score

 Number of correct answers on a mathematics test

 Time it takes a student to solve a mathematics problem


Manipulated Variable:

 Synonymous with independent variable

Moderating Variable:

 SES

 Prior mathematics achievement

 Gender

Nominal Variable:

 Synonymous with categorical variable

Ordinal Variable:

 Mathematics grade (A, B, C, D, E)

 Level of mathematics anxiety (low, medium, high)

Outcome Variable:

 Mathematics achievement

 Mathematics anxiety

 Mathematics self-efficacy

Polychotomous Variables:
 Grade level (elementary, middle, high school)

 Race/ethnicity (Asian, Black, Hispanic, White)

 Type of mathematics instruction (traditional, problem-based learning, inquiry-based learning)

Predictor Variable:

 SAT score

 Prior mathematics achievement

 Gender

Treatment Variable:

 Synonymous with independent variable

Measurement Scales:

- Gender and Writing Hand are nominal variables.

- Caste and Occupation of Parents are also nominal variables.

- Weight and Distance of Home from School are ratio scale variables.

- Marks in Mathematics and Marks in Science are interval scale variables.

Correlation
Correlation could be used to determine the relationship between variables. For example,

the study might analyze the correlation between students' marks in mathematics and their marks

in science.

Regression

Regression analysis could be used to predict students' marks in mathematics based on

other variables. For instance, the study might examine how well gender, writing hand, caste,

occupation of parents, weight, and distance from school predict a student's marks in

mathematics.

Normal Distribution

The study could examine whether students' marks in mathematics or other variables

follow a normal distribution, which would have implications for statistical analyses.

t-Test

A t-test could be used to compare the mean marks in mathematics between two groups, such as

male and female students.

One-Sample t-Test
 A teacher wants to know if her students who use a new math software program are performing

better than the average student in the country. She could use a one-sample t-test to compare the

average scores of her students to the national average on a standardized math test.

Independent Samples t-Test

 A researcher wants to know if there is a difference in the average math scores of students who

are taught using a traditional method and students who are taught using a constructivist method.

She could use an independent samples t-test to compare the average scores of the two groups on

a standardized math test.

Paired t-Test

 A teacher wants to know if her students' math scores improve after they receive tutoring. She

could use a paired t-test to compare the average scores of her students on a standardized math

test before and after they receive tutoring.

ANOVA

ANOVA could be used to compare the mean marks in mathematics among three or more

groups, such as different caste categories or parental occupation groups.

Mann Whitney U Test

The Mann Whitney U test could be employed to compare the median marks in

mathematics between two groups that do not meet the assumptions of a t-test, such as students
with left-handed and right-handed writing. By employing these statistical techniques and

analyzing the data collected from the sample, I think I can gain insights into the relationships

between variables and make inferences about the larger population of students in mathematics

education.

Task – II

Choose a specific area of research related to Mathematics education and learning. Create at least

seven variables (must be nominal, ordinal, and scale (interval & ratio)). Enter at least 200 data

(hypothetical) on SPSS and use SPSS to do quantitative analysis on the following topics (must

be in APA 7th Edition).

Descriptive Statistics
Table: 1

Descriptive Analysis of Marks in Mathematics

Marks in Mathematics

N Valid 200

Missing 0

Mean 54.51

Median 54.00

Mode 52

Std. Deviation 6.058

Variance 36.693

Skewness .160

Std. Error of Skewness .172

Kurtosis .106

Std. Error of Kurtosis .342

Range 32

Minimum 40

Maximum 72

Sum 10902

Percentiles 25 50.00

50 54.00

75 58.75
The above table shows the statistical measures of marks in mathematics, out of 200 valid

data of the respondents, the average marks in science is 64.96, median value is 65, mode 66,

standard deviation of marks in mathematics is 4.986, variance is 24.857, skewness is 0.151,

standard error of skewness is 0.172, kurtosis is -0.054, standard error of kurtosis is 0.342, the

difference between highest and lowest marks of the participants i.e., range is 29, minimum marks

is 53, maximum marks is 82 and their sum is 12991. This informative table shows the first,

second and third quartile are 61,65 and 68 respectively.

Figure:1

Pie-Chart of Respondents’ Religion


The figure alongside shows the occupation of parents. This shows that 45 percent of

respondents are Government job holder, 35 percent are private job holder, and 20 percent are

physical worker. The highest percentage is government job holder whereas least is physical

worker

Figure:2

Pie-Chart of caste of student


Figure 3

Simple Bar Chart of Caste of Student


The figure alongside shows the caste of respondent. This shows that 45 percent of

respondents are Bramin, 35 percent are Vasya, and 20 percent are other. The highest percentage

is Bharmin job holder whereas least is other.


Figure 4

Bar Chart of Occupation of Parents

The above bar-graph shows the count of different occupation of a parent. This shows that

90 parent are government job holder, 70 parents are private job holder and 40 parents are

Physical worker. Goverement job holder parents are more than double of Physical worker.
Figure:5

Simple Boxplot of Weight of student

The figure alongside shows the whisker plot of the weight of student. The median of the

dataset is 40.. The variables have no outlier.


Figure: 6

Simple Boxplot of marks in mathematics

The figure alongside shows the whisker plot of the marks in mathematics. The median of

the dataset is 65. The variables have outlier 7, which is significantly lower than rest of the data. It

is necessary to exclude the outlier from the original dataset before analyzing the data.
Figure:7

Clustered Boxplot of weight of student by caste of student by occupation of parent

The figure above represents the cluster boxplot of Weight of student by caste of the

student. Data shows that median weight of Bhamin is 39, Median Weight of Vaisya is 41 and

median weight of Other caste student is 40. It means there are similar weight of student

according to their caste. The data have no outlier


Figure: 8

Simple Histogram of Marks in Mathematics

The histogram displayed above represents the distance of home from school distribution

of the participants. It indicates that the majority of the data falls within a normal curve, with only

a few data points outside this range. It is suggested to exclude the outlier data for a more accurate

representation of the distance of home from school distribution.


Figure: 9

Simple Histogram of Marks in mathematics

The histogram displayed alongside represents the marks in mathematics distribution of the

participants. It indicates that the majority of the data falls within a normal curve, with only a few

data points outside this range. It is suggested to exclude the outlier data for a more accurate

representation of the marks in mathematics distribution.


Perform normal distributions of scale data (at least 2)

Table:2

Normal Distribution Scale Data of Age of Participants

Distance of Home from School


N Valid 200
Missing 0
Mean 5.02
Std. Error of Mean .128
Median 5.00
Mode 5
Std. Deviation 1.816
Variance 3.296
Skewness .016
Std. Error of Skewness .172
Kurtosis -.155
Std. Error of Kurtosis .342
Range 10
Minimum 0
Maximum 10
Sum 1004
Percentiles 10 3.00
20 4.00
25 4.00
30 4.00
40 5.00
50 5.00
60 5.00
70 6.00
75 6.00
80 7.00
90 7.00

The above table shows that mean and median of Distance of home from the school are quite

similar i.e., 5.02 and 5 respectively. With this we can say that the data are normally distributed.
On the other hands the value of kurtosis is -0.155 which lies between +2 and -2. It also proves

that the Distance of home from school are normally distributed. Let’s have a quick look on value

of skewness which is 0.016, very near to the zero (0). This also shows above data is normally

distributed. Let’s draw a normal curve of age of respondents.

Figure: 10

Normal Distribution of Distance of Home from School


This normal distribution curve presents the distance of home from school. We see the mean

and standard deviation are 5.02 and 1.816 respectively. Our data seems that approximately 68%

of the data are contained in ±1 standard deviation from the mean i.e., Approximately 68% of the

data belongs to 16.13±1.14⇒14.99-17.27. In the same 95% of the data are necessary to be

included between (μ±2σ) and 99% of the data necessary to be included between (μ±3σ). Another

indication of being a normal distribution is it seems symmetrical with the mean.

Table:3

Normal Distribution Scale Data of Marks in Mathematics of Participants

marks in Mathematics
N Valid 200
Missing 0
Mean 69.30
Std. Error of Mean .418
Median 69.50
Mode 71
Std. Deviation 5.911
Variance 34.943
Skewness .022
Std. Error of Skewness .172
Kurtosis -.272
Std. Error of Kurtosis .342
Range 31
Minimum 55
Maximum 86
Sum 13859
Percentiles 10 62.00
20 64.00
25 65.00
30 66.00
40 68.00
50 69.50
60 71.00
70 72.00
75 74.00
80 75.00
90 77.00

The above table shows that mean and median of marks of mathematics are quite similar i.e.,

69.30 and 69.5 respectively. With this we can say that the data are normally distributed. On the

other hands the value of kurtosis is 0.72 which lies between +2 and -2. It also proves that the data

in marks in mathematics are normally distributed. Let’s draw a normal curve of marks in

mathematics of respondents.

Figure: 11

Simple Histogram of Marks in Mathematics


This normal distribution curve presents the marks in mathematics. We see the mean and

standard deviation are 69.3 and 5.99respectively. Our data seems that approximately 68% of the

data are contained in ±1 standard deviation from the mean i.e., Approximately 68% of the data

belongs to 69.±5.99 In the same 95% of the data are necessary to be included between (μ±2σ)

and 99% of the data necessary to be included between (μ±3σ). Another indication of being a

normal distribution is it seems symmetrical with the mean.

Do some custom tables and cross-tabulation (At least 2)

Table:4

Custom Tables of Writing hand*Caste of student


Writing Hand
Right handed Left hand
Caste of student Caste of student
Bharmin Vaisya other Bharmi Vaisya other
n
Count Count Count Count Count Count
Writing Right 60 20 20 0 0 0
Hand handed
Left hand 0 0 0 30 50 20

The above table shows the custom of writing hand and caste of students. The highest

number of right-handed students are Brahmins, with 60 students. The lowest number of right-

handed students are from the "other" caste, with 20 students. The highest number of left-handed

students are from the "other" caste, with 50 students. The lowest number of left-handed students

are Brahmins, with 0 students. Brahmins: 60 students are right-handed and 0 students are left-

handed. Vaisyas: 20 students are right-handed and 0 students are left-handed. Other: 20 students

are right-handed and 50 students are left-handed. The table shows that there is a clear difference

in the distribution of writing hand by caste. Brahmins are more likely to be right-handed, while

students from the "other" caste are more likely to be left-handed. This difference may be due to a

number of factors, including cultural or genetic differences.


Table: 5

Cross-tabulation Mode Writing Hand*caste of student


Caste Of student Total
Brami Vaisy
n a Other
Writing Hand Right handed 60 20 20 100
Left hand 30 50 20 100
Total 90 70 40 200
The table shows the cross-tabulation of caste of student and writing hand. The highest

number of right-handed students are Brahmin, with 60 students. The lowest number of right-

handed students are Vaisya, with 20 students. The highest number of left-handed students are

other caste, with 50 students. The lowest number of left-handed students are Brahmin, with 20

students
Figure: 11

Simple Scatter with Fit Line of Weight of Participants by Age of Participants

The above scatter plot presents the relationship between the Weight of student and Marks in

mathematics. Looking at this scatter plot the data are spreading everywhere. It shows that there is

no linear relationship between the variables. The points are scattered across the graph without a

clear trend or pattern. However, there are some clusters in the lower regions of the middle part of

the figure, it represents a group with varying scores in both variable. One important attribute of

this above figure is there is no relationship between the marks in Math and weight of

participants.
Figure: 12

Grouped Scatter of Marks in Science and Weight of student by Occupation of Parent

The above scatter plot presents the relationship between the marks in science by weight

of student by occupation of parents. Looking at this scatter plot the data are spreading

everywhere. It shows that there is no linear relationship between the variables. The points are

scattered across the graph without a clear trend or pattern. However, there are some clusters in

the lower regions of the middle part of the figure, it represents a group with varying scores in

both subjects. One important attribute of this above figure is there is no relationship between the

marks in science and Weight of student by Occupation of parents.


Table: 6

Correlation between Marks in Science and Marks in Mathematics.

Marks in Marks in
Science Mathematics
Marks in Science Pearson Correlation 1 -0.054
Sig. (2-tailed) .444
N 200 200
Marks in Mathematics Pearson Correlation -0.054 1
Sig. (2-tailed) .444
N 200 200

The table shows that the degree correlations between the marks in science and mathematics.

The Pearson r-value is -0.54. It lies in between 0 to - 0.25. which means there is very weak

correlations between the variables that means marks in science does not effects marks in

mathematics of the respondents.


Table: 7

Correlation between Distance of Home from School and Weight of Students.

Age of Weight of
Participants Participant
Distance of home from Pearson Correlation 1 -0.08
school Sig. (2-tailed) .912
N 200 200
Weight of Participant Pearson Correlation .-0.08 1
Sig. (2-tailed) .348
N 200 200
The table shows that the degree correlations between distance of home from school and

weight of participants. The Pearson r-value is -0.08. It lies in between 0 to- 0.25. Which means

there is very weak correlations between the variables that means Distance of home from school

does not effects weight of participants.


Table:8

Regression of Marks of Mathematics on Marks in Science

Model Unstandardized Standardized t Sig.

Coefficients Coefficients

B Std. Error Beta

1 (Constant) 68.137 4.162 16.370 .000

Marks in -0.046 .060 -0.054 -0.767 .444

Mathematics

a. Dependent Variable: Marks in Science

The table above provides a regression model of marks in mathematics on marks on

science. The model found that (F, (1,198) = 0.589, p (0.00)<0.05, with R2 =0.03 and coefficient=

-0.046. the R- square value explains that the 3% variability of marks in science can be described

by the variability of marks in mathematics.

We have regression equation:

Y= a+ b X

Score in Science = 68.137 -0.046*(Score in Mathematics)

This regression model shows that there is positive impact of independent variables on

dependent variables. That is, if we increase the score of mathematics by 1, there is -0.046

changes in score in science, or if we increase the score of mathematics by 100%, there is 4.6%

Negative change in score in science.


Table:9

Regression of Weight of Participants on Age of Participants.

Model Unstandardized Standardized t Sig.

Coefficients Coefficients

B Std. Error Beta

1 (Constant) 5.133 1.561 9.397 .000

Weight of -0.02 .041 .067 .940 .912

Participant

a. Dependent Variable: Distance of home from school

The table above provides a regression model of weight of participants on distance of

home from school. The model found that (F, (1,198) = 0.012, p (0.00)>0.05, with R2 =-0.005 and

coefficient= - 0.02. the R- square value explains that the 0.5% variability of Distance of school

from home can be described by the variability of weight of participants.

We have regression equation:

Y= a+ b X

Weight of Participants = 14.665-0.002*(Distance of home from school)


This regression model shows that there is negative impact of independent variables on

dependent variables. That is, if we increase distance of school by 1, there is- 0.002 changes in

weight of participants, or if we increase the Distance of home by 100%, there is 0.2%negative

change in weight of participants.

Inferential Statistics

 Perform two one sample t-test analysis of any applicable variables.

One-sampled t-Test of weight of student

Null Hypothesis (H0): The mean weight of students is 40 ( μ= 40).

Alternative Hypothesis (H1): The mean weight of student is not 40 ( μ ≠ 40).

Table

One sampled t-Test of weight of student

t df Sig. (2-tailed) Mean Difference

Weight of student -0.937 199 0.350 -.405

From the table, we have t(199) = -0.937, p-value = 0.350 < α -value = 0.05. In this case,

we accept the null hypothesis and conclude that the mean score of the respondents in science is

around 40. Actual mean weight of student is 39.6

One-sampled t-Test of distance of home from school

Null Hypothesis (H0): The mean distance of home from school is 5 ( μ= 5).

Alternative Hypothesis (H1): The mean weight of student is not 5 ( μ ≠ 5).

Table

One sampled t-Test of Distance of Home from school

t df Sig. (2-tailed) Mean Difference


Distance of home
0.156 199 0.876 0.20
from school

From the table, we have t(199) = 0.156, p-value = 0.876 >α -value = 0.05. In this case, we

accept the null hypothesis and conclude that the mean distance of home from school is around 5

km. Actual mean distance is 5.02.

 Perform two independent sample t-test analysis of any applicable variables.

Independent Sample t-Test of Marks in Mathematics based writing Hand.

Null Hypothesis (H0): There is no significant difference between the mean scores of respondents

in mathematics based on writing hand.

Alternative Hypothesis (H1): There is significant difference between the mean scores of

respondents in mathematics based on writing hand.

Table 4

Independent Sample t-Test of Score in Score in Mathematics Based on Writing Hand

Sig. t df Sig. (2- Mean


tailed) Difference

marks in Equal variances .141 .825 198 .411 .690


Mathematics assumed
Equal variances .825 192.350 .411 .690
not assumed

The results of the t-test showed that t(198) = 0.825, p-value = 0.411 > α-value = 0.05. In

this case, we must fail to reject the null hypothesis. The conclusion is that there is no significant
difference between the mean marks of respondents in mathematics based on their writing hand.

The mean mark of respondents in mathematics who are right-handed is 69.64 and that of

respondents who are left-handed is 68.95. These values are almost equal.

Independent Sample t-Test of weight of student based on gender

Null Hypothesis (H0): There is no significant difference in the mean weight of respondents based

on their gender.

Alternative Hypothesis (H1): There is a significant difference in the mean weight of respondents

based on their gender.

Table 4

Independent Sample t-Test of weight of student Based on Gender

Sig. t df Sig. (2- Mean


tailed) Difference

Weight of Equal variances .925 .658 198 .511 .574


student assumed
Equal variances .659 185.575 .511 .574
not assumed

The results of the t-test showed that t(198) = 0.658, p-value = 0.511 > α-value = 0.05. In

this case, we must fail to reject the null hypothesis. The conclusion is that there is no significant

difference between the mean weight of respondents based on their gender. The mean weight of

respondents who are male is 39.92 and that of respondents who are female is 39.35. These values

are almost equal.


 Insert 50/50 data for pre-test and posttest by creating a new file in SPSS and do Paired

sampled t-test analysis.

Null Hypothesis (Ho): There is no difference between the mean score of pretest and posttest.

Alternative Hypothesis (Ha): There is difference between the mean scores of pretests and

posttest.

Table 1

Mean Difference Tests in Pretest and Posttest.

Parameter t df Sig. (Two-


tailed)

Pair 1 Pretest Marks - -4.993 49 0.00


Posttest Marks

From the table, we have t(49) = -4.993, p-value= 0.00 < α-value 0.05. The result Reject

null hypothesis and accept alternative hypothesis and conclusion is there is significant difference

between the mean scores in pretest and posttest. From the descriptive table, we have mean scores

in pretest and posttest 60.94 and 71.42 respectively. Mean score of Posttest is 10.48 more than

Pretest.
 Perform F-test (ANOVA) analysis of any two applicable variables.

Table 1

ANOVA of Distance of Home from School Based on Occupation of Parents

Parameters df F Sig.

Between Groups 2 0.164 0.849

Within Groups 197    

Total 199    

From the table we have F(2, 197)= 0.164, p-value= 0.849> α-value = 0.05. in this case,

we have accepted hypothesis and conclude that there are no significant differences of distance of

home from school based on the occupation of parents. i.e occupation of parent has no effect on

the distance of home from school We can see the descriptive values in the following tables.
N Mean Std. Deviation

Government 90 5.02 1.767


Private 70 4.94 1.948
Physical worker 40 5.15 1.718
Total 200 5.02 1.816

From table we have descriptive values, Government job (N = 90, M= 5.02, SD= 1.767),

Private (N =70, M= 4.94, SD= 1.948), and Physical Tool (N = 40, M= 5.15, SD= 1.718). The

data shows that each category has almost equal mean distance of home from school. It leads to

conclude that there is no effect of occupation of parents to Distance of home from school.

 Perform Goodness of fit test analysis of one variable.

Ho: There is no significant difference between the observed and expected gender of respondents

Ha: There is significant difference between the observed and expected gender of respondents

Table 1

Observation of Gender Of student


Gender of student
Chi-Square 3.350a
df 1
Asymp. Sig. .066
a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell
frequency is 100.
From above table, we have χ2 (1) = 3.380, p-value = 0.066 > α-value= 0.05. In this case,

we retain null hypothesis and conclude that there is no significant difference between observed

and expected frequencies between male and female Respondent. In this case, The observe

frequency of male and female are 90 and 110 respectively.

Table 2

Frequencies Table of Gender of respondent

Observed N Expected N Residual


Male 90 100.0 -13.0
Female 110 100.0 13.0
Total 200

 Perform Chi-square independence test analysis (one variable).

Null Hypothesis (Ho): There is no significant association between the gender and Writing hand

of respondent

Alternative Hypothesis (Ha): There is significant association between the gender and Writing

hand of respondent

Table 1
Test of Independence of Association between Gender and Writing hand of respondent

  Value df Asymptotic Significance (2-


sided)
Pearson Chi-Square 0.138a 1 0.669

N of Valid Cases 200    

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 43.5.

b. Computed only for a 2x2 table

From the above table, we have χ2(1) = 0.138, p-value =0..669> α- value=0.05. therefore, we

fail to reject the null hypothesis and conclude that there is no association between the

variables. That is The observed frequencies are close enough to the expected frequencies, so

we cannot say that there is a significant difference between them. The data does not support

the claim that there is a relationship between gender and hometown.

Table 2

Frequency Table of Gender*Writing hand

writing hand Total

Right Left

handed hand
Gender of Male Count 45 42 87

Respondent Expected 43.5 43.5 87.0

Count

Female Count 55 58 113

Expected 56.5 56.5 113.0

Count

Total Count 100 100 200

Expected 100.0 100.0 200.0

Count

From the above table, we have Male respondents who is right hand (Expected =43.5,

observed =45), Left hand (Expected = 43.5, observed = 42). Similarly, female respondents who

is right hand (Expected = 56.5, observed = 55), and left hand (Excepted = 56.5, observed = 58).

The data show that there is no association between the variables.

 Perform correlation analysis (Two variable).

Null Hypothesis (Ho): There is no significance between mean score of mathematics and science.

Alternative Hypothesis (H1): There is significance between mean score of mathematics and

science.

Table 1

Correlation analysis of marks in mathematics and science.


marks in Marks in
Mathematics science
marks in Mathematics Pearson Correlation 1 -.054
Sig. (2-tailed) .444
N 200 200
Marks in science Pearson Correlation -.054 1
Sig. (2-tailed) .444
N 200 200

From a table, the p-value for the correlation coefficient is 0.444, which is greater than the

significance level of 0.05. Therefore, we fail to reject the null hypothesis and conclude that there

is no significant correlation between marks in mathematics and marks in science.

 Perform Mann Whitney U Test of two variables if you have data which is not normally

distributed (Optional)

All the data are normal, so no need of Mann Whitney U Test

Task – III

• By using the data created in task-II, do the following tasks on R-Studio.

 Descriptive Statistics

• Central Tendencies and Dispersions

Central Tendencies

Gender Writting_Hand Caste Occupation_OF_Parrent

Length:200 Length:200 Length:200 Length:200

Class :character Class :character Class :character Class :character


Mode :character Mode :character Mode :character Mode :character

Weight Distance_of_home_from_school Marks_in_Mathematics Marks_in_science

Min. :23.00 Min. : 0.00 Min. :55.0 Min. :53.00

1st Qu.:35.00 1st Qu.: 4.00 1st Qu.:65.0 1st Qu.:61.00

Median :40.00 Median : 5.00 Median :69.5 Median :65.00

Mean :39.59 Mean : 5.02 Mean :69.3 Mean :64.95

3rd Qu.:44.00 3rd Qu.: 6.00 3rd Qu.:74.0 3rd Qu.:68.00

Max. :57.00 Max. :10.00 Max. :86.0 Max. :82.00

Correlaton

#correlation

> cor(Marks_in_Mathematics,Marks_in_science)

[1] -0.05445055

> cor(Weight,Distance_of_home_from_school)

[1] -0.00787271

> cor(Distance_of_home_from_school,Marks_in_Mathematics)

[1] 0.0640648

> cor(Distance_of_home_from_school,Marks_in_science)

[1] 0.01897551
> cor(Weight,Marks_in_Mathematics)

[1] -0.03757749

> cor(Weight,Marks_in_science)

[1] -0.008848629

Measure Of Dispersion

Range

> range(Marks_in_Mathematics)

[1] 55 86

> range(Marks_in_science)

[1] 53 82

> range(Weight)

[1] 23 57

> range(Distance_of_home_from_school)

[1] 0 10

Variance

var(Weight)

[1] 37.33766

> var(Distance_of_home_from_school)

[1] 3.29608

> var(Marks_in_Mathematics)

[1] 34.94269
> var(Marks_in_science)

[1] 24.85726

> var(Weight)

[1] 37.33766

Mean Deviation

> mad(Weight)

[1] 5.9304

> mad(Marks_in_Mathematics)

[1] 6.6717

> mad(Marks_in_science)

[1] 5.9304

> mad(Distance_of_home_from_school)

[1] 1.4826

Standard Deviation

> sd(Weight)

[1] 6.110455

> sd(Marks_in_Mathematics)

[1] 5.911234

> sd(Marks_in_science)

[1] 4.985706

> sd(Distance_of_home_from_school)

[1] 1.815511
• Plots (1/1) (Scatter, Bar Plot, Violin Plot, Histogram, Line Plot, Box Plot, etc.)

Scatter Plot

Figure 1

Scatter plot of marks in science


Figure 2

Barplot Of Marks in Mathematics


Figure 3

Histogram of marks In mathematics


Figure 4

Line graph
Figure 5

Box Pot Of marks in science


Figure 6

Cluster Boxplot of Marks in Mathematics * Caste


Figure 7

Pie Chart of Caste of student

Figure: 8

Line Plot of Weight and Distance of Home from School


Violin Plot
• Correlation Analysis

Pearson's product-moment correlation

data: Marks_in_Mathematics and Marks_in_science

t = -0.76733, df = 198, p-value = 0.4438

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.19174303 0.08493209

sample estimates:

cor

-0.05445055

Pearson's product-moment correlation

data: Marks_in_Mathematics and Weight

t = -0.52914, df = 198, p-value = 0.5973

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.1754040 0.1016937

sample estimates:

cor

-0.03757749

Pearson's product-moment correlation


data: Marks_in_science and Weight

t = -0.12452, df = 198, p-value = 0.901

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.1474087 0.1300520

sample estimates:

cor

-0.008848629

 Inferential Statistics (t-Test, ANOVA, ANOVA with Post Hoc Test)

t-test

One Sample t-test

data: Marks_in_Mathematics

t = 10.275, df = 199, p-value < 2.2e-16

alternative hypothesis: true mean is greater than 65

95 percent confidence interval:

68.60426 Inf

sample estimates:

mean of x

69.295
> t.test(Weight, mu = 55, conf.level = 0.95, alt = "greater")

One Sample t-test

data: Weight

t = -35.654, df = 199, p-value = 1

alternative hypothesis: true mean is greater than 55

95 percent confidence interval:

38.88098 Inf

sample estimates:

mean of x

39.595

> t.test(Distance_of_home_from_school, mu = 4, conf.level = 0.95, alt = "less")

One Sample t-test

data: Distance_of_home_from_school

t = 7.9454, df = 199, p-value = 1

alternative hypothesis: true mean is less than 4

95 percent confidence interval:

-Inf 5.232147
sample estimates:

mean of x

5.02

> t.test(Marks_in_Mathematics, mu = 55, conf.level = 0.95, alt = "less")

One Sample t-test

data: Marks_in_Mathematics

t = 34.2, df = 199, p-value = 1

alternative hypothesis: true mean is less than 55

95 percent confidence interval:

-Inf 69.98574

sample estimates:

mean of x

69.295

> t.test(Marks_in_Mathematics, mu = 55, conf.level = 0.95, alt = "two.sided")

One Sample t-test

data: Marks_in_Mathematics

t = 34.2, df = 199, p-value < 2.2e-16


alternative hypothesis: true mean is not equal to 55

95 percent confidence interval:

68.47075 70.11925

sample estimates:

mean of x

69.295

> t.test(Marks_in_science, mu = 55, conf.level = 0.95, alt = "two.sided")

One Sample t-test

data: Marks_in_science

t = 28.238, df = 199, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 55

95 percent confidence interval:

64.2598 65.6502

sample estimates:

mean of x

64.955

Independent sample t-test

Welch Two Sample t-test


data: Marks_in_Mathematics by Writting_Hand

t = -0.82472, df = 192.35, p-value = 0.4106

alternative hypothesis: true difference in means between group Left hand and group Right handed is not

equal to 0

99 percent confidence interval:

-2.866647 1.486647

sample estimates:

mean in group Left hand mean in group Right handed

68.95 69.64

Welch Two Sample t-test

data: Distance_of_home_from_school by Writting_Hand

t = 1.0911, df = 196.8, p-value = 0.2766

alternative hypothesis: true difference in means between group Left hand and group Right handed is not

equal to 0

99 percent confidence interval:

-0.3875033 0.9475033

sample estimates:

mean in group Left hand mean in group Right handed


5.16 4.88

>

ANOVA

aov(Distance_of_home_from_school~Caste)

Call:

aov(formula = Distance_of_home_from_school ~ Caste)

Terms:

Caste Residuals

Sum of Squares 1.093 654.827

Deg. of Freedom 2 197

Residual standard error: 1.823183

Estimated effects may be unbalanced

aov(formula = Marks_in_Mathematics ~ Occupation_OF_Parrent)


Terms:

Occupation_OF_Parrent Residuals

Sum of Squares 18.968 6934.627

Deg. of Freedom 2 197

Residual standard error: 5.933056

Estimated effects may be unbalanced

ANOVA with Post HOC

#One-way ANOVA

> aov(Marks_in_Mathematics~Caste)

Call:

aov(formula = Marks_in_Mathematics ~ Caste)

Terms:

Caste Residuals

Sum of Squares 18.968 6934.627

Deg. of Freedom 2 197

Residual standard error: 5.933056


Estimated effects may be unbalanced

> n = aov(Marks_in_Mathematics~Caste)

> TukeyHSD(n)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = Marks_in_Mathematics ~ Caste)

$Caste

diff lwr upr p adj

other-Bhamin -0.8222222 -3.484778 1.840334 0.7464294

vaisya-Bhamin -0.1793651 -2.412259 2.053529 0.9803586

vaisya-other 0.6428571 -2.134270 3.419984 0.8483037

> plot(TukeyHSD(n))
>

plot(TukeyHSD(n))

> aov(Weight~Caste)

Call:

aov(formula = Weight ~ Caste)

Terms:
Caste Residuals

Sum of Squares 187.855 7242.340

Deg. of Freedom 2 197

Residual standard error: 6.063262

Estimated effects may be unbalanced

> n = aov(Weight~Caste)

> TukeyHSD(n)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = Weight ~ Caste)

$Caste

diff lwr upr p adj

other-Bhamin 0.9305556 -1.79043288 3.651544 0.6987443

vaisya-Bhamin 2.1841270 -0.09776975 4.466024 0.0639137

vaisya-other 1.2535714 -1.58450180 4.091645 0.5506261

> plot(TukeyHSD(n))

You might also like