8614 (1) - 1

Allama Iqbal Open University
Semester: Spring, 2022
Assignment # 2
Course: Educational Statistics
Code: 8614
Level: B.ed
Registration No 21sks00051
Student Name: Saqib Khalid
Father Name: Muhammad Khalid

Q. 1 How is mode calculated? Also discuss its merits and demerits.
The mode is the most frequently occurring score in the distribution.
Consider following data set. 25, 43, 39, 25, 82, 77, 25, 47.
The score 25 comes more often, so it is the mode. Sometimes here may be no particular mode
if no one value emerges more than any other. There might be one mode (uni-modal), two
modes (bi-model), three modes (tri-model), or more than three modes (multi-model). Mode is
useful when scores reflect a nominal scale of measurement. But beside with mean and
median it can also be used for ordinal, interval or ratio data. It can be located graphically by
drawing histogram.
You can calculate mode in several different ways, including:
By hand
To find the mode by hand, place the numbers in ascending or descending order, then count
how over and over again each number appears. The number that appears most often is the
mode.
Example: 55, 4, 28, 44, 32, 55, 32, 45, 48, 6, 44, 28, 14, 23, 12, 32, 44
In order, these numbers are: 4, 6, 12, 14, 23, 28, 28, 32, 32, 32, 44, 44, 44, 48, 55, 55
It’s now easy to see which numbers appear most often. In this case, the data set is bimodal,
and has two modes: 32 and 44. The number 55 appears twice, but 32 and 44 both appear three
times each.
Using a calculator
An online calculator is also used with a mode function:
Do an online search for "calculator with function of mode "
Input all numbers in your data set subsequent the instructions. You may need to add a comma
between numbers or put each number on a separate line.
Select "calculate."
With Excel
below are the steps to calculate mode using Excel:
Enter each number of your data set into a split compartment in a single column.
Highlight the column.
Click the "Sort" button on the toolbar.
Select either "Sort A to Z" to sort your numbers from smallest to largest or "Sort Z to A" to
sort from largest to smallest. You can determine the mode manually at this stage.
Select the cell where you want to mode to display. This cannot be a cell that contains a
number from your data set.
In the function box, type "MODE:(inauguration CELL:conclusion CELL)" using your
specific beginning and ending cell labels, like B2:B32
Press enter. The mode should appear in the selected cell.
Merits of Mode
i) It is easy to understand and easy to calculate.
ii) It is not affected by extreme values.
iii) Even if the extreme values are not known mode can be calculated.
iv) It can be located just by inspection in many cases.
v) It can be located graphically.
vi) It is always present in the data.
vii) It is applicable for both quantitative and qualitative data.
viii) It is useful for methodological forecasts.
The mode or modal value of a distribution is that value of the variable for which the
frequency is maximum.
The number which is repeated maximum number of times is the mode.
1) It is readily comprehensible and easy to compute. In some case it can be computed merely
by inspection.
2) It is not affected by extreme values. It can be obtained even if the extreme values are not
known.
3) Mode can be determined in distributions with open classes.
4) Mode can be located on the graph also.
5) If all the data are not given then also mode can be calculated.
6) It is easy to understand.
7) It is the most observed data point.
8) Sometimes we get more than one mode and sometimes there is no mode, that gives us the
characteristics of
the data.
Demerits of Mode
i) It is not rigidly defined.
ii) It is not based upon all values of the given data.
iii) It is not capable of further mathematical calculation.
iv) There will be no mode if there is no common value in the data.
v) It cannot be used for further methodological processing
1) It is ill defined. It is not always possible to find clearly defined mode. In some cases, we
may come across
distributions with two modes. Such distributions are called Bimodal. If a distribution has
more than two modes,
it is said to be Multimodal.
2) It is not based upon all the observation.
3) Mode can be calculated by various formulae as such the value may differ from one to
other. Therefore, it is
not rigidly defined.
4) It is affected to a greater extent fluctuations of sampling.
Uses
1) Mode is used by manufacturers of readymade garments, shoes and accessories in
common use etc. The
readymade garment manufacturers made those sizes more which are used by most of the
persons than the other
sizes. Similarly, the makers of shoes will make that size maximum which the majority people
use and others in
less quantity. Mode is used when the data is categorized.
Reference
Anderson, T. W., & Sclove, S. L. (1974). Introductory Statistical Analysis, Finland:
Houghton Mifflin Company.
Q.2 Discuss t-test and its application in educational research?
T-Test
A t-test is a useful statistical technique used for comparing mean values of two data sets
obtained from two groups. The comparison tells us whether these data sets are different from
each other. It further tells us how significant the differences are and if these differences could
have happened by chance. The statistical significance of t-test indicates whether or not the
difference between the mean of two groups most likely reflects a real difference in the
population from which the groups are selected. t-tests are used when there are two groups
(male and female) or two sets of data (before and after), and the researcher wishes to compare
the mean score on some continuous variable.
T-test applications
 The T-test is used to compare the mean of two samples, dependent or independent.
 It can also be used to determine if the sample mean is different from the assumed mean.
 T-test has an application in determining the confidence interval for a sample mean.
The following flowchart can be used to determine which t-test to use based on the
characteristics of the sample sets. The key items to consider include the similarity of the
sample records, the number of data records in each sample set, and the variance of each
sample set.
Reference
Gravetter, F. J., & Wallnau, L. B. (2002). Essentials of Statistics for the Behavioral
Sciences (4th Ed.). Wadsworth, California, USA.
Q.3 Why do we use regression analysis? Where is it applied?
Regression Analysis
Regression analysis is used when you want to predict a continuous dependent variable from a
number of independent variables. If the dependent variable is dichotomous, then logistic
regression should be used. (If the split between the two levels of the dependent variable is
close to 50-50, then both logistic and linear regression will end up giving you similar results.)
The independent variables used in regression can be either continuous or dichotomous.
Independent variables with more than two levels can also be used in regression analyses, but
they first must be converted into variables that have only two levels. This is called dummy
coding and will be discussed later. Usually, regression analysis is used with naturally-
occurring variables, as opposed to experimentally manipulated variables, although you can
use regression with experimentally manipulated variables. One point to keep in mind with
regression analysis is that causal relationships among the variables cannot be determined.
While the terminology is such that we say that X "predicts" Y, we cannot say that X "causes"
Y.
Applied of Regression
Number of cases
When doing regression, the cases-to-Independent Variables (IVs) ratio should ideally be
20:1; that is 20 cases for every IV in the model. The lowest your ratio should be is 5:1 (i.e., 5
cases for every IV in the model).

Accuracy of data
If you have entered the data (rather than using an established dataset), it is a good idea to
check the accuracy of the data entry. If you don't want to re-check each data point, you
should at least check the minimum and maximum value for each variable to ensure that all
values for each variable are "valid." For example, a variable that is measured using a 1 to 5
scale should not have a value of 8.
Missing data
You also want to look for missing data. If specific variables have a lot of missing values, you
may decide not to include those variables in your analyses. If only a few cases have any
missing values, then you might want to delete those cases. If there are missing values for
several cases on different variables, then you probably don't want to delete those cases
(because a lot of your data will be lost). If there are not too much missing data, and there does
not seem to be any pattern in terms of what is missing, then you don't really need to worry.
Just run your regression, and any cases that do not have values for the variables used in that
regression will not be included. Although tempting, do not assume that there is no pattern;
check for this. To do this, separate the dataset into two groups: those cases missing values for
a certain variable, and those not missing a value for that variable. Using t-tests, you can
determine if the two groups differ on other variables included in the sample. For example,
you might find that the cases that are missing values for the "salary" variable are younger
than those cases that have values for salary. You would want to do t-tests for each variable
with a lot of missing values. If there is a systematic difference between the two groups (i.e.,
the group missing values vs. the group not missing values), then you would need to keep this
in mind when interpreting your findings and not overgeneralize.
After examining your data, you may decide that you want to replace the missing values with
some other value. The easiest thing to use as the replacement value is the mean of this
variable. Some statistics programs have an option within regression where you can replace
the missing value with the mean. Alternatively, you may want to substitute a group mean
(e.g., the mean for females) rather than the overall mean.
The default option of statistics packages is to exclude cases that are missing values for any
variable that is included in regression. (But that case could be included in another regression,
as long as it was not missing values on any of the variables included in that analysis.) You
can change this option so that your regression analysis does not exclude cases that are
missing data for any variable included in the regression, but then you might have a different
number of cases for each variable.
Outliers
You also need to check your data for outliers (i.e., an extreme value on a particular item) An
outlier is often operationally defined as a value that is at least 3 standard deviations above or
below the mean. If you feel that the cases that produced the outliers are not part of the same
"population" as the other cases, then you might just want to delete those cases. Alternatively,
you might want to count those extreme values as "missing," but retain the case for other
variables. Alternatively, you could retain the outlier, but reduce how extreme it is.
Specifically, you might want to recode the value so that it is the highest (or lowest) non-
outlier value.
Normality
You also want to check that your data is normally distributed. To do this, you can construct
histograms and "look" at the data to see its distribution. Often the histogram will include a
line that depicts what the shape would look like if the distribution were truly normal (and you
can "eyeball" how much the actual distribution deviates from this line).
You can also construct a normal probability plot. In this plot, the actual scores are ranked and
sorted, and an expected normal value is computed and compared with an actual normal value
for each case. The expected normal value is the position a case with that rank holds in a
normal distribution. The normal value is the position it holds in the actual distribution.
Basically, you would like to see your actual values lining up along the diagonal that goes
from lower left to upper right.
You can also test for normality within the regression analysis by looking at a plot of the
"residuals." Residuals are the difference between obtained and predicted DV scores.
(Residuals will be explained in more detail in a later section.) If the data are normally
distributed, then residuals should be normally distributed around each predicted DV score. If
the data (and the residuals) are normally distributed, the residuals scatterplot will show the
majority of residuals at the center of the plot for each value of the predicted score, with some
residuals trailing off symmetrically from the center. You might want to do the residual plot
before graphing each variable separately because if this residuals plot looks good, then you
don't need to do the separate plots. Below is a residual plot of a regression where age of
patient and time (in months since diagnosis) are used to predict breast tumor size.
Reference
Argyrous, G. (2012). Statistics for Research, with a guide to SPSS. India: SAGE
Publications.
Q.4 Explain assumptions of applying One-way ANOVA and its procedure?
ANOVA
The t-tests have one very serious limitation – they are restricted to tests of the significance of
the difference between only two groups. There are many times when we like to see if there
are significant differences among three, four, or even more groups. For example we may
want to investigate which of three teaching methods is best for teaching ninth class algebra.
In such case, we cannot use t-test because more than two groups are involved. To deal with
such type of cases one of the most useful techniques in statistics is analysis of variance
(abbreviated as ANOVA). This technique was developed by a British Statistician Ronald A.
Analysis of Variance (ANOVA) is a hypothesis testing procedure that is used to evaluate
mean differences between two or more treatments (or population). Like all other inferential
procedures. ANOVA uses sample data to as a basis for drawing general conclusion about
populations. Sometime, it may appear that ANOVA and t-test are two different ways of doing
exactly same thing: testing for mean differences. In some cased this is true – both tests use
sample data to test hypothesis about population mean. However, ANOVA has much more
advantages over t-test. t-tests are used when we have compare only two groups or variables
(one independent and one dependent). On the other hand ANOVA is used when we have two
or more than two independent variables (treatment). Suppose we want to study the effects of
three different models of teaching on the achievement of students. In this case we have three
different samples to be treated using three different treatments. So ANOVA is the suitable
technique to evaluate the difference.
These data represent results of an independent-measure experiment comparing learning
performance under three temperature conditions. The scores are variable and we want to
measure the amount of variability (i.e. the size of difference) to explain where it comes from.
To compare the total variability, we will combine all the scores from all the separate samples
into one group and then obtain one general measure of variability for the complete
experiment. Once we have measured the total variability, we can begin to break it into
separate components. The word analysis means breaking into smaller parts. Because we are
going to analyze the variability, the process is called analysis of variance (ANOVA). This
analysis process divides the total variability into two basic components:
i) Between-Treatment Variance
Variance simply means difference and to calculate the variance is a process of measuring
how big the differences are for a set of numbers. The between-treatment variance is
measuring how much difference exists between the treatment conditions. In addition to
measuring differences between treatments, the overall goal of ANOVA is to evaluate the
differences between treatments. Specifically, the purpose for the analysis is to distinguish is
to distinguish between two alternative explanations.
a) The differences between the treatments have been caused by the treatment effects.
b) The differences between the treatments are simply due to chance.
Thus, there are always two possible explanations for the variance (difference) that exists
between treatments
1) Treatment Effect:
The differences are caused by the treatments. For the scores in sample 1 are obtained at room
temperature of 50o and that of sample 2 at 70 . It is possible that the difference between
sample is caused by the difference in room temperature.
2) Chance:
The differences are simply due to chance. It there is no treatment effect, even then we can
expect some difference between samples. The chance differences are unplanned and
unpredictable differences that are not caused or explained by any action of the researcher.
Researchers commonly identify two primary sources for chance differences.
Individual Differences
Each participant of the study has its own individual characteristics. Although it is reasonable
to expect that different subjects will produce different scores, it is impossible to predict
exactly what the difference will be.
Experimental Error
In any measurement there is a chance of some degree of error. Thus, if a researcher measures
the same individuals twice under same conditions, there is greater possibility to obtain two
different measurements. Often these differences are unplanned and unpredictable, so they are
considered to be by chance.
Thus, when we calculate the between-treatment variance, we are measuring differences that
could be either by treatment effect or could simply be due to chance. In order to demonstrate
that the difference is really a treatment effect, we must establish that the differences between
treatments are bigger than would be expected by chance alone. To accomplish this goal, we
will determine how big the differences is when there is no treatment effect involved. That is,
we will measure how much difference (variance) occurred by chance. To measure chance
differences, we compute the variance within treatments
Assumptions Underlying the One Way ANOVA
There are three main assumptions
i) Assumption of Independence
According to this assumption the observations are random and independent samples from the
populations. The null hypothesis actually states that the samples come from populations that
have the same mean. The samples must be random and independent if they are to be
representative of the populations. The value of one observation is not related to any other
observation. In other words, one individual’s score should not provide any clue as to how any
of the other individual should score. That is, one event does not depend on another.
A lack of assumption of independence leads to most serious consequences. If this assumption
is violated, one way ANOVA will be inappropriate to statistic,
ii) Assumption of Normality
The distributions of the population from which the samples are selected are normal. This
assumption implies that the dependent variable is normally distributed in each of the groups.
One way ANOVA is considered a robust test against the assumption of normality and
tolerates the violation of this assumption. As regards the normality of grouped data, the one
way ANOVA can tolerate data that is normal (skewed or kurtotic distribution) with only a
small effect on I error rate. However, platykurtosis can have profound effect when group
sizes are small. This leaves a researcher with two options:
i) Transform data using various algorithms so that the shape of the distribution becomes
normally distributed. Or
ii) Choose nonparametric Kruskal-Wallis H Test which does not require the assumption of
normality. (This test is available is SPSS).
iii) Assumptions of Homogeneity of Variance
The variances of the distribution in the populations are equal. This assumption provides that
the distribution in the population have the same shapes, means, and variances; that is, they are
the same populations. In other words, the variances on the dependent variable are equal
across the groups. If assumption of homogeneity of variances has been violated then tow
possible tests can be run.
i) Welch test, or
ii) Brown and Forsythe test
Alternatively, Kruskal-Wallis H Test can also be used. All these tests are available in SPSS.
Reference
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to Design and Evaluate in
Education. (8th Ed.) McGraw-Hill, New York
Pallant, J. (2005). SPSS Survival Manual – A Step by Step Guide to Data Analysis Using
SPSS for Windows (Version 12). Australia: Allen & Unwin.
Q.5 Explain rationale and procedure of Chi-Square Goodness-of-Fit Test?
Chi-Square (χ2) Goodness-of-Fit Test

The chi-square (χ2) goodness of fit test (commonly referred to as one-sample chi-square) is
the most commonly used goodness of fit test. It explores the proportion of cases that fall
into the various categories of a single variable, and compares these with hypothesized
values. In some simple words we can say that it is used to find out how the observed value
of a given phenomena is significantly different from the expected value. Or we can also say
that it is used to test if sample data fits a distribution from a certain population. In other
words we can say that chi-square goodness of fit test tells us if the sample data represents
the data we expect to find in the actual population. It tells us whether sample data are
consistent with a hypothesized distribution. This is a variation of more general chi-square
test. The setting for this test is a single categorical variable that can have many levels.
In chi-square goodness of fit test sample data is divided into intervals. Then, the numbers
of points that fall into the intervals are compared with the expected numbers of points in
each interval. . The null hypothesis for the chi-square goodness of fit test is that the data
does not come from the specified distribution. The alternate hypothesis is that the data
comes from the specified distribution. The formula for chi-square goodness of fit test is:
Procedure for Chi-Square (χ2) Goodness of Fit Test
For using chi-square (χ2) goodness of fit test we will have to set up null and alternate
hypothesis. A null hypothesis assumes that there is no significance difference between
observed and expected value. Then, alternate hypothesis will become, there is significant
different difference between the observed and the expected value. Now compute the
value of chi-square of fit test using formula:

Two potential disadvantages of chi-square are:
a) The chi-square test can only be used to put data into classes. If there is data that have not
been put into classes then it is necessary to make a frequency table of histogram before
performing the test.
b) It requires sufficient sample size in order for chi-square approximation to be valid.
A. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that there
is no significant
difference between the observed and the expected value.
B. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a
significant difference between the observed and the expected value.
Compute the value of Chi-Square goodness of fit test using the following formula:
Where, = Chi-Square goodness of fit test O= observed value E= expected value
Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom depends on the
distribution of the sample.
Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the same as in
other tests, like t-test, ANOVA , etc. The calculated value of Chi-Square goodness of fit test
is compared with the table value. If the calculated value of Chi-Square goodness of fit test is
greater than the table value, we will reject the null hypothesis and conclude that there is a
significant difference between the observed and the expected frequency.
If the calculated value of Chi-Square goodness of fit test is less than the table value, we will
accept the null hypothesis and conclude that there is no significant difference between the
observed and expected value.

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the
observed value of a given phenomenon is significantly different from the expected value.
Reference
Gay, L. R., Mills, G. E., & Airasian, P. W. (2010). Educational Research: Competencies
for Analysis and Application, 10th Edition. Pearson, New York USA

8614 (1) - 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8614 (1) - 1

Uploaded by

Copyright:

Available Formats

Allama Iqbal Open University

Semester: Spring, 2022

Course: Educational Statistics

Student Name: Saqib Khalid

Father Name: Muhammad Khalid

The mode is the most frequently occurring score in the distribution.

You can calculate mode in several different ways, including:

An online calculator is also used with a mode function:

Do an online search for "calculator with function of mode "

below are the steps to calculate mode using Excel:

Highlight the column.

Click the "Sort" button on the toolbar.

number from your data set.

In the function box, type "MODE:(inauguration CELL:conclusion CELL)" using your

specific beginning and ending cell labels, like B2:B32

Press enter. The mode should appear in the selected cell.

i) It is easy to understand and easy to calculate.

ii) It is not affected by extreme values.

iv) It can be located just by inspection in many cases.

v) It can be located graphically.

vi) It is always present in the data.

vii) It is applicable for both quantitative and qualitative data.

viii) It is useful for methodological forecasts.

The number which is repeated maximum number of times is the mode.

3) Mode can be determined in distributions with open classes.

4) Mode can be located on the graph also.

7) It is the most observed data point.

i) It is not rigidly defined.

ii) It is not based upon all values of the given data.

iii) It is not capable of further mathematical calculation.

iv) There will be no mode if there is no common value in the data.

v) It cannot be used for further methodological processing

may come across

2) It is not based upon all the observation.

not rigidly defined.

4) It is affected to a greater extent fluctuations of sampling.

1) Mode is used by manufacturers of ready- made garments, shoes and accessories in

common use etc. The

persons than the other

use and others in

less quantity. Mode is used when the data is categorized.

Anderson, T. W., & Sclove, S. L. (1974). Introductory Statistical Analysis, Finland:

Houghton Mifflin Company.

Q.2 Discuss t-test and its application in educational research?

the mean score on some continuous variable.

Sciences (4th Ed.). Wadsworth, California, USA.

Q.3 Why do we use regression analysis? Where is it applied?

number of independent variables. If the dependent variable is dichotomous, then logistic

The independent variables used in regression can be either continuous or dichotomous.

occurring variables, as opposed to experimentally manipulated variables, although you can

cases for every IV in the model).

scale should not have a value of 8.

in mind when interpreting your findings and not overgeneralize.

number of cases for each variable.

from lower left to upper right.

Q.4 Explain assumptions of applying One-way ANOVA and its procedure?

(abbreviated as ANOVA). This technique was developed by a British Statistician Ronald A.

Analysis of Variance (ANOVA) is a hypothesis testing procedure that is used to evaluate

technique to evaluate the difference.

These data represent results of an independent-measure experiment comparing learning

to distinguish between two alternative explanations.

b) The differences between the treatments are simply due to chance.