You are on page 1of 22

Name Jahangir Khan

Father Name Soorat Khan

Roll No. CB643097

Semester Autumn 2021

Tutor Name Din Muhammad

Course Name Educational Statistics

Course Code 8614

Assignment No. 1

Contact No 03138263473

Date Of Submission 23/02/2022


Question No.1:

What are the functions of Statistics? Discuss in detail with reference to research in education

Definition of Statistics:

Statisticians have defined the term in different ways.

Some of the definitions are given below:

Longman Dictionary:

Statistics is a collection of numbers which represent facts or measurement.

Webster:

‘Statistics are the classified facts representing the conditions of the people in a state especially those
facts which can be stated in numbers or in tables of numbers of in any tabular or classified
arrangements. A.L. Bowley:

Statistics are numerical statements of facts in any department of enquiry placed in

H. Sacrist:

“By statistics we mean aggregate of facts affected to a marked extent by multiplicity of causes,
numerically expressed, enumerated or estimated according to reasonable standard of accuracy,
collected in a systematic manner for a predetermined purpose and placed in relation to each other.”

From the above definitions it can be said that statistics is:

a. Numerical facts which can be measured enumerated and estimated.


b. Facts are homogeneous and related to each other.
c. Facts must be accurate.
d. It must be collected systematically.
Lovitt:

“Statistics is that which deals with the collection, classification and tabulation of numerical facts as the
basis for explanation, description and comparison of phenomena.”

Statistics has a numerous functions to do.

The following points explain the functions of statistics in summary:

1. It helps in collecting and presenting the data in a systematic manner.


2. It helps to understand unwisely and complex data by simplifying it.
3. It helps to classify the data.
4. It provides basis and techniques for making comparison.
5. It helps to study the relationship between different phenomena.
6. It helps to indicate the trend of behaviour.
7. It helps to formulate the hypothesis and test it.
8. It helps to draw rational conclusions.
Statistics in Education:

Measurement and evaluation are essential part of teaching learning process. In this process we obtained
scores and then interpret these score in order to take decisions. Statistics enables us to study these
scores objectively. It makes the teaching learning process more efficient.

The knowledge of statistics helps the teacher in the following way:

1. It helps the teacher to provide the most exact type of description:


When we want to know about the pupil we administer a test or observe the child. Then from the result
we describe about the pupil’s performance or trait.

Statistics helps the teacher to give an accurate description of the data.

2. It makes the teacher definite and exact in procedures and thinking:


Sometimes due to lack of technical knowledge the teachers become vague in describing pupil’s
performance. But statistics enables him to describe the performance by using proper language, and
symbols. Which make the interpretation definite and exact.

3. It enables the teacher to summarize the results in a meaningful and convenient form:
Statistics gives order to the data. It helps the teacher to make the data precise and meaningful and to
express it in an understandable and interpretable manner.

4. It enables the teacher to draw general conclusions:


Statistics helps to draw conclusions as well as extracting conclusions. Statistical steps also help to say
about how much faith should be placed in any conclusion and about how far we may extend our
generalization.

5. It helps the teacher to predict the future performance of the pupils:


Statistics enables the teacher to predict how much of a thing will happen under conditions we know and
have measured. For example the teacher can predict the probable score of a student in the final
examination from his entrance test score. But the prediction may be erroneous due to different factors.
Statistical methods tell about how much margin of error to allow in making predictions.

6. Statistics enables the teacher to analyse some of the causal factors underlying complex and
otherwise be-wildering events:

It is a common factor that the behavioural outcome is a resultant of numerous causal factors. The
reason why a particular student performs poor in a particular subject are varied and many. So with the
appropriate statistical methods we can keep these extraneous variables constant and can observe the
cause of failure of the pupil in a particular subject.

Important Concepts in Statistics:

Data:

Data may be defined as information obtained from a survey, an experiment or an investigation.

Score:

Score is the numerical evaluation of the performance of an individual on a test.


Continuous Series:

Continuous series is a series of observations in which the various possible values of the variable may
differ by infinitesimal amounts. In the series it is possible to occur at any intermediate value within the
range of the series.

Discrete Series:

Discrete series is a series in which the values of a variable are arranged according to magnitude or to
some ordered principles. In this series it is not possible to occur at any intermediate value within the
range. The example of such is merit, number of persons or census data.

Variable:

Any trait or quality which has the ability to vary or has at least two points of measurement. It is the trait
that changes from one case or condition to another.

Variability:

The spread of scores, usually indicated by quartile deviations, standard deviations, range etc.

Frequency:

Frequency may be defined as the number of occurrences of any given value or set of values. For
example 8 students have scored 65. So that the score 65 has a frequency of

8.

Frequency Distribution:

It is a tabulation showing the frequencies of the values of a variable when these values are arranged in
order of magnitude.

Correlation:

Correlation means the interdepended between two or more random variables. It may be stated as the
tendency for corresponding observation in two or more series to vary together from the averages of
their respective series, that is, to have similar relative position.

If corresponding observations tend to have similar relative positions in their respective series, the
correlation is positive; if the corresponding values tend to be divergent in position in their respective
series, the correlation is negative; absence of any systematic tendency for the corresponding
observations to be either similar or dissimilar in their relative positions indicated zero correlation.

Coefficient:

It is a statistical constant that is independent of the unit of measurement.

Coefficient of correlation:

It is a pure number, limited by the values + 1.00 and —1.00 that expresses the degree of relationship
between two continuous variables.

Functions of Statistics:

The various functions performed by statistics in modern times are discussed under:
1. Simplification of Complex Facts:
The foremost purpose of the statistics is to simplify huge collection of numerical data. It is beyond the
reach of human mind to remember and recollect the huge facts and figures. Statistical method makes it
possible to understand the whole in the short span of time and in a better way.

2. Comparison:
Comparison of data is yet another function of statistics, simplifying the data; it can be correlated or
compared by certain mathematical question like averages, ratios, coefficients etc.

In this regard Boddingtons opined that the object of statistics is to enable comparison to be made
between past and present results will a view to ascertain the reasons for changes which have taken
place and the effect of such changes in the future.

3. Relationship between Facts:


Statistical methods are used to investigate the cause and effect relationship between two or more facts.
The relationship between demand and supply,

supply and price level can be best understood with the help of statistical methods.

4. Formulation and Testing of Hypothesis:


The most theoretical function of statistics is to test the various types of hypothesis and discover a new
theory. For instance, by using appropriate statistical tools we can test the hypothesis whether a
particular coin is fair or not, whether Indian consumers are brand loyal etc.

5. Forecasting:
Statistical methods are of great use to predict the future course of action of the enon. It is only on the
basis of statistical techniques that I planners in India prepare future estimates for production,
consumption, investment etc.

6. Enlarges Individual Knowledge:


Statistical methods sharpen the faculty of rational thinking and reasoning of an individual. It is a master-
key that solves problems of mankind in every sphere of life.

Thus, Whipple has rightly opined statistics enables one to enlarge his horizon.

7. To Indicate Trend Behaviour:


Statistics helps to indicate trend behaviour certain fields of enquiry. The statistical techniques like
Analysis of Time Series Extrapolation etc. are highly used to know the trend behaviour of the enquiry in
question

8. Classification of Data:
Classification refers to a process of splitting up the data into certain parts which helps in the matters of
comparison and interpretation of the various features of the data. This is done by the various improved
techniques statistics.
9. To Measure Uncertainty:
In most of the social fields, comprising of business, commerce, economics, it becomes necessary to take
decisions in the face of uncertainty and study the change of occurrence of certain events and their effect
on the polio adopted.

10. To Draw Rational Conclusion:


In various fields of uncertainty like business and commerce, it is very much necessary to draw rational
conclusions on the basis of facts collected and , the mind of the decision maker should be free from any
bias and prejudices.

Question No. 2:

Write a comprehensive essay on “Types of Variables”.

Essay on Types of Variables:

A “variable” in algebra really just means one thing an unknown value. However, in statistics, you’ll come
across dozens of types of variables. In most cases, the word still means that you’re dealing with
something that’s unknown, but unlike in algebra that unknown isn’t always a number.

Some variable types are used more than others. For example, you’ll be much more likely to come across
continuous variables than you would dummy variables. The following lists are sorted into common types
of variables (like independent and dependent) and less common types (like covariate and non-
combatant).

Click on any bold variable name to learn more about that particular type.

Common Types of Variables

• Categorical variable: variables than can be put into categories. For example, the

category “Toothpaste Brands” might contain the variables Colgate and


Aquafresh.

• Confounding variable: extra variables that have a hidden effect on your experimental results.

• Continuous variable: a variable with infinite number of values, like “time” or

• “weight”.

• Control variable: a factor in an experiment which must be held constant. For example, in an
experiment to determine whether light makes plants grow faster, you would have to control for
soil quality and water.

• Dependent variable: the outcome of an experiment. As you change the independent variable, you
watch what happens to the dependent variable.

• Discrete variable: a variable that can only take on a certain number of values. For example,
“number of cars in a parking lot” is discrete because a car park can only hold so many cars.

• Independent variable: a variable that is not affected by anything that you, the researcher, do.
Usually plotted on the x-axis.

• Lurking variable: a “hidden” variable the affects the relationship between the
independent and dependent variables.

• A measurement variable has a number associated with it. It’s an “amount” of something, or a
“number” of something.

• Nominal variable: another name for categorical variable.

• Ordinal variable: similar to a categorical variable, but there is a clear order. For example, income
levels of low, middle, and high could be considered ordinal.

• Qualitative variable: a broad category for any variable that can’t be counted (i.e. has no numerical
value). Nominal and ordinal variables fall under this umbrella term.

• Quantitative variable: A broad category that includes any variable that can be counted, or has a
numerical value associated with it. Examples of variables that fall into this category include discrete
variables and ratio variables.

• Random variables are associated with random processes and give numbers to outcomes of random
events.

• ranked variable is an ordinal variable; a variable where every data point can be put in order (1st,
2nd, 3rd, etc.).

• Ratio variables: similar to interval variables, but has a meaningful zero.

Less Common Types of Variables

• Active Variable: a variable that is manipulated by the researcher.

• Antecedent Variable: a variable that comes before the independent variable.

• Attribute variable: another name for a categorical variable (in statistical software) or a variable that
isn’t manipulated (in design of experiments).

• Binary variable: a variable that can only take on two values, usually 0/1. Could also be yes/no,
tall/short or some other two-variable combination.

• Collider Variable: a variable represented by a node on a causal graph that has paths pointing in as
well as out.

• Covariate variable: similar to an independent variable, it has an effect on the dependent variable
but is usually not the variable of interest. See also: Noncomitant variable.

• Criterion variable: another name for a dependent variable, when the variable is used in non-
experimental situations.

• Dichotomous variable: Another name for a binary variable.

• Dummy Variables: used in regression analysis when you want to assign relationships to
unconnected categorical variables. For example, if you had the categories “has dogs” and “owns a
car” you might assign a 1 to mean “has dogs” and 0 to mean “owns a car.”

• Endogenous variable: similar to dependent variables, they are affected by other variables in the
system. Used almost exclusively in econometrics.

• Exogenous variable: variables that affect others in the system.


• Explanatory Variable: a type of independent variable. When a variable is independent, it is not
affected at all by any other variables. When a variable isn’t independent for certain, it’s an
explanatory variable.

• Extraneous variables are any variables that you are not intentionally studying in your experiment
or test.

• A grouping variable (also called a coding variable, group variable or by variable) sorts data within
data files into categories or groups.

• Identifier Variables: variables used to uniquely identify situations.

• Indicator variable: another name for a dummy variable.

• Interval variable: a meaningful measurement between two variables. Also sometimes used as
another name for a continuous variable.

• Intervening variable: a variable that is used to explain the relationship between variables.

• Latent Variable: a hidden variable that can’t be measured or observed directly.

• Manifest variable: a variable that can be directly observed or measured.

• Manipulated variable: another name for independent variable.

• Mediating variable or intervening variable: variables that explain how the relationship between
variables happens. For example, it could explain the difference between the predictor and
criterion.

• Moderating variable: changes the strength of an effect between independent and dependent
variables. For example, psychotherapy may reduce stress levels for women more than men, so sex
moderates the effect between psychotherapy and stress levels.

• Nuisance Variable: an extraneous variable that increases variability overall.

• Observed Variable: a measured variable (usually used in SEM).

• Outcome variable: similar in meaning to a dependent variable, but used in a non-experimental


study.

• Polychotomous variables: variables that can have more than two values.

• Predictor variable: similar in meaning to the independent variable, but used in regression and in
non-experimental studies.

• Responding variable: an informal term for dependent variable, usually used in science fairs.

• Scale Variable: basically, another name for a measurement variable.

• Study Variable (Research Variable): can mean any variable used in a study, but does have a more
formal definition when used in a clinical trial.

• Test Variable: another name for the Dependent Variable.

• Treatment variable: another name for independent variable.


Question No. 3:

Explain “Exploratory Data Analysis” in detail.

In data mining, Exploratory Data Analysis (EDA) is an approach to analyzing datasets to summarize their
main

characteristics, often with visual methods. EDA is used for seeing what the data can tell us before the
modeling task. It is not easy to look at a column of numbers or a whole spreadsheet and determine
important characteristics of the data. It may be tedious, boring, and/or overwhelming to derive insights
by looking at plain numbers.

Exploratory data analysis techniques have been devised as an aid in this situation.

Exploratory data analysis is generally cross-classified in two ways. First, each method is either non-
graphical or graphical. And second, each method is either univariate or multivariate (usually just
bivariate).

Exploratory Data Analysis with Chartio:

We will perform exploratory data analysis on the iris dataset to familiarize ourselves with the EDA
process. Let’s look at a few sample data points:

The dataset contains four features – sepal length, sepal width, petal length, and petal width for each the
different species (versicolor, virginica, setosa) of the iris flower. In the dataset, there are 50 instances
(rows of data) of each species, a total of

150 data points.

Univariate Analysis:

Univariate analysis is the simplest form of data analysis, where the data being analyzed consists of only
one variable. Since it’s a single variable, it doesn’t deal with causes or relationships. The main purpose of
univariate analysis is to describe the data and find patterns that exist within it. Let us look at a few
visualizations used for performing univariate analysis.

Box Plots:

A box and whisker plot – also called a box plot – displays the five-number summary of a set of data. The
five-number summary is the minimum, first quartile, median, third quartile, and maximum.
The box plots created in Chartio provide us with the summary of the four numerical features in the
dataset. We can observe that the distribution of petal length and width is more spread out, as exhibited
by the bigger size of the boxes. Whereas, the s

epal length and width is concentrated around it’s median. Moreover, in the sepal width box plot, we can
observe a few outliers, as shown by the dots above and below the whisker.

Histogram

A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a
set of continuous data. This allows the inspection of the data for its underlying distribution (e.g. normal
distribution), outliers, skewness, etc.

The above plots show the histogram of sepal and petal widths made in Chartio. From the charts it can be
observed that the sepal width follows a Gaussian distribution. However, petal width is more skewed
towards the right, and the majority of the flower samples have a petal width less than 0.4 cm.
Multivariate analysis

Multivariate data analysis refers to any statistical technique used to analyze data that arises from more
than one variable. This models more realistic applications, where each situation, product, or decision
involves more than a single variable. Let us look at a few visualizations used for performing multivariate
analysis.

Scatter Plot

A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for
two different variables – one plotted along the x-axis and the other plotted along the y-axis.

Above are examples of two scatter plots made using Chartio. We can observe that there is a linear
relationship between petal length and width. However, with increase in sepal length, the sepal width
does not increase proportionally hence they do not have a linear relationship.

In a scatter plot, if the points are color-coded, an additional variable can be displayed. For example, let
us create the petal length vs width chart below by color coding each point based on the flower species.

We can observe that ‘setosa’ species has the lowest petal length and width,

‘virginica’ has the highest, and ‘versicolor’ lies between them. By plotting more dimensions, deeper
insights can be drawn from the data.
Bar Chart

A bar chart represents categorical data, with rectangular bars having lengths proportional to the values
that they represent. For example, we can use the iris dataset to observe the average petal and sepal
lengths/widths of all the different species.

Observing the bar charts, we can conclude that ‘virginica’ has the highest petal length, petal width and
sepal length, followed by ‘versicolor’ and ‘setosa’. However, sepal width deviates from this trend where
‘setosa’ is highest followed by ‘virginica’ and ‘versicolor’.

The exploratory data analysis we performed provides us with a good understanding of what the data
contains. Once this stage is complete, we can perform more complex modeling tasks such as clustering
and classification.

Apart from the charts shown in our EDA example, we can use various other charts depending on the
characteristics of our data:

1. Line charts to show changes over time


2. Pie charts to show the relationship between a part to a whole
3. Map charts to visualize location data
Conclusion

EDA is a crucial step to take before diving into machine learning or statistical modeling because it
provides the context needed to develop an appropriate model for the problem at hand and to correctly
interpret its results. EDA is valuable to the data scientist to make certain that the results they produce
are valid, correctly interpreted, and applicable to the desired business contexts.
Question No. 4:

Write down the basic purpose of measurement of central tendency. Give examples from your daily
life.

Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the
central position within that set of data. As such, measures of central tendency are sometimes called
measures of central location. They are also classed as summary statistics. The mean (often called the
average) is most likely the measure of central tendency that you are most familiar with, but there are
others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under different conditions,
some measures of central tendency become more appropriate to use than others. In the following
sections, we will look at the mean, mode and median, and learn how to calculate them and under what
conditions they are most appropriate to be used.

Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central tendency. It can be used
with both discrete and continuous data, although its use is most often with continuous data. The mean
is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if
we have n values in a data set and they have values x1,x2, …,xn, the sample mean, usually denoted by
x― (pronounced "x bar"), is:

x―=x1+x2+ +xnn

This formula is usually written in a slightly different manner using the Greek capitol letter, ∑,
pronounced "sigma", which means "sum of...":

x―=∑xn

You may have noticed that the above formula refers to the sample mean. So, why have we called it a
sample mean? This is because, in statistics, samples and populations have very different meanings and
these differences are very important, even if, in the case of the mean, they are calculated in the same
way. To acknowledge that we are calculating the population mean and not the sample mean, we use the

Greek lower case letter "mu", denoted as μ:

μ=∑xn

The mean is essentially a model of your data set. It is the value that is most common. You will notice,
however, that the mean is not often one of the actual values that you have observed in your data set.
However, one of its important properties is that it minimises error in the prediction of any one value in
your data set. That is, it is the value that produces the lowest amount of error from all other values in
the data

set.

An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are
values that are unusual compared to the rest of the data set by being especially small or large in
numerical value. For example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean
value might not be the best way to accurately reflect the typical salary of a worker, as most workers
have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in
this situation, we would like to have a better measure of central tendency. As we will find out later,
taking the median would be a better measure of central tendency in this situation.

Another time when we usually prefer the median over the mean (or mode) is when our data is skewed
(i.e., the frequency distribution for our data is skewed). If we consider the normal distribution - as this is
the most frequently assessed in statistics - when the data is perfectly normal, the mean, median and
mode are identical. Moreover, they all represent the most typical value in the data set. However, as the
data becomes skewed the mean loses its ability to provide the best central location for the data because
the skewed data is dragging it away from the typical value. However, the median best retains this
position and is not as strongly influenced by the skewed values. This is explained in more detail in the
skewed distribution section later in this guide.

Median

The median is the middle score for a set of data that has been arranged in order of magnitude. The
median is less affected by outliers and skewed data. In order to calculate the median, suppose we have
the data below:

65 55 89 56 35 14 56 55 87 45 92

We first
need to
rearrange
that data
into order
of
magnitud
e
(smallest
first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark because
there are 5 scores before it and 5 scores after it. This
works fine when you have an odd number of scores, but what happens when you have an even number
of scores? What if you had only 10 scores? Well, you simply have to take the middle two scores and
average the result. So, if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again
rearrange
that data
into order
of
magnitud
e
(smallest
first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average them to get a median of
55.5.

Mode

The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a
bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular
option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most common
category, as illustrated below:

We can see above that the most common form of transport, in this particular data set, is the bus.

However, one of the problems with the mode is that it is not unique, so it leaves us with problems when
we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is particularly
problematic when we have continuous data because we are more likely not to have any one value that
is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1
kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)?
The answer, is probably very unlikely - many people might be close, but with such a small sample (30
people) and a large range of possible weights, you are unlikely to find two people with exactly the same
weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.

Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as depicted
in the diagram below:

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not
representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the
mode to describe the central tendency of this data set would be misleading.

Skewed Distributions and the Mean and Median

We often test whether our data is normally distributed because this is a common assumption underlying
many statistical tests. An example of a normally distributed set of data is presented below:
When you have a normally distributed sample you can legitimately use both the mean or the median as
your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode
are equal. However, in this situation, the mean is widely preferred as the best measure of central
tendency because it is the measure that includes all the values in the data set for its calculation, and any
change in any of the scores will affect the value of the mean. This is not the case with the median or
mode.

However, when our data is skewed, for example, as with the right-skewed data set below:
We find that the mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the data. The more skewed
the distribution, the greater the difference between the median and mean, and the greater emphasis
should be placed on using the median as opposed to the mean. A classic example of the above
rightskewed distribution is income (salary), where higher-earners provide a false representation of the
typical income if expressed as a mean and not a median.

If dealing with a normal distribution, and tests of normality show that the data is non-normal, it is
customary to use the median instead of the mean. However, this is more a rule of thumb than a strict
guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and
mean are not appreciably different (a subjective assessment), and if it allows easier comparisons to
previous research to be made.

Summary of when to use the mean, median and mode

Please use the following summary table to know what the best measure of central tendency is with
respect to the different types of variable.

Type of Variable Best measure of central tendency

Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median


Question No. 5:

Write down merits and demerits of Mean, Median and Mode.

MEAN

The arithmetic mean (or simply "mean") of a sample is the sum of the sampled values divided by the
number of items in the sample.

Advantages and Disadvantages of Arithmetic Mean

Main advantages and disadvantages of arithmetic mean can be highlighted as follows:

Advantages/Merits of Arithmetic Mean

1. Simplicity
Arithmetic mean is most widely applied measure because it can be calculated very easily and can be
understood without any complication.

2. Rigid Formula
The value of arithmetic mean is always fixed as it is defined by rigid formula.

3. Helps Further Study


Arithmetic mean can be used for further study especially for algebraic

calculation and statistical analysis.

4. Minimum Fluctuation
Arithmetic mean is less affected by fluctuation in sampling.

5. No Need of Data Arrangement


Arithmetic mean does not require the arrangement of data and dividing of data like other measures of
central tendencies (i.e median and mode).

6. Based On Observation
Arithmetic mean is completely based on observation, So, it represents the data not the position of
terms.

Disadvantages/Demerits of Arithmetic Mean

1. Arithmetic mean cannot be determined if any item of observation is missing.


2. AM is too much affected by extreme values of the series.
3. AM is not applicable for qualitative data.
4. In case of open end class intervals, arithmetic mean cannot be calculated.
Median: Median may be defined as the size (actual or estimated) to that item which falls in the
middle of a series arranged either in the ascending order or the descending order of their magnitude.
It lies in the center of a series and divides the series into two equal parts. Median is also known as an
average of position.
Merits of Median:

1. It is simple to understand and easy to calculate, particularly is individual and discrete series.
2. It is not affected by the extreme items in the series.
3. It can be determined graphically.
4. For open-ended classes, median can be calculated.
5. It can be located by inspection, after arranging the data in order of magnitude.

Demerits of Median:

1. It does not consider all variables because it is a positional average.


2. The value of median is affected more by sampling fluctuations
3. It is not capable of further algebraic treatment. Like mean, combined median cannot be calculated.
4. It cannot be computed precisely when it lies between two items.
Merits of median

(1) Simplicity: - It is very simple measure of the central tendency of the series. simple statistical series,
just a glance at the data is enough to locate the median value.

(2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not destroyed by
the extreme values of the series.

(3) Certainty: - Certainty is another merits is the median. Median values are always a certain specific
value in the series.

(4) Real value: - Median value is real value and is a better representative value of the series compared
to arithmetic mean average, the value of which may not exist in the series at all.

(5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also through
the graphic presentation of data.

(6) Possible even when data is incomplete: - Median can be estimated even in the case of certain
incomplete series. It is enough if one knows the number of items and the middle item of the series.

Demerits of median:

Following are the various demerits of median:

(1) Lack of representative character: - Median fails to be a representative measure in case of such
series the different values of which are wide apart from each other. Also, median is of limited
representative character as it is not based on all the items in

the series.

(2) Unrealistic: - When the median is located somewhere between the two middle values, it
remains only an approximate measure, not a precise value.
(3) Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic treatment, but
median is not. For example, multiplying the median with the number of items in the series will not give
us the sum total of the values of the series. However, median is quite a simple method finding an
average of a series. It is quite a commonly used measure in the case of such series which are related to
qualitative observation as and health of the student.

Mode:

The value of the variable which occurs most frequently in a distribution is called the mode.

Merits of mode:

Following are the various merits of mode:

(1) Simple and popular: - Mode is very simple measure of central tendency. Sometimes, just at the
series is enough to locate the model value. Because of its

simplicity, it’s a very popular measure of the central tendency.

(2) Less effect of marginal values: - Compared top mean, mode is less affected by marginal values
in the series. Mode is determined only by the value with highest frequencies.

(3) Graphic presentation: - Mode can be located graphically, with the help of histogram.

(4) Best representative: - Mode is that value which occurs most frequently in the series.
Accordingly, mode is the best representative value of the series.

(5) No need of knowing all the items or frequencies: - The calculation of mode does not require
knowledge of all the items and frequencies of a distribution. In simple series, it is enough if one knows
the items with highest frequencies in the distribution.

Demerits of mode:

Following are the various demerits of mode:

(1) Uncertain and vague: - Mode is an uncertain and vague measure of the central tendency.
(2) Not capable of algebraic treatment: - Unlike mean, mode is not capable of further algebraic
treatment.

(3) Difficult: - With frequencies of all items are identical, it is difficult to identify the modal value.
(4) Complex procedure of grouping: - Calculation of mode involves cumbersome procedure of grouping
the data. If the extent of grouping changes, there will be a change in the model value.

(5) Ignores extreme marginal frequencies: - It ignores extreme marginal frequencies. To that extent
model value is not a representative value of all the items in a series. Besides, one can question the
representative character of the model value as its calculation does not involve all items of the series.

You might also like