You are on page 1of 6

The word ‘statistics’ seems to have been derived from the Latin

word ’status’ which means a political state. Originally, statistics was simply
the collection of numerical data on some aspects of life of the people
useful to the government. Today, statistics involves the collection,
classification, presentation, and analysis of numerical facts or data.


When facts, observations or statements are taken on a particular subject,

they are collectively known as data.

The information collected by the investigator himself or herself with a

definite purpose in his or her mind is called a primary data.

The information gathered from a source which already had information

stored is called secondary data.

A particular value of a variable is called variate or observation.

The numerical data recorded in its original form as it is collected by the

investigator or received from some source is called raw data. Consider the
following data collected on the number of siblings each student in class 9 th
of a particular has:

21001311201003222301 001121102311

Each digit above represents an observation. This form of data is called a

raw data.

A quantity which is being measured in an experiment is called a variable.

Continuous variables are variables which can take any value between two
given values.

Discontinuous variable are variables which cannot take all possible values
between two given values.

Range tells how far apart the greatest and least numbers in a set are. It is
the difference between the largest and smallest numbers.

Now, in the above given data many students have 0 sibling while some
have 3 siblings. The number of times an observation occurs in a set of
data is known as its frequency of occurrence or simply frequency.
The tabular representation of the frequency of all the observations is
known as a frequency distribution and the table itself is known as
frequency distribution table.


Suppose we have to construct a frequency distribution for the above data

of siblings, we will first arrange the data in ascending or descending order:


Such presentation of data gives more information. The data arranged in

such form(ascending order or descending order) is called arrayed data.

To make it easily understandable we now create a frequency distribution

table. To prepare table, observations are first listed in a column. In the
column, next to it, each occurrence of an observation is marked with a
tally mark | . After 4 tally marks against an observation, the fifth time an
observation is repeated, it is marked with a long reverse that crosses out
the first four marks. The number of tally marks against a given
observation is its frequency, which is written in the column next to tally

The above table is simple or ungrouped frequency distribution table.


Mean is the average of all the values. It is the sum of the

observation divided by the number of observation. If the
observations are x1, x2, x3,….,xn, then the average of n terms or

Mean = sum of observations/number of observations

= x1+x2+x3+…..xn /n

= ∑xi /n, where i=1 to n

(The Greek letter ∑, pronounced ‘Sigma’ indicates summation)

Median is the middle value, dividing the number of data into 2
halves.Thus, if there are n observations x 1, x2, x3,…..xn then

If n is odd Median = n+1/2 th observation

If n is even Median = n/2 th observation + (n/2 + 1) th

observation /2


While constructing the simple frequency distribution, the data

was such that each observation could be listed individually in a

Now, consider the following data on the marks 32 students got

in English exam

78 84 53 62 71 86 43 66 36 77 48 59 76 81 92 58 68 74 79 85
65 49 81 75 57 78 84 65 73 42 87 74

Arranging the above data in an array, we have

36 42 43 48 49 53 57 58 59 62 65 65 66 68 71 73 74 74 75 76
77 78 78 79 81 81 84 84 85 86 87 92

We find that there are only 5 observations(65,74,78,81and 84)

with a frequency of 2 and a total of (32-5=) 27 distinct
observations. Listing all the observations in a table would not
only be tedious , but will also not help us to make any
meaningful deductions.

In such cases, the data is grouped into classes. As, 36 is the

lowest observation and 92 is the highest observation, the entire
data can be grouped only in 7 classes, viz., 30-40,40-50,50-
60,60-70,70-80,80-90,90-100. In the class 30-40, 30 is the
lower limit and 40 is the upper limit. In this example, the
classes are non-overlapping but continuous. Such a frequency
distribution is called continuous distribution. In this distribution,
the upper limit of a class coincides with the lower class limit of
the next class.


The difference between the two class limits is known as the

class interval or class width or class size.

The mid point of a class interval is known as the class mark. It

is the average of the lower limit and the upper limit.
a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data
Branch of mathematics concerned with collection, classification, analysis, and interpretation of
numerical facts, for drawing inferences on the basis of their quantifiable likelihood (probability).
Statistics can interpret aggregates of data too large to be intelligible by ordinary observation because
such data (unlike individual quantities) tend to behave in regular, predictable manner.

Types of quantitative data

 .
A Bar Graph (also called Bar Chart) is a graphical display of data using bars of
different heights.
Histogram: a graphical display of data using bars of different heights.

It is similar to a Bar Chart , but a histogram groups numbers into ranges .

The height of each bar shows how many fall into each range.

And you decide what ranges to use!

Frequency is the number of times a particular value occurs in a set of data.

Usually we would record the frequency of data in a frequency table.

The mode of a set of observations is the value that occurs most frequently in
the set. A set of observations may have no mode, one mode or more than
one mode.

Mode is the most common value among the given observations. For example,
a person who sells ice creams might want to know which flavour is the most

In other words, 50% of the observations is below the median and 50% of the
observations is above the median.

. For example, a teacher may want to know the average marks of a test in his

A histogram is a vertical bar chart in which the frequency corresponding to a

class is represented by the area of a bar (or rectangle) whose base is the
class width.

You might also like