You are on page 1of 32

Data as they stand

at the time of
their collection,
unclassified and
unorganised, are
called raw data.
Organisation of data refers to the systematic arrangement of collected figures (raw data), so that
the data becomes easy to understand and more convenient for further statistical treatment .
Classification is the process of arranging data into sequences and groups according to their
common characteristics or separating them into different but related parts.

METHODS OF CLASSIFICATION:
 Geographical classification (Spatial classification)
 Qualitative classification (Temporal classification)
 Quantitative classification
 Chronological classification
When the data is
classified according to
geographical location
or region, it is known
as geographical
classification.
When data is classified with respect to different periods of time, the type of classification is known as
chronological classification.
In qualitative classification, data is classified on the basis of descriptive characteristics or on the basis of
attributes like sex, literacy, religion, caste, education etc, which cannot be quantified.
In this classification, data is classified on the basis of some characteristics which can be measured such
as height, weight, income, expenditure, production or sales.
o To simplify and condense the mass of data into an
easily comprehensible form.
o To explain similarity and dissimilarity of data.
o To facilitate meaningful comparisons, draw
inferences and locate fact.
o To study cause and effect relationships based on
some criteria between the data. (For example, the
characteristics of income and education can be related after classifying
the mass of data.)
o To prepare the data for tabulation.
o To present a mental picture of a situation.
The classification should conform to the object of enquiry.
(For example, if investigation is conducted to enquire into the economic conditions of
workers, then it will be of no use to classify them on the basis of their religion.)
The classification should not lead to any confusion.
Classification should be so exhaustive that every unit of the series
should find place in one group or another.
It should be capable of being adjusted according to the changed
situations and conditions.
The classes must not overlap so that an observed value belongs to
one and only one of the classes.
The principle of classification, once decided, should remain same
throughout the analysis.
All units belonging to a group should exhibit similar characteristics.
• Classified data condenses the raw data into a form suitable
for statistical analysis.
• Complexities are removed and features are highlighted.
• Facilitates comparisons and drawing inferences from the
data.
• Mutual relationships among elements of the data set can
be observed.
• Data can be separated into homogenous groups which
helps in statistical analysis.
CONCEPT OF VARIABLE
A variable refers to quantity or attribute whose value varies from one investigation to another. Example:
“Price” is a variable as prices of different commodities is different.

Variables are of two kinds:

(i) Discrete variable – variables which are capable of taking only exact value and not any fractional value
are termed as discrete variables. Example: Number of workers or number of students in a class are
discrete variables as they cannot be in fractions.

(ii) Continuous variable – Those variables which can take all the possible
values (integral as well as fractional) in a given specified range are termed
as continuous variables. Example: The height or weight of individuals can
be of any value within the limits.
Discrete variable Continuous variable
Discrete variable is a variable which Continuous variable is a variable
is capable of taking only exact value which can take all the possible
and not any fractional value. values (integral as well as fractional)
in a given specified range.
These variables increase in These variables can increase in
complete numbers. fractions as well as in complete
numbers.
In case of discrete variable, data is In case of continuous variable, data
obtained by counting. is obtained by measurement.
Number of workers or number of Height or weight of individuals, are
students in a class are discrete continuous variables as they can be
variables as they cannot be in in fractions.
fractions.
The arrangement of classified data in some logical order, like according to the size, according to the
time of occurrence or according to some other measurable or non – measurable characteristics, is
known as Statistical Series.
Example: If the data pertaining to the marks of 35 students in a class are put in a systematic way, it can
be called statistical series.

KINDS OF STATISTICAL SERIES


• (On the basis of Construction)
 Individual series – Individual series refers to that series in which items are listed singly, i.e. each
item is given a separate value of measurement. Example: If marks of 10 students in class XI are given
individually, it will form an individual series.

 Discrete series – A discrete series is that series where individual values differ from each other by
definite amount.

 Continuous series – A continuous series is that series which represents continuous variables,
showing range of values of different items of the series.
Individual series refers to that series in which items are listed singly, i.e. each item is given a
separate value of measurement.

Individual series are of two types:

1. Unorganized individual series – This is an unarranged mass of data (raw data).


For example, marks obtained by 10 students in a class are as follows:

35 40 38 17 25
45 36 29 42 22

2. Organized individual series – This is an orderly arrangement of raw data.


A discrete series is that series where individual values differ from each other by definite amount.
A continuous series is that series which represents continuous variables, showing range of values
of different items of the series.
• (On the basis of Characteristics)
When the data is arranged on the basis of qualitative characteristics, statistical
series are of three kinds:

Time series – If the different values that a variable has taken in a period of time are arranged in
chronological order, the series so obtained is called a time series. Here data is presented with
regard to time unit (day, week, month or year).
For example – Population of Delhi (1951 – 2011)

Spatial series – The data arranged according to location or geographical considerations form a
spatial series. In this, time factor remains constant, whereas places change.
For example – Population of 5 states of India (As per census 2011)

Condition series – In this series, data is classified according to the changes occurring under certain
conditions.
For Example – Students of a certain class arranged according to their age, heights, weights, marks, etc.
For a discrete variable, the classification of its data is known as a Frequency Array. Since a
discrete variable takes values and not intermediate fractional values between two integral
values, we have frequencies that correspond to each of its integral values.

In the above table, the variable “size of the household” is a discrete variable that only takes
integral values. Since it does not take any fractional value between two adjacent integral values,
there are no classes in this frequency array. Since there are no classes in a frequency array there
would be no class intervals.
CLASS – Class means a group of numbers in which items are placed such as 0 – 10, 10 – 20, 20 – 30, etc.

CLASS LIMITS – The lowest and highest values of the variables within a class is called ‘class limit’.

CLASS INTERVAL – The difference between the lower limit (l1) and upper limit (l2) is known as class –
interval.

WIDTH OF CLASS – INTERVALS – The size (or width) of each class – interval can be determined by the
following formula:
𝑙𝑎𝑟𝑔𝑠𝑒𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑑𝑒𝑠𝑖𝑟𝑒𝑑
RANGE – The range of a frequency distribution can be defined as the difference between the lower limit of
first class – interval and the upper limit of the last class – interval. For example: If classes are 0 -10, 10 –
20,…..till70 – 80, then range is 80 – 0 = 80.
FREQUENCY
Frequency refers to number of times a given value appears in a distribution.
Example: Suppose there are 20 students in a class and out of them:
9 students have got 70 marks
5 students have got 92 marks
Now, frequencies will be 9, 5 respectively.

FREQUENCY DISTRIBUTION
A table in which the frequencies and the associated values of a variable are written side by side, is known as a frequency distribution.
A frequency distribution can be ‘discrete’ or ‘continuous’ depending upon whether the variable is discrete or
continuous.

CLASS FREQUENCY
The number of observations corresponding to a particular class is known as class frequency or the frequency of that class.
MID – POINT OR MID – VALUE – Mid – point is the central point of a class – interval. It is calculated by
dividing the total of magnitude of lower and upper limits by 2.

TYPES OF CONTINUOUS SERIES

Exclusive series (classes of type 10 – 20, 20 – 30, etc.)

Inclusive series (classes of type 10 – 19, 20 -29, etc.)

Open – end distribution (lower limit of first class and upper limit of last class is not given)

Cumulative frequency series (less than and more than series)

Equal class – interval series (classes are of the same interval)

Unequal class – interval series (class – intervals are not equal)

Mid – value series (middle values of a class – interval are given)


The classes of the type 10 – 20, 20 – 30, 30 – 40, etc., wherein the upper limit of one class – interval
becomes the lower limit of the next class, are known as exclusive series.
The classes of the type 10 – 19, 20 – 29, 30 – 39, etc., wherein all observations with magnitude greater
than or equal to the lower limit and less than or equal to the upper limit of a class are included in it, are
known as inclusive classes. Thus, under this series, overlapping of intervals is avoided.
In a frequency distribution, if the lower limit of the first class and the upper limit of last class is not
given, it is known as open – end distribution.
Cumulative frequency series is a modification of the simple frequency distribution. It is obtained by
successively adding the frequencies of the values of the classes according to a certain law.

The frequencies so obtained are called the ‘cumulative frequencies’ abbreviated as c.f.

Types of cumulative frequency distribution:

• ‘Less than’ cumulative frequency distribution

• ‘More than’ cumulative frequency distribution.


In a ‘less than’ cumulative frequency distribution, the frequencies of each class – interval are
added successively from top to bottom.

In a ‘more than’ cumulative frequency distribution, the cumulative frequencies of each class –
interval is obtained by finding the cumulative totals of frequencies starting from the highest value
of the variable (class) to the lowest value (class).
When the classes in a series are of the same interval, it is called the equal class – interval series.

When the class – intervals are not equal, it is called unequal class – interval series.
Mid – value or mid – point is the middle value of a class – interval. When such mid – values are given, it is
called mid – value series.
The frequency distribution summarizes the raw data by making it concise and comprehensible.
However, it does not show the details that are found in raw data and leads to loss of information.
When the raw data is grouped into classes, an individual observation has no significance in further
statistical calculations.

For example: Suppose class 10 – 12 contains 6 values: 12, 15, 16, 18, 14, 19. When
such data is grouped as a class 10 – 20, then individual values have no significance and
only frequency, i.e.6 is recorded and not their actual values. All values in this class are
assumed to be equal to the middle value of the class – interval or class mark. Statistical
calculations are based only on the values of class mark instead of the actual values. As a
result, it leads to considerable loss of information.
 Univariate frequency distribution  Bivariate frequency distribution –
– A frequency distribution involving A frequency distribution involving
only one variable is called a two variables simultaneously is called
univariate frequency distribution. a bivariate frequency distribution.
Univariate frequency distribution Bivariate frequency distribution
When data is classified on the basis When the data is classified on the
of single variable, the distribution is basis of two variables, the
known as Univariate frequency distribution is known as Bivariate
distribution. frequency distribution.
It aims to make description about It aims to determine the empirical
the particular variable. relationship between the two
variables.
It is also known as one – way It is also known as Two – way
frequency distribution. frequency distribution.
Height of students in a class. Height and weight of students in a
class.

You might also like