You are on page 1of 29

Unit – II

Tabulation:
Tabulation may be defined, as systematic arrangement of data is column
and rows.
It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.

A table must satisfy the following requirements:

1. The table must be simple and clear – cut.

2. The title of the table should explain in exact terms what the data represent.

3. The figures in the body of the table must be arranged in a logical order for
the point discussed in the text.

4. When the several points are to be emphasized (give special importance )


making use of the same data, it may be preferable.

Major Objectives of Tabulation


1. To simplify the complex data
2. To facilitate comparison
3. To economize (reduce) the space
4. To draw valid inference / conclusions
5. To help for further analysis

1|Page
Preparing a Table:

The making of a compact table itself an art. This should contain all the information
needed within the smallest possible space. What the purpose of tabulation is and
how the tabulated information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should consist of the following
main parts.
1. Table number
2. Title of the table
3. Caption : Column Heading
4. Stub : Row reading
5. Body : Contains data
6. Head notes: Something that is not explained in the title, caption
and stubs can be explained in the head notes on the top of the table below
the title.
7. Foot notes: Source of data, some exception in the data can be given
in the foot notes.

Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but
related parts.

Objectives / purposes of classifications


1. To simplify and condense (change) the large data.
2. To present the facts to easily in understandable form.
3. To allow comparisons.
4. To help to draw valid inferences.
5. To relate the variables among the data.
6. To help further analysis.
7. To eliminate unwanted data.
8. To prepare tabulation.

2|Page
Important types of classification
1. Geographical
2. Chronological
3. Qualitative
4. Quantitative (Numerical)

Geographical Classification
In geographical classification, the classification is based on the geographical
regions. (I.e. on the basis of area or region wise)

Ex: Sales of the company (In Million Rupees) (region – wise) Region Sales
North 285

South 300

East 485

West 535
Chronological Classification

If the statistical data are classified according to the time of its


occurrence, the type of classification is called chronological classification.

Eg: Sales reported by a departmental store

Sales
Month (Rs.) in
lakhs
January 22
February 26
March 32
April 45
May 67
June 80

3|Page
Qualitative Classification

In qualitative classifications, the data are classified according to the


presence or absence of attributes in given units. Thus, the classification is based
on some quality characteristics / attributes.

Ex: Sex, Literacy, Education, Class grade etc.

Further, it may be classified as

a) Simple classification

b) Manifold classification

Simple classification:

If the classification is done into only two classes then


classification is known as simple classification.

Ex: a) Population in to Male / Female

b) Population into Educated / Uneducated

4|Page
Manifold classification:

In this classification, the classification is based on two or more


attributes are considered and several classes formed

5|Page
• Quantitative Classification:

In Quantitative classification, the classification is


based on quantitative measurements of some characteristics, such as age, marks,
income, production, sales etc. The quantitative phenomenon under study is known
as variable and hence this classification is also called as classification by variable.

Ex:

Measures of Central Tendency

 Measure of central tendency provides a very convenient way of describing a


set of scores with a single number that describes the performance of the
group.

 It also defined as a single value that is used to describe the center of the
data.

6|Page
Types of Central Tendency:

1. Mean

2. Median

3. Mode

4. Geometric Mean

5. Harmonic Mean

1. Arithmetic Mean(AM)
The AM of a group is the simple arithmetic average of the observations.

Arithmetic Mean (ungroup-data)

Formula: Mean = sum of elements / number of elements

= a1+a2+a3+.....+an/n

Arithmetic Mean = ΣX/n

Where

X = Individual value

n = Total number of values

Example: To find the mean of 3, 5, and 7.

Step 1: Find the sum of the numbers.

3+5+7 = 15

Step 2: Calculate the total number, there are 3 numbers.

Step 3: Finding mean.

15/3 = 5

Ans = 5

7|Page
Arithmetic Mean (group-data) :

Formula: Arithmetic Mean = ΣfX/Σf

Where X = Individual value

f = Frequency

Grouped Data Arithmetic Mean Example:

Class interval f X FX
10 – 14 5 12 60
15 – 19 2 17 34
20 – 24 3 22 66
25 – 29 5 27 135
30 – 34 2 32 64
35 – 39 9 37 333
40 – 44 6 42 252
45 – 49 3 47 141
50 – 54 5 52 260
∑f = 40 ∑xf = 1345

1345/40 = 33.63

8|Page
Median:

• The midpoint of the values after they have been ordered from the smallest to
the largest, or the largest to the smallest.

• There are as many values above the median as below it in the data array.

• For an even set of numbers, the median will be the arithmetic average of the
two middle numbers.

• Example for finding median in ungrouped data

1. The age of a sample of five college students is: 21, 25, 19, 20, and 22.

Median = 21

2. The marks of students are 12, 16, 10, 8, 18, 12, 14, and 15.

Median = 13

For Grouped Data

 f 
  c. f 
Median = L +  2  c
 f 
 
 

Where

L = lower class interval in the median class.

n = number of observations.

c.f = previous cumulative frequency of the median class.

f = corresponding frequency of the median class.

c = difference between the class interval in the median class.

9|Page
EX:
Calculation of median for the following Data

Protein intake unit per day No. of families Cumulative frequency

15 – 25 30 30

25 – 35 20 50

35 – 45 40 90

45 – 55 60 150

55 – 65 100 250

65 – 75 130 380

75 – 85 120 500

Median Class is 55 – 65 n/2 = ∑f/2 = 500/2 = 250 L = 55, c = 10

c.f = 150 f = 100

500 / 2  150
Median = 55 + [  10]
100

Median = 65

10 | P a g e
Mode : It is a value which occurs more frequently.

For ungrouped data of individual observations, mode is often found by mere


inspection.

1. Find the mode for the following data


2, 7, 10, 15, 10, 17, 10, 8
Mode = 10

2. Find the mode for the following data


10, 15, 12, 3
there is no mode

3. Find the Mode for the following data


12, 13, 5, 6, 5, 12, 5, 18, 12
Mode = 5 & 12

For Grouped data


d1
Mode = L + c
d1  d 2

Where
L = Lower Class interval in the mode class.
d1 = Modulus (f1 – f0).
d2 = Modulus (f1 – f2).
f0 = Previous frequency of the Mode class.
f1 = Corresponding frequency of the Mode class.
f2 = Next frequency of the Mode class.
c = Difference between class interval in the Mode class.

11 | P a g e
For Grouped data
Find the Mode for the following data

Weight of sorghum in gms(X) No.of ear head (f)

60 – 80 22

80 – 100 38(f0)

100 – 120 45(f1)

120 – 140 35(f2)

140 – 160 20

Total 160

7
Mode = 100 + ×20
17

Mode = 108.24

12 | P a g e
Measures of Dispersion:
In many ways, measures of central tendency are less useful in statistical analysis
than measures of dispersion of values around the central tendency
The dispersion of values within variables is especially important in social and
political research because: Dispersion or "variation" in observations is what we
seek to explain.

1. Researchers want to know WHY some cases lie above average and others below
average for a given variable:
2. TURNOUT in voting: why do some states show higher rates than others?
3. CRIMES in cities: why are there differences in crime rates?
4. CIVIL STRIFE among countries: what accounts for differing amounts?
Much of statistical explanation aims at explaining DIFFERENCES in observations
-- also known as
VARIATION, or the more technical term, VARIANCE.

Types of dispersion

1. Range.
2. Inter quartile Range.
3. Mean Deviation.
4. Standard Deviation.
5. Coefficient of variation.

Range: The distance between the highest and lowest values in distribution
Uses information on only the extreme values. Highly unstable as a result.

finding Range for Ungrouped Data

Range = Largest value – Smallest Value

13 | P a g e
Example (For ungrouped data)

Find the Range for the following data.


21, 56, 76, 54, 78, 12

Range = 78 – 12 = 66

Example (for grouped data)


Find the Range in the following data
X 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60

f 12 14 20 60 120

Range = 60 – 10 = 50

Interquartile Range:

Measures the range of the middle 50% of the values only.

If defined as difference between upper and lower quartiles.

Upper quartile of a group is the value above which 25% of the observations fall.
(I.e. 75%)

Lower quartile of a group is the value below which 25% of the observations fall.

Interquartile Range = upper quartile – lower quartile


= Q3 – Q1

14 | P a g e
Example :
Find Interquartile Range for the following data
28, 12, 5, 54, 8, 42, 23

Solution: Arrange the data in ascending order


5, 8, 12, 23, 28, 42, 54

N +1 7 +1
Q1 = size of th term = size of
4 4
= size of second term = 8
N +1 7 +1
Q3 = size of 3( ) th term = size of 3( )
4 4

= size of 6th term = 42

Inter – Quartile Range = Q3 – Q1

= 42 – 8 = 34

Mean Deviation
The arithmetic mean of the absolute differences of the values
from their values.

Mean deviation formula for ungrouped data


∑ [mod ulus( x  x)]
Mean deviation =
n

Mean deviation formula for grouped data

∑ f [mod ulus( x - x)]


Mean deviation =
∑f
15 | P a g e
Example of Mean deviation for ungrouped data

Following are the number of patients visited a clinic over each of the last 20 days
Find mean deviation.

83 64 84 76 84 54 75 59 70 61

83+ 64+84+ 76+84+54+ 75+ 59+ 70+ 61


Mean =
10
Mean = 71
X X–X IX–XI

83 12 12

64 -7 7

84 13 13

76 5 5

84 13 13

54 - 17 17

75 4 4

59 - 12 12

70 -1 1

61 - 10 10

∑ [mod ulus( x - x)]


Mean deviation =
n

94
= = 9.4
10

16 | P a g e
Find mean deviation for following data (grouped data)

Class – interval 10 – 20 20 – 30 30 – 40 40 - 50 50 – 60 60 – 70 70 – 80
Frequency(f) 30 110 120 80 60 40 140

SOLUTION:

Class interval f X XF X–X I(X – X)I f(I(X – X )I)

10 – 20 30 15 450 - 32.2 32.2 966

20 – 30 110 25 2750 - 22.2 22.2 2442

30 – 40 120 35 4200 - 12.2 12.2 1464

40 – 50 80 45 3600 - 2.2 2.2 176

50 – 60 60 55 3300 7.8 7.8 468

60 – 70 40 65 2600 17.8 17.8 712

70 – 80 140 75 10500 27.8 27.8 3892

Total 580 27400 10120

27400
Mean = = 47.2
580
10120
Mean deviation = = 17.44
580

17 | P a g e
Standard deviation:
The most important and widely used measure of dispersion.
SD is the square root of sum of squared deviation from the mean divided by the
number of observations. This is the best method in dispersion.

Example problem(ungrouped data)

One of the lab groups collected the following data for the heights (in cm) of their
Wisconsin Fast Plants:
5.4 7.2 4.9 9.3 7.2 8.1 8.5 5.4 7.8 10.2
Find the standard deviation

5.4 + 7.2 + 4.9 + 9.3+ 7.2 +8.1+8.5 + 5.4 + 7.8


Solution: Mean = = 7.4
9

Height(X) X – mean (X – mean)2


5.4 - 2 4 Standard deviation = ∑ ( x - mean) 2
n -1
7.2 -0.2 0.04
4.9 -2.5 6.25 27.15
S.D =
9.3 1.9 3.61 9
7.2 -0.2 0.04 S.D = 3.02
8.1 0.7 0.49 = 1.73
8.5 1.1 1.21
5.4 -2 4
7.8 0.4 0.16
10.2 2.8 7.84

18 | P a g e
Standard deviation for grouped data

Find the Standard deviation for the following data

Class- interval 15 – 25 25 – 35 35 - 45 45 - 55 55 - 65 65 – 75 75 – 85
Frequency 12 20 30 40 8 15 60

Solution:

Class - interval Frequency(f) Mid values xf (x – x) (x – x)2 (x – x)2


(x)
15 – 25 12 20 240 - 36.05 1299.60 15595.2

25 – 35 20 30 600 - 26.05 678.60 13572

35 – 45 30 40 1200 - 16.05 257.60 7728

45 – 55 40 50 2000 - 6.05 36.60 1464

55 – 65 8 60 480 3.95 15.60 124.8

65 – 75 15 70 1050 13.95 194.60 2919

75 – 85 60 80 4800 23.95 573.60 34416

Total 185 10370 75819

∑ xf 10370
Mean = = = 56.05
∑f 185

S.D = 75819 = 20.29


185 - 1
19 | P a g e
Coefficient of Variation:(C.V)
It is also known as “relative variability”, equals the
standard deviation divided by the mean. It can be expressed either as a fraction or a
percent

EXAMPLE:

For above problem S.D = 20.29

20.29
C.V = ×100 = 36.19
56.05

The data collected can be presented graphically or pictorially to be easy


understanding and for quick interpretation. Diagrams and graphs give visual
indications of magnitudes, groupings, trends and patterns in the data

Diagrams:
A diagram is a visual form for presentation of statistical data. The diagram refers
various types of devices such as bars, circles, maps, pictorials and cartograms etc.
In order that these graphs and diagrams present ideas truthfully and emphasis
correct ideas, they must be drawn following certain basic rules
1. Dependent partly on conjunction (co incidence).
2. Dependent partly on mathematical consideration.
3. Dependent partly on personal preferences.

20 | P a g e
Histogram:
A histogram is a special kind of bar diagram used to present a
frequency distribution of a characteristic measured on a continuous scale.
2. Rectangles are found over class intervals to represent the frequencies of the class
intervals.
3. In a histogram, the area of each rectangle represents the frequency of the
corresponding class interval.
4. A histogram is constructed from a frequency distribution of grouped data.
5. Where the height of rectangle is proportional to respective frequency.
6. Width of rectangle represents the class interval.

Example:

21 | P a g e
Frequency polygon:
A frequency polygon is a variation of Histogram. Instead
of rectangles found over the intervals, points are plotted at the midpoint of the tops
of the corresponding rectangles in a Histogram, and the successive points joined by
straight lines. A frequency polygon may be chosen to compare two frequency
distributions.

Example:

22 | P a g e
Frequency curve:
When the total frequency is large and when we adopt much
narrower class intervals the frequency polygon will most often have a much
smoother appearance.

If the total frequency is increased indefinitely, the frequency polygon will


approach a smoother curve. This limiting condition is known as the frequency
curve.

Graphs:

The four purposes of graphs


1. Exploration: The data contain a message and we would like to find out
what it is.

2. Communication: We know something and we would like to tell others.

3. Calculation: Graphs can serve as visual algorithms that enable us to


determine at a glance (immediately upon on looking) what
might otherwise be tedious to calculate.

4. Decoration: Graphs are pretty and can be used to enliven (brighten up) what
might otherwise be a dull presentation.

Types of Graphs or Charts


1. Bar Chart
2. Pie Chart
3. Line Chart

23 | P a g e
Bar Chart:
A bar diagram is commonly used to provide a visual comparison of
figures in a time series.

Example: Draw the bar chart for the following survey report

percentage Frequency

5 12 Agree
10 18 Disagree
18 6 Strongly Agree
19 29 Strongly Disagree
25 5 Neutral
35

30 Strogly Disagree

25
Frequency

20
Disagree

15
Agree
10
Strongly Agree
5
Neutral

0
5 10 18 19 25
Percentage

24 | P a g e
Types of Bar Chart
1. Simple Bar Chart
2. Multiple Bar Chart
3. Component Bar Chart

Example : Draw the appropriate chart for the following data:


Wards: A B C D

Treatment 1: 20 5 25 18

Treatment 2: 16 27 30 35

Treatment 3: 32 20 28 16

Solution:

40

35

30

25
TREATMENT!
20
TREATMENT2
15 TREATMENT3
10

0
A B C D

25 | P a g e
Pie chart: Just like Bar Chart Pie Chart is useful whenever there are different
components. The area of the sector is proportional to the total frequency of a series
when they are more than one series. The total angle of the center represents the
total frequency of the data.

GRADES OF A STUDENTS

27% 29%

7%

37%

Line Chart
A line graph is to be preferred when emphasis (importance) is on the
trend of the time series over the period rather on the comparison of relative sizes of
the different figures in the series.

26 | P a g e
Correlation

Measurement of Central and dispersion are the characteristic of single variable


concerning the statistical data. The concept of correlation is one of the methods of
studying the relationship between two variables.
In statistical analysis, we come across the study of two variables where the change
in the value of one variable produces the change in the value of the other variable.
In that case, we have to say that there is a correlation between two variables.

For example, we might want to know whether reading scores are related to math
scores, i.e., whether students who have high reading scores also have high math
scores, and vice versa.

Range of Measure
1. The main result of a correlation is called the correlation coefficient(r)
2. The range of r is – 1 to + 1
3. r is Positive when the values increase together
4. r is Negative when one value decreases as the other increases
5. r is 0 then no correlation

Example: For the following data check whether any relation between height(X)
and weight(y). If Ʃx = 188, Ʃy = 405, Ʃxy = 15706, Ʃx2 = 7928, Ʃy2 = 33461and
n =5

Solution:

n∑ xy - ∑ x ∑ y
r=
[n∑ x 2 - (∑ x) 2 ][ n∑ y 2 - (∑ y ) 2 ]

2390
r=
3753.78

27 | P a g e
= 0.64

We conclude that there is a relationship between the variables.

Regression

A statistical measure that attempts to determine the strength of the relationship


between the one dependent variable(Y) and a series of other changing variables
(Independent variables X)
y = mx + c
Hence the parameter are determined using the principle of least squares, this
regression equation is used to find the value of Y corresponding to known value of
X

on the other hand if X is dependent variable and Y is the independent variable the
linear relationship expressing X in terms of Y is called regression equation of X on
Y x = my + c

Use of Regression
In certain cases the finding is quite useful if the model is found valid. We can
estimate the value of an outcome variable for a given value of predictor variable in
a new case.

28 | P a g e
29 | P a g e

You might also like