Professional Documents
Culture Documents
Data Presentation
• Dr. Dharmendra Dubey
• Assistant professor
• Symbiosis Institute of health sciences
Statistics
•
Health statistics
• Statistics related to the vital events in life – birth , illness, death, marriage,
divorce, adoption, etc. – their rates of occurrence, causes of increase or
decrease in the vital rates, expectation of life at birth and at a given age, etc.
are included in vital statistics.
Examples in Biostatistics
• Smoking causes cancer: “But not everyone who smokes gets cancer, and some
people who never smoke get cancer.” The cause and effect chain has an element of
uncertainty in it: studies investigating the effects of smoking need to appropriately
address this uncertainty.
Examples in Biostatistics
• In designing health facilities, planners need to take into account demographic shifts,
changes in “standard of care,” economics, and new technology. The uncertainties
associated with these variables need to be considered.
Tabulation:
1. It simplifies complex data and the data presented are easily understood.
2. It facilitates comparison of related facts.
3. It facilitates computation of various statistical measures like averages,
dispersion, correlation etc.
4. Tabulated data are good for references and they make it easier to present the
information in the form of graphs and diagrams.
Preparing a Table:
An ideal table should consist of the following main parts:
1. Table number
2. Title of the table
3. Captions or column headings
4. Stubs or row designation
5. Body of the table
6. Footnotes
7. Sources of data
A model structure of a table is given below:
Type of Tables:
Tables may be classified as follows:
3. Manifold table
Simple or one-way Table:
For example:
The blank table given below may be used to show the number of adults in
different occupations in a locality.
Two-way Table:
Example:
The caption may be further divided in respect of ‘ sex’
Manifold Table:
Example:
Table shown below shows three characteristics namely, occupation, sex and
marital status
FREQUENCY DISTRIBUTION
1. Class limits
2. Class Interval
3. Width or size of the class interval
4. Range
5. Mid-value or mid-point
6. Frequency
7. Number of class intervals
8. Size of the class interval
Class limits:
The class limits are the lowest and the highest values that can be
included in the class.
For example, take the class 30-40. The lowest value of the class is 30 and
highest class is 40.
Class Interval
The class interval may be defined as the size of each grouping of data.
Range:
The difference between largest and smallest value of the observation is called the
range and is denoted by ‘ R’ i.e
R = Largest value – Smallest value
R=L-S
Mid-value or mid-point:
The central point of a class interval is called the mid value or mid-point. It is found
out by adding the upper and lower limits of a class and dividing the sum by 2
Thus if the number of observation is 10, then the number of class intervals is
There are three methods of classifying the data according to class intervals
namely,
1. Exclusive method
2. Inclusive method
3. Open-end classes
Exclusive method:
When the class intervals are so fixed that the upper limit of one class is the
lower limit of the next class; it is known as the exclusive method of classification.
The following data are classified on this basis.
Inclusive method:
In this method, the overlapping of the class intervals is avoided. Both the lower and
upper limits are included in the class interval.
It cannot be used with fractional values like age, height, weight etc.
Inclusive method:
A class limit is missing either at the lower end of the first class interval or at
the upper end of the last class interval or both are not specified.
Types of class intervals:
Given that,
Number of college students, N= 50
Highest value, H= 64
Lowest
value, L= 32
Range, R = H-L
=64-32=32
Solution:
Thus the number of class interval is 7 and size of each class is 5. The
required size of each class is 5.
The required frequency distribution is prepared using tally marks as given below:
Percentage Frequency:
𝐹 𝑟 𝑒 𝑞 𝑢 𝑒 𝑛 𝑐 𝑦 𝑜 𝑓 𝑎 𝑐 𝑒 𝑟 𝑡 𝑎 𝑖 𝑛 𝑐𝑙𝑎𝑠𝑠
Percentage Frequency of a class= ×
𝑇𝑜𝑡𝑎𝑙
100 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Percentage Frequency:
=
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎 𝑐𝑒𝑟𝑡𝑎𝑖𝑛 𝑐𝑙𝑎𝑠𝑠 × 100
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Relative Frequency:
Relative frequency refers to the ratio of the number of frequency of a certain class and
total number of frequency existing in a frequency distribution.
Relative Frequency =
𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒂 𝒄𝒆𝒓𝒕𝒂𝒊𝒏 𝒄𝒍𝒂𝒔𝒔
𝑻𝒐𝒕𝒂𝒍 𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
Relative Frequency:
30-35 4 4
44
= 0.09
35-40 10 0.23
40-45 20 0.45
45-50 8 0.18
50-55 2 0.05
Cumulative Frequency:
The total frequency of all values less than the upper class boundary of a
given class interval is called the cumulative frequency upto and including that class
interval.
Cumulative Frequency:
Diagrams:
A diagram is a visual form for presentation of statistical data, highlighting their basic
facts and relationship.
Significance of Diagrams and Graphs:
Diagrams and graphs are extremely useful because of the following reasons:
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
Types of diagrams:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
One-dimensional diagrams:
These diagrams are in the form of bar or line charts and can be classified
as:
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line graphs or line diagrams
• Data having some order such as the year-wise immunization
coverage or age-wise incidence of a disease can be represented by
a line diagram.
Bar diagram
• In this diagram, we show the category of the variable on the X-axis and the
frequencies on the Y-axis on a graph paper.
Bar diagram
• A bar for each category of the variable is erected and the height of the bar is
proportional to the frequency of that category.
• Since the data is of a qualitative nature (discrete), bars should not be next to each
other and there should be an equal gap between two successive bars.
Simple Bar Diagram:
Adjacent bar diagram
• Bar diagrams can be extended to compare two or more data sets (of a qualitative
nature) with regard to the same variable. The bars of each category of the variable
for the different data sets are drawn adjacent to each other.
• The principles of bar width and the gap between the bars will remain similar to
those in the simple bar diagram.
Multiple Bar Diagram:
Component bar diagram
• The adjacent bar diagram does not convey anything about the
similarity of the relative proportions of each type in the two
centers being compared.
• Before beginning to represent the data by a pie diagram, we should have the frequencies of
different categories at hand.
• For each category find the proportionate degrees of the total of 360 in the circle.
• For example, if a category has a 25% frequency of the total, it should be allotted 25% of 360 or
90. So a sector with 90 to represents this category of the variable.
Pie Diagram
Draw a Pie diagram for the following data of production of sugar in quintals of
various countries.
Pie Diagram
Graphs:
4. Ogive
Histogram
• The most popular of all diagrams is the histogram, which is used to depict the
frequency distribution of a quantitative variable.
• The class intervals are shown on the horizontal axis (X-axis) and corresponding
frequencies in the form of vertical bars on the vertical axis (Y-axis).
• The width of each bar need not be the same as it depends on the width of the
class intervals.
Histogram:
Example: Draw a histogram for the following data.
Solution:
Histogram:
Example: Draw a histogram for the following data.
Example:
Tabular
Tabular Graphical
Graphical Tabular
Tabular Graphical
Graphical
Methods
Methods Methods
Methods Methods
Methods Methods
Methods
• The less similar the scores are to each other, the higher the
measure of dispersion will be
i. Range
ii. Mean or average deviation
iii.Standard deviation
iv. Quartile deviation
When the data is in different units, in such a situation we may use the relative
dispersion. A relative dispersion is independent of original units. Generally, relative
measures of dispersion are expressed in terms of ratio, percentage etc.
97 34 is the RANGE
-63 or spread
34 of this set of data
Let us consider a set of observations 𝑥1, 𝑥2,𝑥 3, ……………… , 𝑥 𝑛 and 𝑋𝐻 is
maximum and 𝑋𝐿 is minimum.
Then Range = 𝑋𝐻 − 𝑋𝐿.
Example:
Find out the range of the set of observations,
-7, -2, -4, 0, 8.
Solution:
Here, maximum value, 𝑋𝐻 = 8 and
minimum value, 𝑋𝐿 = −7
Range For Grouped data:
In this case, the range is the difference between the upper boundary of the highest
class and the lower boundary of the lowest class.
Then Range = 𝑋𝑈 − 𝑋𝐿
Where,
𝑋𝑈 = The upper boundary of the highest class.
𝑋𝐿= The lowest boundary of the highest class.
Example: determine the range from the following frequency distribution.
Salary (TK.) 1700-1800 1800-1900 1900-2000 2000-2100 2100-2200
No. of workers 420 460 500 300 200
Solution:
From the given frequency distribution, We
have,
The upper boundary of the highest class, 𝑋𝑈
= 𝑇𝐾. 2200
And the lowest boundary of the highest
class, 𝑋𝐿 = 𝑇𝐾. 1700
Then Range = 𝑋𝑈 − 𝑋𝐿 = 𝑇𝐾. 2200 −
𝑇𝐾. 1700 = 𝑇𝐾. 500
Advantages of Range:
1. It is time saving and widely used in industrial quality control, weather forecast.
2. Variations in stock exchange can be studies by range.
Measures of Dispersion
Mean deviation
Mean deviation or Average
deviation:
Definition of Mean deviation:
Mean deviation is the mean of absolute deviations of the items from an average
like mean, median or mode. Normally, we consider the arithmetic mean as the
average.
1. Mean deviation for ungrouped data:
If 𝑥 , 𝑥 ,𝑥 ,……………… , 𝑥 be a set of n observations or values, then the mean
1 2 3 𝑛
deviation is expressed and defined as:
𝑥 -
i. Mean deviation about arithmetic mean, M.D (𝑋) = 𝑖= 𝑖 ; 𝑋 = 𝐴𝑟𝑖𝑡h𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛
𝑛1 �𝑥
𝑥𝑖 - �
ii. Mean deviation about median, M.D (𝑋) = 𝑖=
; 𝑀𝑒 = 𝑀𝑒𝑑𝑖𝑎𝑛
𝑛1 𝑛𝑒
𝑀
𝑥𝑖 -
iii.Mean deviation about mode, M.D (𝑋) = 𝑖=
; 𝑀𝑜 = 𝑀𝑜𝑑𝑒
𝑛1 𝑛𝑀 𝑜
The Mean Deviation (cont.)
Example:
The number of patients seen in the emergency room at Ibrahim Memorial Hospital for a
sample of 5 days last year were: 103, 97, 101, 106 and 103.
Determine the mean deviation and interpret.
Example:
Calculate mean deviation
from the following
No. of Persons 6 8 10
data
Income (TK.) 0-10 10-20 20-30 30-40 40-50
12 7
50-60 60-70
4 3
Solu
tion:
Table for calculation of Mean Deviation from mean
Income (TK.) Class mid- No. of persons 𝒇𝒙 𝒙−𝒙 𝒇𝒙−𝒙
point (x) (f)
0-10 5 6 30 26 156
10-20 15 8 120 16 128
20-30 25 10 250 6 60
30-40 35 12 420 4 48
40-50 45 7 315 14 98
50-60 55 4 220 24 96
60-70 65 3 195 34 102
Total 𝑓𝑖= 50 𝑓𝑖𝑥𝑖 = 1550 𝑓𝑖 𝑥𝑖 − 𝑥 = 688
Advantages of Mean Deviation:
90 - 100 3
100 – 110 5
110 – 120 7
120 – 130 10
130 – 140 15
140 – 150 11
150 – 160 9
160 – 170 6
170 – 180 2
Total 68
Measures of Dispersion
Standard deviation
Standard Deviation:
The standard deviation is the most important measure of dispersion.
Definition:
The standard deviation is the positive square root of the mean of the squared deviation
from their mean of a set of observation. It can be written as
𝑠 𝑢 𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓 𝑟 𝑜 𝑚
Standard Deviation, 𝜎 = 𝑚 𝑒 𝑎 𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓
𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Standard Deviation s
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
n=8 Mean = X = 16
1. It is rigidly defined.
2. It is less affected by sampling fluctuations.
3. It is useful for calculating the skewness, kurtosis,
coefficient of correlation, coefficient of variation and so on.
4. It measure the consistency of data.
Disadvantages of standard
deviation
1. It is not so easy to compute.
2. It is affected by extreme values.
90 - 100 3
100 – 110 5
110 – 120 7
120 – 130 10
130 – 140 15
140 – 150 11
150 – 160 9
160 – 170 6
170 – 180 2
Total 68
Measures of Dispersion
Variance
The Variance
• Average (approximately) of squared deviations of values from the mean
• Sample variance:
= arithmetic mean
Where n = sample size
Xi = ith value of the variable X
Variance: The variance of a set of observations is the average of the
squares of the deviations of the observations from their mean. In
symbols, the variance of the n observations x1, x2,…xn is
• One of the definitions of the mean was that it always made the sum of the scores minus the mean equal
to 0
• Thus, the average of the deviates must be 0 since the sum of the deviates must equal 0
• To avoid this problem, statisticians square the deviate score prior to averaging them
• Squaring the deviate score makes all the squared scores positive
What Does the Variance Formula Mean?
• The larger the variance is, the more the scores deviate, on average, away from the mean
• The smaller the variance is, the less the scores deviate, on average, from the mean
131
Relative Measures of Dispersion
Exa
mple
Calculate the coefficient of range from the following frequency
:distribution,
Age (Year) 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 4 7 15 18 16 12 8
Coefficient of Variation (CV):
Coefficient of variation is the most commonly used measure of relative measures of
dispersion. It is 100 times of a ratio of the standard deviation to the arithmetic mean. It is
denoted by C.V. and written as:
𝜎
𝐶. 𝑉 = 𝑥 × 100; 𝑥 ≠ 0
The Coefficient of Variation
• 100 times the co-efficient of dispersion based upon standard deviation is called co-
efficient of variation (C.V.).
• C.V. is the percentage variation in the mean, standard deviation being considered as
the total variation in the mean.
• For comparing the variability of two series, we calculate the C.V. for each series.
• The series having greater C.V. is said to be more variable than the other and the series
having lesser C.V. is said to be more consistent (or homogenous) than the other.
Example
S.D.
=(1−27.875)2+(5−27.875)2+(6−27.875)2+(8−27.875)2+(10−27.875)2+(40−27.875)
2+(65−27.875)2+(88−27.875)2=7578.875
Standard deviation:
σ= √1082.696 = 32.904
• Solution:
(i) Wage bill for section A = 40 x 450 = 18000
Wage bill for section B = 65 x 350 = 22750
Section B is larger in wage bill.
Variance
Standard
Deviation
Summary Characteristics
The more the data are spread out, the greater the range,
variance, and standard deviation.
The less the data are spread out, the smaller the range,
variance, and standard deviation.
If the values are all the same (no variation), all these
measures will be zero.
Cricketers-B 15 25 18 30 11 4 23 21 31 22
𝝈= 𝒙𝟐
−(
𝒙 𝟐
) 𝝈= 𝒚𝟐
−(
𝒚 𝟐
)
𝒏 𝒏 𝒏 𝒏
= =
𝟏𝟎 𝟏𝟎
𝟕𝟏𝟑𝟐𝟖 𝟓𝟖𝟎 𝟐 𝟒𝟔𝟐𝟔 𝟐𝟎𝟎 𝟐
− ( ) −( )
= 𝟔𝟏. 𝟑𝟗 = 𝟕. 𝟗𝟏
𝟏𝟎 𝟏𝟎
Comment:
From the above result, we see that, C.V. (A) = 105.84% and C.V. (B) 39.56%
Since, C.V. (A)> 𝐂. 𝐕. (𝐁). Therefore, the cricketer-B is a more consistent.
Quartile Measures
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the observations are smaller and 75%
are larger
Q2 is the same as the median (50% of the observations are smaller and 50% are larger)
Only 25% of the observations are greater than the third quartile
Quartile Measures: Locating Quartiles
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
Example
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
• If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then
average the two corresponding data values.
The first quartile (Q1) is the first 25% of the data. The
second quartile (Q2) is between the 25th and 50th
percentage points in the data. The upper bound of Q2 is
the median. The third quartile (Q3) is the 25% of the data
lying between the median and the 75% cut point in the
data.
•
• Where
• we calculate n/4 = 68/4 = 17
• Lower quartile class interval = 120 – 130mmHg
• Lq1 = 120
• CF = the cumulative frequency up to the lower quartile class = 15
• f = the frequency of lower quartile class = 10,
• W = the width of the lower quartile class = 10
• Lower quartile Q1 = 120 + [(10/10) X (17-15)] = 120 + 2 = 122mmHg
Solution
•
• Where
• We calculate n/4 =3 X 68/4 = 3 X 17 = 51
• Upper quartile class interval = 140 – 150mmHg
• Lq3 = 140
• CF = the cumulative frequency up to the lower quartile class = 40
• f = the frequency of lower quartile class = 11,
• W = the width of the lower quartile class = 10
• Upper quartile Q3 = 140 + [(10/11) X (51-40)] = 140 + 10 = 150mmHg
Deciles and Percentiles
• The IQR is a measure of variability that is not influenced by outliers or extreme values
• Measures like Q1, Q3, and IQR that are not influenced by outliers are called resistant
measures
Calculating The Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
The Boxplot or Box and Whisker Diagram
• If data are symmetric around the median then the box and central line are
centered between the endpoints
2
The term (𝑄3 − 𝑄1) is known as the interquartile range and the quartile deviation 𝑄 3 -2𝑄 1
is also known as semi-interquartile range.
Example:
The Automobile Association checks the prices of gasoline before many holiday weekends.
Listed below are the self-service prices for a sample of 8 retail outlets during the May
2004 Memorial Day Weekend.
40, 22, 60, 30, 45, 66, 70, 55
Determine the quartile deviation, Interquartile range and semi-interquartile range.
Exa
mple
Find out the quartile deviation, interquartile range and semi-interquartile range from
:the following frequency distribution:
Class Less than 10 10-15 15-20 20-25 25-30 More than 30
Frequency 2 6 7 10 3 1
So, Quartile
Deviation,
𝑄 3-𝑄 1 23.375-14.375
Q.D = = = 4.5
2 2
Interquartile range,
IQR = (𝑄3 − 𝑄1)
= 23.375 − 14.375 = 9
• Arithmetic Mean
• Median
• Mode
• Geometric Mean
• Harmonic Mean
On average,
I feel fine
It’s too
hot! It’s too
cold!
Measures of Central Tendency
Summarizing Data
Giveyou
Give youone
onescore
scoreor
or The Mean
measurethat
measure thatrepresents,
represents,ororisis
typicalof,
typical of,an
anentire
entiregroup
groupof
of The Median
scores
scores The Mode
Most scores tend to center toward
a point in the distribution.
frequency
score
Central Tendency
Frequency Tables & Graphs Measures of Central Tendency
33 73 Averaging
52 67 35 43 The Mean
Frequency Tabulating
Tables 35 35 39 84 47 41
52 Graphing
84 49
47 35 The Median
90
52 35 47
Graphs
43 41 56 84 35
69
35 77 39 The Mode
47 Measurement
52 65 scales92 41
49
47
The Mean
Methods of Center Measurement
Center measurement is a summary measure of the overall level of a dataset
Mean: Summing up all the observation and dividing by number of observations. Mean
of 20, 30, 40 is (20+30+40)/3 = 30.
Definition: For ungrouped data, the population mean is
the sum of all the population values divided by the total
number of population values. To compute the
population mean, use the following formula.
Sigma
Population
Population
size
size
THE SAMPLE MEAN
Definition: For ungrouped data, the sample mean is
the sum of all the sample values divided by the
number of sample values. To compute the sample
mean, use the following formula.
Sigma
Sample
Size
Measures of Central Tendency
as Inferential Statistics
• It is rigidly defined
Oh my !!
Where is the
median?
Location
Median
Median: The middle value in an ordered sequence of observations. That is, to find the median we need to
order the data set and then find the middle value.
In case of an even number of observations the average of the two middle most values is the median.
For example,
For example,
Then the median is the average of the two middle values from the sorted sequence,
• It is rigidly defined
• It is easily understood and is easy to calculate. In some cases it can be located
merely by inspection
• It is not at all affected by extreme values
• It can be calculated for distributions with open ended classes
Demerits of Median
• In case of even number of observations median can not be determined exactly. We merely estimate
it by taking the mean of two middle terms.
• It is not based on all the observations, for example, the median of 10, 25, 50, 60 and 65 is 50. We
can replace the observations 10 and 25 by any two values which are smaller than 50 and the
observations 60 and 65 by any two values greater than 50 without affecting the value of median.
• Where,
• L = Lower limit of the class interval where the median occurs
• f = Frequency of the class where median occurs
• h = Width of the median class
• C.F. = Cumulative frequency of the class preceding the median class
Mean or Median
• The median is less sensitive to outliers (extreme scores) than the mean and thus a better measure
than the mean for highly skewed distributions, e.g. family income.
• For example
If we have given the observations 20, 30, 40, and 990.
Mean?
Median?
63 73 84 86 88 95 97 97 100
Sounds like
MEDIUM
Think middle when you hear median.
A Hint for remembering the MODE…
90 - 100 11 95 1045 11
Total 200
25280
Solution
•
• Mean =
= 25280/200 = 126.4
Solution
•
• Where, N/2 = 200/2 =100
• L = 120
• f = 125
• h = 10
• C.F. = 74
• = 120 + (260/38) =120+6.08 =126.08
Solution
• Mode = ] X h
= 130 + [43-38]/2x43-38-28]x10
=130 + [5/20]x10
= 130+ 2.5 = 132.5
Geometric Mean
• This measure of the central tendency is called the geometric mean
(GM) and is defined as the arithmetic mean of the values taken on a
log scale.