Professional Documents
Culture Documents
Introduction:
DEFINITION
Different statisticians defined different ways that some one defined as
numerical data whereas others as statistical methods.
“The classified facts respecting the condition of the people in a state especially
those facts which can be stated in numbers or in table to numbers or in any
classified arrangement”. > By Webster.
There are many definitions of statistics when applied to medicine and public.
Statistical methods are useful for various aspects of data collection such as
designing of Pro-forma, decision on sample on sample size, sampling
method can we effectively used.
Statistical methods are useful for best possible data presentation; different
types of table’s frequency distribution table, diagrams and graph can be
used.
Statistics is also useful for further in medical data analysis and
interpretation.
Calculation of averages, variation, correlation and regression and tests of
significance are some of the methods of interpretation.
Using hospital data, it is possible to calculate various hospital statistics are
useful for management, administration and planning of the hospital.
Planning statistics also used for Public health administration and health
planning. This is done by using various health data.
Uses of Statistics:
1) Statistics helps in providing a better understanding and exact description of
a phenomenon of nature.
7) Statistics also provides tools for prediction and forecasting using data and
statistical models.
The methods of data collection, the reliability and validity of data and the
types of variable used are important parameters for statistical studies. Data are
collected by directly observation or measurement and from responses to questions.
E.g.: height, weight and Blood pressure of children are directly
measured
and data an preference for coffee or tea is collected by asking
questions.
Reliability of data depends on the preciseness of the methods used for data
collection, so that all the persons can use the same procedure and expect to get
the same or approximately the same results. E.g.: Height, weight of students
measuring method.
Validity of data depends upon the use of appropriate methods for data
collection. Say if a biased scale is used, the repeated measurements will the same
results but they are not valid because the same mistake or same preference is used
during measurement.
The methods of collection of data are collected only from personal experimental
study, i.e. primary data is used. The primary statistical data can be collected in two
methods:
1) Census Method
2) Sampling Method.
1) CENSUS METHOD
In this method the data is collected all the individual items that are connected
with the inquiry. E.g.: To study the level of Income of the parents of all the 500
students studying in a School to determine their economic status, we have to make use
of all the 500 observations.
al
2) SAMPLING METHOD
Sampling is the selection of a few units from all the observational units in a
population, and a sample is a portion of the population selected to study the
characteristics of the whole population. E.g.: To take the weight of 25 fishes of 100
fishes in an aquarium. The weights of 25 fishes constitute the sample while the
weight of all 100 fishes is the population. Thus, sample should be a true
representative of the whole population.
al
Types of Sample:
There are two types of sample:
1) Qualitative Samples: E.g. The grasses are taller in one grassland than the
other.
The total number of items which are used in the study to get significant results
is termed sample size. To select the proper sample size is very important. The
sample size should not be very large or very small because the conclusions are
directly affected by the sample size.
7) Surveys: The term ‘health survey’ is used for surveys relating to any
aspect to health-morbidity, mortality, nutritional status etc. It include
health interview, health examination, study of health records and mailed
questionnaires.
TYPES OF DATA:
ii) Secondary Data: These are the data obtained from outside source.
If we are studying hospital records and wish to use census data,
then census data becomes secondary data.
> to avoid the problems mentioned above we use the tool known as rate.
Types of Rates:
1) Crude Rates
2) Specific Rates
3) Standardize Rates
4) Other Rates
1) Crude Rates: This rates is called crude because here denominator used in
entire mid-year population and we are not including specific information in
numerator.
e.g. Crude Birth Rate
2) Specific Rates: There are the actual observed rates due to specific causes,
age groups, time periods etc.
e.g. Specific Cancer Death Rate.
1) Standardized Rates: To make two populations and their rates comparable,
adjustment either by direct or indirect method of age, sex, religion etc, can
be made. Thus we can get age or sex standardized rates.
5) Other Rates: Where the time period is not a year of course these rates can
be converted to annual rates by using appropriate multiplier.
a) Quniquennial Rate:
No. of vital events in the population during
the period of five year
= ----------------------------------------------------- x 10,000
Population examination of the middle of five
Year to convert to annual rate, (multiply by 1/5)
b) Decennial Rate:
No. of vital events in the population during
The period one decade (10-years)
= --------------------------------------------------- x 1000
Population estimated of the middle of the
Ten year to convert to annual rate, (multiply by 1/10)
c) Weekly rate:
No. of vital events occurring in one year
= ------------------------------------------------- x 100
Total mid-year estimated population to
Convert to annual rates (multiply by 52 )
RATIO:
The ratio or relation between two unrelated quantities, e.g. in basket there are
30 mangoes and 20 apples we can say that ratio of mangoes to apples is 3/2 or 3:2.
Similarly, we can get ratio of WBC’s to RBC’s serum cholesterol to serum calcium
etc. Here numerator is not a part of the denominator. Ratio does not require a time
frame and multiplier. Sex Ratio is one example of ratios. It is defined as number of
females per 1000 males in 2001. It was 933:1000.
Dependency Ratio:
(Population aged 0 yr to 14 yrs) + (Population
aged 65 yrs & above)
= ---------------------------------------------------------- x 100
Population aged 15 yrs to 64 yrs
Proportion of persons above 65 years & below 15 yrs of age are considered to
be dependent on the economically age group of 15 yrs to 64 years.
PROPORTION:
A proportion is a ratio in which is always included in the denominator. It is
usually expressed as percentage.
E.g.:
Marks obtained
= ------------------ x 100
Total Marks
****@@@*****
Note:
CLASSIFICATION OF DATA
The process of arranging the primary data in a definite pattern and presenting
it in a systematic way is known as classification of data.
To simplify the complexities of raw data and make the data easily
accessible and easily understandable.
To make data attractive so as to leave a lasting impression.
To facilitate quick comparison and easy study.
To present the data in condensed form by the summation of items.
To ensure detection of errors and omission in data
To ensure proper use of collected data
To simplify the complexities of raw data and to draw statistical inference.
To help drafting the final report.
Tally Mark:
Tally marks are small vertical bars which are used in a frequency table to
represent the number of times particular event has appeared in the collected data.
Against a particular class if a particular value has occurred four times, we put tally
mark (llll) but for the fifth occurrence we put a cross tally mark (llll) cross to give it a
block of 5. When it occurs for the sixth time we put another tally mark leaving some
space for the first block 5 (llll) cross.
E.g. 22, 21, 23, 22, 21, 25, 26, 27, 25, 25, 26, 31, 33, 35, 31, 33, 44, 42, 44,
46,
49, 49, 50, 10, 12, 15, 16, 11, 14, 18, 55, 55, 58, 59, 60, 61, 66, 69, 62,
63,
70, 77, 71, 71, 78, 75, 74, 73, 80, 82, 83, 80, 82, 86, 87, 90,91,93,96.98.
Frequency:
TYPES OF TABLE
a) Simple Table
b) Cross Table
c) Table with Multi variables
d) Tables prepared only for Statistical analysis
1) Simple Table
The simple table having table no. table title stub (heading) and caption
(sub heading) with some variables and frequency.
Total 90
G.E Cases
Talk
No. of cases %
Villianur 39 43.2
Bhoor 23 25.6
Karaikal 14 15.6
Mahe 9 10.0
Yanam 5 5.6
Total 90 100.0
2) Cross Table
When 2 or more variables are recorded simultaneously, the data are recorded
in row and column according to the corresponding frequency in each category.
More than two variables or attributes are put to analysis for a set of character
and set of values are presented as multiples way.
Health checkup
Once in Once in
Age Group Once in
Nil three Six Total
month
months months
31 1 7 39
Below
(79.6%) (2.5%) (17.9%) -- (100%)
3months
52 5 57
3-6 months (91.3) -- (8.7) -- (100%)
29 1 4 34
6-9 months
(85.3) (2.9) (11.8) -- (100%)
2 72 121
9-12 months 47
(1.7) (59.5) -- (100%)
(38.8)
230 270 2 502
1-3 years
(45.8) -- (53.8) (0.4) (100%)
E.g.: We want to find out the association of infants and toddlers with health
checkup, we can prepare a statistical table for the purpose of application of statistical
test.
Health Checkup
Children
- (Minus) + (Plus) Total
infants 159 92 251
toddlers 499 588 1087
658 680 1338
Above prepared table is suitable for to find out association of health check up
and infants and toddlers. The said table is 2x2 contingents table, specially prepared
for statistical test, which need not be put into depiction. Only statistical inference in
discussion or the same as footnote multiple tables does the job.
**(al)**
GRAPHIC AND DIAGRAMMATIC PRESENTATION OF DATA
Introduction:
The columns and rows in a table makes eye stain and there are chances of poor
visual impression of data presented in tabular form. In such circumstance data can be
presented in the form of graphic, picture, diagram or figure, which will help in good
comparison through good visual impression. Hence graphs and diagrams are of
utmost importance in creating interest from the observational data.
The presentation of data by diagram proves a very considerable aid and has
much to commend it if certain basic principles are not forgotten. Main objective of
diagram is to help the eye to grasp series of numbers and to grasp the meaning of
series of data and also to assist the intelligence.
Graphic presentation:
Definition: Graphic presentation is the method of presenting statistical data in
the form of curves on a graph paper. It gives a visual effect and is prepared from
frequency distribution table. It provides an immediate overview of the values of
different variables in a simple, clear and comprehensive manner.
1) It is time consuming.
2) Finer details may be lost.
3) It depicts only approximate values.
1) Line Diagram
2) Histogram
3) Frequency Polygon
4) Frequency Curve
5) Cumulative Frequency Curve
6) Scatter Diagram
Line Diagram:
This is simple type of diagram which is useful in studying the changes of
values of the variable with the passenger of time. When such corresponding values
are joined by a line, it constitutes a line diagram.
In line diagram, if more than one set of observations are plotted, line drawn in
different type to show a comparative picture.
No. of Cases
10
8
6.5 Time in Hours
6
5 4.5
4 3.5
2
1
0
1 2 3 4 5 6
Time in Hours
Histogram:
The class intervals are given along the horizontal axis (abscissa) and the
frequencies along the vertical axis (ordinate). The area of each bars or block or
rectangle is proportional to the frequency. The width of the bar represents the interval
of each category. It is an area diagram.
If class intervals are uniform then height of the rectangle will show the
frequency but if class intervals are not uniform, then area of the rectangle will
represent the frequency.
Systolic Blood
No. of Adult Males
Pressure
101-110 20
111-120 30
121-130 40
131-140 30
141-150 20
151-160 10
Systolic Blood Pressure in Adult males
45
40
40
35
30 30
30
No.of Adult Males
101-110
25 111-120
121-130
20 20 131-140
20 141-150
151-160
15
10
10
0
Range of Systolic BP
Frequency Polygon:
45
40
40
35
30 30
30
No.of Adult Males
101-110
111-120
25 121-130
131-140
20 20 141-150
20 151-160
Linear (101-110)
Power (101-110)
15
10
10
0
Range of Systolic BP
Frequency Curve:
As the number of observations become very large and class intervals very
much reduced, the frequency polygon lasses its angulations and gives rise to a smooth
curve known as frequency cure. Such frequency curves are often encountered when
we study the distribution of most of the biological variables.
Here relative frequency of the variable can be obtained from the curve.
Systolic Blood Pressure in Adult males
45
40
40
35
30 30
30
No.of Adult Males
101-110
111-120
25 121-130
131-140
20 20 141-150
20 151-160
Linear (101-110)
Power (101-110)
15
10
10
0
Range of Systolic BP
E.g.: The age groups of albino rats shows on important feature, i.e. the curve is
always ascending order. Such as curves are called ogives.
Frequency Distribution of Weight ( in pounds) of individuals .
Cumulative
Class Interval ( Weight in pound) Frequency
Frequency
151-155 8 8
156-160 7 15
161-165 15 30
166-170 9 39
171-175 9 48
176-180 2 50
50
Cumulative Frequency Distribution of Weights in Pounds
60
50 50
48
30 30
20
15
10
8
0
151 156 161 166 171 176
Weight in Pounds
Weight
Height of of the
children group Children
100 21
102 26
104 31
106 38
108 45
110 52
112 57
114 61
70
60
50
Weight of the children
40
30
20
10
0
98 100 102 104 106 108 110 112 114 116
Height of Children
Diagrammatic Presentation of Data:
Types of Diagram:
1) BAR DIAGRAM:
Bar diagram consists of equally spaced vertical rectangular bars of equal with
placed on a common horizontal base line. The heights of the rectangles are
proportional to the frequencies. The vertical bars substitute the straight lines. The bar
diagram is used with discrete or discontinuous qualitative variables. It provides a
visual comparison of figures. The vertical bars are used for time comparison.
14000
13000
12000
12000
10000
10000
No. of Cases
8000
6000
4000
2000
2009 13000
Years
2007 12000
For example:
An Old & New OPD cases, Special Clinic Cases and In- Patients are for the
years of 2007, 2008 and 2009. It is represented by two side by side bars, but
differentiated with shades, to represent No. of cases respective categories.
e.g.:
14000
13000
12000
12000
10000
10000
No. of Cases
8000
2007
2008
6000
6000 2009
5000
4500
4000
4000
3000
2500
2000
0
OLD & New OPD Cases Special Clinics In-Patient
Name of the Categories
The bars may be divided into two or more parts. Each parts representing a
certain items and proportional to the magnitude of the particular items. It is also
advisable to make one bar as 100 percent and each subcategory is given proportion
with in the bar
100
90
Percentage of distribution
80
70
60 In-patient by %
50
Out patient by %
40
30
20
10
0
1998 1999 2000
Years
Cause of Mortality in Old age in percentage
Non-Specified 35
CHD 22
Diabetes 23
Cancers 10
Accidents 10
100
90
80
70
60 Accidents
50 Cancers
40 Diabetes
30 CHD
20 Non-Specified
10
0
Cause of Mortality in Old age %
169962
Villupuram District : ------- X 360° =150°
407132
So on ….
Percentage calculation
73793
——— X 100= 18.125%
407132
So on …
If there is percentage values are given
18.05
= ---- X 407132 = 73793
100
(or)
65°
= ---- x 100 = 18.125
360°
Thiruvannamalai Geographical Distribution
District
12%
OTHERDist. (TN)
14% Puducherry State
AP
1% KERALA Cuddalore District
0% Villupuram District
KARNATAKA Thiruvannamalai
District
0% OTHER Dist. (TN)
Other State
AP
0%
KERALA
Puducherry State KARNATAKA
18%
VillupuramDistrict Other State
42%
Cuddalore District
13%
Pictogram:
Pictograms are a popular method of presenting data to the lay man and for
those who cannot understand orthodox charts. These are a for of bar diagrams.
MEAN
1) Arithmetic Mean
2) Geometric Mean
3) Harmonic Mean
__ X1+X2+X3+….Xn ∑x
A.M.(or) X.= ———————— = —
n n
X1,X2….. etc. are different values of variable X.
n->is the number of observation of X.
Symbol ∑ is Greek letter sigma. It denotes sum i.e. ∑x is the sum of all values of X. It
also known as Direct Method of calculation of Arithmetic Mean. It is useful only
when number of items in the series is few and the size of values is small.
E.g.: Find the arithmetic mean of the marks obtained by 10 students of a class in
mathematics in certain examination. The marks obtained are:
25, 30, 21, 55, 47, 10, 15, 17, 45, 35.
Formula:
_ ∑d
X(x-bar) =A+ ----
n
_
Where, X(x-bar) is the actual arithmetic mean
A is the assumed arithmetic mean _
d is the deviation of items from the assumed mean, i.e. d = (X-A)
∑d is the sum of deviations from the assumed mean
n is the total number of observations.
E.g.1: The table given below shows the number of colonies of bacteria grown on ten
agar plates. Calculate the arithmetic mean by using short cut method.
Plate No. 1 2 3 4 5 6 7 8 9 10
No.of
60 70 80 95 100 110 115 130 140 160
Colonies
al
Grouped Data:
_ ∑fx 1
X(x-bar) = ----- or --- ∑fx
∑f N
Where,
f = Frequency
X = Value of each item
fx = By multiplying each(X)value with corresponding frequency value.
∑fx = Adding all the multiplication products.
N = ∑f
Marks(x) 5 10 15 20 25 30 35 40
No. of Students (f) 5 7 9 10 8 6 3 2
al
A. M. Continuous Series Method:
_ ∑fm ∑fx
X = ----- = ——
∑f N
Where,
m(or)x = mid value of various classes (or) x-value
f = the frequency of each class
∑f.m = the sum of mid value multiplied by their frequencies.
∑f (or) N = the total frequency.
_
Where, X(x-bar) is the actual arithmetic mean
A is the assumed arithmetic mean _
d is the deviation of items from the assumed mean, i.e. d = (X-A)
C
∑d is the sum of deviations from the assumed mean
n is the total number of observations.
Merits:
1) It is rigidly defined or certain.
2) It is easy to calculate and simple to understand.
3) It is a relatively stable measure.
4) It is based on all the observation of the series.
5) It is capable of further algebraic treatment.
6) It is the best measure for comparing two or more series of data.
7) It is balances the value on either side.
Demerits:
a) Affect of extreme values.
b) Problem in case of Incomplete Data.
c) Mean value may not figure in the series.
d) Misleading Conclusions
e) Absurd Results (Unacceptable results) fraction value of persons.
f) It is cannot be used for small number of classes.
g) It cannot be used the qualitative data.
Geometric Mean(G.M)al
When the data contains a few extremely large (or) small values and when the
values in the data are some what is geometric progress, in such situation the geometric
mean (GM) is a suitable average. It is usually more suitable as a measure of central
tendency when the values change exponentially.
Definition:
GM of ‘n’ observation is defined as the ‘n’ root of product of ‘n’ observation.
Ungrouped data:
For ungrouped average mean X1, X2, ……Xn the Geometric Mean (GM) is
given by
_______________
GM= ⁿ√X1,X2,……Xn
(or)
GM = Antilog [∑ (logx)]
n
Remarks: GM cannot be used when one of the observation is either zero (or) negative.
e.g. 1) Salaries of five employees are Rs. 2300, 2400, 2200, 2350, and 4600.
Find the Geometric Mean.
Solution: Here, n = 5
= Antilog (17.1182/5)
= Antilog (3.42364)
= 2649 + 4
= 2653. Ans.
Merits:
1) It is based on all observation.
2) It has bias for smaller observation.
3) It is not affected much by fluctuation of sampling.
4) It is useful in averaging ratios, percentage rate of increase and decrease
between two persons.
Demerits:
Ungrouped Data:
n
H.M = ----------------
∑ . 1/x
Grouped Data :
N
H.M ------------------
∑ f /x
Example: The following table gives the weight of 31 persons in a sample enquiry.
Calculate the mean weight using G.M and H.H.
Weihgt(X) 130 135 140 145 146 148 149 150 157
No. of 3 4 6 6 3 5 2 1 1
Pers. (f)
Steps :
1) Find the Log value of X
2) Find the Sum of LogX
3) Find Sum of frequency values
4) Multiply the frequency values with LogX
5) Find the 1/X values
6) Find the f/X values
7) Then apply the formula both G.M. and H.M.
Merits:
1) It is based on all observation.
2) It is not affected much by fluctuations of sampling.
3) As reciprocal values are involved, it gives weight age to smaller
observation.
Demerits:
b) It is difficult to understand and calculate for biologists.
c) Its value cannot be obtained if any one of the observation is zero.
A.M.>G.M.>H.M.
The weighted arithmetic mean is the sum of the products of the values with
their respective weights divided by the sum of the weight. ‘Weight’ here stands for
the importance of different events items. In certain circumstance all the observation
do not have equal weight.
_ X1 +X2+X3+ ….Xn
Xw = ------------------------------ x 10
n
II) MEDIAN (Me):
The value of the middle observation or the mean value of two middle
observations is called median. If the values of a variable are arranged in ascending or
descending order of magnitude, the median is that value which divides the whole data
into two equal parts, one part having all values smaller than median value and other
part having all the values greater than the median value.
Calculation of Median:
Number of observations + 1 n +1 th
Median = --------------------------------- = [-------] observations
2 2
E.g. 2:
X 75 97 100 120 150 175
n 6
Step 1 : AM of 6 observations is = --- = --- = 3rd Observation
2 2
n 6
Step 2 : AM of 6 observations + 1 = --- + 1 = --- +1 = 4th Observation
2 2
Variable (X) 1 2 3 4 5 6 7
Frequency (f) 1 4 12 9 2 1 1
∑f
[ — -F ]
2
Median= L1 + ——— x i
fm
Where,
L1 = The lower limit of that class interval where median falls.
N or ∑f = Total number of frequency
f = Frequency of middle class
F = The cumulative frequency just above that class interval where
median falls.
Fm = The frequency of that class interval where median falls
i = The width of the class interval.
a. It is rigidly defined
b. Median is easy to understand and easy to calculate
c. Median is not affected by extreme observation
d. Median can be computed with a distribution of open end class.
e. Median is best for qualitative data.
Demerits:
1) Median cannot be determined in the case of even number of
observations.
2) Median is relatively less stable than mean.
3) Median is a positional average.
4) It is not included all observations
5) It cannot be subjected to algebraic treatment.
MODE ( Mo):
Mode is the most frequently occurring value in a data. It means that for a
given data, mode may or may not be exist.
E.g.1:
a) 10,10, 9, 8, 5,4,12,10 : one mode data 10
b) 10,10, 9,9, 12, 15, 5 : Two mode data 10and 9
c) 4, 6, 7, 15, 12, 13, 10 : No mode
By A.U. Tuttle.
Variable X 32 22 29 25 17 25 40
fm-f1
Mode(Z) = L1+( ---------------) x c
2fm-(f1-f2)
Where,
L1 = Lower limit of modal class
fm = Frequency of modal class or maximum frequency
f1 = frequency of class just preceding the modal class
f2 = Frequency of class just succeeding the modal class
C = Class interval & width
E.g. 4: In a class following is the distribution of marks of 85 students. Calculate the
modal class and mode of the following data:
Marks(Grouped data 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
No. of Students (f) 5 7 8 18 25 12 7 5
Merits :
Demerits:
1) Mode is not rigidly defined.
2) As compared to mean, mode is affected to a great extent by the fluctuation of
sampling.
3) It is not suitable for algebraic treatment.
****al****
Introduction:
The measures of Central tendency like Mean, Median, Mode alone the average, our
description will be incomplete.
Binomial Characteristics:
treatment needed, not needed
suffering , not suffering
enlarged , not enlarged
gained , not gained
healthy, sick
positive , negative
increased , decreased
recovered , nor recovered
We take the help of normal values. It is a concept which depends upon the
distribution of attribute are variables in the population. Thus measures of variability
are very essential for our understanding of the concept of normal values with the help
of measures of variability we can give a complete picture of (or) a set of health data.
“Mean defines the distribution in concise manner, but do not fix the distribution.
Even with common mean, two sets of data vary very much in their values of
observations. This measures of disperse values are called measures of dispersion or
Variability”.
“Centering constants are representation values of the series. They do not express
the range of normalness. Centering constants together with measures of
variations. It helps to understand of the data better centering Constance alone”.
1) Range
2) Mean Deviation
3) Quartile Deviation
4) Standard Deviation
5) Coefficient of Variation
1) RANGE:
The range of a group observation is the interval between the smallest and the biggest
observations. The value of the range is dependent only upon the two extreme
observations in the group and does not consider the other observations.
Merits:
1) Easy to calculate
2) Does not require mathematical calculation
Demerits:
1) Only extreme are considered
2) It is gross in expression
3) It is not ideal
2) QUARTILE DEVIATION
Quartile deviation is another method of distance measure of dispersion. In this
method , the series is divided into 4 equal parts or quarters. Quartile deviation are
represented as Q1, Q2, and Q3. The difference between third quartile (Q3) and
first quartile (Q1) represents the quartile deviation and quartile distance.
3(N+1)
3) Q3= size of { ---------- }th item
4
Q3 - Q1
• QD = 2
(3N/4 – c.f.)
Q1 = L1+ ———— x i
fq
Q3-Q1
Coefficient of QD = ——
Q3+Q1
Merits:
Is better than Range
It is the only measure of Dispersion
It is simple to calculate and understand.
Demerits :
It is not based on all observation.
It is affect the sampling.
It is not use when exhibit great variations.
2) MEAN DEVIATION:
> The formula for the Mean Deviation for Grouped data is :
_
∑f│(x-x)│
MD= _________
∑f
9280
Mean ( X) = -------- = 46.4
200
345.4 + 523.8 + - - - - - - 38.6
Mean Deviation (MD) = ------------------------------------- = 13.56
200
Co-efficient of Mean Deviation :
13.56
= --------- = 0.29
46.4
The standard deviation (SD) is the defined as the Positive square root of the
arithmetic mean from the square of the deviations taken from the arithmetic mean. It
is denoted by symbol ‘σ ’ a Greek alphabet, read as ‘Sigma’.
Thus,
_________ _________
SD(σ) = √ ∑(x-x)² √ ∑d²
--------- ---------
n n
__________ __________
= √ ∑x²─ (∑x)² √ ∑x²─ (∑x)²
n n
n n-1
__________
SD(σ (or) S ) = √ ∑f (x-x) ² (or)
∑f
___________________
= √ ∑fx²/n - [∑fx/n] ²
These formula for SD of the population is usually denoted by “σ ” , and that
of the sample by ‘S’.
Thus formula is
________ _______________ ________
s = √ ∑(x-x)² (or) √ ∑(x) ²- (∑x)²/n √ ∑x²─ (∑x)²
n n-1 (or) n
n-1
The square of Standard Deviation is called Variance, which can also be used as a
measure of Dispersion.
_____________
√ ∑(fd ²) -N(fd/n)²
= ---------------------- x h
∑f (or) N (or) N-1
Steps :
1) Decide assumed mean ‘a’
2) Obtain the deviation values: d=x-a
3) Find the value of d = x-a ,
h=class width h
4) Find d²
5) Find fd
6) Find fd²
7) Apply the formula
e. g. Haemoglobin values:
11.8 , 11.6, 11.4 ,12.6,10.4, 13.3, 11.6,12.9, 10.8, 13.2,12.2, 14.2, 12.9, 13.5, 12.3,
13.0, 10.8, 13.8, 12.0, 12.2, 10.5, 11.2, 12.4, 11.7, 12.7,12.2.
∑x = 317.2
∑(x-x)² = 25.40
No. of Obs.= 26
5) VARIANCE (V):
Definition:
• The variance is the arithmetic mean of the squares of sum of the deviation
from the mean value of the data. It is also described as the square of
Standard Deviation (SD).
∑(x – xbar) ²
s ² =V= ---------- (Ungrouped Data)
n
∑f(x – xbar) ²
s ² =V= ---------- (Grouped Data)
n
SD x 100
Co-efficient of Variation (CV)= -----------
Mean
***al***
SAMPLING
Definition: “The totality (or) aggregate of all individuals with the specified
characteristic is a population (or) universe and a group of individuals that is
chosen from that population is sample” this process called sampling.
Sampling, which is the selection of part of an aggregate to represent the whole,
is frequently used in everyday life in all kinds of investigations, surveys, etc. The
important purpose of sampling is to obtain information about population.
Needs of Sampling:-
Uses of Sampling:
1) It is used for descriptive survey information as regards to characteristics of
the entire population is obtained.
2) It is used for analytical surveys to get information from various sub groups
3) It is used in industries to improve operational efficiency.
4) It is used in population census.
5) It is also used in experimental investigations
6) It is provides estimation of population parameters from sample statistics.
METHODS OF SAMPLING:
Broadly says the methods of sampling can be divided into two methods A)
Non random Sampling and B) Random or Probability Sampling
Size of the sample depends upon the size of the population (heterogeneous
population), which is called Stratified sampling with proportional allocation. The
population is divided into a number of sections or subgroup or homogeneous group or
classes called strata. Depending upon characteristics, they are divided into subgroups
and random sample is drawn independently from each sub group.
e.g., To estimate average weight of persons from a heterogeneous population
of three female and three male stratified random sampling is used. Here males form
on stratum and females form another stratum in the population and sample is drawn
from each stratum so that, the variability is each stratum is adequate represented in the
sample.
3) Systematic Sampling:
This is a simple procedure and utilized when a complete list of population
from which a sample is to be drawn is available. It is more often applied to field
studies when the population is large, scattered and heterogeneous. This samples
getting from old records, household survey, patient clinic, where total size of
population is known and particulars of units are not known. Selecting a number at
random and 100 is added to that i.e. every 100th observations is to be selected.
4) Cluster Sampling:
The units of population are natural groups of elements. These groups are
called clusters. Each cluster includes only one types of elements. A simple random
sample is taken from each cluster. Cluster sampling provides best results only when
the elements within the cluster are heterogeneous.
***al***
PROBABILITY THEORY
The probability or chance that an event will occur can be defined as the
number of times in which that event occurs in a very large number of trials. It may be
described as the ‘law of chance’. Probability is a numerical assessment of the
likelihood of an outcome or the number of occurrences of a random variable.
Probability is a population parameter.
Definition:
It is defined as “the ratio of number of times a particular event occurs to
the total number of trials during which the events have happened”.
Concepts:
a) Trial and event.
b) Exhaustive event.
c) Favourable events.
d) Mutually exclusive events.
e) Equaly likely events.
f) Independent events.
g) Random experiment & sample space.
h) Mathematical or classical probability.
b) Exhaustive Event:
The total number of possible outcomes in any trial is known as exhaustive
event.(e.g: Throwing two dice together the exhaustive events are 36).
c) Favourable Events:
The number of cases favourable to an event in a trial is the number of
outcomes which entail the happening of the event. (e.g.: Tossing a coin, the number of
favourable for getting a head is 1 and for getting a tail is 1.)
f) Independent events:
Events are said to be independent if the happening (or non- happening) of an
event is not affected by the supplementary knowledge concerning the occurrences of
any number of the remaining events. (e.g. In a five test cricket series between two
countries, the win or loss in 5 th test is independent of the results of earlier four test
matches.)
***al***
Introduction:
Health It means a state of complete mental, physical and social well beings.
Term “Hospital”:
During A.D 390 St. Jerome is being the first to mention the word
‘Hospital”which is derived from Latin “Hospitalis” formed from “Hospes”, its
meaning host or guest.
The WHO expert committee on Health Statistics in its 13th report categorically
identified the primary need for Hospital Statistics,
1) To establish administrative control over functional activities,
2) To provide reports to the governing board, outside agencies etc,
3) To provide basis for preparing operating budgets,
4) To provide a basis for the distribution of expenses when computing cost of
operation.
5) To provide a basis for the calculation of average income and cost per unit
of service rendered.
6) Realistic planning for the future is important the basic statistics.
7) To assess utilization of hospital facilities.
8) To provide data for health intelligence to public health authorities.
Merits:
By sources of health information
ii) They should be helpful in further analysis.
iii) We can prepare over all statistics in a population.
iv) We can find the No. of sufferings in a particular disease.
Demerit:
i) There will not be accurate & correct.
d) Miscellaneous:
1) Out –Patient
2) In-Patient
3) EMS (Emergency Medical Service)
4) Admissions
5) Discharges
6) Patient day
7) Patient Identification Data
8) In-patient Census
9) Health Facility Death
10) Length of Stay
11) Hospital bed
12) Bed Strength
13) Discharge Analysis
14) Daily , Monthly and Annual reports
15) Transfer to out side and Inside patients
16) Turnover Intervals
17) Live Birth
18) Foetal Death (or) Still Birth
19) Total investigations
20) Operations
21) Special Investigation
22) Medico-Legal cases
23) Autopsies
IMPORTANCE TERMS
1) Out-Patient: The patient who visits to a hospital is confined only a few hours and
who is not accommodating over night is considered as out-patient.
3) Hospital Bed: A hospital bed is one regularly maintains in a hospital for the use of
patient.
7) Length of Stay: The number of days an in-patient has stayed on the hospital. It is
completed by subtracted by the admission date from the discharge date. (The
admission day is calculated but discharge day is not calculated.
8) Total Length of Stay: It is sum of days of any group of in-patient discharged during
a specific period of time and it is necessary in computing average length of stay
9) Average Length of stay: It is the average of hospitalization of in-patient discharged
during the time under consideration. The average length of stay for new born is
reported separately.
Total Length of stay (Discharge days)
ALS = --------------------------------------------
Total Discharges
10) Daily Discharge Analysis: The tabulation of data daily concerning patients
discharged from the hospital is called discharged service analysis or analysis of
hospital service.
11) Monthly Analysis: It provides a clear monthly report of professional care rendered
to patient in the hospital. This report provides comparative figures. Which are of the
value to the medical staff to evaluate its own performance and to the governing board
and the hospital administrator as a picture of professional performance of the hospital
and medical staff.
13) In-patient day (or) In-patient service day (or) Census day (or) Bed-Occupancy
day:
a unit of measure denoting the services received by one In-patient in on 24hours
period.
14) Total In-patient service day: The sum of all In-patient service days for each of the
day in the period under consideration.
15) Average daily In-patient Census: (The average daily census, Average census,
Average Daily no. of in-patient) Average no. of in-patient present each days for a
given period of time.
16) In-patient Census and In-patient service day: The patients remaining in the
hospital at the census taking time for a specific day, plus the admission for the
following day, minus discharges (including deaths) during that day equal the patient
remaining at the census taking time is called in-patient census for the day.
17) In-patient service day: Measures the serviced one In-patient in one 24 hours
period. The 24 hours period is the time between the censuses taking hours on 2
successive days.
19) Bed count (Bed complements): The no. of beds available, both occupied and
vacant.
The bed complement of a hospital is the total number of hospital beds normally
available for use by inpatient.
20) New Born Bassinets count: The no. of bassinets available in the hospital for NB
both occupied and vacant in a given day.
21) Bed count day: A unit of measure denoting the presence of one In-Patient bed
either occupied or vacant, set up, and staffed for use in one 24 hours period.
22) Bed turnover Interval: The bed remains empty between two occupants or
admissions.
23) Live Birth : It is complete expulsion or extraction from its mother of a product of
conception, irrespective of the duration of the pregnancy, which after such separation,
breathes or shows any other evidence of life, such as beating of the heart, pulsation of
the umbilical cord, (or) definite movement of voluntary muscles, whether or not the
umbilical cord has been cut or the placenta attached, each product of such a birth is
considered as Live Birth.
24) Still Birth: A death prior to complete expulsion or extraction from its mother of a
product of conception, irrespective of the duration of the pregnancy, the death is
indicate by the fact after such separation the fetus does not breathe or show any other
evidence of life, such as beating of heart, pulsation of umbilical cord or definite
movement of voluntary muscles.
25) Cause of Death : The cause of death to be entered on the Medical Certificate of
Cause of Death are all-those disease , morbid conditions or injuries which either
resulted in or contributed to death and the circumstance of the accident or violence
which produced any such injuries.
26) Hospital New Born (alive at Birth): This category includes only infants born in
the hospital. Infants who are born at home or born on the way to hospital are not
hospital new born inpatients.
27) Hospital Death (or) Health Facility Death: Death occurring after lodging a patient
in an inpatient bed. Detailed record should be maintained for death occurring with in
or beyond 48-hours.
If a patient dies earlier than 48-hours after admission length of stay should also
be indicated in hours (for calculation net death rate)
Death occurring before lodging (death in the emergency room, ambulance or
on the lift are not classified as hospital death or health facility death.
28) Communicable disease: One whose causative agents may pass or be carried from
one person to another directly or indirectly.
31) Medico-legal case: Pertaining to a matter that involves both medicine and legal.
32) Underlying cause of Death: The underlying cause of death is defined as (a) the
disease or injury which initiated the train of morbid events leading directly to death or
(b) the circumstances of the accident or violence which produce the fatal injury.
***al***
Vital Statistics
Introduction:
These vital events comprehensively include live birth, deaths, featal deaths,
marriages, divorces, adoptions, legitimations, recognitions, recognitions, annulments
and legal separations.
The collected sources of health/ hospital information are, compile and calculated for
Averages & Percentage, Bed Occupancy rate, Death rate, Birth rate etc.
Out Patient:
1) Average daily No. of Out-patient attendance(ADOPA):
(The average can also be calculated for new daily out-patients attended
by giving the total no. of New outpatients attendance in the numerator.)
Death rates specific for age, sex, etc., can be calculated in a manner similar to the
crude death rate. Specific Rates given below:
The risk of dying from cause associated with childbirth is measured by the
maternal mortality rate. The numbers exposed to the risk of dying from
puerperal causes are women who have been pregnant during the period. Their
number being unknown the number of live birth is used as the conventional
base for computing comparable mortality rates.
Many late foetal deaths and early neonatal deaths may be attributed to similar
underlying conditions and it has been suggested that a single rate called the
perinatal mortality rate. It can be calculated by combining the deaths in both
categories.
Total No. of Late Foetal deaths (above 28 weeks) +
Early Postnatal (neonatal) deaths (under 1-week)
PMR = --------------------------------------------------------------- X 1000
Total No. of Live Birth
Deaths occurring under 28 days a rate called the neonatal mortality rate is
calculated.
Simply say that the infant mortality rate is then approximately the sum of the
neonatal and the post-neonatal mortality rates.
(or)
Annual No. of Deaths between 1-4 years life children
CDR = --------------------------------------------------------------- X 1000
No. of Live Births in the year
HOSPITAL DEATH
The ratio of maternal death for a period to the total no. of patients discharged.
5) Hospital Neonatal Death Rate (or) Infant New Born Mortality Rate
(HNDR or INBMR):
The ratio of deaths infants born in the hospital for a period. Foetal Deaths are
not included since they are not New Born – In-patient. Infants born outside
the hospital and admitted should be recorded as child in-patient not as New
Born in-patient.
The ratio of all infection in clean surgical cases to the no. of operations.
Foetal deaths are autopsies performed on these cases are not included when
computing the autopsy rate.
The ratio during any given time period of all in-patient autopsies to all in-
patient un-autopsied coroners’ or medical examiners’ cased.
13) Hospital Foetal Death Rate or Still Birth Rate (HFDR or SBR):
The prenatal deaths is a general term referring to both foetal deaths and infants
who die during the neonatal period (28-days)
Neonatal Period:
II- From the beginning of the 24th hrs of life through 6-days 23 hrs.
and 59mnts.
III- From the beginning of the 7th day of life through 27th day, 23 hrs .
and 59mnts.
The General Fertility Rate is approximately 4 times the Crude Birth Rate.
As an index of the relative speed at which additions are being made to the
population through child birth.
It is the average number of day a bed remains vacant large turn over intervals
indication efficient use and scope for improvement.
This is mean number of days that bed is not occupied between two
admissions.
Medical Records Personnel (Census) checking the Daily census reports for 24
hours ending mid-night and receive with discharged patients in-patient records from
all wards and make entries into the electronically made format all the admission,
discharges and deaths ward by ward for preparation of In-Patient statistics.
It carefully verified and correctly entered into the computer all the remaining
In-patients, admissions, discharges, deaths, and after completion of every month we
get the total monthly,yearly admission, total monthly,yearly discharges, total monthly-
yearly deaths and new born.
Discharge case sheets are entered into the computer for analyzing the hospital
service rendered. After entries of all discharges we get the total number of
discharges, result, total hospital days, provisional and final diagnosis, operations,
consultation, etc.
After completion of this analysis every month, the following monthly-yearly statistics
are prepared.
1) Total admission
2) Total Discharges
3) Total Deaths (- 48 hours + 48 hours)
4) Death percentage
5) Average hospital days
6) Total postmortem(Autopsies) percentage
7) Bed Occupancy percentage
8) Turn over interval
9) Geographical distribution of Admission
10) Operation wise cases
11) Service wise patients
12) Results
13) Consultation report at the end of the month
14) Disease-wise report
15) Cause of death wise report
16) Communicable disease wise report
Uses of HS:
5) proper arrangement of the medical record case sheets for indexing, coding and
retrieval & comparison purpose.
Limitation of HS:
Personal Characteristics,
Severity of Disease
Associated conditions
Admission Polices.
2) Hospital Records are not designed for research, the may be
Incomplete,
Illegible or missing
Variable in diagnostic quality
***al***