You are on page 1of 75

GENERAL STATISTICS

Introduction:

Generally state that statistics is a science of collecting, classifying and


interpreting numerical facts and data, used as method of analysis in science, social
science, business etc. concerned both with description of actual events and prediction
of likelihood of an events occurring.

Statistic  Value (Weight of one person)

Statistics  Data (Weight of a group of persons).

“Statistics is the study of methods and procedures for collection, classification,


analysis and interpretation of data to make scientific inference from it. The term has
been derived from the Latin word ‘Status”, the Italian word ‘Statistica’, the French
word ‘Statistique’ or the German word ‘Statistik’.

These words mean a ‘political state’ or a ‘government’. It has been the


traditional function of government to keep records of Population, Births, Deaths,
Taxes, Military Strength and other kind of activities”.

DEFINITION
Different statisticians defined different ways that some one defined as
numerical data whereas others as statistical methods.

A) Numerical Data (Definitions):

“The classified facts respecting the condition of the people in a state especially
those facts which can be stated in numbers or in table to numbers or in any
classified arrangement”. > By Webster.

“Quantitative data affecting to a marked extent by multiplicity of cause”


> By Yule and Kendall

“By statistics we mean aggregates of facts affected to a marked extent by


multiplicity of cause, estimated according to reasonable standards of accuracy,
collected in a systematic manner for a pre-determined purpose and placed in
relation to each other”.
 By Prof. Secrist.
This definition clearly points out all the important characteristics of numerical data.

1) Statistics are aggregates of facts.


2) They are affecting to a marked extent by multiplicities of causes.
3) They are numerically expressed.
4) They are enumerated or estimated according to reasonable standards
of accuracy.
5) They are collected in a systematic manner.
6) They are collected for a pre-determined purpose and,
7) They should be placed in relation to each other.

B) Statistical Method (Definitions):

It is a branch of science which provides various methods to deal with data.

“Statistics may be called the science of counting”


“Statistics may right be called the science of averages”
“Statistics is the science of the measurements of social organism regarded as a
whole in all its manifestation”.
> By Prof. A. L. Bowley.

“Science of estimates and probabilities” > By Boddington.

“As the Collection, Presentation, Analysis and Interpretation of numerical


data”
> By Croxton and Cowden.

By adding organization we can have the different methods of statistics is


1) Collection of data
2) Organization of data
3) Presentation of data
4) Analysis of data
5) Interpretation of data.

There are many definitions of statistics when applied to medicine and public.

Medical Statistics: Medical statistics deals with application of statistical


methods to the study of disease, disability, efficacy of vaccine, a new regime etc.

Health Statistics: Health statistics deals with application of statistical


methods to varied information of public health importance.
“It is the branch of statistics applied in collection, organization,
presentation, analysis and interpretation of hospital data is called “Health care /
Hospital statistics”. It includes all statistical information required for the
administration of a health agency, and so would comprise not only vital statistics but
also a good deal of other numerical information.

Vital Statistics: Vital Statistics is the ongoing collection by government


agencies of data relating to vital events such as births, deaths, marriages, divorces, and
health and disease related states and events which are deemed reported by local
authorities.

Bio-Statistics: It can be defined as an arts and science of collection,


compilation, presentation, analysis and logical interpretation of biological data
affected by multiplicity of factors. This appears to be a comprehensive definition
encompassing health, medical and vital statistics to-gather with statistical methods
encountered (e.g. Morbidity Rate, Mortality Rate).

Bio-Statistics  It means application of statistical processes and methods to the


analysis of biological data.
Importance of Bio-Statistics in health field:

 Statistical methods are useful for various aspects of data collection such as
designing of Pro-forma, decision on sample on sample size, sampling
method can we effectively used.
 Statistical methods are useful for best possible data presentation; different
types of table’s frequency distribution table, diagrams and graph can be
used.
 Statistics is also useful for further in medical data analysis and
interpretation.
 Calculation of averages, variation, correlation and regression and tests of
significance are some of the methods of interpretation.
 Using hospital data, it is possible to calculate various hospital statistics are
useful for management, administration and planning of the hospital.
 Planning statistics also used for Public health administration and health
planning. This is done by using various health data.

Uses of Statistics:
1) Statistics helps in providing a better understanding and exact description of
a phenomenon of nature.

2) Statistics helps in proper and efficient planning of a statistical inquiry in


any field of study.

3) Statistical helps in collecting an appropriate quantitative data.

4) Statistics helps in presenting complex data in a suitable tabular,


diagrammatic and graphic form for an easy and clear comprehension of the
data.
5) Statistics helps in understanding the nature and pattern of variability of a
phenomenon through quantitative observation.

6) Statistics helps in drawing valid inference, along with a measure of their


reliability about the population parameters from the sample data.

7) Statistics also provides tools for prediction and forecasting using data and
statistical models.

8) Statistics in applicable to a wide variety of academic disciplines, including


natural and social science, government, and business.

9) Statistical methods can be used to summarize or describe of collection of


data.

10) Statistical methods useful in research, when communicating the results of


experiments.

METHODS OF COLLECTION OF DATA

The methods of data collection, the reliability and validity of data and the
types of variable used are important parameters for statistical studies. Data are
collected by directly observation or measurement and from responses to questions.
E.g.: height, weight and Blood pressure of children are directly
measured
and data an preference for coffee or tea is collected by asking
questions.

Reliability and Validity of Data:

Reliability of data depends on the preciseness of the methods used for data
collection, so that all the persons can use the same procedure and expect to get
the same or approximately the same results. E.g.: Height, weight of students
measuring method.

Validity of data depends upon the use of appropriate methods for data
collection. Say if a biased scale is used, the repeated measurements will the same
results but they are not valid because the same mistake or same preference is used
during measurement.

The methods of collection of data are collected only from personal experimental
study, i.e. primary data is used. The primary statistical data can be collected in two
methods:

1) Census Method
2) Sampling Method.
1) CENSUS METHOD

In this method the data is collected all the individual items that are connected
with the inquiry. E.g.: To study the level of Income of the parents of all the 500
students studying in a School to determine their economic status, we have to make use
of all the 500 observations.
al

Advantages of Census Method:

a) The data has high degree of accuracy.


b) The data is more representative and ture.
c) Results are more reliable.
d) Possibility of any type of bias is minimized.

Disadvantages of Census Method:

i) It is less economical as it consumes more time, more energy


and more expenditure.
ii) It requires organizational skills and large number of
investigations.
iii) It can not be applied to all the situations,

2) SAMPLING METHOD

Sampling is the selection of a few units from all the observational units in a
population, and a sample is a portion of the population selected to study the
characteristics of the whole population. E.g.: To take the weight of 25 fishes of 100
fishes in an aquarium. The weights of 25 fishes constitute the sample while the
weight of all 100 fishes is the population. Thus, sample should be a true
representative of the whole population.
al

Essentials of a Good Sample:

 Samples selected from the population should be homogeneous and be true


representatives of the population.
 Samples should have characteristics similar to the population from which
they are collected.
 The number of samples or number of observations in each sample should
be more to make the results more reliable.
 The individual items composing the sample should be independent of each
other.

Types of Sample:
There are two types of sample:
1) Qualitative Samples: E.g. The grasses are taller in one grassland than the
other.

2) Quantitative Samples: E. g.: The number of young and Old trees in an


orchard.

Size of the Samples:

The total number of items which are used in the study to get significant results
is termed sample size. To select the proper sample size is very important. The
sample size should not be very large or very small because the conclusions are
directly affected by the sample size.

A representative characteristic of the sample depends on the adequacy of the


sample selected. Larger the sample better is the degree of representation of
various attributes of the population from the sample selected.

Advantages of Sampling Method:


1) Sampling is comparatively more economical as it consumes less energy,
time, expenditure.
2) It is the most scientific method of data collection
3) It is most suited in case of large population.
4) It requires less number of investigators.
5) It is most suited to those places and situations.

Disadvantages of Sampling Method:

a) It requires services of experts, otherwise incorrect or misleading results


will be obtained.
b) In this method selection of appropriate method of sampling in necessary.
c) In case the units of population are spread over a large area, this method can
not be used.
d) In case the size of samples is small, sampling does not provide true
representative of the population.
e) Sample if taken wrongly may lead to wrong analysis and interpretation.

Sources for Collection of Data:

There are eight sources of collection of data is


1) Census
2) Registration of Vital Events
3) Sample Registration System(SRS)
4) Notification of Diseases
5) Hospital Records
6) Epidemiological Surveillance
7) Surveys
8) Research Findings.
1) Census: In India every 10 years census is taken. It is defined as “the total
process of collecting, compiling and publishing demographic, economic
and social data pertaining to all persons in a country or delimited territory
at a specified time or times”.

2) Registration of Vital events: In India, registration of births, deaths and


marriages is mandatory by law.

3) Sample Registration System(SRS): It is a dual record system, consisting


of continuous enumeration of births and deaths by an enumerator and
independent survey every 6 months by investigator-supervisor. Due to
complete coverage of our country by SRS, we are able to get more reliable
information on birth and death rates, age specific fertility, mortality rates
and infant mortality.

4) Notification of Disease: It is a valuable source of morbidity data such as


incidence, prevalence in the community of certain specified diseases which
are notifiable. Eg. Cholera, Plague, Yellow fever are international
notifiable diseases.

5) Hospital Records: Hospital data is that it represents only those


individuals who seek medial care and we do not know the denominator
due to lack of precise boundaries of the catchments area of the hospital.
Still it gives useful information regarding time, place and person
distribution of various diseases.

6) Epidemiological Surveillance: Special surveillance activities are


conducted for diseases like malaria, AIDS in our country.

7) Surveys: The term ‘health survey’ is used for surveys relating to any
aspect to health-morbidity, mortality, nutritional status etc. It include
health interview, health examination, study of health records and mailed
questionnaires.

8) Research Findings: In various departments of Medical Colleges &


Hospital experiments and also Biomedical institutions & pharmaceutical
industries are performed for investigation and research with specific
objectives.

TYPES OF DATA:

1) Qualitative and Quantitative Data:


2) Qualitative data is also called as enumeration data. It represents a
particular quality or attribute. E.g. Religion, sex, tall or short,
blood group, cured or not cured, grey or black hair.

3) Quantitative data is also called as measurement data. It can be


fractional measurement. E.g. Height in cm, weight in kg,
haemoglobin (gm%), blood pressure (mm of Hg), and serum
billirubin.

2) Discrete and Continuous Data:

i) Discrete Data: It is always getting a whole number. This data


never includes fractions. E.g. number of people dying in RTA
(Road Transport Accident), number of refrigerator
requiring repairs and number of vial of polio vaccine.

ii) Continuous Data: There is possibility of getting fractions


depending upon our requirement. It takes all possible values in
a certain range. E.g. Height, weight and haemoglobin level.

3) Grouped and Ungrouped Data;

i) Grouped Data: These are presented in groups. E.g. Hb % level


of seven men can be presented as 12.3gm% (3men), 14.5 gm %
(2 men) & 14.1 gm% (2 men).

ii) Ungrouped Data: These are presented individually. E.g.: Hb %


level of 7 men can be presented as 12.3, 12.3, 12.3, 14.5, 14.5,
14.1, & 14.1.
4) Primary and Secondary Data:

i) Primary data: These are the data obtained directly from an


individual. It gives precise information. Height, weight, disease of
an individual interviewed is primary data.

ii) Secondary Data: These are the data obtained from outside source.
If we are studying hospital records and wish to use census data,
then census data becomes secondary data.

5) Nominal and Ordinal Data:

i) Nominal Data: In nominal data, the information fits into one of


the categories, but the categories cannot be ordered one above
another. E.g. In a class room a student (category) can be
Hindu, Muslim, Christian or Buddha.

ii) Ordinal Data: Ordinal data describes a limited number of


categories that can be ordered one above the other, but the
space between each category may not be the same. E.g. a
student getting less than 35% marks is “fail”, getting 36-50% is
“third class”, 51 to 59 % is “second class” and more than 60 %
is “first class”. Here there is a definite order in these
categories.
Rates

It expresses relation between the relative numbers (or) qualities.

Which provide measurement of population, which population behavior of one


country, or states can be compared with the other.

If in school ‘A’ 27 students developed food poisoning and in school ‘B’ 93


students developed food poisoning. One might be tempted to comment that more
cases of food poisoning have occurred in school ‘B’ than ‘A’ ( Now we are told that
in school ‘A’ 27 cases occurred in month of Jan.1994 and there are 270 students in
the school and in school ‘B’ 93 cases occurred from Jan. to June of 1994, and there
are 1860 students in the school. One will now realize that in school ‘A’ in one month
almost 10% (27/270 students suffered from food poisoning whereas in school ‘B’
only 5% students suffered from food poisoning that too in 6 months (5/6 =0.83% per
month.

> to avoid the problems mentioned above we use the tool known as rate.

A rate consists of numerator (how many), denominator (out of how


many) time specification (period or time from) and multiplication.

The example of a typical rate is birth rate,

Number of Births in 2001


Birth Rate = ------------------------------ x 1000
Mid-year population

The multiplier used in 1000, however as per convenience or for avoiding


fraction one can use 100, 1000, 1,00,000 as multiplier.

Types of Rates:
1) Crude Rates
2) Specific Rates
3) Standardize Rates
4) Other Rates

1) Crude Rates: This rates is called crude because here denominator used in
entire mid-year population and we are not including specific information in
numerator.
e.g. Crude Birth Rate

2) Specific Rates: There are the actual observed rates due to specific causes,
age groups, time periods etc.
e.g. Specific Cancer Death Rate.
1) Standardized Rates: To make two populations and their rates comparable,
adjustment either by direct or indirect method of age, sex, religion etc, can
be made. Thus we can get age or sex standardized rates.

5) Other Rates: Where the time period is not a year of course these rates can
be converted to annual rates by using appropriate multiplier.

a) Quniquennial Rate:
No. of vital events in the population during
the period of five year
= ----------------------------------------------------- x 10,000
Population examination of the middle of five
Year to convert to annual rate, (multiply by 1/5)

b) Decennial Rate:
No. of vital events in the population during
The period one decade (10-years)
= --------------------------------------------------- x 1000
Population estimated of the middle of the
Ten year to convert to annual rate, (multiply by 1/10)

c) Weekly rate:
No. of vital events occurring in one year
= ------------------------------------------------- x 100
Total mid-year estimated population to
Convert to annual rates (multiply by 52 )

RATIO:
The ratio or relation between two unrelated quantities, e.g. in basket there are
30 mangoes and 20 apples we can say that ratio of mangoes to apples is 3/2 or 3:2.
Similarly, we can get ratio of WBC’s to RBC’s serum cholesterol to serum calcium
etc. Here numerator is not a part of the denominator. Ratio does not require a time
frame and multiplier. Sex Ratio is one example of ratios. It is defined as number of
females per 1000 males in 2001. It was 933:1000.

Dependency Ratio:
(Population aged 0 yr to 14 yrs) + (Population
aged 65 yrs & above)
= ---------------------------------------------------------- x 100
Population aged 15 yrs to 64 yrs

Proportion of persons above 65 years & below 15 yrs of age are considered to
be dependent on the economically age group of 15 yrs to 64 years.

PROPORTION:
A proportion is a ratio in which is always included in the denominator. It is
usually expressed as percentage.

E.g.:
Marks obtained
= ------------------ x 100
Total Marks

It is called proportion of marks out of total marks.

E.g: A proportion of Mortality due to AIDS out of total mortality,

No. of Deaths Due to AIDS


= --------------------------------- x 100
No. of Deaths due to all causes

****@@@*****
Note:

1) Numerator: The number of times and events has occurred in a


population, during a specified time of period. The event may by a
Birth, Death, Disability, Diseases, episodes of sickness, marriage,
divorce, adoption etc.
2) Denominator: Related to mid year population. The mid-point of a
year is customarily taken as first July of on year. A rate or ratio
calculated in January will be quite then that calculated in December
of the same year. Changes in size and composition of population as
a result of Birth and Death and Migrations.
**al***

CLASSIFICATION OF DATA

The process of arranging the primary data in a definite pattern and presenting
it in a systematic way is known as classification of data.

Collected data are in a crude and unorganized from, it is known as ungrouped


data. It is very difficult to study and draw inferences from ungrouped data because it
is very large. Therefore, it is essential to classify and present data in such a way that
the data becomes easy to convenient to use and handle.

Methods of Classification of Data:

Data may be grouped or classified as follows:

1) Classification by Space (Geographical Data): The data is classified by


location of occurrence, i.e., according to area or region. The data is
organized in the sets of categories in the order of their geographical
location is called classification of space or geographical classification.
2) Classification by Time (Chronological Data): The data is classified by
the time of occurrence of the observations or occurrence of an event. The
categories are arranged in chronological order.

3) Classification of Attribute (Qualitative Data): The data is collected and


classified on the basis of some qualitative characteristics term as attributes.
The qualitative attributes are defined in non-measurable terms, such as
Sex, colour, health condition, intelligence, singing or drawing ability etc.
This data can be two type: i) simple or Dichotomous Data, ii) Manifold
Data.

4) Classification of Size (Quantitative Data): The data pertaining to such


characteristics that are represented by quantitative measurement and are
expressed by numerical values, are called quantitative data. The
classification of quantitative attribute is described as quantitative
classification or classification of variable. The variables can be of two
types: i) Continuous variable and ii) Discrete variable.

Objectives of Classification of Data:

 To simplify the complexities of raw data and make the data easily
accessible and easily understandable.
 To make data attractive so as to leave a lasting impression.
 To facilitate quick comparison and easy study.
 To present the data in condensed form by the summation of items.
 To ensure detection of errors and omission in data
 To ensure proper use of collected data
 To simplify the complexities of raw data and to draw statistical inference.
 To help drafting the final report.

Differences between Classification and Tabulation:

The process of arranging and presenting the primary or raw data in a


systematic way is called classification of data. This is a prelude to tabulation.

Tabulation is presentation of statistical data in the form of tables. A table is a


systematic organization of statistical data in columns and rows. A properly
constructed and adequately labeled table can be read and understood independently
without consulting the accompanying text. Therefore, tables are designed in such a
way that these enable the reader to grasp the information that tables intend to convey.
The tables may be prepared for general purpose and for special purpose.

Tally Mark:

Tally marks are small vertical bars which are used in a frequency table to
represent the number of times particular event has appeared in the collected data.
Against a particular class if a particular value has occurred four times, we put tally
mark (llll) but for the fifth occurrence we put a cross tally mark (llll) cross to give it a
block of 5. When it occurs for the sixth time we put another tally mark leaving some
space for the first block 5 (llll) cross.

Let us take an example to understand tabulation is Number of deaths due to


neonatal tetanus is 60 districts of Indian in one year are given below.

E.g. 22, 21, 23, 22, 21, 25, 26, 27, 25, 25, 26, 31, 33, 35, 31, 33, 44, 42, 44,
46,
49, 49, 50, 10, 12, 15, 16, 11, 14, 18, 55, 55, 58, 59, 60, 61, 66, 69, 62,
63,
70, 77, 71, 71, 78, 75, 74, 73, 80, 82, 83, 80, 82, 86, 87, 90,91,93,96.98.

Classes Tally Mark Total No. of C.F


Frequency
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100

Frequency:

Frequency or frequency count of an attribute is the number of times a


character or attribute has appeared in the data collected. Frequency data can be
represented in the form of frequency table. Arrangement of frequency of a variable
and presentation in a defined group is called frequency table.

e.g. 1) Results from Tossing a Coin,


2) Frequency Table from Throwing a Dice.
RESENTATION OF DATA (al)

In Presentation of Data used various methods of presenting the organized data.


The most frequently used methods of data presentation are: Tabular, Graphic and
Diagrammatic forms.

TABULAR PRESENTATION OF DATA

Tabulation is presentation of statistical data in the form of tables. A table is a


systematic organization of statistical data in columns and rows. A table can be simple
or complex depending upon the number of measurements of a single set or multiple
sets of items.

Preparation of Class Intervals Steps:

1) The data is arranged in an ascending order.


2) The identical values of the variable are grouped together.
3) The number of classes into which the data is to be grouped is decided on
the number of observations. The number should be between 5 and 20 (i.e.
not less than 5 and not more than 20).
4) The size or width of the classes is decided on the basis of data. It should be
uniform.
5) Class frequencies are than calculated and table is prepared showing
different classes with class interval of 5.

Rules and guidelines for Tabular Presentation:

1) Table must be numbered.


2) Brief & Self explanatory title must be given to each table.
3) The headings of columns and rows must be clear, sufficient, concise and
fully defined.
4) The data must be presented according to size of importance,
chronologically, alphabetically or geographically.
5) If data includes rate or proportion mention the denominator.
6) Full details of deliberate exclusions in collected series must be given.
7) Table should not be too large.
8) Figures needing comparison should be placed as close as possible.
9) Arrangement should be vertical as scanning of data is easier from top to
bottom than left to right.
10) Foot notes be given whenever necessary providing additional information,
source or explanatory notes.
11) The classes should be clearly defined and should not lead to any
ambiguity.
12) The classes should be exhaustive (should include all observation).
13) The classes should be mutually exclusive and non-overlapping.
14) The classes should be of equal width.
15) Open ended classes should be avoided as far as possible.
16) Number of classes should be neither too large nor small.

TYPES OF TABLE

a) Simple Table
b) Cross Table
c) Table with Multi variables
d) Tables prepared only for Statistical analysis

1) Simple Table

The simple table having table no. table title stub (heading) and caption
(sub heading) with some variables and frequency.

Simple table is simple consolidation of 90 observations which are


grouped under 5 talks. It easily tells us the distribution of cases Talk wise and so also
the proportion of occurrence Talk wise.

e.g.: Distribution of Gastroenteritis (G.E) cases, Puducherry Talk wise


in June 2000.

Talks No. of Cases


Villianur 39
Bhoor 23
Karaikal 14
Mahe 9
Yanam 5

Total 90

Eg. Distribution of Gastroenteritis (GE) cases Talk wise in June2000 in


Puducherry State district wise.

G.E Cases
Talk
No. of cases %
Villianur 39 43.2
Bhoor 23 25.6
Karaikal 14 15.6
Mahe 9 10.0
Yanam 5 5.6
Total 90 100.0

2) Cross Table

When 2 or more variables are recorded simultaneously, the data are recorded
in row and column according to the corresponding frequency in each category.

E.g.: Distribution of RTA cases in 2002

MLC =440 Non- MLC =115


Month
No. of % No. of %
Jan – Mar. 183 41.6 41 35.7
Apr - Jun 153 34.8 38 33.0
Jul - Sep 75 17.0 31 27.0
Oct -Dec 29 6.6 5 4.3

440 100% 115 100%

The Cross - Table provides information on occurrence of an event in relation


to another variable and is presented as one to one Cross Table
3) Table with Multi variable:

More than two variables or attributes are put to analysis for a set of character
and set of values are presented as multiples way.

E.g.: Distribution of children by age and health checkup periodicity

Health checkup
Once in Once in
Age Group Once in
Nil three Six Total
month
months months
31 1 7 39
Below
(79.6%) (2.5%) (17.9%) -- (100%)
3months
52 5 57
3-6 months (91.3) -- (8.7) -- (100%)

29 1 4 34
6-9 months
(85.3) (2.9) (11.8) -- (100%)

2 72 121
9-12 months 47
(1.7) (59.5) -- (100%)
(38.8)
230 270 2 502
1-3 years
(45.8) -- (53.8) (0.4) (100%)

3-6 years 269 1 314 1 585


(45.9) (0.2) (53.3) (0.2) (100%)
Total 658 5 672 3 1338
(49.1) (0.4) (50.3) (0.2) (100%)

4) Tables Prepared only for Statistical Analysis

When we want to verify the association or measure association, we


have to redraft the tabulated data which suit the application of statistical tests.

E.g.: We want to find out the association of infants and toddlers with health
checkup, we can prepare a statistical table for the purpose of application of statistical
test.

Health Checkup
Children
- (Minus) + (Plus) Total
infants 159 92 251
toddlers 499 588 1087
658 680 1338
Above prepared table is suitable for to find out association of health check up
and infants and toddlers. The said table is 2x2 contingents table, specially prepared
for statistical test, which need not be put into depiction. Only statistical inference in
discussion or the same as footnote multiple tables does the job.
**(al)**
GRAPHIC AND DIAGRAMMATIC PRESENTATION OF DATA

Introduction:

The columns and rows in a table makes eye stain and there are chances of poor
visual impression of data presented in tabular form. In such circumstance data can be
presented in the form of graphic, picture, diagram or figure, which will help in good
comparison through good visual impression. Hence graphs and diagrams are of
utmost importance in creating interest from the observational data.

The presentation of data by diagram proves a very considerable aid and has
much to commend it if certain basic principles are not forgotten. Main objective of
diagram is to help the eye to grasp series of numbers and to grasp the meaning of
series of data and also to assist the intelligence.

Graphic presentation:
Definition: Graphic presentation is the method of presenting statistical data in
the form of curves on a graph paper. It gives a visual effect and is prepared from
frequency distribution table. It provides an immediate overview of the values of
different variables in a simple, clear and comprehensive manner.

A graph is visual portrayal of a collection of numerical data or statistical data.


It presents dry and complex numerical data in the form of attractive and appealing
pictorial presentation.

A graph consists of two lines:

1) Abscissa: It is the horizontal line called ‘X’ axis. It represents magnitude of


dependent variable.
2) Ordinate: It is the vertical line called ‘Y’ axis. It represents the magnitude
of dependent variable.
The meeting point or the point of intersection of X and Y –axis is called origin
point and is represented by zero(0).

The Graph can be dividing in to four parts termed as Quadrant.

Uses of Graphical Representation:

1) Easy for comparison


2) Trends in the observation can be noticed with respect of time.
3) Lay people can understand it
4) Median, percentile, quartile , etc can be calculated
5) When lots of fluctuations are seen, the semi-log or double log graphs can
be a good representation.
Advantages of Graphic Presentation:
1) It is easy to read
2) It can show the relationship between two or more sets of observations in one
look.
3) It is universally applicable.
4) It is attractive in representation.
5) It helps in proper estimation, evaluation and interpretation of the
characteristics of items and individuals.
6) In indicate trend, and helps in forecasting.
7) It has high communication power.
8) It is economical.
9) It is more effective, especially in comparative analysis.
10) It has more lasting effect in brain.
11) It simplifies complex data.

Disadvantages of Graphic Presentation.

1) It is time consuming.
2) Finer details may be lost.
3) It depicts only approximate values.

Types of Graphical Presentation:

1) Line Diagram
2) Histogram
3) Frequency Polygon
4) Frequency Curve
5) Cumulative Frequency Curve
6) Scatter Diagram

Line Diagram:
This is simple type of diagram which is useful in studying the changes of
values of the variable with the passenger of time. When such corresponding values
are joined by a line, it constitutes a line diagram.

In line diagram, if more than one set of observations are plotted, line drawn in
different type to show a comparative picture.

Line Diagram Table

NO. of Cases Time in Hours


1 1
2 5
3 6.5
4 11.5
5 4.5
6 3.5
More than one set of observation
14
12 11.5

No. of Cases
10
8
6.5 Time in Hours
6
5 4.5
4 3.5
2
1
0
1 2 3 4 5 6
Time in Hours

It is nothing but a frequency presenting variations by a line. The class interval


can be a week, a month and a year (or) 100 years. Scale used can change shape of
line diagram.

Histogram:

It is a pictorial diagram of frequency distribution. It is a special form of bar


diagrams which represents categories of continuous and ordered data. It consist of a
series of bars of blocks.

The class intervals are given along the horizontal axis (abscissa) and the
frequencies along the vertical axis (ordinate). The area of each bars or block or
rectangle is proportional to the frequency. The width of the bar represents the interval
of each category. It is an area diagram.

If class intervals are uniform then height of the rectangle will show the
frequency but if class intervals are not uniform, then area of the rectangle will
represent the frequency.

Systolic Blood
No. of Adult Males
Pressure
101-110 20
111-120 30
121-130 40
131-140 30
141-150 20
151-160 10
Systolic Blood Pressure in Adult males

45

40
40

35

30 30
30
No.of Adult Males

101-110
25 111-120
121-130
20 20 131-140
20 141-150
151-160

15

10
10

0
Range of Systolic BP

Frequency Polygon:

It is the representation of the distribution of categories of continuous and


ordered data similar to histogram. It is an area diagram. The X-axis depicts the
categories of data and the Y-axis depicts the frequency of data in each category.

Frequency polygon can be obtained from histogram by joining of blocks or


rectangles of the histogram. It can be more useful than the histogram because several
frequency distributions can be plotted easily on one graph.
Systolic Blood Pressure in Adult males

45

40
40

35

30 30
30
No.of Adult Males

101-110
111-120
25 121-130
131-140
20 20 141-150
20 151-160
Linear (101-110)
Power (101-110)
15

10
10

0
Range of Systolic BP

Frequency Curve:

As the number of observations become very large and class intervals very
much reduced, the frequency polygon lasses its angulations and gives rise to a smooth
curve known as frequency cure. Such frequency curves are often encountered when
we study the distribution of most of the biological variables.

Here relative frequency of the variable can be obtained from the curve.
Systolic Blood Pressure in Adult males

45

40
40

35

30 30
30
No.of Adult Males

101-110
111-120
25 121-130
131-140
20 20 141-150
20 151-160
Linear (101-110)
Power (101-110)
15

10
10

0
Range of Systolic BP

CUMULATIVE FREQUENCY CURVE OR OGIVE:


Ogive is graphic representation of a cumulative frequency distribution.
On the vertical axis cumulative frequencies are represented and the horizontal
axis is marked off for class boundaries. Cumulative frequency of each class is
marked by a point above the upper limit of class. These points are joined to give
an Ogive for ‘less than’or ‘greater than’.

E.g.: The age groups of albino rats shows on important feature, i.e. the curve is
always ascending order. Such as curves are called ogives.
Frequency Distribution of Weight ( in pounds) of individuals .

Cumulative
Class Interval ( Weight in pound) Frequency
Frequency
151-155 8 8
156-160 7 15
161-165 15 30
166-170 9 39
171-175 9 48
176-180 2 50
50
Cumulative Frequency Distribution of Weights in Pounds

60

50 50
48

No. of Cumulative Frequency of Individuals


40
39

30 30

20

15

10
8

0
151 156 161 166 171 176
Weight in Pounds

SCATTER OR DOT DIAGRAM


It is prepared in cases in which frequencies of at least two variables have been
cross classified. Of them one variable is independent and the other is dependent. The
independent variable is the cause of all dependent variables.

Height and weight of 8 group of children is given below

Weight
Height of of the
children group Children
100 21
102 26
104 31
106 38
108 45
110 52
112 57
114 61

Height and Weight of the 8 groups of Children

70

60

50
Weight of the children

40

30

20

10

0
98 100 102 104 106 108 110 112 114 116
Height of Children
Diagrammatic Presentation of Data:

Types of Diagram:

1. Bar Diagram (Vertical and Horizontal)


a. Simple Diagram
b. Multiple Diagram
c. Component
2. Pie Diagram
3. Pictogram
4. Map diagram

1) BAR DIAGRAM:
Bar diagram consists of equally spaced vertical rectangular bars of equal with
placed on a common horizontal base line. The heights of the rectangles are
proportional to the frequencies. The vertical bars substitute the straight lines. The bar
diagram is used with discrete or discontinuous qualitative variables. It provides a
visual comparison of figures. The vertical bars are used for time comparison.

Length of the bars indicates magnitude of the frequency of the character to be


compared. Bars can be arranged in ascending, descending order of magnitude though
it is no mandatory. Bar can be arranged serially or in any order as the data categories
are discrete. Spacing between various bars should be equal or more then half of the
width of the bar. Height of bar itself will tell us the comparison.

There are three types of Bar Diagrams:

2) Simple Bar Diagram


3) Multiple (or) Compound Bar Diagram
4) Component (or) Proportion Bar Diagram

Simple Bar Diagram


The Bars can be represented Vertically or Horizontally. A suitable scale must
be used to present the length of the bars.

e.g.: OPD New and Old attendance cases


OLD & New OPD
Years
Cases
2007 12000
2008 10000
2009 13000

Vertical Simple Diagram


OLD & New OPD Cases

14000
13000
12000
12000

10000
10000
No. of Cases

8000

OLD & New OPD Cases

6000

4000

2000

2007 2008 2009


Years

Horizontal Simple Bar Diagram


OLD & New OPD Cases

2009 13000
Years

2008 10000 OLD & New OPD


Cases

2007 12000

0 2000 4000 6000 8000 10000 12000 14000


No. of Cases

Multiple (or) Compound Bar Diagram:

In case of multiple bar diagram, study of sub classification of data can be


done by using more than one bar and are separately indexed to make the graph clearly
understandable.

For example:
An Old & New OPD cases, Special Clinic Cases and In- Patients are for the
years of 2007, 2008 and 2009. It is represented by two side by side bars, but
differentiated with shades, to represent No. of cases respective categories.
e.g.:

Years OLD & New OPD Cases Special Clinics In-Patient


2007 12000 5000 3000
2008 10000 4000 2500
2009 13000 6000 4500
New, Old, Special Clinics and In-Patient

14000
13000

12000
12000

10000
10000
No. of Cases

8000
2007
2008
6000
6000 2009
5000
4500
4000
4000
3000
2500
2000

0
OLD & New OPD Cases Special Clinics In-Patient
Name of the Categories

Component or Proportional Bar diagrams

The bars may be divided into two or more parts. Each parts representing a
certain items and proportional to the magnitude of the particular items. It is also
advisable to make one bar as 100 percent and each subcategory is given proportion
with in the bar

Proportional Distribution of In-patients and Out-patient during 1998-2000


period

Year Out patient by % In-patient by %


1998 25 75
1999 50 50
2000 60 40

Proportional distribution of OP and IP

100
90
Percentage of distribution

80
70
60 In-patient by %
50
Out patient by %
40
30
20
10
0
1998 1999 2000
Years
Cause of Mortality in Old age in percentage

Non-Specified 35
CHD 22
Diabetes 23
Cancers 10
Accidents 10

100
90
80
70
60 Accidents
50 Cancers
40 Diabetes
30 CHD
20 Non-Specified
10
0
Cause of Mortality in Old age %

Pie or Sector Diagram:

Pie Diagram is another common diagram to depict data, whole circle is


divided into many sector equivalents to frequencies taking total angle of a circle as
360 degree, angle for any sector is calculated by

No. of observation of a Sector


= ------------------------------------ x 360-degree
Total No. of Observation in study

Geographical Distribution of Out-Patient for the year of 2008


Puducherry State 73793
Cuddalore District 53724
Villupuram District 169962
Thiruvannamalai District 49011
OTHER Dist. (TN) 55894
AP 2726
KERALA 523
KARNATAKA 831
Other State 668
Total 407132
Pie- Diagram showing the Geographical Distribution of out Patient for the year of
2008

The angle of Pie-Diagram is calculative by considering the circle as a whole to


the angle (360-degree). The values are to be transferred into degrees follow: by
considering total values as 360°.
e.g.:
73793
Puducherry State : -------- X 360°= 65°
407132

169962
Villupuram District : ------- X 360° =150°
407132

So on ….

Percentage calculation
73793
——— X 100= 18.125%
407132
So on …
If there is percentage values are given

18.05
= ---- X 407132 = 73793
100
(or)

65°
= ---- x 100 = 18.125
360°
Thiruvannamalai Geographical Distribution
District
12%
OTHERDist. (TN)
14% Puducherry State
AP
1% KERALA Cuddalore District
0% Villupuram District

KARNATAKA Thiruvannamalai
District
0% OTHER Dist. (TN)
Other State
AP
0%
KERALA
Puducherry State KARNATAKA
18%
VillupuramDistrict Other State
42%

Cuddalore District
13%

Pictogram:

Pictograms are a popular method of presenting data to the lay man and for
those who cannot understand orthodox charts. These are a for of bar diagrams.

Here each picture indicates some constant of happenings such as deaths.


CARTOGRAM (OR) MAP DIAGRAM:
When numerical facts are shown in the form of maps they are termed as
cartograms. Cartograms are most suitable for geographical data. The different values
on a map can be represented by different colours, varying degrees of shading or
cross-hatching by dots of similar size with different density of number or by dots of
proportional size, etc.
MEASURES OF CENTRAL TENDENCY

The data is condensed to a single value. Such a single value


expression or presentation of data is called Central Value. The values of variable
tend to concentrate around the central value. Therefore, the central value is also
called the central tendency. The measures devised to calculate the central tendency
are known as measures of central tendency.

Measure of central tendency refers to all those methods of statistical analysis


which are used to estimate or calculate the average of a set of data. The three
common measure of central tendency are mean, median and mode.

a) Characteristics of Central Tendency or Average

 The average should be properly defined.


 The value of average should not be biased.
 It should have an easy access for computation.
 The average should be readily understandable.
 The average should be based on each and every item of the series.
 The average should not be affected by the extreme vales.
 The average should be least affected by the fluctuations of sampling.
 It should be easily subjected to further mathematical calculations.

MEAN

Mean is the measure of centre of a set of values or of central tendency.


Average represented mathematically is termed as mathematical average or mean. It is
calculated by taking into account the values of all items of the series. Therefore,
mathematical average is the Mean.

There are three deferent types of Mean:

1) Arithmetic Mean
2) Geometric Mean
3) Harmonic Mean

Arithmetic Mean (AM)


The common average of many individual values of observations obtained
arithmetically is referred to as the Arithmetic Mean. It is number obtained by
dividing the sum of all the items in a series by the total number of items of that series.
(al)
(Ungrouped Data):
A.M. Direct method
Formula:

__ X1+X2+X3+….Xn ∑x
A.M.(or) X.= ———————— = —
n n
X1,X2….. etc. are different values of variable X.
n->is the number of observation of X.

Symbol ∑ is Greek letter sigma. It denotes sum i.e. ∑x is the sum of all values of X. It
also known as Direct Method of calculation of Arithmetic Mean. It is useful only
when number of items in the series is few and the size of values is small.

E.g.: Find the arithmetic mean of the marks obtained by 10 students of a class in
mathematics in certain examination. The marks obtained are:

25, 30, 21, 55, 47, 10, 15, 17, 45, 35.

A.M. Short-cut Method:


Short-cut method for calculating arithmetic mean is used when the number of
items is the series is very large.

Formula:

_ ∑d
X(x-bar) =A+ ----
n
_
Where, X(x-bar) is the actual arithmetic mean
A is the assumed arithmetic mean _
d is the deviation of items from the assumed mean, i.e. d = (X-A)
∑d is the sum of deviations from the assumed mean
n is the total number of observations.

When a data set has a large number of values, it is summarized as a frequency


table. The frequencies represent the number of times each value occurs.

E.g.1: The table given below shows the number of colonies of bacteria grown on ten
agar plates. Calculate the arithmetic mean by using short cut method.

Plate No. 1 2 3 4 5 6 7 8 9 10
No.of
60 70 80 95 100 110 115 130 140 160
Colonies
al
Grouped Data:

A. M. Discrete Series Method:

In the case of discrete series arithmetic mean of grouped data is calculated


by the following formula:

_ ∑fx 1
X(x-bar) = ----- or --- ∑fx
∑f N

Where,
f = Frequency
X = Value of each item
fx = By multiplying each(X)value with corresponding frequency value.
∑fx = Adding all the multiplication products.
N = ∑f

E.g 2: Find the mean from the following data:

Marks(x) 5 10 15 20 25 30 35 40
No. of Students (f) 5 7 9 10 8 6 3 2
al
A. M. Continuous Series Method:

When this is a continuous distribution from the observations{X0-X1, X1-X2,


X2-X3, …. With their corresponding frequencies as f1, f2, f3, … the arithmetic mean
is calculated by using the following formula:

_ ∑fm ∑fx
X = ----- = ——
∑f N

Where,
m(or)x = mid value of various classes (or) x-value
f = the frequency of each class
∑f.m = the sum of mid value multiplied by their frequencies.
∑f (or) N = the total frequency.

E.g.3: Values of fecundity (rate of reproduction) of 50 fishes of a species of fish are


given in a frequency table. Calculate the mean value of fecundity from grouped data
of continuous series.
Frequency
Class Interval
(f)
1-10 3
11-20 11
21-30 7
31-40 4
41-50 15
51-60 0
61-70 7
71-80 3
Assumed mean method:
_ ∑fd
X = A+ ----xC
∑f

_
Where, X(x-bar) is the actual arithmetic mean
A is the assumed arithmetic mean _
d is the deviation of items from the assumed mean, i.e. d = (X-A)
C
∑d is the sum of deviations from the assumed mean
n is the total number of observations.

Merits & Demerits of Arithmetic Mean:

Merits:
1) It is rigidly defined or certain.
2) It is easy to calculate and simple to understand.
3) It is a relatively stable measure.
4) It is based on all the observation of the series.
5) It is capable of further algebraic treatment.
6) It is the best measure for comparing two or more series of data.
7) It is balances the value on either side.

Demerits:
a) Affect of extreme values.
b) Problem in case of Incomplete Data.
c) Mean value may not figure in the series.
d) Misleading Conclusions
e) Absurd Results (Unacceptable results) fraction value of persons.
f) It is cannot be used for small number of classes.
g) It cannot be used the qualitative data.

Geometric Mean(G.M)al

When the data contains a few extremely large (or) small values and when the
values in the data are some what is geometric progress, in such situation the geometric
mean (GM) is a suitable average. It is usually more suitable as a measure of central
tendency when the values change exponentially.

Definition:
GM of ‘n’ observation is defined as the ‘n’ root of product of ‘n’ observation.

Ungrouped data:

For ungrouped average mean X1, X2, ……Xn the Geometric Mean (GM) is
given by
_______________
GM= ⁿ√X1,X2,……Xn
(or)

GM = Antilog [∑ (logx)]
n

In case of frequency distribution GM is given by

GM = Antilog [∑(f . logx)]


N
Where, N= ∑(f)

Remarks: GM cannot be used when one of the observation is either zero (or) negative.

e.g. 1) Salaries of five employees are Rs. 2300, 2400, 2200, 2350, and 4600.
Find the Geometric Mean.

Solution: Here, n = 5

Salary (in Rs.) Log 10 (x)


2300 3.3617
2400 3.3802
2200 3.3424
2350 3.3711
4600 3.6628
Total 17.1182
al

GM = Antilog [Sum (logx)]


n

= Antilog (17.1182/5)
= Antilog (3.42364)
= 2649 + 4
= 2653. Ans.

Merits:
1) It is based on all observation.
2) It has bias for smaller observation.
3) It is not affected much by fluctuation of sampling.
4) It is useful in averaging ratios, percentage rate of increase and decrease
between two persons.

Demerits:

a) It is a mathematical character. So it is not easy to understand.


b) It becomes imaginary if any observation is zero.

Harmonic Mean (H.M.)al

It is reciprocal of arithmetic mean of reciprocal observation. It is used in


situation where the reciprocals of the actual values seem more useful to determine the
central tendency.

For an Ungrouped data

X1, X2, …..Xn, the Harmonic Mean is given by the formula

Ungrouped Data:
n
H.M = ----------------
∑ . 1/x
Grouped Data :

For Frequency distribution or it is defined as

N
H.M ------------------
∑ f /x

Where, f = total frequency


x = Value of Variable

Example: The following table gives the weight of 31 persons in a sample enquiry.
Calculate the mean weight using G.M and H.H.

Weihgt(X) 130 135 140 145 146 148 149 150 157
No. of 3 4 6 6 3 5 2 1 1
Pers. (f)

Steps :
1) Find the Log value of X
2) Find the Sum of LogX
3) Find Sum of frequency values
4) Multiply the frequency values with LogX
5) Find the 1/X values
6) Find the f/X values
7) Then apply the formula both G.M. and H.M.
Merits:
1) It is based on all observation.
2) It is not affected much by fluctuations of sampling.
3) As reciprocal values are involved, it gives weight age to smaller
observation.

Demerits:
b) It is difficult to understand and calculate for biologists.
c) Its value cannot be obtained if any one of the observation is zero.

Relation between Arithmetic Mean (A.M.), Geometric Mean (G.M.) and


Harmonic Mean (H.M.)

1) When Observation are Equal.

If in a set all the observation are equal, then


_
A.M.(X)=G.M.=H.M.

2) When Observations are Unequal.

When in a set of observation, the size of observation varies, the


arithmetic mean (A.M.) is greater than G.M. and G.M. is greater than H.M.

A.M.>G.M.>H.M.

Weighted Arithmetic Mean:

The weighted arithmetic mean is the sum of the products of the values with
their respective weights divided by the sum of the weight. ‘Weight’ here stands for
the importance of different events items. In certain circumstance all the observation
do not have equal weight.
_ X1 +X2+X3+ ….Xn
Xw = ------------------------------ x 10
n
II) MEDIAN (Me):
The value of the middle observation or the mean value of two middle
observations is called median. If the values of a variable are arranged in ascending or
descending order of magnitude, the median is that value which divides the whole data
into two equal parts, one part having all values smaller than median value and other
part having all the values greater than the median value.

1) Median (Ungrouped Data):


The values of a variable or data are arranged in the order of magnitude either
in ascending or descending order. The middle-most value in this arrangement
represents the Median.

E.g. 1: To calculate the median of following seven observations.

X 100 97 110 200 75 120 150

Calculation of Median:

a. Calculation of Median when Number of Observation (n) is


Odd:

Number of observations + 1 n +1 th
Median = --------------------------------- = [-------] observations
2 2

b. Calculation of Median when Number of Observations (n) is


Even:

When number of observation is even, there is no unique


median. The median is such cases is located half way between the two middle items.
Therefore, the median is taken as the arithmetic mean of n-th and (n-1)th
observations.

E.g. 2:
X 75 97 100 120 150 175

n 6
Step 1 : AM of 6 observations is = --- = --- = 3rd Observation
2 2
n 6
Step 2 : AM of 6 observations + 1 = --- + 1 = --- +1 = 4th Observation
2 2

100 + 120 220


Step 3: 3rd and 4th is = ------------ = ------- = 110 is the Median.
2 2

c) Calculation of Median for Simple Frequency Distribution:

For calculation of median for frequency distribution of


ungrouped data, cumulative frequency of the corresponding each value of variable is
calculated. The value of the variable corresponding to the cumulative frequency of
N+1 / 2 is called median. Where N is total frequency.
E.g.3: Calculate the median value of the following variables on the basis of
following simple frequency distribution.

Variable (X) 1 2 3 4 5 6 7
Frequency (f) 1 4 12 9 2 1 1

2) Median (Grouped Data):

The median of grouped data is calculated by the following formula:

∑f

[ — -F ]
2
Median= L1 + ——— x i
fm
Where,
L1 = The lower limit of that class interval where median falls.
N or ∑f = Total number of frequency
f = Frequency of middle class
F = The cumulative frequency just above that class interval where
median falls.
Fm = The frequency of that class interval where median falls
i = The width of the class interval.

E.g.4: Let us find out the Median value of grouped data:

Age group 20-30 30-40 40-50 50-60 60-70


No. of persons 3 18 26 17 6
Merits :

a. It is rigidly defined
b. Median is easy to understand and easy to calculate
c. Median is not affected by extreme observation
d. Median can be computed with a distribution of open end class.
e. Median is best for qualitative data.

Demerits:
1) Median cannot be determined in the case of even number of
observations.
2) Median is relatively less stable than mean.
3) Median is a positional average.
4) It is not included all observations
5) It cannot be subjected to algebraic treatment.

MODE ( Mo):
Mode is the most frequently occurring value in a data. It means that for a
given data, mode may or may not be exist.

E.g.1:
a) 10,10, 9, 8, 5,4,12,10 : one mode data 10
b) 10,10, 9,9, 12, 15, 5 : Two mode data 10and 9
c) 4, 6, 7, 15, 12, 13, 10 : No mode

Definition of Mode and Modal Class:


‘Mode is that value in a sample or data which has the greatest or largest
frequency density in the frequency table.’ The class having greatest frequency is
called Modal Class.

By A.U. Tuttle.

1) Calculation of MODE (Ungrouped data):

A Calculation of mode by Inspection: In this method the data is arranged in


increasing order. It is then observed that how many times each values in the data is
repeated. The item value which occurs most frequently represents the mode of the
data.

E.g. 2: Calculate the mode for the following data:

Variable X 32 22 29 25 17 25 40

2) Calculation of MODE (Grouped Data):

Determination of mode in the case of continuous frequency distribution is


complicated and involves following two steps;

i)The modal class is ascertained either by inspection method or by grouping


methods.
ii) After determining the modal class, the exact value of mode is calculated by
using the following formula:

fm-f1
Mode(Z) = L1+( ---------------) x c
2fm-(f1-f2)

Where,
L1 = Lower limit of modal class
fm = Frequency of modal class or maximum frequency
f1 = frequency of class just preceding the modal class
f2 = Frequency of class just succeeding the modal class
C = Class interval & width
E.g. 4: In a class following is the distribution of marks of 85 students. Calculate the
modal class and mode of the following data:

Marks(Grouped data 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
No. of Students (f) 5 7 8 18 25 12 7 5

Mode is also calculated from the relationship:

Mode = Mean -3(Mean – Median)

Merits :

1) Mode is easy to calculate and understand.


2) It is not affected by extreme observations.
3) Mode can be calculated from a grouped frequency with open end classes.

Demerits:
1) Mode is not rigidly defined.
2) As compared to mean, mode is affected to a great extent by the fluctuation of
sampling.
3) It is not suitable for algebraic treatment.
****al****

Measures of Dispersion (or) Variability

Introduction:
The measures of Central tendency like Mean, Median, Mode alone the average, our
description will be incomplete.

Binomial Characteristics:
 treatment needed, not needed
 suffering , not suffering
 enlarged , not enlarged
 gained , not gained
 healthy, sick
 positive , negative
 increased , decreased
 recovered , nor recovered

We take the help of normal values. It is a concept which depends upon the
distribution of attribute are variables in the population. Thus measures of variability
are very essential for our understanding of the concept of normal values with the help
of measures of variability we can give a complete picture of (or) a set of health data.
“Mean defines the distribution in concise manner, but do not fix the distribution.
Even with common mean, two sets of data vary very much in their values of
observations. This measures of disperse values are called measures of dispersion or
Variability”.

“Centering constants are representation values of the series. They do not express
the range of normalness. Centering constants together with measures of
variations. It helps to understand of the data better centering Constance alone”.

Following measures of Dispersion are commonly used:

1) Range
2) Mean Deviation
3) Quartile Deviation
4) Standard Deviation
5) Coefficient of Variation

1) RANGE:
The range of a group observation is the interval between the smallest and the biggest
observations. The value of the range is dependent only upon the two extreme
observations in the group and does not consider the other observations.

There are four chance of occurrence in range values:


1) There may not be any variation at all.
2) There may be considerable variation from lowest to highest values (Table –B).
3) The variation may be to much between extremes (Table-C)
4) None of the values may present the mean (Table –D)

Merits:
1) Easy to calculate
2) Does not require mathematical calculation

Demerits:
1) Only extreme are considered
2) It is gross in expression
3) It is not ideal

2) QUARTILE DEVIATION
Quartile deviation is another method of distance measure of dispersion. In this
method , the series is divided into 4 equal parts or quarters. Quartile deviation are
represented as Q1, Q2, and Q3. The difference between third quartile (Q3) and
first quartile (Q1) represents the quartile deviation and quartile distance.

Quartile Deviations Formulas:


N+1
1) Q1=size of {-------}th item,
4
2) Q2= size of 2x(N+1)/4 th item (or) N+1/2

3(N+1)
3) Q3= size of { ---------- }th item
4
Q3 - Q1
• QD = 2

Grouped data (Continuous series)


(N/4 – c.f.)
Q1 = L1+ ———— x i
fq

(3N/4 – c.f.)
Q1 = L1+ ———— x i
fq
Q3-Q1
Coefficient of QD = ——
Q3+Q1

L1=lower limit of the C.I in which Q1 or Q3


lies respectively.
N=Total Frequency.
c.f. = Cumulative frequency just above the Class Interval.
Q1= N/4 and Q3 = 3N/4.
fq = Frequency of that C.I where Q1 or Q3 falls.
i=Class width or Class interval.

Merits:
Is better than Range
It is the only measure of Dispersion
It is simple to calculate and understand.
Demerits :
It is not based on all observation.
It is affect the sampling.
It is not use when exhibit great variations.

2) MEAN DEVIATION:

The mean deviation is the arithmetic average of the deviations of the


observations from the arithmetic mean ignoring the sign of the deviation.

The mean deviation is based on all observation in the group. It is a further


description of the frequency distribution to measure the degree of variability around
the mean. ∑
> The formula for the mean deviation for ungrouped data is
_
∑│x-x│
M.D. = _______
n
Where, │ x-x │ indicates the difference between the value of the observation
and the arithmetic mean ignoring the sign of the difference, n- number of observation.

> The formula for the Mean Deviation for Grouped data is :
_
∑f│(x-x)│
MD= _________
∑f

Where, ‘x’ is the midpoint of the class interval


‘f’ is the frequency.
_
X=Mean (or) average

Class No. of Mid-Point of


interval Frequency Class f*x │ (x-x) │ f *│ (x-x)│
(f) interval (x)
10-20 11 15 165 -31.4 345.4
20-30 27 25 675 -19.4 523.8
30-40 36 35 1260 -11.4 410.4
40-50 38 45 1710 -1.4 53.2
50-60 43 55 2365 8.6 369.8
60-70 28 65 1820 18.6 520.8
70-80 16 75 1200 28.6 457.6
80-90 1 85 85 38.6 38.6
∑f= 200 ∑fx = 9280 ∑(x-x)=158 ∑f(x-x)=2719.6

9280
Mean ( X) = -------- = 46.4
200
345.4 + 523.8 + - - - - - - 38.6
Mean Deviation (MD) = ------------------------------------- = 13.56
200
Co-efficient of Mean Deviation :

It is obtained by taking the division of MD and average,


Mean Deviation
Co-efficient of M.D. = ---------------------
Mean

13.56
= --------- = 0.29
46.4

3) STANDARD DEVIATION (SD):


The Standard deviation is the square root of the average of the square
deviation of the observations from the arithmetic mean. In calculating the SD the
algebraic signs may be eliminated to taking the square of the deviation of arithmetic
mean instead of taking absolute values of the deviations.

The standard deviation (SD) is the defined as the Positive square root of the
arithmetic mean from the square of the deviations taken from the arithmetic mean. It
is denoted by symbol ‘σ ’ a Greek alphabet, read as ‘Sigma’.

The formula for ungrouped data is

Thus,
_________ _________
SD(σ) = √ ∑(x-x)² √ ∑d²
--------- ---------
n n

__________ __________
= √ ∑x²─ (∑x)² √ ∑x²─ (∑x)²
n n
n n-1

for the grouped data (Frequency distribution)

__________
SD(σ (or) S ) = √ ∑f (x-x) ² (or)
∑f
___________________
= √ ∑fx²/n - [∑fx/n] ²
These formula for SD of the population is usually denoted by “σ ” , and that
of the sample by ‘S’.

To get an unbiased estimate of population standard deviation from small


samples, divide by ‘n-1’ instead of by n. This is denoted as ‘s’.

Thus formula is
________ _______________ ________
s = √ ∑(x-x)² (or) √ ∑(x) ²- (∑x)²/n √ ∑x²─ (∑x)²
n n-1 (or) n
n-1

The square of Standard Deviation is called Variance, which can also be used as a
measure of Dispersion.

Step Deviation Method

_____________
√ ∑(fd ²) -N(fd/n)²
= ---------------------- x h
∑f (or) N (or) N-1
Steps :
1) Decide assumed mean ‘a’
2) Obtain the deviation values: d=x-a
3) Find the value of d = x-a ,
h=class width h
4) Find d²
5) Find fd
6) Find fd²
7) Apply the formula

• Coefficient of Standard Deviation

Standard Deviation S.D.


= -------------------------- = -------
Arithmetic mean X
Merits of SD:
• It is rigidly defined
• It summarizes in one figure the deviation of a large distribution from mean.
• It is based on all observations
• It does not ignore the algebraic sings of deviations
• It is capable of further mathematical treatment.
• It is not much affected by sampling fluctuation.
Demerits of SD:
• It is difficult to understand and calculate.
• It cannot be calculate for qualitative data.
• It is unduly affected due to extreme deviation.

e. g. Haemoglobin values:
11.8 , 11.6, 11.4 ,12.6,10.4, 13.3, 11.6,12.9, 10.8, 13.2,12.2, 14.2, 12.9, 13.5, 12.3,
13.0, 10.8, 13.8, 12.0, 12.2, 10.5, 11.2, 12.4, 11.7, 12.7,12.2.

∑x = 317.2

∑(x-x)² = 25.40

No. of Obs.= 26

5) VARIANCE (V):

Definition:
• The variance is the arithmetic mean of the squares of sum of the deviation
from the mean value of the data. It is also described as the square of
Standard Deviation (SD).

∑(x – xbar) ²
s ² =V= ---------- (Ungrouped Data)
n

∑f(x – xbar) ²
s ² =V= ---------- (Grouped Data)
n

CO-EFFICIENT OF VARIATION (C.V.):

The co-efficient of variation is standard deviation expressed as a percentage of


the arithmetic mean.

When variability of groups of observations is dependent on the average size of


the groups and / or when the observations are in different units of measurement, this
measures of dispersion is adopted to facilitate comparison of the relative variability is
different groups (or) of different measurements.

SD x 100
Co-efficient of Variation (CV)= -----------
Mean

***al***
SAMPLING

Definition: “The totality (or) aggregate of all individuals with the specified
characteristic is a population (or) universe and a group of individuals that is
chosen from that population is sample” this process called sampling.
Sampling, which is the selection of part of an aggregate to represent the whole,
is frequently used in everyday life in all kinds of investigations, surveys, etc. The
important purpose of sampling is to obtain information about population.

Sample: A finite subset of statistical individual in a population is called


sample.

Sample size: Total number of individuals in a sample is called the sample


size.

It is a technique of selection of a small proportion from the entire lot.


Selection is based on the probability of occurrence of events.

Needs of Sampling:-

a) Sample must be well chosen.


b) Sample must be sufficiently large to minimize the sampling error.
c) There must be adequate coverage of the sample.

A sampling method is a scientific and objective procedure of selecting units


from a population and provides a sample that is expected to be representation of the
population as whole.

It may be necessary at times to use skilled manpower, Specific equipment for


collecting information.

Uses of Sampling:
1) It is used for descriptive survey information as regards to characteristics of
the entire population is obtained.
2) It is used for analytical surveys to get information from various sub groups
3) It is used in industries to improve operational efficiency.
4) It is used in population census.
5) It is also used in experimental investigations
6) It is provides estimation of population parameters from sample statistics.

Parameters : Values obtained from a defined population such a mean(µ),


Variance (²)
Statistics : The constant values as they are obtained from each and every
member or unit of the population. Mean (-bar), variance
(SD²)

METHODS OF SAMPLING:

Broadly says the methods of sampling can be divided into two methods A)
Non random Sampling and B) Random or Probability Sampling

A) Non Random Sampling:


In Nonrandom sampling, the samples are drawn without following any
specific procedure or any yardstick. The sample collected does not show any specific
approach nor the samples can be used to assess properly the accuracy of estimator.
The sampling procedure many investigator biases are likely to occur. Such, the
samples are termed as ‘chunks’, ‘accidental’, incidental’ or samples of convenience’.
The nonrandom sampling having three types. 1) Convenience Sampling, 2) Purposive
(or) Biased Sampling and
3) Quota Sampling.

B) Random (or) Probability Sampling:


There are four main types of probability sampling.

1) Simple Random Sampling.


2) Stratified Random Sampling.
3) Systematic Sampling.
4) Cluster Sampling.

1) Simple Random Sampling:

In this method each member (sampling unit) of the population has an


equal chance of being selected in the sample. The selection is made without
deliberate discrimination this is called Simple Random Sampling. The procedure
is easy to apply when the population is small, homogeneous and readily available.
The randomness of the sample is ensured by any one of the several procedures such as
1) Use of lottery
Identification details of all the listed member of the population are
written down on small slips of paper of uniform size, mixed well in a container or
drum and then the required number of slips are picked up blindfold.

2) Use of a table of random numbers:


These numbers had been prepared by using certain randomizing
machines and then arranged in rows and columns. Theses numbers can be used to
select a single digit, double digit, three or four digit numbers. The randomness of
sample is ensured by the proper use of this table.

2) Stratified Random Sampling:

Size of the sample depends upon the size of the population (heterogeneous
population), which is called Stratified sampling with proportional allocation. The
population is divided into a number of sections or subgroup or homogeneous group or
classes called strata. Depending upon characteristics, they are divided into subgroups
and random sample is drawn independently from each sub group.
e.g., To estimate average weight of persons from a heterogeneous population
of three female and three male stratified random sampling is used. Here males form
on stratum and females form another stratum in the population and sample is drawn
from each stratum so that, the variability is each stratum is adequate represented in the
sample.

3) Systematic Sampling:
This is a simple procedure and utilized when a complete list of population
from which a sample is to be drawn is available. It is more often applied to field
studies when the population is large, scattered and heterogeneous. This samples
getting from old records, household survey, patient clinic, where total size of
population is known and particulars of units are not known. Selecting a number at
random and 100 is added to that i.e. every 100th observations is to be selected.

For e.g. 152, 252, 352, and so on.

4) Cluster Sampling:
The units of population are natural groups of elements. These groups are
called clusters. Each cluster includes only one types of elements. A simple random
sample is taken from each cluster. Cluster sampling provides best results only when
the elements within the cluster are heterogeneous.

A cluster can be regarded as a small scale version of the entire population. In


case each cluster is true representative of the population, sampling of a small number
of clusters will provide good estimates of population parameters. The cluster may
consist of units such as villages, wards, blocks, factories, slums of a town, children of
a school, hospital wards etc.

E.g.: Cluster sampling in a vaccination results

***al***

PROBABILITY THEORY

The probability or chance that an event will occur can be defined as the
number of times in which that event occurs in a very large number of trials. It may be
described as the ‘law of chance’. Probability is a numerical assessment of the
likelihood of an outcome or the number of occurrences of a random variable.
Probability is a population parameter.

Definition:
It is defined as “the ratio of number of times a particular event occurs to
the total number of trials during which the events have happened”.

Basic Laws of Probability:


There are six basic laws of Probability are most specific:

1. If probability of occurrence of an event is 1, the event will occur


certainly.
2. If probability of occurrence of an event is 0, the event will never occur.
3. The probability of any event must assume a value between 0 and 1.
4. The sum of probability of all the simple events in a sample space must
be equal to 1. It can also be said that the probability of the sample
space in any experiment is always one.

Important Terms and Concepts.

Terms: Sample Space and Sample Points:

A set of all possible outcomes from a random experiment is called sample


space.
The number of components is denoted by n (s). The components of sample space are
known as Sample Points.

Concepts:
a) Trial and event.
b) Exhaustive event.
c) Favourable events.
d) Mutually exclusive events.
e) Equaly likely events.
f) Independent events.
g) Random experiment & sample space.
h) Mathematical or classical probability.

a) Trial & Events:


An experiment is called Trial and all possible outcomes of a trial are called
Events. (e.g.: Tossing of a coin, and throwing of a dice.).

b) Exhaustive Event:
The total number of possible outcomes in any trial is known as exhaustive
event.(e.g: Throwing two dice together the exhaustive events are 36).

c) Favourable Events:
The number of cases favourable to an event in a trial is the number of
outcomes which entail the happening of the event. (e.g.: Tossing a coin, the number of
favourable for getting a head is 1 and for getting a tail is 1.)

d) Mutually exclusive events:


Events are said to be mutually exclusive or incompatible if the happening of
any one of them precludes the happening of all others. (e.g.: In drawing a single card
from a pack one can see that from ace to lowest card of every type i.e. spade,
diamond, heart or club each card is unique in nature. Once you get a diamond queen
you can not get anything else.

e) Equally likely events:


Outcomes of a trial are said to be equally likely if taking into consideration all
the relevant evidences, there is no reason to expect one in preference to other. (E.g.
when we toss a coin, getting head or trial is equally likely events.)

f) Independent events:
Events are said to be independent if the happening (or non- happening) of an
event is not affected by the supplementary knowledge concerning the occurrences of
any number of the remaining events. (e.g. In a five test cricket series between two
countries, the win or loss in 5 th test is independent of the results of earlier four test
matches.)

g) Random experiment & Sample Space:


Each performance in a random experiment is called a trial that is all the trials
conducted under the same set of conditions form a random experiment. The result of
a trial is called an outcome, and elementary event or a sample point. The totality of
all possible outcomes is called sample space..

***al***

HEALTH CARE STATISTICS / HOSPITAL STATISTICS

Introduction:
Health  It means a state of complete mental, physical and social well beings.

Hospital  Institution for giving Treatment to sick and injured.

Term “Hospital”:
During A.D 390 St. Jerome is being the first to mention the word
‘Hospital”which is derived from Latin “Hospitalis” formed from “Hospes”, its
meaning host or guest.

Hospital Objective: It is to help people attain and maintain health.

Health Statistics: Health statistics deals with application of statistical methods to


varied information of public health importance.
Definition: “Hospital / Healthcare Statistics is the branch of statistics
applied in collection, organization, presentation, analysis and
interpretation of hospital data is called “Health care / Hospital statistics”.
It includes all statistical information required for the administration of a health
agency, and so would comprise not only vital statistics but also a good deal of other
numerical information.

The WHO expert committee on Health Statistics in its 13th report categorically
identified the primary need for Hospital Statistics,
1) To establish administrative control over functional activities,
2) To provide reports to the governing board, outside agencies etc,
3) To provide basis for preparing operating budgets,
4) To provide a basis for the distribution of expenses when computing cost of
operation.
5) To provide a basis for the calculation of average income and cost per unit
of service rendered.
6) Realistic planning for the future is important the basic statistics.
7) To assess utilization of hospital facilities.
8) To provide data for health intelligence to public health authorities.

Hospital management requires statistical data which will provide quantitative


information concerning the scope of activities within the hospital. It helps to
administration to ascertain the increase or decrease in the volumes of work and
the effective utilization of men and material.

VARIOUS SOURCES OF HEALTH INFORMATION


1) Population Census,
2) Vital Registration
3) Hospital Registration
4) Disease Notification
5) Sample Registration
6) Health Surveys.

Merits:
By sources of health information
ii) They should be helpful in further analysis.
iii) We can prepare over all statistics in a population.
iv) We can find the No. of sufferings in a particular disease.

Demerit:
i) There will not be accurate & correct.

Uses of Health Information:


1) Medical & Health institutions provide service of the community, such as
hospitals, clinics & dispensaries, maternal & child health care centre all provide
continuous treatment to all communities.
2) Survey or investigation conducted in response of the number for more
detailed information on a narrow geographical basis such as nutrition surveys
epidemiological investigation in various diseases. Field trails to test the Therapeutic
or Prophylactic values of drugs.

3) Special Sickness, Surveys coverings the whole population or sample of it.

4) Statistics on illness collected on connection either life or sickness insurance.

5) National or local registers on cancer, TB, and Wazerman files.

6) Statistics of causes of dictator in relation of knowledge of duration of illness


if recorded.

SOURCES OF HEALTH /HOSPITAL STATISTICS

The various sources of information can be classified into four parts.

a) Systems organized on a national scale to obtain information


continuously
from each households or institution:
> Registration and Notification of disease,
> Nation-wide surveys on a sampling basis,
> Surveys have given estimates of the prevalence of certain disease of
long duration (e.g. tuberculosis, cancer and leprosy).

b) Medical and Health institutions providing service to the community.


> Hospital , clinics, and dispensaries, maternal and child health centers,
local health units, diagnostic laboratories, mass immunization or
treatment programmes, social security schemes, Medical care
Programmes.

c) Surveys or Investigations conducted in response to the need for more


detailed information on a narrower geographic basis:
> Nutrition Surveys,
> Epidemiological Investigations in various diseases,
> Field trails to test the therapeutic or prophylactic value of drugs,
vaccines, or procedures.

d) Miscellaneous:

> Physician case records,


> Police reports on accidents,
> Meteorological data
> Information on social, economic or occupational factors affecting
health.

The Importance of Sources of the Health/Hospital Statistics:

1. Notifiable disease records;


2. Hospital records of in-patients together with records of attendance of out-patient in
hospital, clinics and dispensaries;
3. Cases seen by the staff of infant welfare centers, clinics, school medical services,
by factory medical staff and other medical staff responsible for special groups of
population;
4. Medical practitioners’ records based on domiciliary visits and patients attending
their
surgery;
5. Records collected during domiciliary visits paid by other health and welfare staff;
6. Special sickness surveys covering the whole population or samples of it;
7. Statistics on illness collected in connection with life or sickness insurance;
8. Medical care statistics, i.e., statistics collected under community medical care
Programs;
9. National or local registers on cancer, tuberculosis and Wassermann files;
10. Statistics of causes of death in relation to knowledge of duration of illness if
recorded.
11. Records from industrial sickness-benefit association;
12. Records from recruitment to the armed forces or jail records;
13. Mass diagnostic and screening surveys, e.g. tuberculosis, venereal diseases;
14. Absenteeism and sickness records in educational institution , civil service
examinations, and industrial concerns.

Sources of Hospital Information’s:

1) Out –Patient
2) In-Patient
3) EMS (Emergency Medical Service)
4) Admissions
5) Discharges
6) Patient day
7) Patient Identification Data
8) In-patient Census
9) Health Facility Death
10) Length of Stay
11) Hospital bed
12) Bed Strength
13) Discharge Analysis
14) Daily , Monthly and Annual reports
15) Transfer to out side and Inside patients
16) Turnover Intervals
17) Live Birth
18) Foetal Death (or) Still Birth
19) Total investigations
20) Operations
21) Special Investigation
22) Medico-Legal cases
23) Autopsies

Uses of Hospital information

1) To calculation of Mortality Rates


2) To calculation of Infection Rates
3) To Calculation of Morbidity Rates
4) To Calculation of Utilization of Health Facility
5) To Calculation of average length of Stay
6) To Calculation of Birth rates
7) To Calculation of Other kind of Hospital Statistics

IMPORTANCE TERMS

1) Out-Patient: The patient who visits to a hospital is confined only a few hours and
who is not accommodating over night is considered as out-patient.

2) In-Patient: A patient occupying a bed in a hospital for the purpose of receiving


Medical and Paramedical treatment, i.e. an admitted patient, a unit of measure
denoting the services received by one in-patient in one 24 hours period. The 24 hours
period is the time between the censuses taking hours on two successive days.

3) Hospital Bed: A hospital bed is one regularly maintains in a hospital for the use of
patient.

4) Admission: A formal acceptance by a hospital of a patient who is to be provided


with room, board and continuous nursing service in an area of the hospital, where
patient generally stay at least over night.

5) Discharge: The terminations of a period of In-patient hospitalization through the


formal release of the in-patient by the hospital. In-patient discharge includes the end
of a hospitalization by order of the physician, AMA or death. Unless otherwise
specified discharge.
6) Medical Record: Medical Record is information about a patient regarding treating
aspects after the admission up to the discharge of the patient. A good medical record
and health facility statistical system can constitute effectively to wards improved
medical care. Medical Record means a cumulative nature of the history of a patient
the treatment given, the final diagnosis & continuing care following separation.

7) Length of Stay: The number of days an in-patient has stayed on the hospital. It is
completed by subtracted by the admission date from the discharge date. (The
admission day is calculated but discharge day is not calculated.

8) Total Length of Stay: It is sum of days of any group of in-patient discharged during
a specific period of time and it is necessary in computing average length of stay
9) Average Length of stay: It is the average of hospitalization of in-patient discharged
during the time under consideration. The average length of stay for new born is
reported separately.
Total Length of stay (Discharge days)
ALS = --------------------------------------------
Total Discharges

10) Daily Discharge Analysis: The tabulation of data daily concerning patients
discharged from the hospital is called discharged service analysis or analysis of
hospital service.

11) Monthly Analysis: It provides a clear monthly report of professional care rendered
to patient in the hospital. This report provides comparative figures. Which are of the
value to the medical staff to evaluate its own performance and to the governing board
and the hospital administrator as a picture of professional performance of the hospital
and medical staff.

12) Annual Report: It is a compilation of 12 monthly reports is easily prepared with


figures are cumulated each month just as the monthly report is cumulated daily.

13) In-patient day (or) In-patient service day (or) Census day (or) Bed-Occupancy
day:
a unit of measure denoting the services received by one In-patient in on 24hours
period.

14) Total In-patient service day: The sum of all In-patient service days for each of the
day in the period under consideration.

15) Average daily In-patient Census: (The average daily census, Average census,
Average Daily no. of in-patient) Average no. of in-patient present each days for a
given period of time.

16) In-patient Census and In-patient service day: The patients remaining in the
hospital at the census taking time for a specific day, plus the admission for the
following day, minus discharges (including deaths) during that day equal the patient
remaining at the census taking time is called in-patient census for the day.

17) In-patient service day: Measures the serviced one In-patient in one 24 hours
period. The 24 hours period is the time between the censuses taking hours on 2
successive days.

18) In-patient Bed Occupancy Ratio, Percentage of Occupancy, and Occupancy


Ratio); It is proportion of in-patient beds occupied. It is defined as the ratio of In-
patient service day to in-patient bed count days in a period under consideration. It is
generally expressed as a percent.

19) Bed count (Bed complements): The no. of beds available, both occupied and
vacant.
The bed complement of a hospital is the total number of hospital beds normally
available for use by inpatient.

20) New Born Bassinets count: The no. of bassinets available in the hospital for NB
both occupied and vacant in a given day.

21) Bed count day: A unit of measure denoting the presence of one In-Patient bed
either occupied or vacant, set up, and staffed for use in one 24 hours period.

22) Bed turnover Interval: The bed remains empty between two occupants or
admissions.

23) Live Birth : It is complete expulsion or extraction from its mother of a product of
conception, irrespective of the duration of the pregnancy, which after such separation,
breathes or shows any other evidence of life, such as beating of the heart, pulsation of
the umbilical cord, (or) definite movement of voluntary muscles, whether or not the
umbilical cord has been cut or the placenta attached, each product of such a birth is
considered as Live Birth.

24) Still Birth: A death prior to complete expulsion or extraction from its mother of a
product of conception, irrespective of the duration of the pregnancy, the death is
indicate by the fact after such separation the fetus does not breathe or show any other
evidence of life, such as beating of heart, pulsation of umbilical cord or definite
movement of voluntary muscles.

25) Cause of Death : The cause of death to be entered on the Medical Certificate of
Cause of Death are all-those disease , morbid conditions or injuries which either
resulted in or contributed to death and the circumstance of the accident or violence
which produced any such injuries.

26) Hospital New Born (alive at Birth): This category includes only infants born in
the hospital. Infants who are born at home or born on the way to hospital are not
hospital new born inpatients.

27) Hospital Death (or) Health Facility Death: Death occurring after lodging a patient
in an inpatient bed. Detailed record should be maintained for death occurring with in
or beyond 48-hours.
If a patient dies earlier than 48-hours after admission length of stay should also
be indicated in hours (for calculation net death rate)
Death occurring before lodging (death in the emergency room, ambulance or
on the lift are not classified as hospital death or health facility death.

28) Communicable disease: One whose causative agents may pass or be carried from
one person to another directly or indirectly.

29) Out-Patient Record: Out-patient Records are compiled in out-patient department


and should conform in size and form to the records use for hospitalization patients
each is seen in the out patient clinics should have and initial history and physical
examination on subsequent clinic visits notation as to findings and treatment should
be on the progress notes.
30) Discharge Summary (or) Case summary: A summary written (or) dictated by a
doctor, order about the care sitting out the essential facts from the records symptoms
previous history, diagnosis, laboratory and X-ray findings treatment given, operation
performed information given to be patient and further treatment arranged prescribed
form is called Discharge Summary.

31) Medico-legal case: Pertaining to a matter that involves both medicine and legal.

32) Underlying cause of Death: The underlying cause of death is defined as (a) the
disease or injury which initiated the train of morbid events leading directly to death or
(b) the circumstances of the accident or violence which produce the fatal injury.
***al***

Vital Statistics

Introduction:

Definition of Vital Statistics: “It is the ongoing collection by government


agencies of data relating to vital events such as births, deaths, marriages, divorces and
health and disease related states and events which are deemed reportable by local
health authorities”.

These vital events comprehensively include live birth, deaths, featal deaths,
marriages, divorces, adoptions, legitimations, recognitions, recognitions, annulments
and legal separations.

Vital statistics is very useful for all the countries.

Uses of Vital Statistics :

1) Population estimation and forecasting


2) Analysis of health trends
3) Programme planning, monitoring and evaluation
4) Operational and administrative decision making
5) Providing indices for measuring heath and disease in a country
6) To provide tool and methods for measuring and comparing morbidity,
mortality, fertility etc. all over a country.
7) To Survival in a particular disease ( Like cancer, AIDS etc) and
expectancy of life at birth or any age.
Thus it will be evident that for any country the role of vital statistics in
promoting health & preventing disease by means of providing data
required for programme planning and implementation is very important
and can never be underestimated.
Basic tools for measurement of vital statistics.

a) Rate, b) Ratio and


c) Proportion.
CALCULATION OF DERIVATIVES (al)

The collected sources of health/ hospital information are, compile and calculated for
Averages & Percentage, Bed Occupancy rate, Death rate, Birth rate etc.

AVERAGES & PERCENTAGES:

Out Patient:
1) Average daily No. of Out-patient attendance(ADOPA):

Total No. of out-patients attendance


In the hospital during a period
ADOPA = ------------------------------------------
No. of working days for the hospital
During that period.

(The average can also be calculated for new daily out-patients attended
by giving the total no. of New outpatients attendance in the numerator.)

1. Average length of Stay (ALS)


Total duration of stay of discharged patients divided by the number of
discharged during the period.

Total no. of Length of Stay (Discharge days)


ALS = --------------------------------------------------------
Total No. of Discharges

2. The Average Daily In-Patient Census (ADIPC):

It is calculated by dividing the number of In- patient days during a


period by the number of days in the period.

Total No. of In-Patient for the period


i) ADIPC = --------------------------------------------------------------
Total no. of days in the period

ii) ADIPC for excluding NB:

Total No. of In-Patient for a period (Excluding NB)


= ---------------------------------------------------------------------
Total No. of days in the period
iii) ADNBIP (Average daily New-Born In-Patient Census:

Total No. of (NB) In-Patient for a period


= -----------------------------------------------------------------
Total No. of days in the period

3. In-Patient Bed Occupancy Rate (% of Occupancy, Occupancy ratio) BOR:

Ratio of actual patient – day to the maximum patient day (based on


bed complement) during any given period of time.

Total no. of In-patient Service days for a period


BOR = --------------------------------------------------------- X 100
Total No. of In-patient bed count day X
No. of days in the period

4. DEATH RATES (MORTALITY RATES):


1) Crude Death Rate (CDR):
To measure the decrease of population due to death, the rate commonly
used is the crude death rate. The Crude Death Rate is perhaps the most widely used
of any vital rate. This is because it is relatively easy to compute, requiring only total
deaths and total population, and it has value as an index to numerous demographic
and public health problems. It can be interpreted in terms of public health since the
rate of death is the first approximate measure of the health status of a population.

Total No. of Deaths which occurred among the population


of a given geographic area during the given year
CDR = ---------------------------------------------------------------------- X 1000
Mid-Year Population of the given geographic area
during the same year.

(** This is around 8.7 / 1000 population in India).

2) SPECIFIC DEATH RATES (SDR):

Death rates specific for age, sex, etc., can be calculated in a manner similar to the
crude death rate. Specific Rates given below:

a) Case Fatality Rate(CFR):


It is the ratio of deaths to number of cases. It is nothing but ratio.

Total No. of Deaths by a particular disease


CFR = -------------------------------------------------- X 100
Total no. of cases of same disease
b) Proportional Mortality Rate (PMR):

When we need an index of proportion of deaths, by a disease, by a


cause, in a given age, etc. Proportional mortality is useful.

Total No. of deaths by Ca. Tongue in a year


PMR = ----------------------------------------------------- X 100
Total No. of cases of Ca. Tongue or
Total deaths from all causes in that year

Total No. of Deaths under Five children in a year


PMR = ----------------------------------------------------------- X 100
Total No. of Deaths of all ages in that year
c) Maternal Mortality Rate (MMR):

The risk of dying from cause associated with childbirth is measured by the
maternal mortality rate. The numbers exposed to the risk of dying from
puerperal causes are women who have been pregnant during the period. Their
number being unknown the number of live birth is used as the conventional
base for computing comparable mortality rates.

Total No. of Female deaths due to complication of


Pregnancy, Child Birth, within 42 days of delivery
MMR= ---------------------------------------------------------------- X 1000
Total No. of Live Birth

(** This is 3-4 per 1000 Live Birth in Indian)

d) Foetal Death Rate (Or) Still Birth Rate (FDR or SBR):

Expert Committee on Health Statistics has recommended that foetal death


ratio be computed on the basis of live births. As the registration of all foetal
deaths is incomplete, better figures are obtained if the calculation of ratio is
limited to late foetal death ( Still Births). This ratio relates the number of late
foetal deaths to the number of live births.

Total No. of Foetal Death after 28 weeks gestation


for a given period
FDR = ----------------------------------------------------------------- X 1000
Total No. of Live Birth + Still Birth (or) Total Deliveries

(** This is around 8-9 / 1000 Live Birth + Still Birth)

e) Perinatal Mortality Rate (PMR):

Many late foetal deaths and early neonatal deaths may be attributed to similar
underlying conditions and it has been suggested that a single rate called the
perinatal mortality rate. It can be calculated by combining the deaths in both
categories.
Total No. of Late Foetal deaths (above 28 weeks) +
Early Postnatal (neonatal) deaths (under 1-week)
PMR = --------------------------------------------------------------- X 1000
Total No. of Live Birth

(** This is around 40-46 / 1000 live Births in India)

f) Neonatal Mortality Rate (NMR):

Deaths occurring under 28 days a rate called the neonatal mortality rate is
calculated.

Total no. of Deaths of children under 28 days


NMR = ------------------------------------------------------ X 1000
Total No. of Live Birth

(** This is around 45-50 / 1000 live birth in India.)

g) Post Neonatal Mortality Rate (PNMR):

Total no. of Deaths of Children (age 28days to under 1-year)


PNMR = ---------------------------------------------------------------------- X 1000
Total No. of Live Birth

(** This is around 26-30 / 1000 Live birth in India )

h) Infant Mortality Rate(IMR):

Mortality rate among infants is of special significance because it is regarded


as one of the most sensitive indexes of health conditions of the general
population. It is a sensitive measure of health because a baby in its extra-
uterine life is suddenly exposed to multitude of new environmental factors
and their reactions are reflected in this rate.

The calculation of this rate the population of infants in the denominator


is therefore substituted by the number of live births.

Total No. of Deaths below 1-Year


IMR = ----------------------------------------- X 1000
Total no. of Live Births

Simply say that the infant mortality rate is then approximately the sum of the
neonatal and the post-neonatal mortality rates.

IMR= NMR + PNMR


This is universally accepted major mortality indicator, reflecting the
RCHS(Rural Child Health Service). It is around 69 / 1000 live birth.

i) One to Four(1-4) Year Mortality rate(Child Death Rate):

Total No. of Deaths of 1-4 year children


Child D R = ------------------------------------------------ X 1000
Total Children of 1-4 year age in
Mid-year Population.

(or)
Annual No. of Deaths between 1-4 years life children
CDR = --------------------------------------------------------------- X 1000
No. of Live Births in the year

** This is around 12-14 / 1000 1-4 years children in India. It is estimated at


present at 10% of total deaths.

j) Under 5-years Mortality Rate (Child Mortality Rate):

Total no. of deaths below 5-years children


U5 M.R.= --------------------------------------------------- X 1000
Total no. of Live Birth

** An estimate of 76 to 90 / 1000 Live birth is under five mortality rate in


India.

k) Causes of Death Rate :


This is a requirement in International certification of cause of death.

No. of Deaths from a specified cause


Cause of D. R= ---------------------------------------------- x 1,00,000
Mid year Population

HOSPITAL DEATH

1) Hospital Death Rate (or) Gross Death Rate (HDR or GDR):


The proportion of In-Patient hospitalization that end in death usually
expressed as

Total No. of Deaths of In-Patient in a period


HDR = ----------------------------------------------------- X 100
Total No. of Discharges (including deaths)
in the same period

(** In Jipmer 2.93 / 100 discharged patient as per 2008 Statistics.)

2) Net Death Rate (Institution Death Rate):


The ratio of the total no. of deaths occurring in the hospital 48 hrs or
more after admission for a period to the total no. of discharges(including
deaths) minus deaths under ( –) 48hrs over for the same period.

Total No. of Deaths those + 48 hrs or more for a period


NDR = -------------------------------------------------------------------- X 100
Total No. of Discharges (Including deaths) –Deaths under 48 hrs
for the period

(** In Jipmer 1.17 / 100 discharged patients as per 2008 Statistics)

3) Anesthesia Death Rate (ADR):


The Ratio of anesthesia deaths caused by anesthetic agents for a period
to the no. of anesthetics administered for the period. It is defined as a death the
takes place which the patient is under an anesthesia or which is caused by
anesthetics agents used by an anesthetic or anesthesiologist in the practice of
his profession.

Total No. of Deaths caused by Anesthetic agents for a period


ADR = -------------------------------------------------------------------------- X 100
Total No. of Anesthetics Administered for the period

4) Hospital Maternal Death Rate (HMDR):

The ratio of maternal death for a period to the total no. of patients discharged.

Direct Maternal Death : Hospital usually computes this rate by


counting only those patients whose death in a result of obstetrical complication
of the pregnancy whether labor or puerperiam. This is called Direct Maternal
Death.

Indirect Maternal Death: The deaths of obstetrical patient resulting


from a
Previously existing Disease effects of pregnancy. It is called Indirect Maternal
Death.

A non maternal is an obstetric death resulting from accidental or


incidental cause not related to pregnancy or its management.

A woman who dies following an abortion is maternal death, as is an


obstetrical patient, who dies before delivery of a cause of due to pregnancy.

Total No. of Maternal Deaths for a period


HMRD = ---------------------------------------------------- X 100
Total no. of Maternal (Obstetrical) Discharges
(Including deaths) for the period

5) Hospital Neonatal Death Rate (or) Infant New Born Mortality Rate
(HNDR or INBMR):

The ratio of deaths infants born in the hospital for a period. Foetal Deaths are
not included since they are not New Born – In-patient. Infants born outside
the hospital and admitted should be recorded as child in-patient not as New
Born in-patient.

Total No. of Deaths (below 28 days) for a period


HNDR = ----------------------------------------------------- x 100
Total no. of New Born infant discharges for
the period

6) Post Operative Infection Rate (POIR):

The ratio of all infection in clean surgical cases to the no. of operations.

Total No. of Infection in clean surgical Cases for a period


POIR = ---------------------------------------------------------------------- X 100
Total no. of surgical operation for the period

7) Gross Infection Rate (or) Hospital Infection Rate (GIR or HIR):

Total No. of Infections Recorded in


the hospital
GIR= --------------------------------------- x100
Total No. of Discharges

8) Net Infection Rate (NIR):

Total no. of infection attributed to the Hospital


NIR = -------------------------------------------------------- x 100
Total no. of Discharges

9) Autopsy Rate: The ratio of Autopsies to deaths

Total No. of Autopsies


AR = ----------------------------- x 100
Total no. of Deaths

10) Definition of Hospital Autopsy:

It means post mortem examination performed by a hospital pathologist


or a physician of the medical staff to whom the responsibility has been
delegated wherever performed on the body of a person who has at some time
been a hospital patient.

Traditionally, autopsies performed on in-patient deaths only have been


considered when figuring a hospitals’ autopsy rate.

11) Hospital Autopsy Rate (HAR):

The proportion of deaths of hospital patients following which the


bodies of the deceased persons are available for autopsy and hospital autopsies
are performed.

Foetal deaths are autopsies performed on these cases are not included when
computing the autopsy rate.

Total no. of Hospital Autopsies


HAR = ------------------------------------------ X 100
Total No. of hospital patient whose
Bodies are available for hospital
Autopsies
12) Net Autopsy Rate ( NAR):

The ratio during any given time period of all in-patient autopsies to all in-
patient un-autopsied coroners’ or medical examiners’ cased.

Total No. of In-patient autopsies


NAR = ----------------------------------------------- ---- X 100
Total No. of In-patient deaths un –autopsied
coroners’ or medical examiners’ cases.

13) Hospital Foetal Death Rate or Still Birth Rate (HFDR or SBR):

The ratio of foetal deaths to total births in given period. The


percentage for the intermediate or late foetal deaths is requested most
frequently.
Total No. of intermediate / late foetal( still birth) deaths
HFDR = ------------------------------------------------------------------- X 100
Total no. of Live Birth for the period

(** In Jipmer 4.33 / 100 Live births as per 2008 statistics)

14) Hospital Perinatal Mortality Rate (HPMR):

The prenatal deaths is a general term referring to both foetal deaths and infants
who die during the neonatal period (28-days)

Neonatal Period:

I- From the hour of birth through 23 – hrs. and 59- mnts.

II- From the beginning of the 24th hrs of life through 6-days 23 hrs.
and 59mnts.
III- From the beginning of the 7th day of life through 27th day, 23 hrs .
and 59mnts.

Foetal Deaths (under 28 days)


HPMR = --------------------------------------- X 100
Total No. of Live Birth + Still Birth
(or)
Total No. of Deliveries

15) Cesarean Section Rate (CSR):


The ratio of cesarean sections performed to deliveries. Delivery is defined as
act of giving birth to either a living child or a dead foetus for statistical
purpose, when a delivery results in a multiple birth , it is counted as one.

Total No. of Cesarean Sections performed in a period


CSR = ----------------------------------------------------------------- X 100
Total No. of Deliveries in the period

STANDARDISED DEATH RATE (SDR) :


A crude or unadjusted death rate is simply the number of deaths divided by the
population at risk, often multiplied by some constant so that result is not in fractions.
It is a good measure of the overall mortality in a population. It is useful for some
purpose like planning for the delivery of health care services.
We can get age, sex, religion or race adjusted / standardized death rates by
using direct and indirect methods of standardization. Multivariate analysis, regression
techniques can also help us in standardization.

FERTILITY RATES / BIRTH RATES

FERTILITY RATE (General Fertility Rate):

The number of births in a population is largely dependent upon the proportion


of women of child-bearing age(married women). The fertility of a population is
therefore obtained if we relate live births to the total women as the reproductive age
period viz. 15-44 years. The rate is known as the General Fertility Rate (GFR) or age-
limited live birth rate, calculated as follows:

Number of live births in one year


GFR = ----------------------------------------- x 1000
Number of women population in age group
15-44(or 49) years

The General Fertility Rate is approximately 4 times the Crude Birth Rate.

a) Age-Specific Fertility Rates (ASFR) :

These take account of the age-sex composition of population

Number of live birth which occurred to mothers


(or fathers) of a specified age group of the population
of a given period
ASFR = ----------------------------------------------------------------- X 1000
Mid-year female (or male) population of the specific
Age group in same period

b) Birth Rate (Crude Birth Rate):

As an index of the relative speed at which additions are being made to the
population through child birth.

Number of live births which occurred among


the population of a given period
Annual= ------------------------------------------------------- X 1000
CBR Mid-year total population of the given same
Period
No. of Live Births in a year
Natural Increase = ------------------------------------ X 1000
Rate Mid year Population

c) Hospital Birth Rate (or) Gross Hospital Birth Rate:

Total No. of Live Births


HBR (or) -------------------------------------- X 100
GBR Total No. of Discharges

TURN OVER INTERVAL:

It is the average number of day a bed remains vacant large turn over intervals
indication efficient use and scope for improvement.
This is mean number of days that bed is not occupied between two
admissions.

Total No. Vacant bed days


TOI = ----------------------------------
Total No. of discharges

Total duration of stay of discharged patients divided by the no. of discharges


during in a period.

Analysis Hospital Services and Discharge

Medical Records Personnel (Census) checking the Daily census reports for 24
hours ending mid-night and receive with discharged patients in-patient records from
all wards and make entries into the electronically made format all the admission,
discharges and deaths ward by ward for preparation of In-Patient statistics.

It carefully verified and correctly entered into the computer all the remaining
In-patients, admissions, discharges, deaths, and after completion of every month we
get the total monthly,yearly admission, total monthly,yearly discharges, total monthly-
yearly deaths and new born.

Discharge case sheets are entered into the computer for analyzing the hospital
service rendered. After entries of all discharges we get the total number of
discharges, result, total hospital days, provisional and final diagnosis, operations,
consultation, etc.

After completion of this analysis every month, the following monthly-yearly statistics
are prepared.
1) Total admission
2) Total Discharges
3) Total Deaths (- 48 hours + 48 hours)
4) Death percentage
5) Average hospital days
6) Total postmortem(Autopsies) percentage
7) Bed Occupancy percentage
8) Turn over interval
9) Geographical distribution of Admission
10) Operation wise cases
11) Service wise patients
12) Results
13) Consultation report at the end of the month
14) Disease-wise report
15) Cause of death wise report
16) Communicable disease wise report

USES AND LIMITATION OF HOSPITAL STATISTICS

Uses of HS:

1) Designing of various case sheets for registration of In-Patient / Out-Patient.

2) Compilation of data collection and Presentation of Hospital data.

3) Calculation of daily average patients (both In-patient and Out-patient) per-day.

4) Calculation of Bed Occupancy rate, length of stay.

5) proper arrangement of the medical record case sheets for indexing, coding and
retrieval & comparison purpose.

6) Useful in hospital management administrations and planning in the hospital.

7) To describe the level of patient care quality of a hospital.

Limitation of HS:

1) Hospital Admissions are selective in relations to:-

 Personal Characteristics,
 Severity of Disease
 Associated conditions
 Admission Polices.
2) Hospital Records are not designed for research, the may be
 Incomplete,
 Illegible or missing
 Variable in diagnostic quality

3) Population (s) at risk (Denominator) is (are) generally no defined.

***al***

You might also like