You are on page 1of 24

Module 1

Introduction to statistics

The term ‘Statistics’ seems to have been derived from the Latin word ‘status’or the
Italian word ‘statista’ or the German word ‘statistik’ or the French word
‘statistique’.Each word means a ‘political state’ . In the early years a collection of facts
about the People in the state for administrative or political purpose was known as
statistics. For the proper administration of the state it is necessary to collect data
regarding income, expenditure, wealth, health, employment, birth, death, etc.of the
people belonging to the state. Thus the subject of statistics has developed as a ‘ science of
state craft’.

Meaning of statistics

The term ‘statistics’ is used as a plural noun as well as as singular noun .In plural sense
statistics means numerical data . In singular sense statistics means different techniques
and methods used for
collection, analysis and interpretation of numerical data

Definition of statistics

According to croxton and cowden, “statistic is the collection presentation analysis and
interpretation of numerical data”.

According to horace secrist,” statistics as aggregate of facts affected to a marked extent


by multiplicity of causes numerically expressed enumerated or estimated according to
reasonable standards of accuracy collected in a systematic manner for a pre-determined
purpose and placed in relation to each other”.

Empirical and quantitative analysis


● Empirical analysis is an evidence-based approach to the study and interpretation of
information.

● Quantitative analysis refers to analysis that aims to understand for predict


behaviour or events through the use of mathematical theories, measurement and
calculations.

Functions( objectives) of statistics

1. Helps to simplify complexity


2. To present facts in a systematic manner
3. To facilitate easy comparison of data with respect to time and location
4. To help in formulating and testing hypothesis
5. To help in testing the laws of other Sciences
6. To indicate trend behaviour
7. To measure uncertainty
8. To study the relationship between different phenomena
9. To formulate and implement policies in different fields

Importance of statistics
1. Simple presentation of data
2. Helps in comparison
3. Helps in decision making
4. Helps in measuring relationship
5. Helps in planning and policy formulation
Limitations of statistics
1. Statistics does not study qualitative phenomena
2. Statistics is incapable of revealing all aspects of a problem
3. Statistical laws lack accuracy following
4. Statistics does not deal with individual items
5. Statistics is liable to be misused
6. Data must be uniform
7. Too many methods to find a single result
8. Statistics are true only on an average

Distrust of statistics
Distrust of statistics refers to disbelief or lack of faith in statistics.
Distrust Of statistics occurs due to the reasons:
1. incomplete knowledge of statistical method
2. Unrealistic assumptions
3. Deliberate misuse of statistics
4. Ignoring limitations of statistics
5. Wrong application of statistical methods

To overcome the problems of distrust, the following precautions should be taken into
consideration:

1.limitation of statistics should be kept in mind


2. Only experts should make use of the statistics
3.Data should be used after a careful enquiry 4. Great care and caution should be
exercised while using statistics.
5.Free and Frank discussion should be made while applying statistical method

----------

Module 2 - Statistical Survey

The term survey means search for information, knowledge or truth. The
investigation is statistical when it is conducted by using statistical methods.

Stages of statistical survey

1. Planning and designing of enquiry : Proper planning of surveys is of great


importance as the quality of Survey results depends on the preparations
made before surveys conducted. Following are the factors that should be
considered during the planning stage-
● Purpose of survey
● Scope of Survey
● Unit of Data Collection
● Sources of data
● Techniques of Data Collection
● The choice of a frame
● Degree of accuracy desires
● Miscellaneous consideration
2. Executing a survey : the work of executing the survey is started after a plan
for data collection has been prepared. Following are the various stages of executing
the survey -
● Setting up an administrative organisation
● Design of forms or instruments
● Selection training and supervision of field investigator.
● Control over the quality of fieldwork
● Follow-up of non-response
● Processing of data
● Preparation of report.

Collection of Data (Business Data Sources)


- Statistical data may be other primary or secondary data.
- Primary Data : are those that are collected for the first time and are original
in character. They are in the nature of raw materials from which
investigators draw conclusions.
- Secondary Data : are those data which have already been collected by some
other persons and have already passed through the statistical analysis. They
are in the nature of finished products.

Difference between Primary and Secondary Data


PRIMARY DATA SECONDARY DATA
Primary data is the original data Secondary data is not original. The
collected by the investigator. investigator makes use of the data
collected by other investigator or
agencies.

Primary data collection is time Use of secondary data is relatively less


consuming and requires more energy expensive and it is less time
and money. consuming.

The suitability of primary data for the The suitability of the secondary data
current investigation will be more. for the current investigation cannot be
predicted. It may or may not suit the
objectives of the study.

Primary data will be obtained as raw Secondary data is in the nature of


data finished products.

Primary data can be used without much The use of secondary data should be
precaution because the data is collected with grater care; otherwise it may lead
by the investigator itself. to wrong interpretations.

Investigations based on primary data The accuracy of the investigation with


will be more accurate since it involves secondary data will be comparatively
the attention and personal interest of less since the investigator is depends on
the investigator. data collected by others.

The source of primary data may be the The source of secondary data are
result of an experiment, a survey etc. governmental and nongovernmental
organization, published reports,
journals, books etc.
The possibility of personal prejudice Possibility of lesser degree of personal
exists in primary data. prejudice.

Methods of Primary Data Collection

1. Direct personal investigation


- The investigator himself personally goes to the spot of enquiry and collects
the information either through interview with the informants or through
observation.
- It is applicable when the enquiry is small and greater accuracy is required.
2. Indirect oral investigation -
- In this method data is collected through indirect sources. Investigator
collects the data indirectly by interviewing the third persons or witnesses
who are supposed to be in close touch with the original informants.
- This method is used in case of complex nature, area is vast and the informant
is reluctant to convey the information.
3. Mailed questionnaire method
- In this method a list of relevant questions relating to the problem under
investigation is prepared and send to the various informants by post.
- Self addressed envelope and a covering letter requesting to furnish the
necessary information are also sent to the informants along with the
questionnaire.
- This is applicable in cases where informants are literate and in case of vast
areas.
4. Schedules sent through enumerators
- Under this method information is collected by sending schedules through
the enumerators or Interviewers.
- A schedule is a set of questions relating to the problem under study but are
asked and filled in a face to face situation between enumerators and
informants.
5. Through Local correspondence
- Under this method the investigator appoints local agent or correspondents in
different places to collect information.
- They collect and transmit the information to the central office where the data
are processed.
- This method is used by various government departments, newspaper
agencies etc.

Methods for collection of Secondary Data

1. Published sources
- Official publications of Central, state and local governments, Official
publications of foreign governments or International bodies, Report
submitted by University bureaus, economist research scholars, etc.
2. Unpublished sources
- It includes many enquiries of private nature are conducted by some persons
which are not published since they are usually meant for private use.
- Examples data relating to trade associations, Chambers of Commerce etc.

Drafting the Questionnaire/ Essential Qualities of a Good Questionnaire

● It should be divided into two parts : The first part should contain the aims
and objectives of the enquiry and the reasons for issuing the questionnaire.
Second part is the main part and it contains those questions.
● Questions should be clear, simple and easy to understand : Questionnaires
should use apt words at the apt places. Words with the multiple meaning
should be avoided
● Questionnaire should be brief : Number of questions should be reduced to
the minimum.
● Questions should be arranged in a logical order :
● Personal questions should be avoided : Avoid private, confidential or
personal questions which respondents will be reluctant to answer it.
● Questions should be capable of objective answers : as far as possible the
question should be in the form of yes or no.
● Questionnaire should look attractive.
● Questions requiring calculations should be avoided.
● Questionnaire should be pre tested with a group before mailing it out : Pre
testing helps to overcome the shortcomings of the questionnaire.
● Crosscheck.
● Method of tabulation

Techniques of Data Collection


It refers to the methods of selecting the units from which the primary data
are to be collected from the universe.
A) Census or Complete Enumeration Technique : Census method is a technique
of enquiry conducted by which information is collected from the entire units
of the universe.
B) Sample Technique : Sampling is a technique of inspecting or studying only a
selected representative and unit of the population and drawing conclusions
based on the study for the entire universe.

Differences between Census method and Sampling method

Census Method Sampling Method


In census method information is Information is collected from the
collected from the and entire units of representative units of the
the population. of the population.

All individual items are studied As only selected items are studied

Census method is usually more time Sample survey is less time


consuming. time consuming.
Cost involved is high. Cost involved is less.

A detailed study of all items is not Detailed study of the items is


possible. possible.

Methods or Techniques of Sampling

The various methods of sampling techniques can be grouped under two


categories:-
A) Probability Sampling or Random Sampling : This sampling is based on
the theory of probability. Each unit has it's on chance of being selected. It is
also known as chance sampling.
B) Non Probability Sampling or Non Random Sampling : This is not based
on theory of probability. In this sampling it does not provide equal chance of
selection to each population element.

Methods of probability or random sampling

A. Simple or unrestricted random sampling


Under this method of sampling the entire units in the population under study have
any equal and independent chance of being selected. This is the basic probability
based sampling design. It is used for small homogeneous population.

B. Complex random sampling

- probability sampling under restricted sampling technique may result in


complex random sampling. Following are the different types of popular
Complex random sampling techniques: -
1. Stratified random sampling : Under this method the whole of the universe or
group is divided into various strata or subgroups of items possessing similar
characteristics and a sample is drawn from each stratum by using simple
random sampling method. This method ensures greater accuracy, reduces
time and expenses, etc.
2. Cluster sampling : Under this method the whole universe is divided into
certain subgroups called clusters and certain clusters are selected at random
to provide the sample. These clusters are known as the primary units and our
homogeneous in nature.
3. Systematic sampling or Quasi random sampling : It is the most practical
method of drawing samples by selecting every kth item from a complete list.
As the interval between sample units is fixed this method is also known as
fixed interval method.Also known by pseudo random sampling.

Methods of Non Probability Sampling

1. Convenient Sampling
- Is a non probability sampling in which sampling units are selected according
to the convenience of the investigator.
- This method is also known as accident sampling because the researcher
select those respondents he meets accidentally.

2. Purposive or Judgement Sampling


- In this method the investigator himself makes the choice of sample from the
given universe according to his own skill and judgement.
- He selects those respondents whom he thinks to be representative of the
universe for the study.

3. Quota Sampling
- In this method the interviewer is instructed to interview a specified number
of persons from each quota.
- This quota is fixed in advance for collecting samples from each group
according to certain specific homogeneous characteristics.

4. Snowball Sampling
- In the sampling a set of respondents are selected initially and interviewed.
- After this respondents are asked to list the names of other people in their
opinion who form a part of the target sample.
- So this technique creates a snowball effect which keeps on growing in size
as it rolls down.

Criteria for selecting Sampling Techniques

● Purpose of survey
● Measurability
● Degree of precision
● Information about population
● Nature of population
● Geographical area of the study and size of population
● Financial resources
● Time limitation
● Economy

Theories of sampling

Theories of Sampling are the principles or laws on which the sampling


techniques of data collection is based. Following are the two important theories of
sampling:-
1. Law of Statistical Regularity - This law states that if a large size sample is
taken at random from a universe, it is likely to possess the same
characteristics as that of the universe.
2. Law of Inertia of Large Numbers - The law states that large aggregates are
more stable than small ones. It means that when a large number of samples
are taken for study then the total changes are likely to be very small.
Editing - It is defined as, “the process of examining the raw data to detect errors
and omissions and to correct them if possible so as to ensure legibility,
completeness, consistency and accuracy”.
Editing refers to the techniques, procedures and methods used for checking and
existing data for omissions, errors,etc.

Coding - It is the process of organising and sorting the collected data. Coding is
done by assigning some symbols, alphabetical or numericals or both to the
collected data.

Tabulation - it is the systematic presentation of collected and classified numerical


data in columns and rows on the basis of their salient features or characteristics.
Tabulation makes the data more simple and presentable. It also helps in the final
analysis of the problem and drawing the conclusion.

Cross tabulation - It is a statistical tool used to compare the relationship between


two variables. A crosstab report shows the connection between two or more
questions asked in a survey.

Classification- It is the process of arranging the data in groups or classes


according to resemblances and similarities in order to make the data clear and
meaningful. Here the whole data is divided into a number of classes on the basis of
common characteristics.

Module 3 A
Univariate Data Analysis- 1
Measures of Central Tendency

The word measures means methods or instruments and Central tendency


means average value of any statistical series .Measures of Central tendency means
the methods of finding out the central value or average value of a statistical series.

Essential characteristics

1. It is a single figure expressed in some quantitative form.


2. It lies between the maximum and minimum value of a series.
3. It is a typical value that represents all the values in a series.
4. it is capable of giving a central idea about the series it represents.

Characteristics of an ideal average

1. It should be rigidly defined


An average should be rigidly defined so that there is no confusion regarding its
meaning
2. Its definition should be in the form of a mathematical formula
With mathematical formulation anybody computing the average from a set of data
will arrive at the same figure this means the average should have a definite value
3. It should be simple to understand the meaning and nature of an average should
be such that even a layman can easily understand it
4. It should be simple to calculate
Calculation of the average value should be very simple and free from mathematical
intricacies
5.It should be based on all the observations of the series an average will truly be a
representative of the whole series if it is computed from all the observation
6. It should not be affected much by a few extreme values a few very small or very
large observations should not unduly affect the value of a good average
7.It should be capable of further algebraic treatment An average should be such
that it enables an analyst to make more and more mathematical analysis on the
basis of its result
8. It should be capable of being used in for the statistical computation
An average should be one which helps in further statistical
computation
9. It should be capable of making relative study
An average should be least affected by Sampling and should be
capable of expressing in simple numerical terms.

Functions/ objectives /advantages of an average


1. Representative : an average represents all the features of a group
2. short description :an average gives as simple and brief description
of the main features of the whole data.
3. Simplify complexity : very large number of data may be briefed
into a single figures
4. Helpful in comparison : The measures of Central tendency reduce
the data to a single value which is highly useful for making comparative
studies
5. Helpful in formulation of policies: It helps to develop a business policy for a
firm or help the economy of a country to formulate a policy for its development
6. Base of other statistical analysis: other statistical devices such as mean deviation
,coefficient of variation, correlation ,time series analysis and index numbers are
also based on the averages
7. Helpful in research works : Any research work requires application of our ages
and their interpretation. it is an important quantitative techniques in research
works

Types of averages

1. Arithmetic average or mean

* a )simple Arithmetic mean b )weighted Arithmetic mean


* Geometric mean
* Harmonic mean
2. Positional averages
* median and partition values
a) Quartiles b ) Deciles c ) Percentiles ,etc.
* Mode
3. Miscellaneous averages
* Moving average
* Progressive average
*Quadratic average etc.

Arithmetic average or mean

Average is obtained by adding together all the items and by dividing this
total by the number of items .

An average may be defined as “the quotient obtained by dividing the total of the
values of a variable by the total number of their observations”
Calculation of Arithmetic Average

1. Individual series
2. Discrete series
3. Continuous series
Merits and Demerits of arithmetic average
Merits
1. It is rigidly defined and hence there is no scope for ambiguity or
misunderstanding about its meaning
2. It is easy to understand and thus it is a popular average.
3. it is simple to calculate .
4. It is based on all the items of a series.
5. It is not very much affected by fluctuations in sampling .
6. It is capable of further algebraic treatment.
7. It is a calculated value, and not based on position in the series
Demerits
1. It cannot be determined by inspection nor can it be located graphically
2. It cannot be used in the study of qualitative phenomena like intelligence, beauty
, etc .
3. It is affected by Extreme values.
4. It is not suitable for averaging ratios and percentages
5. If a single observation is missing or lost or is illegible, mean cannot be
calculated.
6. In a distribution with open - and classes the value of mean cannot be computed
without making assumptions regarding the size of the class interval of the open end
classes.
Combined mean
“ A combined mean is the mean of whole series when there are two or more
component series”
Correction in mean
From the total of the values the incorrect values are first subtracted and
then the correct values are added. This total is divided by the number of items to
get the correct value of the mean.
Weighted average (weighted mean )
4/5
Weighted average may be defined as the average, whose component items are
being multiplied by certain values known as weights and the aggregate of the
multiplied results are being divided by the total sum of their weights instead of the
sum of the items.
Advantages of weighted average
1. Unequal importance to items.
2. Varying frequencies .
3. Wide change in values or frequencies
4. Comparison .
5. Calculation of average from different series .
6. Calculation of rates

Geometric mean
Geometric mean is defined as the nth root of the product of n items.
Steps for calculating Geometric mean
1. Find the logarithm of all values
2. Add the logarithmic values
3. Divide them by number of items
4. Then find the anti logarithm
Merits and demerits of geometric mean
Merits
1. Well defined .
2. Based on all the items .
3. Further algebraic treatment .
4. Not affected by fluctuations .
5. weight according to size of items .
6. Not affected by Extreme values .
demerits
1. It is difficult to understand by a layman.
2. It cannot be calculated if the number of negative values is odd.
3. It cannot be calculated if any value is zero.
4. At times it gives a value which may not be found in the series.
Harmonic mean
Harmonic mean is a mathematical average. It is defined as “the reciprocal
of the Arithmetic average of the reciprocals of the values of a variable”
Merits and demerits of harmonic mean
Merits
5/5
1. Well defined
2. Based on all the items
3. Further algebraic treatment
4. Not affected by fluctuations
5. Measures relative changes
6. Can be calculated even when a series contain negative values

demerits
1. It is not easy to understand by a layman
2. It is only a summary figure and may not be the actual item in the series
3. It is very difficult to calculate
4. It is not truly representative of the statistical series
5. Its algebraic treatment is very much limited
Module 3B
Uni-variate data analysis- 1
Positional averages and Partition values

Positional average is that average whose value is worked out on


the basis of its position.

Median
Median is a positional average . It is the middle most item of a series
when the values are arranged according to their magnitudes .

Median may be defined as “ that value of the variable which divides the group into
two equal parts, one Part comprising all the values greater and the other, all values
being less than the median”.

Merits of median
1. Rigidly defined
2. Easiness
3. positional average
4. good for open end classes
5. location by inspection
6. can deal with qualitative data
7. Central location

Demerits of median
1. In case of even number of observations for an ungrouped data, median cannot be
determined exactly . In this case median is the arithmetic average of two middle
items
2. Median is not suitable for further mathematical treatment .
3. Median is relative less stable than mean , particularly for small samples.
4. Median being a positional average, is not based on each and every item of the
distribution .
5. For calculating median, it is necessary to average the data in ascending order or
descending order; other averages do not need any arrangement .
6. The value of median is affected more by sampling fluctuations than the value of
arithmetic mean
7. At times, it produces a value which is never found in the series.

Partition values
The values whitch break the series into a number of equal parts are called the
partition values.
Median, quartiles, deciles , percentiles etc. are the important partition values .

Quartiles
The values which divide the series into four equal parts are known as
quartiles.There are three quartiles namely Q1 , Q2, Q3.

Deciles
The values which divide the series into 10 equal parts are known as deciles . There
are 9desciles D1, D2 ,D3 etc..D9.

Percentiles
The values which divide the series in to 100 equal parts are known as
percentiles.There are 99 percentiles , P1, P2, P3 , etc...P99.
Module 3 C
Uni-variate data analysis- 1 Mode

Meaning and definition


Mode is the value around which the items are most heavily concentrated
and is the most common item of the series or it is the item having the largest
frequency.

According to A M Tuttle, “ mode is the value which has the greatest frequency
density in its immediate neighbourhood”.

Characteristics of mode

1. Most typical value


2. Not affected by the extreme values
3. Positional average

Merits
1. It gives the most representative value of a series.
2. It is not affected by the extreme values of a series.
3. It can be determined graphically .
4. It is considered as a reliable average for studying skewness of a distribution .
5. It is commonly understood and easily calculated .
6. open end classes also do not pose any problem in the location of mode.
7. For the calculation of mode It is not necessary to know the value of all the items
of a series
8. It is very much useful in the field of business and Commerce.

Demerits

1. It is not rigidly defined and so in some cases it may come out with the different
the result.
2. It is not based on all the observation of a series but on the concentration of
frequencies of the items
3. It is not capable of further algebraic treatment
4. It is ill defined , inderminate and indefinite.
5. As compared with mean, mode is affected to a greater extent by sampling
fluctuations
6. It cannot be determined from series with unequal class intervals unless they are
equalised.
7. In many cases , it may be impossible to get a definite value of mode .

Module 4 B
Univariate data analysis - 2

Skewness Moments and Kurtosis

Skewness
Skewness means the asymmetry or lack of symmetry.

According to Morris Hamburg, “skewness refers to the asymmetry or lack of


symmetry in the shape of a frequency distribution” .

Characteristics
1. It refers to lack of symmetry
2. lt refers to the difference in value of Mean, Median and mode
3. it refers to the difference in distance between the quartiles and median.
4. It may be positive or negative.

Types of skewness
1. Karl Pearson’s coefficient of skewness.
2. Bowley's co- efficient of skewness.
3. Kelly’s coefficient of skewness .
4. Measures of skewness based on moments and kurtosis.

Moments
Moments defined as “the arithmetic average of a certain power of deviations
of the items from their arithmetic mean”.

Uses of moments

1. Moments are used to describe the central tendency of a series.


2. They are used to describe the dispersion of items of a series .
3. They help us to describe the skewness of a series.
4. They help to describe the kurtosis of a series.
Kurtosis
The word kurtosis is derived from Greek language which means ‘bugliness’.
kurtosis means the degree of the extent of peakedness of a distribution compared to
a normal distribution.
According to Spiegel , “kurtosis is the degree of peakedness of a distribution,
usually taken relative to a normal distribution”.

Module-5
Interpolation and Extrapolation

Interpolation
Interpolation is a statistical technique used for arriving at a missing value from the
normal values pertaining to a given phenomenon .

Interpolation is defined as, “estimation of an unknown value between two known


values or drawing conclusions about missing information from the available
information”.

Extrapolation
Extrapolation is a statistical technique used for arriving at the unknown projected
value from the Known values pertaining to a given phenomenon .

Extrapolation is defined as “estimation or projection of an unknown value from the


given values”.

Importance /objectives /utility


1. To find the missing value .
2. To estimate positional averages .
3. To project future value .

Methods of interpolation
1. Graphic method
2. Algebraic method
(a) Newton’s formula of advancing differences
(b) Binomial expansion method
(c) Lagrange's method

You might also like