Professional Documents
Culture Documents
Definition:
Statistics is may be defined as the science of collection ,presentation, analysis and interpretation of
numerical data.
Characteristics of Statistics:
Statistics
are aggregate or population of facts
must be numerically expressed.
must be comparable and homogeneous
affected to considerable extent by multiplicity of causes
are to enumerated with reasonable accuracy
should be collected in a systematic manner
Division of Statistics
Descriptive Statistics
Inferential Statistics
Importance of Statistics
Statistics is vital in policy formulation and in making sound decision
Statistics helps in proper understanding of any problem affecting human welfare
Statistics is indispensable in social studies
Statistical methods have got their application in natural sciences also
Statistics is indispensable in government administration
Statistics is indispensable in economic analysis also
Statistics is indispensable in business and commerce
Statistics helps in the formulation of organizational policy and managerial planning.
Limitation
Laws of Statistics are true only in the long run. Statistical expressions are in terms of average,
approximation, and probabilities
It can provide a group average without revealing the individual characteristics
Statistics is applicable only in quantitative study
Statistics may produce faulty decision either due to deliberate manipulation or due to
inappropriate use
Statistics has the chance of being misused
Statistics only provide the raw material and tool for making judgment and inferences but they do
not constitute inferences for any study.
1
What is Data collection ?
Data collection
Data collection is a systematic approach to gathering information from a variety of sources to get
a complete and accurate picture of an area of interest.
Stages of generating variables
Determining Problem
Setting Objectives- Major Objective (only one) Specific Objectives (one or more)
From specific objectives we generate variables.
Types of Variable
2
12.0 OVERVIEW OF DATA COLLECTION TECHNIQUES AND TOOLS
Data collection techniques allow is to systematically collect information about our objects of study (people,
objects, and phenomena) and about the settings in which they occur.
12.3 Interviewing
An interview is a data collection technique that involves oral questioning of respondents, either
individually or as a group.
12.7 Differentiation between Data Collection Techniques and Data Collection Tools
To avoid confusion in the use of terms, the following table points out the distinction between techniques
and tools applied in data collection.
Data Collection Techniques and Tools
3
14.0 DATA PRESENTATION TECHNIQUES
Table : Since a piece of paper is two-dimensional, the most effective layout is almost always one of
columns and rows, Such a layout is termed as table.
The construction of a table id many ways work of art. It is not enough just t have columns rows; a badly
constructed table can be as confusing as a are, it is difficult to lay down precise rules that will apply to all
cases. For this reason the reader should construct his tables as common sense guide him, and the
sounder his common sense, the better his tables will be:
Construct the table so that it achieves its object in the best manner possible.
Some of the possible reasons for which a table may be constructed are:
a) to present the original figures in an orderly manner;
b) to show a distinct pattern in the figures;
c) to summarize silent figures which other people may use in future statistical studies.
Curves: Any line on a graph that represents the data to be presented is called a curve, even if it is
a straight line.
Principles of graph construction
4
Diagram: A diagram can be defined as any tow-dimensional form of representation which only one
variable is depicted.
a. Pictorial presentation
i) Pictogram
ii) Statistical maps.
b. Bar charts
i) Simple bar charts
ii) Component bar charts
iii) Percentage component bar charts.
iv) Multiple bar charts.
c. Pie charts.
a) those in which the same picture, always the same size, is shown repeatedly – the value of a
figure represented being indicated by the number of picture shown.
b) Those in which the pictures change in size – value of a figure represented being indicated by the
size of the picture shown.
Statistical maps: These are simply maps shaded or marked in such a way as to convey
statistical information.
Bar Charts: Bar charts are diagrams in which figures are presented by the lengths of the
bars.
Simple Bar Charts: In simple bar charts the data is represented by a series of bars the height of
each bar indicating the size of the figure represented.
Component bar chart: Component bar charts are ordinary bar charts except that the bars are
subdivided into component parts. This sort of chart is constructed when each
total figure is built up from two or more component figure.
Multiple bar chart: In multiple bar chart the component figures are shown as separate bars
adjoining each other.
Pie charts: A pie charts is a circle divided by redial lines into sections so that the area of
each section is proportional to the size of the figure represented.
Use of a pie chart: A pie chart is particularly useful where it is desired to show the relative
proportions of the figures that go to make up a single overall total. Unlike bar
charts, its effectiveness is not limited to three or four component figures but
can extend up to seven or eight, thought it tends to diminish after that, Pie
charts however cannot be used effectively where a time series of figures is
involved, as a number of different pie charts are not easy to compare.
Histogram: A histogram is a graph of a frequency distribution. It is constructed on the
basis of following principles:
i. The horizontal axis is a continuous scale running from one extreme end
of the distribution to the other. This means that this axis is exactly the
same as any ordinary axis on a graph. It should be labeled with the name
of the variable and the units of measurement.
ii. For each class in the distribution a vertical rectangle is drawn with;
a. its base on the horizontal axis extending from one class limit of the
class to the other class limit;
b. its area proportional to the frequency is the class, i.e. if one class has
a frequency twice that of another, than its rectangle will be twice the
area of the other.
5
Data: Statistical observation is called data.
Raw data: Raw data can be defined as data recorded as it is observed or received.
Cross-section data: Information on the variables concerning individual agents (consumer or
producers) at a given point if time.
Time series data: Time series data give information about the numerical values of variables
from period to period. For example the data on gross national income in the
period 1950-65 forms a time series on the variable income.
Panel data: These are repeated surveys of a single (cross section) sample in different
period of time. They record the behavior same set of individual micro-
economic units over time.
Engineering data: These data five information about the technical requirements of the method
of production employed.
Array: The first obvious step to be taken in making the raw data more meaningful is
to relist the figures in order of size, i.e. rearrange them so that run from
lowest to the highest. Such a list of figure is called array.
Frequency: In statistics the number of occurrences is called the frequency.
Data Collection: The following methods of data collection can be adopted for gathering data.
Direct observation
Interviewing
Abstraction from published statistics.
Postal questionnaire.
Design of a questionnaire:
If a questionnaire is to be used, either as a postal questionnaire or as a basis for interviewing, the
following points should be observed in its design.
6
13.0 SAMPLING
Sampling involves the selection of number of study unit from a defined study population. Samples are
used to estimate the true values, or parameters, of statistics in a population and to do so with a calculable
probability or error. A Sample is a set of measurements taken forms a process or series of experiment.
First of all scientific samples are not needed in research in which the subject of inquiry is homogenous.
But if we are trying to study a population of diverse elements a scientific sample is definitely called for.
A study based on a representative sample of adequate size, however, is often better than one based on a
larger sample or on the whole population. That is, sample data may have grater internal validity than data
from the whole population.
Probability sampling. With probability sampling, every element of the population has a
known probability of being included in the sample.
7
Non-probability sampling. With non-probability sampling, we cannot specify the
probability that each element will be included in the sample.
1. Probability samples
Simple Random Sample
Systematic Random Sampling
Stratified Random Sample
Cluster Sampling
Multistage Sampling
2. Non-probability Samples
Quota Sample
Purposive or Judgement Sampling
Snowball Sample
Haphazard or Convincing Sample.
Probability – based samples are representative of larger population and they increase external validity in
any study.
The general rule is this: Use representative, probability sampling whenever you can and use non-
probability sampling strategies as a last resort.
Representativeness
A representative Sample has all the important characteristics of the population from which it is drawn.
Probability Sampling Method
Probability Sampling involves random selecting procedures to ensure that each unit of the sample is
chosen on the basis of chance. All units of the study population should have an equal or at least a known
chance of being included in the sample.
This is the simplest form of probability sampling. To select a simple random sample we need to:
Make a numbered list of all the units in the population form which we want to draw a sample;
Decide on the size of the sample
Select the required number of a sampling units using a ‘lottery’ method or a table of random
numbers.
2. Systematic Sampling
In Systematic Sampling individuals are chose at regular intervals from the sampling frame. Ideally we
randomly select a number to tell us where to start selecting individuals from the list.
Sample size
Study population
3. Stratified Sampling
If it is important that the sample includes representative groups of study unites with specific
characteristics (for example, residents from urban and rural areas, or different age groups), then the
sampling frame must be divided into groups, or strata, according to these characteristics. Random or
systematic samples of a predetermined size will then have to be obtained from each group (stratum). This
is called Stratified Sampling.
4. Cluster Sampling
8
The selection of groups of study units (cluster) instead of the selection of study units individually is called
Cluster Sampling.
1. Convincing Sampling
Convincing Sampling is a method in which for convenience sake the study units that happen to be
available at the time of data collection in the sample.
2. Quota Sampling
Quota Sampling is a method that ensures that a certain number of sample units from different categories
with specific characteristics appear in the sample so that all these characteristics are represented.
13.4 How big should a sample size be?
1. Improve the procedure by which the elements are selected, guaranteeing that every element has an
equal chance of winding up in the sample;
2. Increase the sample size.
The first way is by the far more important. If your selection procedure is biased, then increasing sample
size only increases the bias.
The eventual sample size is usually a compromise between what is desirable and what is feasible.
2NP(1-P)
Sample Size =
{C2(N-1) + 2P(1-P)}
Where X2 is the chi-square value for 1 degree freedom at some desired probability level: N is the
population size (which gets more important as N gets smaller): P is the population parameter of a
variable; and C is the confidence interval you choose.
Table 1
9
Population Size Sample Size
50 44
51 80
52 108
200 132
250 152
300 169
400 196
500 217
800 260
1000 278
1500 306
2000 322
3000 341
4000 351
5000 357
10000 370
50000 381
1000000 384
10