You are on page 1of 10

Session

Definition:
Statistics is may be defined as the science of collection ,presentation, analysis and interpretation of
numerical data.

Statistics may be called the science of counting.


Statistics may be called as science of average.
Statistics is the science of estimates and probabilities.
Statistics is the science of decision making in the field of uncertainty.

Characteristics of Statistics:
Statistics
 are aggregate or population of facts
 must be numerically expressed.
 must be comparable and homogeneous
 affected to considerable extent by multiplicity of causes
 are to enumerated with reasonable accuracy
 should be collected in a systematic manner

Division of Statistics
 Descriptive Statistics
 Inferential Statistics

Importance of Statistics
 Statistics is vital in policy formulation and in making sound decision
 Statistics helps in proper understanding of any problem affecting human welfare
 Statistics is indispensable in social studies
 Statistical methods have got their application in natural sciences also
 Statistics is indispensable in government administration
 Statistics is indispensable in economic analysis also
 Statistics is indispensable in business and commerce
 Statistics helps in the formulation of organizational policy and managerial planning.

Limitation
 Laws of Statistics are true only in the long run. Statistical expressions are in terms of average,
approximation, and probabilities
 It can provide a group average without revealing the individual characteristics
 Statistics is applicable only in quantitative study
 Statistics may produce faulty decision either due to deliberate manipulation or due to
inappropriate use
 Statistics has the chance of being misused
 Statistics only provide the raw material and tool for making judgment and inferences but they do
not constitute inferences for any study.

1
What is Data collection ?

Data collection

Data collection is a systematic approach to gathering information from a variety of sources to get
a complete and accurate picture of an area of interest.  
Stages of generating variables
Determining Problem
Setting Objectives- Major Objective (only one) Specific Objectives (one or more)
From specific objectives we generate variables.

Types of Variable

 Variable that can take more than one value.


 A variable is a characteristic of a person, object or phenomenon that can take different value.
 The values of variable can be expressed in numbers. These are called Numerical variable.
 The values of variable can also be expressed in categories. Then these are called Categorical
Variables.
 Variable is some indicator, which can vary on situation.
 Variable must be expressed in quantifiable form.
 Some variables we can generally express in quantifiable form i.e. income, age, number of car,
height etc.
 Some variables are qualitative variable.
 Dependent Variable
 Independent Variable
 There is no fundamental rule that some variables are dependent and some variables are
independent.
 Dependent: Variable cannot vary itself.
 An Independent variable can vary by itself.
 Exogenous Variable/Explicit (Outside): Values of Variable determined outside the model e.g.
taxes, money supply, interest rate etc.
 Endogenous Variable/Implicit (Inside): Values of variables are determined within the model e.g.
inflation rate, profit, income, sex, age.
 Random Variable:
 Values of variable cannot be determined till the occurrence of event is over. (Unemployment rate,
rainfall, GDP)
 Random variable: 1. Discrete 2. Continuous.
 Qualitative variable: Job satisfaction
 Directly observable variable:
 Constructed variable
 Dichotomous variable
 Unidimentional variable i.e. Height
 Multidimentional Variable i.e. stress.

2
12.0 OVERVIEW OF DATA COLLECTION TECHNIQUES AND TOOLS
Data collection techniques allow is to systematically collect information about our objects of study (people,
objects, and phenomena) and about the settings in which they occur.

12.1 Using Available Information


Usually there is a large body of data already collected by others, although, it may not necessary have
been analyzed or published. Locating sources and retrieving the information is a good starting point in
any data collection effort. The data can be of quantitative (for example, data from statistics) or of
qualitative nature.
12.2 Observation
Observation is a technique that involves systematically selecting, watching, and recording behavior and
characteristics of living beings, objects, or phenomena.

12.3 Interviewing
An interview is a data collection technique that involves oral questioning of respondents, either
individually or as a group.

12.4 Self-Administered Questionnaires


A written questionnaire (also referred to as self-administered questionnaire) is a data collection tool in
which written questions are presented that are to be answered by the respondents in written form.

12.5 Focus Group Discussion (FGD)


When using focus group discussions as a research technique, the researcher is no longer the center of
the activity, but (s)he rather lets informants discuss with each other, providing global guidance.
12.6 Projective Techniques
When a researcher uses projective techniques, (s)he asks an informant to react to some kind of visual or
verbal stimulus.

12.7 Differentiation between Data Collection Techniques and Data Collection Tools
To avoid confusion in the use of terms, the following table points out the distinction between techniques
and tools applied in data collection.
Data Collection Techniques and Tools

Data collection Techniques Data Collection Tools


Using available information Checklist, data, compilation form

Observing Eyes and ears, pen and paper, watch, tape


Or video recorder etc.

Interviewing Interview schedule, checklist, questionnaire,


Tape recorder

Administering written questionnaires Questionnaire

Organizing Focus Group Discussions Discussion guide, tape recorder

Using projective techniques Visual aids, sentence completion forms,


Hypothetical cases.

3
14.0 DATA PRESENTATION TECHNIQUES

Table : Since a piece of paper is two-dimensional, the most effective layout is almost always one of
columns and rows, Such a layout is termed as table.

The prime requirement of table construction

The construction of a table id many ways work of art. It is not enough just t have columns rows; a badly
constructed table can be as confusing as a are, it is difficult to lay down precise rules that will apply to all
cases. For this reason the reader should construct his tables as common sense guide him, and the
sounder his common sense, the better his tables will be:

Basic principle of table construction

Construct the table so that it achieves its object in the best manner possible.
Some of the possible reasons for which a table may be constructed are:
a) to present the original figures in an orderly manner;
b) to show a distinct pattern in the figures;
c) to summarize silent figures which other people may use in future statistical studies.

Other principles of table construction


a) The table should be simple.
b) The table must have a comprehensive, explanatory title.
c) The source must be stated.
d) Unit must be clearly stated.
e) The headings to columns and rows should be unambiguous.
f) Double counting must be avoided.
g) Totals should be shown where appropriate.
h) Distinctive rulings should be used as appropriate.
i) Footnotes should be used to qualify or clearly the table.
Advantages of tabular layout

a) it enables any desired figures to be located more quickly;


b) it enables comparisons between different categories to be made more easily;
c) it reveals patterns within the figures which cannot be seen in the narrative form;
d) it takes up less space, or it is far less dense.

Graph :A graph is the representation of data by a continuous curve on ruler paper.

Curves: Any line on a graph that represents the data to be presented is called a curve, even if it is
a straight line.
Principles of graph construction

1. Correct impression must be given.


2. The graph must have a clear and comprehensive title.
3. The independent variable should always be placed on the horizontal axis.
4. The vertical scale should always start at zero.
5. A double vertical scale should be used where appropriate.
6. Axes should be clearly labeled.
7. Curves must be distinct.
8. The graph must not be over crowded with curves.
9. The source of the data always be given.

4
Diagram: A diagram can be defined as any tow-dimensional form of representation which only one
variable is depicted.

The main forms of diagrams are as follows:

a. Pictorial presentation
i) Pictogram
ii) Statistical maps.
b. Bar charts
i) Simple bar charts
ii) Component bar charts
iii) Percentage component bar charts.
iv) Multiple bar charts.
c. Pie charts.

Histograms: There are tow kinds of pictogram:

a) those in which the same picture, always the same size, is shown repeatedly – the value of a
figure represented being indicated by the number of picture shown.
b) Those in which the pictures change in size – value of a figure represented being indicated by the
size of the picture shown.

Statistical maps: These are simply maps shaded or marked in such a way as to convey
statistical information.
Bar Charts: Bar charts are diagrams in which figures are presented by the lengths of the
bars.
Simple Bar Charts: In simple bar charts the data is represented by a series of bars the height of
each bar indicating the size of the figure represented.
Component bar chart: Component bar charts are ordinary bar charts except that the bars are
subdivided into component parts. This sort of chart is constructed when each
total figure is built up from two or more component figure.
Multiple bar chart: In multiple bar chart the component figures are shown as separate bars
adjoining each other.
Pie charts: A pie charts is a circle divided by redial lines into sections so that the area of
each section is proportional to the size of the figure represented.
Use of a pie chart: A pie chart is particularly useful where it is desired to show the relative
proportions of the figures that go to make up a single overall total. Unlike bar
charts, its effectiveness is not limited to three or four component figures but
can extend up to seven or eight, thought it tends to diminish after that, Pie
charts however cannot be used effectively where a time series of figures is
involved, as a number of different pie charts are not easy to compare.
Histogram: A histogram is a graph of a frequency distribution. It is constructed on the
basis of following principles:
i. The horizontal axis is a continuous scale running from one extreme end
of the distribution to the other. This means that this axis is exactly the
same as any ordinary axis on a graph. It should be labeled with the name
of the variable and the units of measurement.
ii. For each class in the distribution a vertical rectangle is drawn with;
a. its base on the horizontal axis extending from one class limit of the
class to the other class limit;
b. its area proportional to the frequency is the class, i.e. if one class has
a frequency twice that of another, than its rectangle will be twice the
area of the other.

5
Data: Statistical observation is called data.
Raw data: Raw data can be defined as data recorded as it is observed or received.
Cross-section data: Information on the variables concerning individual agents (consumer or
producers) at a given point if time.
Time series data: Time series data give information about the numerical values of variables
from period to period. For example the data on gross national income in the
period 1950-65 forms a time series on the variable income.
Panel data: These are repeated surveys of a single (cross section) sample in different
period of time. They record the behavior same set of individual micro-
economic units over time.
Engineering data: These data five information about the technical requirements of the method
of production employed.
Array: The first obvious step to be taken in making the raw data more meaningful is
to relist the figures in order of size, i.e. rearrange them so that run from
lowest to the highest. Such a list of figure is called array.
Frequency: In statistics the number of occurrences is called the frequency.

Data Collection: The following methods of data collection can be adopted for gathering data.
 Direct observation
 Interviewing
 Abstraction from published statistics.
 Postal questionnaire.

Design of a questionnaire:
If a questionnaire is to be used, either as a postal questionnaire or as a basis for interviewing, the
following points should be observed in its design.

 Question should be short and simple.


 Question should be unambiguous,
 The best kinds question are those which allow a reprinted answer to be ticked.
 The questionnaire should be as short as possible.
 Questions should be neither irrelevant nor too personal.
 Leading questions should not be asked.
 The questionnaire should be designed so that the questions fall into a logical sequence

6
13.0 SAMPLING

13.1 What is sampling?

Sampling involves the selection of number of study unit from a defined study population. Samples are
used to estimate the true values, or parameters, of statistics in a population and to do so with a calculable
probability or error. A Sample is a set of measurements taken forms a process or series of experiment.

Why Samples are taken?


What kinds of samples are there?
How big it should be?

13.2 Why Samples are taken?

First of all scientific samples are not needed in research in which the subject of inquiry is homogenous.
But if we are trying to study a population of diverse elements a scientific sample is definitely called for.

A study based on a representative sample of adequate size, however, is often better than one based on a
larger sample or on the whole population. That is, sample data may have grater internal validity than data
from the whole population.

Some more regarding advantages of Sampling

1. Practicability, 2. Flexibility, 3. Accuracy, 4. Speed, and 5. Reduced Cost.

Examples of study Population and study units are as follow:

Problem: High drop-out rates in primary schools in District.

Study Population: all primary schools in District.

Study Unit: One Primary School in District

13.3 What kinds of samples are there?


There are two types of samples

Statisticians distinguish between two broad categories of sampling .

 Probability sampling. With probability sampling, every element of the population has a
known probability of being included in the sample.

7
 Non-probability sampling. With non-probability sampling, we cannot specify the
probability that each element will be included in the sample.

e are two types of samples

1. Probability samples
 Simple Random Sample
 Systematic Random Sampling
 Stratified Random Sample
 Cluster Sampling
 Multistage Sampling

2. Non-probability Samples
 Quota Sample
 Purposive or Judgement Sampling
 Snowball Sample
 Haphazard or Convincing Sample.

Probability – based samples are representative of larger population and they increase external validity in
any study.

The general rule is this: Use representative, probability sampling whenever you can and use non-
probability sampling strategies as a last resort.
Representativeness
A representative Sample has all the important characteristics of the population from which it is drawn.
Probability Sampling Method
Probability Sampling involves random selecting procedures to ensure that each unit of the sample is
chosen on the basis of chance. All units of the study population should have an equal or at least a known
chance of being included in the sample.

1. Simple Random Sampling

This is the simplest form of probability sampling. To select a simple random sample we need to:
 Make a numbered list of all the units in the population form which we want to draw a sample;
 Decide on the size of the sample
 Select the required number of a sampling units using a ‘lottery’ method or a table of random
numbers.

2. Systematic Sampling

In Systematic Sampling individuals are chose at regular intervals from the sampling frame. Ideally we
randomly select a number to tell us where to start selecting individuals from the list.
Sample size
Study population

3. Stratified Sampling

If it is important that the sample includes representative groups of study unites with specific
characteristics (for example, residents from urban and rural areas, or different age groups), then the
sampling frame must be divided into groups, or strata, according to these characteristics. Random or
systematic samples of a predetermined size will then have to be obtained from each group (stratum). This
is called Stratified Sampling.

4. Cluster Sampling

8
The selection of groups of study units (cluster) instead of the selection of study units individually is called
Cluster Sampling.

5. Multi Stage Sampling


A Multi Stage Sampling procedure is carried out in phases and usually involves more than one than
sampling method.

Non-probability Sampling Method

1. Convincing Sampling
Convincing Sampling is a method in which for convenience sake the study units that happen to be
available at the time of data collection in the sample.

2. Quota Sampling
Quota Sampling is a method that ensures that a certain number of sample units from different categories
with specific characteristics appear in the sample so that all these characteristics are represented.
13.4 How big should a sample size be?

There are two ways to make a sample more representative of population:

1. Improve the procedure by which the elements are selected, guaranteeing that every element has an
equal chance of winding up in the sample;
2. Increase the sample size.

The first way is by the far more important. If your selection procedure is biased, then increasing sample
size only increases the bias.

The proper size of a sample depends on five things:

1. How much money and time you have;


2. How big the population is to which you want to generalize;
3. The heterogeneity of the population or chunk;
4. How may population subgroups you want to deal with simultaneously in your analysis;
5. How accurate you want your sample statistics (or parameter estimators) to be.

The eventual sample size is usually a compromise between what is desirable and what is feasible.

The feasible Sample size is determined by the availability of resources:

1. Time, 2. Manpower, 3.Transport, 4. Manpower.


Determining Sample size

2NP(1-P)
Sample Size =
{C2(N-1) + 2P(1-P)}

Where X2 is the chi-square value for 1 degree freedom at some desired probability level: N is the
population size (which gets more important as N gets smaller): P is the population parameter of a
variable; and C is the confidence interval you choose.

(3.841) (N) (0.5) (0.5)


{(.05)2 (N - 1) + 2P(1 - P)}

Table 1

Size of Sampled Required for Various Population Sizes at 5% confidence Interval

9
Population Size Sample Size
50 44
51 80
52 108
200 132
250 152
300 169
400 196
500 217
800 260
1000 278
1500 306
2000 322
3000 341
4000 351
5000 357
10000 370
50000 381
1000000 384

Source: Krejcie and Morgan (1970).

10

You might also like