You are on page 1of 35



Data Preparation

• The data collected from the respondents is generally not in the form to be analyzed directly.

• After the responses are recorded or received, the next stage is that of preparation of data
i.e. to make the data agreeable for appropriate analysis.

• Data preparation includes editing, coding, and data entry and is the activity that ensures
the accuracy of the data and their conversion from raw form to reduced and classified
forms that are more appropriate for analysis.

Validation of
Editing Coding Data entry Classification Tabulation

The customary first step in analysis is to edit the raw data. Editing detects errors
and omissions, corrects them when possible, and certifies that maximum data
quality standards are achieved. The editor's purpose is to guarantee that data are:
1. Accurate.
2. Consistent with the intent of the question and other information in the survey.
3. Uniformly entered.
4. Complete.
5. Arranged to simplify coding and tabulation




Field Editing
• In large projects, field editing review is a responsibility of the field supervisor.
• It, should be done soon after the data have been gathered.
• During the stress of data collection in a personal interview and paper-and-pencil
recording in an observation, the researcher often uses ad hoc abbreviations
special symbols.
• Soon after the interview, experiment, or observation, the investigator should
review the reporting forms.

Central Editing

• It should take place when all forms or schedules have been completed and
returned to the office.
• This type of editing implies that all forms should get a thorough editing by a
single editor in a small study and by a team of editors in case of a large
• Editor(s) may correct the obvious errors such as an entry in the wrong place,
entry recorded in months when it should have been recorded in weeks, and the
• In case of inappropriate on missing replies, the editor can sometimes
determine the proper answer by reviewing the other information in the

• Coding refers to the process of assigning numerals or other symbols to answers so

that responses can be put into a limited number of categories or classes.
• Numeric coding simplifies the researcher's task in converting a nominal variable, like
gender, to a "dummy variable,".
• Statistical software also can use alphanumeric codes, as when we use M and F, or other
letters, in combination with numbers and symbols for gender.
• Coding involves assigning numbers or other symbols to answers so that the
responses can be grouped into a limited number of categories.
• In coding, categories are the partitions of a data set of a given variable (e.g., if the
variable is gender, the partitions are male and female).
• Both closed- and open-response questions must be coded.
Some examples of Pre coded questions

Questions Answers Codes

How often these days do you go to the More than once a week 1
cinema? Once a week 2
Once a fortnight 3
Three or four times a year 4
Less often 5
Never 6
Which type(s) of wristwatch do you Hand – wound 1
own? Automatic 2
Electronic 3
Which battery – operated equipment do Torch 1
you have at home? Transistor 2
Other (specify) 3
• After the data is coded, it is validated for data entry errors.
• The data is then used for further analysis.
• The purpose of validating the data is that it has been collected as per the specifications in
the prescribed format or questionnaire.
• For example, if the respondent is asked to rate a particular aspect on 1 to 7, then the obvious
responses should be 1 or 2 ….., or 7.
• Any other inputted number is not considered as valid.
• In validation of the data, the above data will be restricted to the integers between 1 and 7.
• This minimizes the errors.
• The other validations are age within a number like 100, dates such as birth dates, joining
dates, etc should not be future dates etc.

• Data entry converts information gathered by secondary or primary methods to a

medium for viewing and manipulation.
• Keyboarding remains a backbone for researchers who need to create a data file
immediately and store it in a minimal space on a variety of media.
• However, researchers have profited from more efficient ways of speeding up the
research process, especially from bar coding and optical character and mark

• Data having a common characteristic are placed in one class and in this way the
entire data get divided into a number of groups or classes.
• Classification can be one of the following two types, depending upon the nature of the
phenomenon involved:
 Classification according to attributes: As stated above, data are classified on the basis
of common characteristics which can either be descriptive (such as literacy, sex,
honesty, etc.) or numerical (such as weight, height, income, etc.).
 Classification according to class-intervals: Data relating to income, production, age,
weight, etc. come under this category. Such data are known as statistics of variables and
are classified on the basis of class intervals. For instance, persons whose incomes, say,
are within Rs 201 to Rs 400 can form one group, those whose incomes are within Rs
401 to s 600 can form another group and so on.

• Tabulation is the process of summarizing raw data and displaying the same
in compact form (i.e., in the form of statistical tables) for further analysis.
• In a broader sense, tabulation is an orderly arrangement of data in columns
and rows.
• Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Generally accepted principles of tabulation:
1. Every table should have a clear, concise and adequate title so as to make the table intelligible without
reference to the text and this title should always be placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along
with the reference symbols used in the table.
6.  Source or sources from where the data in the table have been obtained must be indicated just below
the table.
7. Usually the columns are separated from one another by lines which make the table more readable and
8. The columns may be numbered to facilitate reference.
9. Decimal points and (+) or (-) signs should be in perfect alignment.
10. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the

• Qualitative Data
Analysis Techniques

• Quantitative Data
Analysis Techniques
Quantitative Research Qualitative Research
Measurability: Quantitative data is measurable. Measurability: Not possible or difficult to
For example size of the market, rate of product measure.
Features: 1. It is a kind of exploratory research
1. Data collected is numerical in nature.
2. Data collection methods are Characteristic:
a. Mail Questionnaire 2. Sample size used is usually small.
b. Personal Interview 3. Unstructured questionnaire is used for data
c. Telephonic Interview collection.

3. Sample size used is very large.
4. Structured questionnaire is used for data
Basis of Difference Qualitative Data Analysis Quantitative Data Analysis
Focus Understand and Interpret Describe, explain and Predict

Sample Design Non – Probability, Purposive Probability

Interpretation It relies on interpretation and logic. This analysis relies on STATISTICS.
Qualitative researchers present their analyses Quantitative research use graphs and tables to
using text and arguments. present their analysis.
Procedures and Qualitative analysis has no set of rules, but Quantitative analysis follows agreed upon
Rules rather guidelines are there to support the standardised procedures and rules.

Occurrence This analysis occurs simultaneously with data Quantitative analysis occurs after data
collection. collection is finished.

Methodology Qualitative analysis may vary methods Methods of Quantitative analysis are
depending on the situations. determined in advance as part of the study
Summarizing Data

• A statistics is a number summarizing a bunch of values.

– Simple or univariate statistics summarize values of one variable.
– Effect or outcome statistics summarize the relationship between values of two or
more variables.
Descriptive Statistics
• Describes the frequency and/or percentage distribution of a single variable
• Tells how many and what percent
• Example: 33% of the respondents are male and 67% are female

Tables and Graphs for Summarizing Data

Frequency Distribution:

• Frequency – number of occurrences

• Interval – is used when grouping quantitative data (also called classes)
• The possible values and how often that it takes these values.
• The label or class is the category of the data.

Categorical Variables - Display

The distribution of a categorical variable lists the categories and gives the count
or percent or frequency of individuals who fall into each category.
– Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts
or percents for the categories.
– Bar graphs represent categories as bars whose heights show the category counts or percents.

0.4 5%

0.3 A
D 20%

0.2 25%

Grade 40%
Quantitative Variable: Histograms
• Histograms show the distribution of a quantitative variable by using bars. Remember to
always include the summary table.
• Procedure – discrete (small number of values)
1. Calculate the frequency distribution and/or relative frequency of each x value.
2. Mark the possible x values on the x-axis.
3. Above each value, draw a rectangle whose height is the frequency (or relative frequency) of that value.

Discrete Data Continuous Data

Discrete data can not be split up into Continuous data can be split up –
“bits”. For example, the number of you can have 1.2 metres, 9.87
students in a class. seconds or be 10.5 years old.

Firstly, what does the group 0 ≤ h < 2 mean?

The group goes from 0 up to 2 hours, including 0 but not including 2.
The group contains any value from 0 up to 1.99999999… hours. 21
Histogram - Discrete
100 married couples between 30 and 40 years of age are studied to see how many
children each couple have. The table below is the frequency table of this data set.

Kids # of Couples Rel. Freq

0 11 0.11
1 22 0.22
2 24 0.24
3 30 0.30
4 11 0.11
5 1 0.01
6 0 0.00
7 1 0.01
100 1.00

Quantitative Variable: Histograms - continuous

Procedure - continuous
1. Divide the x-axis into a number of (equal) class intervals or classes
such that each observation falls into exactly one interval.
2. Calculate the frequency or relative frequency for each interval.
3. Above each value, draw a rectangle whose height is the frequency
(or relative frequency) of that value.

Example Of How To Draw A Histogram: continuous

A survey has been conducted on how many hours of TV some children watched last week.
Draw a histogram for this data.
Hours (h) spent watching TV Frequency
last week
0≤h<2 3
2≤h<5 6
5 ≤ h < 10 10
10 ≤ h < 20 25
20 ≤ h < 40 10
How To Draw A Histogram: continuous
A survey has been conducted on how many hours of TV some children watched last week.
Draw a histogram for this data.
Hours (h) spent
watching TV last Frequency Frequency Density
(Frequency ÷ Group Width)
0≤h<2 3 3 ÷ 2 = 1.5
2≤h<5 6 6÷3=2
5 ≤ h < 10 10 10 ÷ 5 = 2
10 ≤ h < 20 25 25 ÷ 10 = 2.5
20 ≤ h < 40 10 10 ÷ 20 = 0.5

Since the groups are all different widths we need to calculate the
frequency density by dividing the frequency by the group width.
Drawing the histogram: continuous

Things to notice:
• The widths of the bars are the group widths.
• We plot the frequency density not the frequency.
Drawing the histogram: continuous

How could we calculate the frequency from the graph?

Frequency = Freq. Density x Group Width

Therefore the area of each bar is the frequency.

Statistical Analysis

Why do we need to use statistical methods?

– To make strongest possible conclusion from limited amounts of data;
– To generalize from a particular set of data to a more general conclusion.

Descriptive Statistics
– Descriptive statistics are methods for organizing and summarizing data.
– For example, tables or graphs are used to organize data, and descriptive values such as the
average score are used to summarize data.
– A descriptive value for a population is called a parameter and a descriptive value for a
sample is called a statistic.
Inferential Statistics
• Inferential statistics are methods for using sample data to make general conclusions
(inferences) about populations.
• Because a sample is typically only a part of the whole population, sample data provide only
limited information about the population. As a result, sample statistics are generally imperfect
representatives of the corresponding population parameters.


Population From population to sample

From sample to population

Inferential statistics

 If the information about the population is completely known by means of its

parameters then statistical test is called parametric test
 Eg: t- test, f-test, z-test, ANOVA

 If there is no knowledge about the population or parameters, but still it is
required to test the hypothesis of the population. Then it is called non-
parametric test
 Eg: mann-Whitney, rank sum test, Kruskal-Wallis test
Difference between parametric and Non parametric

Parametric Non Parametric

Information about population is completely No information about the population is

known available

Specific assumptions are made regarding the No assumptions are made regarding the
population population

Null hypothesis is made on parameters of the The null hypothesis is free from parameters
population distribution
Difference between parametric and Non parametric

Parametric Non Parametric

Test statistic is based on the distribution Test statistic is arbritary

Parametric tests are applicable only for It is applied both variable and artributes

No parametric test excist for Norminal scale Non parametric test do exist for norminal
data and ordinal scale data

Parametric test is powerful, if it exist It is not so powerful like parametric test

Advantages of non parametric test

 Non parametric test are simple and easy to understand

 It will not involve complicated sampling theory
 No assumption is made regarding the parent population
 This method is only available for norminal scale data
 This method are easy applicable for artribute dates.
Disadvantages of non parametric test

 it can be applied only for norminal or ordinal scale

 For any problem, if any parametric test exist it is
highly powerful.
 Nonparametric methods are not so efficient as of
parametric test
 No nonparametric test available for testing the
interaction in analysis of variance model.

You might also like