Professional Documents
Culture Documents
DATA ANALYSIS
1
Data Preparation
• The data collected from the respondents is generally not in the form to be analyzed directly.
• After the responses are recorded or received, the next stage is that of preparation of data
i.e. to make the data agreeable for appropriate analysis.
• Data preparation includes editing, coding, and data entry and is the activity that ensures
the accuracy of the data and their conversion from raw form to reduced and classified
forms that are more appropriate for analysis.
Data
Preparation
Validation of
Editing Coding Data entry Classification Tabulation
data
EDITING
The customary first step in analysis is to edit the raw data. Editing detects errors
and omissions, corrects them when possible, and certifies that maximum data
quality standards are achieved. The editor's purpose is to guarantee that data are:
1. Accurate.
2. Consistent with the intent of the question and other information in the survey.
3. Uniformly entered.
4. Complete.
5. Arranged to simplify coding and tabulation
EDITING Cont..
Editing
Field
Editing
Central
Editing
Field Editing
• In large projects, field editing review is a responsibility of the field supervisor.
• It, should be done soon after the data have been gathered.
• During the stress of data collection in a personal interview and paper-and-pencil
recording in an observation, the researcher often uses ad hoc abbreviations
special symbols.
• Soon after the interview, experiment, or observation, the investigator should
review the reporting forms.
EDITING Cont..
Central Editing
• It should take place when all forms or schedules have been completed and
returned to the office.
• This type of editing implies that all forms should get a thorough editing by a
single editor in a small study and by a team of editors in case of a large
inquiry.
• Editor(s) may correct the obvious errors such as an entry in the wrong place,
entry recorded in months when it should have been recorded in weeks, and the
like.
• In case of inappropriate on missing replies, the editor can sometimes
determine the proper answer by reviewing the other information in the
schedule.
CODING
• Data having a common characteristic are placed in one class and in this way the
entire data get divided into a number of groups or classes.
• Classification can be one of the following two types, depending upon the nature of the
phenomenon involved:
Classification according to attributes: As stated above, data are classified on the basis
of common characteristics which can either be descriptive (such as literacy, sex,
honesty, etc.) or numerical (such as weight, height, income, etc.).
Classification according to class-intervals: Data relating to income, production, age,
weight, etc. come under this category. Such data are known as statistics of variables and
are classified on the basis of class intervals. For instance, persons whose incomes, say,
are within Rs 201 to Rs 400 can form one group, those whose incomes are within Rs
401 to s 600 can form another group and so on.
TABULATION
• Tabulation is the process of summarizing raw data and displaying the same
in compact form (i.e., in the form of statistical tables) for further analysis.
• In a broader sense, tabulation is an orderly arrangement of data in columns
and rows.
• Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a
minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Generally accepted principles of tabulation:
1. Every table should have a clear, concise and adequate title so as to make the table intelligible without
reference to the text and this title should always be placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along
with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated just below
the table.
7. Usually the columns are separated from one another by lines which make the table more readable and
attractive.
8. The columns may be numbered to facilitate reference.
9. Decimal points and (+) or (-) signs should be in perfect alignment.
10. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the
table.
TYPES OF DATA ANALYSIS
• Qualitative Data
Analysis Techniques
• Quantitative Data
Analysis Techniques
Quantitative Research Qualitative Research
Measurability: Quantitative data is measurable. Measurability: Not possible or difficult to
For example size of the market, rate of product measure.
usage.
Features:
Features: 1. It is a kind of exploratory research
1. Data collected is numerical in nature.
2. Data collection methods are Characteristic:
a. Mail Questionnaire 2. Sample size used is usually small.
b. Personal Interview 3. Unstructured questionnaire is used for data
c. Telephonic Interview collection.
Characteristic:
3. Sample size used is very large.
4. Structured questionnaire is used for data
collection.
Basis of Difference Qualitative Data Analysis Quantitative Data Analysis
Focus Understand and Interpret Describe, explain and Predict
Occurrence This analysis occurs simultaneously with data Quantitative analysis occurs after data
collection. collection is finished.
Methodology Qualitative analysis may vary methods Methods of Quantitative analysis are
depending on the situations. determined in advance as part of the study
design.
Summarizing Data
17
Tables and Graphs for Summarizing Data
18
Frequency Distribution:
19
Categorical Variables - Display
The distribution of a categorical variable lists the categories and gives the count
or percent or frequency of individuals who fall into each category.
– Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts
or percents for the categories.
– Bar graphs represent categories as bars whose heights show the category counts or percents.
F
0.4 5%
0.3 A
D 20%
Percent
0.2 25%
B
10%
0.1
0
A B C D F C
Grade 40%
20
Quantitative Variable: Histograms
• Histograms show the distribution of a quantitative variable by using bars. Remember to
always include the summary table.
• Procedure – discrete (small number of values)
1. Calculate the frequency distribution and/or relative frequency of each x value.
2. Mark the possible x values on the x-axis.
3. Above each value, draw a rectangle whose height is the frequency (or relative frequency) of that value.
22
Quantitative Variable: Histograms - continuous
Procedure - continuous
1. Divide the x-axis into a number of (equal) class intervals or classes
such that each observation falls into exactly one interval.
2. Calculate the frequency or relative frequency for each interval.
3. Above each value, draw a rectangle whose height is the frequency
(or relative frequency) of that value.
23
Example Of How To Draw A Histogram: continuous
A survey has been conducted on how many hours of TV some children watched last week.
Draw a histogram for this data.
Hours (h) spent watching TV Frequency
last week
0≤h<2 3
2≤h<5 6
5 ≤ h < 10 10
10 ≤ h < 20 25
20 ≤ h < 40 10
How To Draw A Histogram: continuous
A survey has been conducted on how many hours of TV some children watched last week.
Draw a histogram for this data.
Hours (h) spent
watching TV last Frequency Frequency Density
(Frequency ÷ Group Width)
week
0≤h<2 3 3 ÷ 2 = 1.5
2≤h<5 6 6÷3=2
5 ≤ h < 10 10 10 ÷ 5 = 2
10 ≤ h < 20 25 25 ÷ 10 = 2.5
20 ≤ h < 40 10 10 ÷ 20 = 0.5
Since the groups are all different widths we need to calculate the
frequency density by dividing the frequency by the group width.
Drawing the histogram: continuous
Things to notice:
• The widths of the bars are the group widths.
• We plot the frequency density not the frequency.
Drawing the histogram: continuous
Descriptive Statistics
– Descriptive statistics are methods for organizing and summarizing data.
– For example, tables or graphs are used to organize data, and descriptive values such as the
average score are used to summarize data.
– A descriptive value for a population is called a parameter and a descriptive value for a
sample is called a statistic.
Inferential Statistics
• Inferential statistics are methods for using sample data to make general conclusions
(inferences) about populations.
• Because a sample is typically only a part of the whole population, sample data provide only
limited information about the population. As a result, sample statistics are generally imperfect
representatives of the corresponding population parameters.
Sampling
Inferential statistics
PARAMETRIC AND
NONPARAMETRIC TEST
Parametric
Nonparametric
If there is no knowledge about the population or parameters, but still it is
required to test the hypothesis of the population. Then it is called non-
parametric test
Eg: mann-Whitney, rank sum test, Kruskal-Wallis test
Difference between parametric and Non parametric
Specific assumptions are made regarding the No assumptions are made regarding the
population population
Null hypothesis is made on parameters of the The null hypothesis is free from parameters
population distribution
Difference between parametric and Non parametric
Parametric tests are applicable only for It is applied both variable and artributes
variable
No parametric test excist for Norminal scale Non parametric test do exist for norminal
data and ordinal scale data