You are on page 1of 49

Data Processing

= Data collected are processed in preparation for data


analysis.
= Data processing involves a variety of operations, such as,
editing, coding, encoding, sorting and tabulation.
= The selection of a particular procedure may depend on
such factors as nature and amount of the data to be
processed, the kind of end product desired, time allowed to
finish the work, and the availability of resources.
What is data
processing

Data processing is a set of procedures of categorizing, organizing, and


presenting data in suitable forms that will make the data meaningful to
the researcher and suggest what statistical analysis need to be done so
that these can be correctly interpreted.

When all the data needed to answer the study objectives have
collected, the data should be processed in preparation for analysis.
Why is data processing necessary?
• In studies involving collection and analysis of quantitative
information, a great amount of information needs to be processed in
preparation for collation and analysis.

• With the availability of computers, data processing has become


simple, fast and accurate.

• Large amount of information can be stored in data files, which can be


easily retrieved and analyzed.
Through data processing:

• Data can be checked for • Tables, graphs, charts and other


completeness, consistency, and forms of data presentation can
accuracy. be easily generated.

• Coded and organized data can • Statistical analysis and


be easily and safely stored. generation of statistical outputs
can be don easily and quickly
Before starting with data processing the researcher
must decide on:

1. Whether processing will be done manually or by using a


computer;
2. What tables, graphs, charts, etc. need to be generated, and
3. What statistical manipulations will be performed.
Steps in data
processing:
There are five steps in data processing, namely;

Editing
Coding
Encoding
Creation of data files
Tabulation
Step One: Editing

• Editing is done on accomplished questionnaires/interview schedules


to discover
• omissions
• Inconsistency of responses
• Or incompleteness of information

• Errors or omissions should be remedied before data are coded.

• Editing is performed by the data collector and the field supervisor.


Tips in editing
1. Review the completed instruments immediately after interview or
administration.
Make sure all necessary questions have been asked and answered properly and legibly.
If there are omissions or inconsistencies I responses, the data collected should go back
to the respondent for clarification or additional information.

2. If the supervisor or office editor finds errors, omissions, and inconsistencies


in the instrument, he/she should clarify these with the data collector.
If the data collector cannot provide or clarify the answer, the data collector should be
advised to go back to the data source to the get the mission data or clarification of
ambiguous answers.
Illustration:

• In editing the completed questionnaire as indicated, the editor


notices two errors in the instrument.
• The first error is the inconsistency between the response for question
A.5. and the response for question A.4. since the answer in A4 is “Yes”
question A5 should have an answer. The second error is the omission
of the answer to question B.5. Every respondent should have an
answer question B5.
Research title: “Socioeconomic Characteristics of Graduate School Students in University A”
A. Respondents’ Personal characteristics
Sex 1 Male
2 Female
2. How old are you on your last 30
birthday
3. What is your civil Status 1 Single
2 Married
3 Widowed
4 Separated

4. Are you presently gainfully 0 No


working? 1 Yes
5. What is your major occupation? 0 Not working
1 Professional
2 Administration
3 Business
4 Farming/Fishing
5 Service/Communication
6 Others
7. On the average, how much do you
approximately earn from your major Php 25,000.00
occupation>

B. Educational Background
1. What undergraduate degree did 1 AB/BA
you finish? 2 Education
3 Commerce
4 Nursing
5 Engineering
6 Agriculture
7 Others

2. What graduate program are you 1 MAED


pursuing/ 2 MAN
3 MBA
4 MPA
5 MEEng
6 Others

3. Why did you enrol in graduate For further education


school?
4. Do you kike graduate school? 1 Yes
2 No
5. Why?
Step Two: Coding
• Coding is the process of converting all possible response categories to a question
to unique numerical code.

• The codes may be marked on the questionnaire/interview schedule, or written in


a specially prepared coding sheet. The codes are usually defined in a coding
manual.

• A coding sheet is a form that contains columns and rows where the code
symbols are entered.

• The column represent variables or question items, while the rows represent the
data souces, subjects, respondents, or instruments being coded.
Sampling coding sheet

R No. Sex Age Civ Work Emp Occ. Income Educ Course Like Why Why
Stat Stat grad No Yes

1 1 18 1 1 1 2 10,000 1 1 1 2 1
2 2 17 1 2 1 1 12,300 1 1 1 2 2
3 2 16 1 1 1 2 10,500 2 2 2 3 3
4 1 17 1 2 0 0 NAP 2 2 1 3 2
A Coding Manual

• Is a form which define variables and gives the codes for the categories
of responses for the questions or items in the research instrument.

• It specifies the variable number, variable name, item number in the


instrument, variable definition or description, categories of responses,
and codes for the categories.

• The coding manual is usually prepared in a tabular form.


Illustration: A coding manual for the questionnaire: Socioeconomic characteristics..
Variable No. Variable Name Quest Item No. Description Code Categories
1 Sex A.1 Sex of Respondent 1 Male
2 Female
2 Age A.2 Age of Respondent As is As is
3 Civil Status A.3 Civil Stat of Resp 1 Single
2 Married
3 Widowed
4 Work A.4 Working or not 1 Not working
2 Working
5 Empstat A.5 Employment Stat 1 Not working
2 Fulltime
3 Part-time
6 Occup A.6 Occupation of Resp 1 None/NAP
2 Professional
3 Administrator
4 Office employee
5 Farming/Fishing
6 Service provider
7 Others

7 Income A.7 Monthly income of resp As is As is


Var no. Var Name Quest Item # Description Code Categories

8 Educ B.1 Bachelors 1 AB/BA


degree/Degree
completed
9 Course B.2 Course pursued by 1 MAED
student 2 MAN
3 MB/MPA
4 MEEng
5 Others

10 Like Grd B.3 Whether or not resp like 1 Yes


graduate school 2 No
11 Whyno B.4 Reasons why 0 NAP
respondents don’t like 1 Difficult
graduate school 2 No time to study
3 Expensive
4 Others
12 Whyyes B.4 Reasons why 0 NAP
respondents like 1 Challenging
graduate school 2 Satisfying
3 Exciting
4 Others
How to prepare a coding
manual
1. Identify the variables or items in the research instrument. For example civil status
of respondent
2. Create a short name or label for each variable (8 characters or less) and give a
brief description of variable. For example, a possible short name for civil status is
“civstat”. Its description is civil status of respondent.”
3. Identify the categories of responses for each item/question and assign a unique
code for each category. (In many instruments, questions are provided with precoded
fixed alternative responses.
• For example, for variable “civstat,” the categories and their respective codes
are:
Variable Description Codes Categories
Name
civstat Civil status of 1 Single
respondents 2 Married
3 Widowed
4. For open-ended questions, get a sample of accomplished instruments and from
these, make a list of all responses to each question. Make sure that categories do
not overlap. An example of overlapping categories are: “psychological violence”
and “use of insults, sarcasm, and offensive language.”
5. Groups the answers to each item according to common characteristics and
elements, give a name to each group that captures their commonalities and assign
a unique code or symbol (number, letter or word) to each category.
Example for the open ended questions, “Why did you enrol in graduate
school?” the answers listed from a sample of completed questionnaires are listed in
one of the preceding table.
The answers have been grouped into five and each group is assigned a label
that captures the meaning of all categories in the group. A code is assigned to each
new category. Just in case there are responses that will be found in he other
questionnaires which cannot be assigned to any of the new labels, the label
“Others, specify” should be added, under which the additional categories can can
be assigned.
Responses Codes Categories
“for professional growth” 1 For professional growth
“for promotion”, and :for For promotion
salary improvement” 2

“for self-satisfaction,” and For self satisfaction


“self gratification” 3

“to learn new things” or to To learn new things and


get new ideas,” and “to be 4 ideas
updated”
“to gain friends,” “to know 5 To meet other people
more”
“Other answers that may Others, specify
be found “later” 6
Step Three: Encoding and Creating Data File
• After the raw data have been coded, data files are created.
• The coded data are entered and stored in a data sheet, then in a
computer disk, diskette or tape, to facilitate retrieval, processing, and
statistical manipulation.
• There are many softwares which can be used for this purpose, such as
the SPSS, a statistical package which is easily available.
• The coding manual, and the coding sheet or the precoded
questionnaires serve s the main references for encoding or creation of
data file.
• For manual processing, the accomplished coding sheet may already
serve as the data sheet or file.
• For computer processed data, the data file looks like a data sheet.
• The SPSS software provides instructions on how to create data files and
generate tables.
A sample coding sheet/ date file for the questionnaire
R No. Sex Age Civ Work Emp Occ. Income Educ Course Lik1e Why Why
Stat stat grad No Yes
School
1 1 18 1 1 1 2 10,000 1 1 1 2 1
2 2 17 1 2 1 1 12,300 1 1 1 2 2
3 2 16 1 1 1 2 10,500 2 2 2 3 3
4 1 17 1 2 0 0 NAP 2 2 1 3 2
5 1 18 2 2 0 0 NAP 3 3 1 3 2
6 1 17 2 1 2 2 12,000 1 1 1 4 3
7 1 18 1 1 2 1 14.050 2 4 2 2 4
8 1 19 2 2 0 0 NAP 4 3 1 3 2
9 2 17 1 1 1 2 13,000 3 5 1 3 1
10 1 18 2 1 0 0 NAP 4 1 2 3 4
Step four: Tabulation: Generating Data Summaries

• Before statistical computation and analysis are performed, initial


descriptive tables per variable must be generated.
• Tables allow the researcher to have a picture of the study population
in terms of the variables to be studied.
• The outputs of the process are single variable tables, crosstabulations,
graphs, or charts, and other outputs that will enable the research to
have a preliminary view of the findings.
• The preliminary view of the data will also allow the researcher to
identify and correct errors in coding and data entry.
Sample Tabulated data for the Questionnaire
Table 1 Distribution of graduates students by Sex

Sex Number Percent


Male 7 70
Female 3 30
Total 10 100

Table 2. Distribution of graduate students by work

Work status Number Percent


Working 6 60
Not working 4 40
Total 10 100
Table 3. Distribution of Graduate Students by Sex and Employment Status
Employment Sex
Status Total
Male Female

No. % No. % No. %


Working 4 57 2 67 6 60
Not working 3 43 1 33 4 40
Total 7 100 3 100 10 100
Table 4. Mean Age and mean Education of Respondents
Variables N Minimum Maximum Mean Std.
Deviation
Age 30 19 48 30.37 7.595
Education 30 0 8 2.43 1.547
Valid N 30
Data Analysis and Interpretation

• Data can be better appreciated and effectively used when


they have been analyzed and interpreted.

• Analysis enables the researcher to interpret the results of a


study and answer the research questions or study objectives.
What is Data Analysis
• Data analysis is a process of summarizing trends and patterns
observed in the data, determining major differentials or relationships
among variables used in the study and the application of appropriate
statistical tests on a set of data to answer the objectives of the study.

• The type of data analysis to use depends on:

• The objective of the study


• The kind of scales of measurement of the data or variables being
dealt with.
Scales of Measurement

• Understanding the measurement scales of data or variables helps


determine the type of statistics that can be used in analysing data to
answer the research objectives.

• There are four levels of measurement:


• Nominal
• Ordinal
• Interval and
• ratio
Nominal Scale
• The nominal scale has no mathematical value. It is also called a categorical
scale.

• Numbers are assigned to categories of nominal data/variables to facilitate data


processing.

• A higher number assignment does not mean a bigger value or weight.

• For example, sex is a nominal variable. Its categories, “male” and “female,” do
not have mathematical value. If number “1” is used to represent “male” and
“2” is used to represent “female”, does not mean that the “female” category
has a higher value than the “male” category. Numbers are assigned to
categories to facilitate processing.
Ordinal Scale

• An ordinal scale is a measure in which data or categories of a variability are


ordered or ranked into two or more levels of degrees, such as from low or high
or least to most.

• The distance between the first, second and the second ranks, however, is not
the same as the distance between the second and the third ranks or the
distance between third and fourth ranks.

• For example, three high school students who got the first, second and third
honors in their class obtained a general average of 94, 89 and 88, respectively.
Take note that while their ranks are consecutive, the differences in grades
between ranks are not equal.
The rank in class is an ordinal variable.
Interval Scale

• An interval scale has the characteristic of an ordinal scale, but in


addition, the distances between points in the interval scale is equal.

• For example, body temperature is considered interval scale. The


distance between a body temperature of 30 degrees Farenhiet is the
same as the distance between 40 degrees and 50 degrees.

• Body temperature does not have an absolute zero point.


Ratio Scale
• A ratio scale is almost like the interval scale, except that the ratio scale
has a real zero point.

• An example of a ratio scale is monthly income.

• Income values have equal distances between each other. For instance,
the distance between Php 1,000 and Php 3,000. similarly, the distance
between Php 5,000 and Php 10,000 is the same as the distance
between php 15,000 and Php 20,000 which is Php 5,000.
Descriptions and Examples of the Four Scales of Measurements

Scale Description Example


Nominal Categories do not have mathematical Sex: male, female
values. One is not higher or lower than Color: Red, white, yellow
the other Civil Status: Single, married
Ordinal Categories can be ranked. The difference Degree of malnutrition: 1st degree,
between the first and the second rank is 2nd degree, 3rd degree
not the same as the difference between Honor Roll: 1st, 2nd, 3rd
the second and the third ranks. Level of anger: Not angry, Angry,
Very angry
Interval The data have numerical value. The Body temperature in Farenheit: 30
distance between two points is the same, degrees, 40 degrees, 45 degrees
but there is no zero point or it may be Buisness capital (Php): 1 M, 2 M, 3
arbitrary. M
Ratio The same as interval data but the zero No. of children: 0, 1, 2, 3, 4
point is fixed. Hrs spent in studying: 0, 5, 10
Data Analysis
• Data analysis is the process of determining the distribution of cases or
respondents under given categories of information/responses and
summarizing of trends and patterns observed in the data.

• It may also involve the computation of differentials and indexes used


in determining relationships between/among variables used in the
study (Parrel, et al., 1979). Data analysis may be descriptive or
inferential.

• Descriptive Analysis
• Is used to describe the nature and characteristics of an event or a
population under investigation. It is used to describe the
characteristics of a variable or a set of data and/or the variance
within the data.
Inferential Analysis
• Is a method of analysis used in testing hypothesis. It is used to test for
significance of observed differences or relationship between or among
variables. This method is used in analytical studies.

• Data/variables may be analyzed one at a time (univariate) two at a time


(bivariate) or three or more at a time (multivariate).

• A. Describing the characteristics of the Data (Univariate)


When a researcher wants to describe the characteristics of a sample
population considering one variable at a time, the question commonly asked
of the data is: What are the characteristics of the population (respondents,
subjects, items, or events)?
• The two methods of analysing the data to answer this question are:
Frequency distribution
measures of central tendency
mean
mode
median

Frequency distribution
The frequency distribution indicates the number and percentage of
responses for each category.
• The distribution is a useful measure for analysing nominal and ordinal data.

• The percentage is computed by dividing the number of responses per category


by the total number of cases or respondents, and then multiplying the result
by 100. The data may be presented in table or graphical form. Example:

Table 1. Distribution of Students According to Sex


______________________________________
Sex Number Percent
______________________________________
Male 45 39.13
Female70 60.87
_____________________________________________
Total 115 100.00
_____________________________________________
Table 2. Distribution of Children by Nutritional status
_____________________________________________
Nutritional Status Number Percent
_____________________________________________
Normal 30 40.0
1st degree malnourished 20 26.7
2nd degree malnourished 15 20.0
3rd degree malnourished 10 13.3
____________________________________________________
Total 75 100.00
____________________________________________________
Figure 1. Distribution of Respondents by Nutritional Status

30

25

20

15

10

0
Normal 1st degree malnourihed 2nd degree malnourished 3rd degree malnourished
Figure 1. Distribution of Respondents by Nutritional Status

Nutritional Status Normal 1st degree malnourihed 2nd degree malnourished 3rd degree malnourished
Measure of Central Tendency

• Measures of central tendency which are commonly called averages,


enable the researcher to summarize the data in a single number.

• This number represents a typical score attained by a group of


individuals or subjects on a certain variable or measure.

• The three measures of central tendency are:


mean
median
mode
• The mean

Is the average of all values. It is useful in analysing interval and ratio data.

It is derived by adding all the values and dividing the sum by the total
number of cases.
Example: The achievement can be measured by a score in 100 item test.
In the illustration below, the mean 84.4, is the average of the scores
obtained by 15 students.
Scores of 15 students in an achievement test
82 83 85 87 87 88 90 91 93 93 94 95 95 95 96
Mean = Sum of 82+83+85+87 = 1266/15 = 84.4
• Median
The median is the midpoint of a group of interval measures arranged from
highest to lowest.
Example: In the 15 scores below which are arranged from lowest to highest,
the midpoint is the 8th score from the lowest (82), and the 8th score from the
highest (96), is the median.
Scores: 82 83 85 87 87 88 90 91 93 93 94 95 95 95 96
Mode
The mode is the most frequently occurring figure in a set of figures. In actual
situations, modal scores are usually near the middle of continuum of scores.
Example: In the 15 scores below, 90 is the mode because it occurs three times.
Scores 82 83 85 87 87 88 90 90 90 91 93 93 96 97 97
• B. Describing the /variance in the data (Univariate)
The two commonly used measures of variations are the range and the
standard deviation.

Range
The range is a simple measure of variation calculated as the highest value
in a distribution, minus the lowest value plus 1.
Range = Highest value – Lowest Value + 1
Example:
In the sample data below, the highest score is 97, while the lowest is 82. the
range = 97 (the highest score) minus 82, the lowest score) plus 1 = 15+1 = 16
Scores: 82 83 85 87 87 88 90 90 90 91 93 93 96 97 97
• Standard Deviation (SD)
The standard deviation (SD) gives the average of the distance of individual
observations from the group mean, the square root of the average squared deviation of
each case from the mean.
The steps involved in calculating the standard deviation (SD) are:
1. Calculate the mean of the distribution (X).
2. Subtract the mean from each score (X-X)
3. square each of these scores (X-X)2
4. divide the sum of the squared scores by the number of scores (n). The result is
called the variance.
5. Take the square root of the variance. The result is the SD

SD = Σ(x-x)2
n
• B. Analyzing Differences Within The Data
A researcher may want to know whether the difference between two groups
is satisfactorily significant or may have occurred by chance.

The comparison may be for difference in prportions or difference in means.

1. Difference in proportions
For example in a survey on the smoking practices of high school students, it
was found that 63 percent of the male students, while only 20 percent of the
female students smoke. To determine whether the difference in the
proportion of male smokers and the proportion of female smokers is statistically
significant, Z test for difference in proportions can be applied. The result of the
analysis is shown in Table 3 indicate a significant difference between proportions
(Z=5.87, .01)
Table 3. Distribution of Students by Smoking Practices and Sex
______________________________________________
Smoking Practices Male Female Z-test
______________________________________________
No. % No. % Value P
Smoking 47 63.0 15 20 5.87 .01
Not Smoking 28 37.0 60 80 7.12 .01
Total 75 100.0 75 100.0
______________________________________________
2. Difference
A study comparing the performance of male and female college students, revealed
that the 129 sample male students obtained a mean grade of 82.34 (SD=1.69).
While the 178 sample female students obtained a mean grade of 81.85 (SD=1.34).

To determine whether the mean grade of the male students significantly differ from
that of the female students, the Z-test difference between means can be applied.

The result of the analysis shown in table 4 indicates that there is no significant
difference between the two means (Z=1.74, p=.350). This means that the male and
the female students d not significantly differ in terms of their mean college grades.
(For a sample population of 30 or less, the t-test is used.

To analyze the differences among thee or more means, the analysis of Variance
(ANOVA) is used.
Table 4. Comparison of Mean College Performance of Students.

_________________________________________________
Sex No. Mean Grade SD Z-Test Sig.
_________________________________________________
Male 129 82.34 1.69
Female 178 81.85 1.34 1.74 3.54
Total 307 82.05 1.49
_________________________________________________
C. Describing Relationships Between Variables (Bivariate Analysis)
Many studies focus on determining association or relationship between two
variables. The simplest way of finding out whether there is an association
between two variables is by using crosstabulation.

A crosstabulation displays the distribution of one variable for each category


of another variable. For example, Table 3 shows the distribution of young
adults by voting behaviour and income.

The research question of being addressed by this analysis is; Is there a


relationship between income and the decision to vote or not to vote?’ the
dependent variable is “voting behaviour,” while the independent variable is
“monthly income.”

You might also like