Professional Documents
Culture Documents
SCIENCES
•Cleaning data
analysis
QUESTIONNAIRE CHECKING
A questionnaire returned from the field may be
unacceptable for several reasons.
Parts of the questionnaire may be incomplete.
The pattern of responses may indicate that the respondent did not
understand or follow the instructions.
The responses show little variance. One or more pages are missing.
The questionnaire is received after the pre-established cutoff date.
The questionnaire is answered by someone who does not qualify for
participation.
DATA PREPARATION
Preparation of data file
It is important to convert raw data into a usable data for
analysis (coding where it needed), simply transform
information from questionnaire to computer database
The analysis and results will surely depend on the quality
of data
There are possibilities of errors in handling instruments,
raw data, transcribing, data entry, assigning codes, values,
value labels
Data need to be cleaned to fulfill the analysis conditions
CODING
In this lesson, first we will learn how to enter data, edit the
variable name, and assign labels and values. Next, we will
create a frequency table, produce descriptive statistics, and
plot a histogram.
Quantitative Data
Next, click the Data View tab to return to the data view
window.
E. Choose a statistical procedure.
a. From the main menus choose: Analyze / Descriptive
Statistics / Frequencies
[To know more about Procedure Frequencies, click on the
Help button. Close the help topic window when you are done.]
b. Select the variable ‘scores’ to be analyzed. By default, it will
display the frequency table.
c. Click the Statistics button and select the statistics you want
SPSS to compute as shown below.
The 25th percentile score is 14. That is, 25% of the students who
took the same test scored at or below a score of 14.
The 50th percentile score is 15. That is, 50% of the students who
took the same test scored at or below a score of 15.
The 75th percentile score is 16. That is, 75% of the students who
took the same test scored at or below a score of 16.
Frequency Table
There are only a few data values. The frequency table is shown below
Example 2 Dr. Smith asked fifteen students in his class on what days of
the week they were born. The results are shown below.
A. Code Data
The variable `dayofwk` is a categorical variable. One way to
simplify data entry is to assign numbers or symbols to represent
responses.
Assign 1 for `Sunday`, 2 for 'Monday', 3 for 'Tuesday', 4 for
'Wednesday', 5 for 'Thursday', 6 for 'Friday', 7 for 'Saturday' and 9
for ' Missing '.
B. The most common method of representing frequency of
categorical membership is a bar chart. Our task is to produce a
bar chart.
SPSS for Windows
A. Open a new Data Editor window. From the menus choose: File /
New / Data.
a. Name: Define the variable name. Click the Variable View tab.
Double click on the textbox. Type in the variable name
“dayofwk” as shown below
Double click on the textbox and a gray square will appear. Click
on the gray square as shown below
b. New Value
Enter the new value: 1.
c. Click on Add. The numbers 0 through 11 will
now be recorded as `1`
.
d. Specify the second old range (12 through 15)
and specify the second new value `2`. Click on
Add. For those who completed high school but
not college
(b) Click on Edit Text. `1` is highlighted in the list. Delete `1` in
the Label text box. Type `0 through 11` in the Label text box.
Click on Change.
(c) Highlight `2` from the scroll list. Type `12 through 15` in the
Label text box. Click on Change.
(d) Highlight `3` from the scroll list. Type `16 through 22` in the
Label text box. Click on Change.
Pearson Correlation
B. Enter values.
Correlation
Interpretations
Value
r= 1.0 to .90 or r= -1.0 to -.90 Very High Correlation; Very significant relationship
r= .89 to .70 or r= -.89 to -.70 High Correlation; Significant relationship
r= .69 to .40 or r= -.69 to -.40 Moderate Correlation; Average relationship
r= .39 to .20 or r= -.39 to -.20 Low Correlation; Small relationship
r= .19 and below Very low Correlation; Almost no relationship
Stage 1: Development
Since the math skills was strongly correlated with grades in
statistics, the researcher decided to use the math skills as a
predictor variable to predict grades in statistics.
Task: Develop a linear regression equation to predict the scores
on the final exam from the scores on the pretest.
From the menus choose: Analyze \ Regression \ Linear.
Dependent and Independent (s)
Move the variable `final` to the Dependent variable
list.
Move the variable `pretest` to the Independent
variable list.
To save predicted values, residuals, and prediction
interval for individual predicted values, click on
Save. The Save dialog box will appear. Click
Continue and OK.
A. Summary Statistics for the Equation
Find the linear regression equation for predicting the final exam
from the pretest.
b. Stage 2: Estimation
1. Enter Data
2. Choose Analysis \ Correlation \ Partial.
Note that you may press and hold down the Control key (the
Ctrl key on your keyboard) while clicking on the two
variables. Then click the arrow button to move them at the
same time. You may also move the variables one by one.
T-test
T-test are used when you have two groups (e.g., Males and
Females) or two sets of data (Before and After) and you wish to
compare the mean score on some continuous variable.
Example of question:
Is there a significant difference in the mean self-esteem
scores for females and males?
Research Question
Does watching the movie about killing baby seals
change viewers' attitudes toward the hunting and
killing of animals? Specifically, will the group
watching a movie about killing baby seals score lower
on the questionnaire?
Group Mean
Higher scores on the questionnaire represent
rationalization of the animal slaughter.
The group watching a movie about killing baby seals
scored lower (M1 = 28, SD = 3.24) on the questionnaire
than did the group viewing a movie about the migration
of caribou (M0 = 32.8, SD = 3.11).
Equality of Variances
Most computer programs routinely check for equality of
variances for both groups before computing a t-
test. Levene's test is used to test the null hypothesis that the
two population variances are equal.
Three Variables:
Two categorical independent variables (e.g., Gender:
males/females;
Age group: young, middle, old); and
One continuous dependent variable (e.g., total
optimism).
Summary:
For example, it allows you to test for:
Sex differences in optimism;
Differences in optimism for young, middle and old
subjects; and
The interaction of these two variables-is there a
difference in the effect of age on optimism for males
and females?
One-Way Independent Measures ANOVA
Data Set
Hypotheses
Are there significant differences among the four teaching
methods?
Set a significance level
Use a .05 significant level.
Data Input
The Analyze Menu
1. To obtain a one-way analysis of variance, from
the menus choose:
Analyze
CompareMeans
One-WayANOVA