You are on page 1of 122

DATA PROCESSING, ANALYSIS AND INTERPRETATION

(SOCIAL SCIENCE RESEARCH)
Pablo E. Subong, Jr., Ed.D., Ph.D.

West Visayas State University

OBJECTIVES
To develop skills in data processing manually and with the use of SPSS  Be able to process hypothetical data  Be able to properly analyze the data

INTRODUCTION
SPSS for windows is a computer package that will perform a wide variety of statistical procedures.  Data management and analysis can be handled well with SPSS.  Using SPSS we can manipulate data, make graphs and perform statistical techniques varying from means to regression.

spss.WHAT IS SPSS?  SPSS stands for “Statistical Package for the Social Sciences”  The SPSS home-page is: www.com .

WHAT CAN YOU DO WITH SPSS? Run Frequencies  Calculate Descriptive Statistics  Compare Means  Conduct Cross-Tabulations  Recode Data  Create Graphs and Charts  Do T-Tests  Conduct ANOVAs  Run Various Type of Regressions And Much More!  .

WHAT I WILL SHOW YOU TODAY!!  Bringing your data into SPSS Recoding SPSS uses Survey  Experimental study  Social science research    .

SPSS WINDOWS PROCESS Data window  Variable view window  Output window  Chart editor window  .

MANAGEMENT OF DATA AND FILES SPSS can read different types of data files.  You can open not only SPSS files but also excel and other files.  . delete and view the contents of your data file.  You can also edit.  You can create a new data set with SPSS.

HOW TO USE DIFFERENT FILE TYPES?  Excel file  csv file  SPSS file .

TYPES OF VARIABLES  You can select type of variable   String Numeric  You can also select format of variable Categorical  Ordinal  Interval  .

 Gender  Hair color is also a categorical variable  .CATEGORICAL (NOMINAL) A categorical variable is one that has two or more categories. but there is no intrinsic ordering to the categories.

.  The difference between the two is that there is a clear ordering of the variables. the spacing between the values may not be the same across the levels of the variables.   SES (Socio Economic Status) Education  Even though we can order these from lowest to highest.ORDINAL VARIABLE  An ordinal variable (nominal) is similar to a categorical variable.

INTERVAL VARIABLE An interval variable is similar to an ordinal variable.  Annual Income measured in Euros  . except that the intervals between the values of the interval variable are equally spaced.

WHY DOES IT MATTER?
Statistical computations and analyses assume that the variables have specific levels of measurement  Can you compute average of hair color?  Does it makes sense to compute the average of educational experience?  An average requires a variable to be interval.

DATA ANALYSIS
Data analysis embraces both the problem of finding an appropriate model, on the one hand, and model estimation and testing, on the other.  In this context normality assumption becomes important.  In social sciences, it is hard to find typical bell shaped normal distribution.

NORMAL DISTRIBUTION

In general, the bell shape distribution has the following characteristics
The average is located in the center of the distribution.  The greater the distance from average, the lower the frequency.

000-29.999=0 30.Sample Coding Book Infant’s sex =sex Male=1 Female=2 Family income ($)=fincome 5.999=2 Maternal age (years)=m_age Maternal Smoking status=m_smk Yes=1 No=0 Birth weight (granms) =bwgt Maternal weight before pregnancy (pounds)=m_wgt Father’s weight before the pregnancy=f_wgt .000-59.999=1 60.000-99.

. .Sample Birth Weight Data ID 1 2 3 4 5 6 7 . 99 100 sex 2 1 2 2 2 2 1 fincome 2 2 1 0 0 0 0 m_age 29 25 28 28 19 35 27 m_smk 1 1 0 1 1 0 1 bwgt 3770 3742 3175 2919 3288 3175 3883 m_wgt 122 125 160 110 105 120 125 f_wgt 167 200 210 165 160 160 180 2 1 2 1 24 23 0 0 4337 4110 123 115 173 140 ....

-DATA PROCESSING -SPSS DEMO .

USING SPSS © FOR WINDOWS 3 May 1999 20 Introduction  Data procedures  Statistical procedures  Syntax files  Editing output  .

3 May 1999 21 INTRODUCTION .

STEPS FOR ANALYZING DATA Enter the data  Select the procedure and options  Select the variables  Run the procedure  Examine the output  3 May 1999 22 .

click Statistics Choose Summarize  Click Frequencies 3 May 1999 23 .COMMON OPERATIONS MENU OPTIONS  In the menu.

Variables are selected from the list on the left.COMMON OPERATIONS VARIABLES DIALOG BOX This type of dialog box is used for many procedures. Click the arrow to move them to the appropriate box on the right. 3 May 1999 24 .

USING SPSS FOR WINDOWS DATA PROCEDURES Ways to Enter Data  Entering Data Directly  © Defining variables  Entering data  Viewing Data  Recoding Variables  Computing New Variables  Selecting Cases  .

WAYS TO ENTER THE DATA SPSS datafile  Import data  26 Database file  Spreadsheet file  ASCII text file   Enter data directly with Data Editor .

ENTERING DATA DIRECTLY-DEFINE THE VARIABLES 3 May 1999 27 .

ENTERING DATA DIRECTLYDEFINE THE VARIABLE  Name  Type and size  Labels  Missing values 3 May 1999 28 .

NAME  Name      the variable No more than 8 characters Each name unique Must begin with a letter Certain characters not allowed Not case sensitive 3 May 1999 29 .DEFINE THE VARIABLE .

DEFINE THE VARIABLE - TYPE
 Define

the variable

type.  Define the variable width.  Define the number of decimal places.

3 May 1999

30

DEFINE THE VARIABLE - LABELS
 Labels

will be displayed in the output.  Variable Label

can be more descriptive than variable name

3 May 1999

31

DEFINE THE VARIABLE MISSING VALUES

 Missing

values are used to define userspecified missing information.
  

No response Refused to answer Data entry mistakes

3 May 1999

32

DEFINE THE VARIABLE COLUMN FORMAT Column Format is used to define column width and alignment in the Data Editor window 3 May 1999 33 .

g. 3 May 1999 34 . • Press <Tab> key or right arrow key to move to next variable..ENTERING DATA DIRECTLY • Each row is a case (e. survey form). • Enter the value for each variable.

• Press <Enter> key to move to next case.ENTERING DATA DIRECTLY • Leave blank or use user-defined missing value if no answer. 3 May 1999 35 .

CHANGE THE VIEW .  In the menu.VALUE LABELS Data entered as numeric codes can be displayed as value labels. click View  Click Value Labels 3 May 1999 36 .

RECODE PROCEDURE Recode is used to to change the values of an existing variable to create a new variable based on the values an existing variable 3 May 1999 37 .

click Transform.  Select Recode.  Click Into Different Variable(s) 3 May 1999 38 .RECODE INTO NEW VARIABLE  In the menu.

RECODE INTO NEW VARIABLE Select and move variable(s) over. and label new variable. Old and New Values  Name  Click 3 May 1999 39 .

RECODE INTO NEW VARIABLE For each value of the existing variable • Enter the new value • Repeat for each value or range of values • Click Continue 3 May 1999 40 .

RECODE INTO NEW VARIABLE Click Click Change OK 3 May 1999 41 .

DEFINE LABELS FOR NEW VARIABLE In the Data menu. value labels for the new variable. Click Enter 3 May 1999 42 . Labels. click Define Variable.

In the menu. click Transform. 3 May 1999 43 .COMPUTE PROCEDURE Compute is used to create a new variable. Click Compute.

COMPUTE PROCEDURE Name the new variable. 3 May 1999 44 . Click Type&Label to define the characteristics of the new variable.

Enter the variable type.COMPUTE PROCEDURE Label the new variable. 3 May 1999 45 .

COMPUTE PROCEDURE  Enter the numeric expression that will determine the values of the new variable.  Click OK. 3 May 1999 46 .

. use Select Cases. 3 May 1999 47 .SELECT CASES For a subset of the datafile..  In the menu.  Click Select Cases. click Data.

SELECT CASES ALCOHOL DRINKERS ONLY To select only those cases which meet certain criteria. 3 May 1999 48 . choose the If option.

3 May 1999 49 . • Click Continue.SELECT CASES ALCOHOL DRINKERS ONLY • Enter the expression that will determine which variables will be selected.

click OK.SELECT CASES ALCOHOL DRINKERS ONLY When you’ve finished specifying selection criteria. 3 May 1999 50 .

USING SPSS FOR WINDOWS STATISTICAL PROCEDURES 3 May 1999 ©  Summarizing Data   51 Frequencies Crosstabs (Chi Square)  Comparing Means   T-Tests One-Way Analysis of Variance  Nonparametric Tests Wilcoxon Signed Ranks Mann-Whitney U  Kruskal-Wallis   .

FREQUENCIES  In the menu. click Statistics Choose Summarize  Click Frequencies 3 May 1999 52 .

FREQUENCIES Select and move the • variables. Click Statistics. 3 May 1999 53 .

 Click Continue.FREQUENCIES  Choose the appropriate statistics. 3 May 1999 54 .

click Charts. 3 May 1999 55 .FREQUENCIES .CHARTS For histograms or other charts.

FREQUENCIES Choose the type of chart and click Continue 3 May 1999 56 .

FREQUENCIES To select the format of the table(s). 3 May 1999 57 . click Format.

FREQUENCIES Choose the format and click Continue 3 May 1999 58 .

3 May 1999 59 .FREQUENCIES Click OK to run the Frequencies procedure.

FORMAT OPTION ORGANIZE OUTPUT BY VARIABLES 3 May 1999 60 .FREQUENCIES .

FORMAT OPTION COMPARE VARIABLES 3 May 1999 61 .FREQUENCIES .

FREQUENCIES .DISTRIBUTION TABLE i n a u l r r u r c c c e V 0 8 2 9 9 1 4 6 9 8 2 2 3 5 3 3 3 5 7 0 4 1 7 6 6 5 9 4 1 7 6 3 0 1 8 7 8 7 0 8 8 0 6 1 8 9 5 8 2 0 1 1 2 0 0 T 4 9 0 S 0 1 T 4 0 3 May 1999 62 .

FREQUENCIES .8 0 0.0 6.0 N = 424.0 4.HISTOGRAM Apgar 1 minute score 300 200 100 Frequency Std. Dev = 1.0 2.00 Apgar 1 m inute score 3 May 1999 63 .0 10.0 8.83 Mean = 7.

click on Statistics  Choose Summarize  Click Crosstabs 3 May 1999 64 .CROSSTABS  In the menu.

 Move the predictor variable(s) to the Column(s) box.CROSSTABS  Move the outcome variable(s) to the Row(s) box.  Click Statistics. 3 May 1999 65 .

3 May 1999 66 .CROSSTABS  Select the appropriate statistics.  Click Continue.

percentages. and residuals to be displayed in each cell.CROSSTABS To select the counts. click Cells. 3 May 1999 67 .

CROSSTABS  Select the information to be displayed in each cell.  Click Continue. 3 May 1999 68 .

CROSSTABS To run the Crosstabs procedure. click OK. 3 May 1999 69 .

OUTPUT t c 3 May 1999 70 .CROSSTABS .

. 8 2 0 % % % % c a % % % % c i % % % % T C o 2 4 8 7 5 E . . . 2 8 0 % % % % c a % % % % c i % % % % N C 1 3 8 9 7 E .CROSSTABS . . 0 0 0 % % % % c a % % % % c i % % % % 3 May 1999 71 .OUTPUT e k s s c i T y n o e o t s B L C i o 0 8 8 c a E . . .

INDEPENDENT SAMPLES T-TEST In the menu. • Click Independent Samples T-Test. Choose Compare Means. 3 May 1999 72 . click Statistics.

INDEPENDENT SAMPLES T-TEST Select  and move Test Variable(s)  Grouping Variable Click Define Groups. 3 May 1999 73 .

INDEPENDENT SAMPLES T-TEST Enter the values for the groups. 3 May 1999 74 . Click Continue.

3 May 1999 75 .INDEPENDENT SAMPLES T-TEST Click OK to run the procedure.

e i g a e e o a F p d i t r r g i p f e e 7 2 6 2 0 4 3 4 3 a 6 0 4 4 1 2 1 a 3 May 1999 76 .INDEPENDENT SAMPLES T-TEST .OUTPUT i s u f a o n a l e r .

Choose Compare Means. 3 May 1999 77 . click on Statistics. • Click One-Way Analysis of Variance.ONE-WAY ANALYSIS OF VARIANCE In the menu.

Move the grouping variable(s) to the Factor box. 3 May 1999 78 .ONE-WAY ANALYSIS OF VARIANCE Move the dependent variable(s) to the Dependent List box. click Post Hoc. For comparison tests.

3 May 1999 79 .ONE-WAY ANALYSIS OF VARIANCE Select the appropriate Post Hoc comparisons . Click Continue.

ONE-WAY ANALYSIS OF VARIANCE Click Options for  Descriptive statistics  Homogeneity of variance  Mean plots  Missing values options 3 May 1999 80 .

3 May 1999 81 .ONE-WAY ANALYSIS OF VARIANCE Make appropriate selections. then click Continue.

click OK. 3 May 1999 82 .ONE-WAY ANALYSIS OF VARIANCE To run the One-way ANOVA procedure.

ONE-WAY ANALYSIS OF VARIANCE OUTPUT i r t x i i i i f f i 3 May 1999 83 .

ONE-WAY ANALYSIS OF VARIANCE OUTPUT i f 3 May 1999 84 .

o I o ( ( i u J I E u J g ) U N n 6 5 2 1 8 2 8 O 7 1 5 3 4 0 9 O 5 7 7 6 0 7 7 * N U o 5 6 2 1 8 8 2 O 6 1 4 7 3 0 3 O 6 6 5 7 0 2 8 * O U 1 7 5 3 4 9 0 N 1 6 4 7 3 3 0 O 0 8 2 9 1 0 3 * O U 7 5 7 6 0 7 7 * N 6 6 5 7 0 8 2 * O 8 0 2 9 1 3 0 * * . T h 3 May 1999 85 .ONE-WAY ANALYSIS OF VARIANCE OUTPUT Co De T u d e e a U e o p r w e p S .

click Statistics  Choose Nonparametric Tests  Click 2 Related Samples 3 May 1999 86 .WILCOXON SIGNED RANKS TEST  In the menu.

Click Options. 3 May 1999 87 .WILCOXON SIGNED RANKS TEST Move selected variable pairs to the Test Pair(s) List box... Choose the statistical test(s).

3 May 1999 88 .WILCOXON SIGNED RANKS TEST Check Descriptives box for descriptive statistics.

3 May 1999 89 .WILCOXON SIGNED RANKS TEST Click OK to run the procedure.

WILCOXON SIGNED RANKS TEST a b c t b a g a n u o r g a n u o a Z 1 0 a b 3 May 1999 90 .

click Statistics  Choose Nonparametric Tests  Click 2 Independent Samples 3 May 1999 91 .MANN-WHITNEY U TEST  In the menu.

MANN-WHITNEY U TEST  Select   and move Test Variable(s) GroupingVariable Click Define Groups. 3 May 1999 92 .

3 May 1999 93 .MANN-WHITNEY U TEST Enter the values for the groups. Click Continue.

• After changing options.MANN-WHITNEY U TEST Click Options. 3 May 1999 94 . click Continue. • Click OK to run the procedure.

OUTPUT i t b a g a n c o 0 0 Z 8 6 5 a a b 3 May 1999 95 .MANN-WHITNEY U TEST .

click Statistics  Choose Nonparametric Tests  Click K Independent Samples 3 May 1999 96 .KRUSKAL-WALLIS TEST  In the menu.

 Click Define Range.KRUSKAL-WALLIS TEST  Move the dependent variable(s) to the Test Variable List box. 3 May 1999 97 .  Move the grouping variable(s) to the Grouping Variable box.

KRUSKAL-WALLIS TEST  Enter the minimum and maximum values for the Grouping Variable. 3 May 1999 98 .  Click Continue.

3 May 1999 99 .KRUSKAL-WALLIS TEST  Check the box for the Kruskal-Wallis H.  Click OK to run the procedure.

KRUSKAL-WALLIS TEST .OUTPUT t 3 May 1999 100 .

USING SPSS FOR WINDOWS EDITING THE OUTPUT 3 May 1999 © 101 Pivot Tables  Scatterplots  Charts  .

Choose Scatter…. click on Graphs.SCATTERPLOT In the menu. 3 May 1999 102 .

3 May 1999 103 .SCATTERPLOT Choose the appropriate type of plot. Click Define.

SCATTERPLOT Select and move the variables for the X and Y axes to the appropriate box. Click OK to run the procedure. 3 May 1999 104 .

OUTPUT 5000 4000 Regression line must be added.SCATTERPLOT . 3000 2000 1000 BTWT 0 10 20 30 40 50 60 70 BMI 3 May 1999 105 .

3 May 1999 106 .EDIT THE SCATTERPLOT In the Output Window Click the chart object to select it. click Edit. In the menu. Click Open. Choose SPSS Chart Object.

SCATTERPLOT The Chart Window will open. 3 May 1999 107 .

Click Options.EDIT THE SCATTERPLOT In the Chart Window In the menu. click Chart. 3 May 1999 108 .

EDIT THE SCATTERPLOT Check the Total box. Click OK. 3 May 1999 109 .

OUTPUT 5000 4000 Regression line is added.SCATTERPLOT . 3000 2000 1000 BTWT 0 10 20 30 40 50 60 70 BMI 3 May 1999 110 .

EXERCISE DATASETS Coding and recoding  Survey about smoking habit  Test of Difference  .

D.D.. Ed.SUBONG. JR. Ph. .STATISTICAL DATA ANALYSIS AND INTERPRETATION Prepared By: PABLO E..

96 1.75 1. Type of School Private Public E. SES High Average Low D.TABLE 1: NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS Category A.66 0.66 1.48 0. 0.33-3.60 0.32 Mean 1.09 1.00 Low . Gender Male Female C.67-2.72 0.96 Scale 1.29 1.00-1. Mental Ability High Average Low 1.85 2.59 0.47 0.51 0.60 2.90 1.85 2.61 0.45 0.67 0.D. Entire Group B.83 2.07 Description Average Average Average Average Average Average Average Average Average Average Average Description High Average S.67 1.

=0. the NMAT performance of the BS Biology students is average. and mental ability. Generally. . (M=1.d.96. socioeconomic status. the BS Biology students exhibited the same level of NMAT performance which is average. type of school.60)  When they are classified into their gender. s.NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS  The NMAT Performance of the BS Biology students is presented in Table 1.

TABLE 2: T-TEST RESULTS FOR THE DIFFERENCES IN THE NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS
Compared Groups A. Gender Male Female B. Type of School 34 82.80 71.63 16.74 20.92 1.782 .084 d.f. Mean s.d. t-ratio t-Prob.

Private Public

34

76.22

22.52 79.44

0.496 15.87

0.623

p > 0.05 Not significant at 0.05 alpha

DIFFERENCES IN THE NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS

The differences in the NMAT performance of the BS Biology students are shown in Table 2. The t-test computations reveal no significant differences in the NMAT performance of the BS Biology students when they are classified into gender, t(34)=1.782, p=0.084. The null hypothesis of no significant difference in the NMAT performance of the BS Biology students that would exist between gender was accepted. This simply shows that both male and female BS Biology students have the same performance in their NMAT. Likewise, when they are classified into type of school, students coming from private and public schools exhibited the same performance in their NMAT, t(34)=0.496, p=0.623. This similar performance might be attributed to the fact that public school nowadays can now compete with the private schools in terms of scholastic performance of the students.

TABLE 3-A: ANOVA RESULTS FOR THE DIFFERENCES IN THE NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS CLASSIFIED AS TO SOCIOECONOMIC STATUS
Sources of Variation Between Groups Within Groups Total Degrees of Freedom 2 33 35 Sum of Squares 1143.17 1855.83 12999.00 Mean Square s 571.58 359.27 F-ratio F-Prob.

1.591

0.219

p > 0.05 Not significant at 0.05 alpha

00 Mean F-ratio Square s 2673.000 p < 0.TABLE 3-B: ANOVA RESULTS FOR THE DIFFERENCES IN THE NMAT PERFORMANCE OF THE BS BIOLOGY STUDENTS CLASSIFIED AS TO THEIR MENTAL ABILITY Sources of Variation Between Groups Within Groups Total Degrees of Freedom 2 33 35 Sum of Squares 5346.25 11.528 231.50 7652. 0.05 Significant at 0.50 12999.05 alpha .89 FProb.

034 .05 Significant at 0.000 0.00 0.75 17.75 Significant 0.TABLE 3-C: POST HOC TEST FOR THE DIFFERENCES IN MEANS IN THE NMAT PERFORMANCE OF BS BIOLOGY STUDENTS CLASSIFIED AS TO MENTAL ABILITY NMAT Performance High Mental Ability Average Mean Difference 12.05 alpha Low Low 29.138 Average p < 0.

ANOVA results revealed no significant differences in the NMAT performance of the BS Biology students when they classified as to their socioeconomic status. p=0.000.33)=11. average. their performance level in their NMAT is similar. and low socioeconomic status.  . F(2. F(2. ANOVA results revealed a significant difference in their NMAT performance. The results are reflected in Table 3-B. p=0.33)=1.219. those BS Biology students with high. Meaning.528.  But when the BS Biology students are classified into their mental ability.591.

Likewise. but those students with high mental ability. . Pair-wise comparison using Scheffe Test in Table 3-C showed that those BS Biology students with high and average mental ability do not differ significantly in their NMAT performance. differ in their NMAT performance with those students with low mental ability. those students with average mental ability differ in their NMAT performance with those students with low mental ability.

THANK YOU .