You are on page 1of 43

Data Analysis Using SPSS

By

Dr.R.RAVANAN
Associate Professor Department of Statistics Presidency College Chennai 600 005

E-mail: ravananstat@gmail.com Mobile: 98403 75672 / 94442 21627

What is SPSS?
Statistical Package for Social Science General Purpose Statistical Software Consists of three components
Data Window - data entry and database (.sav) Output Window - all output from any SPSS session (.lst) Syntax Window - commands lines (.sps)

Data Entry & Preparation


Data entry
New or Recalled (SPSS or non-SPSS)

Data Definition Data Manipulation and Variable Development

Data Definition
Purpose:
Give meanings to the numbers for ease of reading the output

Involves
Data Format Variable Name Value Labels Missing Values Command: Data Data Definition

Data Manipulation
Recoding To give new values to old values (especially reversing negatively worded questions) To form nominal variable from continuous data Variable Development To form new variables combinations of old ones or functions of old ones Command: Transform Recode/ Compute

Data Analysis - Descriptive


Purpose: To describe each variable - What is the current level of the variable of interest? Command Frequency Means, Minimum, Maximum, Standard Deviation, Quartiles, Standard Deviation Analyze Frequencies /Descriptives

Data Analysis - Descriptive


Frequencies for two or more nominal variables Analyze Summarize Crosstabulation

Means of variables by subgroups defined by one or more nominal variables Analyze Compare Means Means (Use of Levels)

Parametric Test of Differences


When
dependent continuous variable and we want to test differences across groups

Command
Analyze Compare Means Independent t-test/ Paired t-test/ oneway ANOVA

Non-Parametric Test of Differences


When
dependent variable ordinal or normal assumption not met

Command
Analyze Non-parametric 2 Independent/ 2 related samples/ k independent samples/ k related samples

Parametric Two-Way ANOVA


When
continuous dependent variable and related groups

Command
Analyze General Linear Model Simple Note: Fixed Factor Effect

Bivariate Relationship
When
Covariation between two variables

Correlation:
When both are continuous or ordinal

Command
Analyze Correlate Bivariate (with option for Spearman if both ordinal)

Regression Analysis
When
To establish relationship between one continuous dependent variable and a number of continuous independent variables

Command
Analyze Regression Linear (Use Statistics, Save options)

Issues:
Assumptions of Regression - normality; constant variance, independence of independent variables; independence of error terms

Regression Analysis
Issues (cont.)
Outliers and Leverage Values Choice of Selection Method of Independent Variables - Enter, Backward, Forward, Stepwise Dummy Independent Variables

Options
Residual Analysis; Influence Statistics, Collinearity Diagnostics, Normality Plots

Regression Analysis
Interpretation
Goodness of Model: R2, F-statistics, Adj. R2, Standard error Strength of Influence of Independent Variables: beta and standardized beta

Reliability Analysis
When
Before forming composite index to a variable from a number of items

Command
Analyze Scale Reliability Analysis (with option for Descriptives item, scale, scale if item deleted)

Interpretation
alpha value greater than 0.7 is good; more than 0.5 is acceptable; delete some items if necessary

Measures of Reliability
Internal Consistency: (of items in a scale): 1. Average inter-item correlation If average inter-item correlation > 0.6, then standardize items and add them together as an index. 2. Cronbach's alpha , which measures " internal consistency of items in a scale" Garson ,G.D.(1999) and is

Factor Analysis
When
To reduce the number of variables to underlying dimensions

Command
Analyze Data Reduction Factor (Option: rotation, save factor scores)

Issues
Assumptions sufficient correlations between the variables (Bartlett test; anti-image, KMO test of sufficiency)

Discriminant Analysis
When
Dependent Variable is Nominal and the Purpose is to predict group membership on the basis of independent variables

Command
Analyze Classify Discriminant (Option: Classify by summary tables; Select - for holdout and analysis samples

Issues
Similar to Regression

Discriminant Analysis
Interpretation
Goodness of Analysis: Hits Ratio compared to maximum chance, proportional chance and Press Q. Univariate Results: To establish the discriminating variables

Exercise 1: t TEST FOR SINGLE MEAN


Problem:
The satisfaction levels of 12 employees current job are given below:
Emp No
Satisfaction level

10

11

12

HS

HS

HS

HS

HS

Test whether the level of satisfaction are above average level at 1% level

Solution:
1. Null Hypothesis: The level of satisfaction of employees is equal to average level. 2. Alternate Hypothesis: The level of satisfaction of employees is not equal to average level 3. Test Statistic: t test for single mean is

Exercise 2: t -TEST FOR DIFFERENCE OF TWO MEANS(INDEPENDENT SAMPLE) Problem: The Marks obtained by a group of 9 regular students and another group of 11 part-time course students in a test are given below:
Regular Part -Time 70 78 75 71 73 59 78 69 62 70 71 62 60 56 69 64 72 72 68 66

Examine whether the marks obtained by regular and part-time students differ significantly at 5% level of significance.

Solution: 1. Null Hypothesis: There is no significant difference between the average marks obtained by regular and Part time students 2. Alternate Hypothesis: There is a significant difference between the average marks obtained by regular and Part-Time students.

3. Test Statistic: t test for difference of two means is

Exercise 3: PAIRED t TEST FOR DIFFERENCE OF TWO MEANS (DEPENDENT SAMPLES) Problem: A Company arranged an intensive training course for its team of salesmen. A random sample of 10 salesmen was selected and the value (in 000) of their sales made in the weeks immediately before and after the course are shown in the following table: Salesmen Sales Before Sales After 1 12 18 2 3 4 18 5 10 13 6 21 22 7 8 9 8 12 10 14 16

23 5

19 15 17 19

22 15 21

Test whether there is evidence of an increase in mean sales.

Solution: 1. Null Hypothesis: There is no significant difference in mean sales of before and after the training course. 2. Alternate Hypothesis: There is significant difference in mean sales of before and after the training course.

3. Test Statistic: Paired t test for difference of two means is

Exercise 4: F-TEST FOR EQUALITY OF TWO VARIANCE


Problem: Time taken by workers in performing a job are given below Method I 20 16 26 27 23 22

Method II

27

33

42

35

32

34

38

Test whether there is any significance difference between the variance of time distribution.

Solution: 1. Null Hypothesis: There is no significant difference between the variance of method I and method II with regard to time distribution. 2. Alternate Hypothesis: There is significant difference between the variance of method I and method II with regard to time distribution. 3. Test Statistic: F test for equality of variance is

Exercise 5: ANOVA (ONE WAY CLASSIFICATION)


Problem:

The Following table gives the yields of 15 sample of plot under three varieties of seed. Variety A Variety B Variety C 20 18 25 20 20 28 23 17 22 16 15 28 20 25 32

Test whether there is significance difference in the average yield of three varieties of seed

1. Null

Hypothesis: There is no significant difference between average yield of three varieties of seeds

2. Alternate Hypothesis: There is a significant difference between the average yield of three varieties of seeds.

Exercise 6 : ANOVA (TWO WAY CLASSIFICATION)


Problem: Perform a two-way ANOVA and test for the difference between varieties as well as blocks to the following data.

Variety
1 A B C 52 43 39 2 56 41 39

Blocks
3 48 45 41 4 44 38 41

1. Null Hypothesis: There is no significant difference between the mean yields between varieties as well as blocks. 2. Alternate Hypothesis: There is significant difference between the mean yields between varieties as well as blocks.

Exercise 7: CHI SQUARE TEST FOR GOODNESS OF FIT

Problem: A company keeps records of accidents. During a recent safety review, a random sample of 60 accidents was selected and classified by the day of the week on which they occurred.
Day No of accidents Monday Tuesday Wednesday Thursday Friday 8 12 9 14 17

Test whether there is any evidence that accidents are more likely on some days than others.

Solution: 1. Null Hypothesis: Accidents are equally distributed over the days of the week. 2. Alternate Hypothesis: Accidents are not equally distributed over the days of the week 3. Test Statistic: Chi-square test for goodness of fit is

Exercise 8: CHI SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES Problem: The following table gives the data relating to the condition of child and condition of home. Test whether the two attributes are independent.

Condition of Child
Clean

Condition of Home
Clean 70 Dirty 50

Fairly clean
Dirty

80
35

20
45

Solution: 1. Null Hypothesis: There is no association between condition of child and condition of home. 2. Alternate Hypothesis: There is an association between condition of child and condition of home. 3. Test Statistic: Chi-square test for independence of attributes is

Exercise 9: TEST FOR SIGNIFICANCE OF CORRELATION COEFFICIENT Problem: Find the correlation coefficient between income and expenditure of the family to the following data. Also test whether correlation coefficient is significant. Income ( in hundreds) Expenditure (in hundreds) 60 55 58 50 45 40 65 60 56 62 38 45 70 63

Solution: First find the coefficient of correlation by using the formula

1. Null Hypothesis: There is no relationship between income and expenditure of the family

2. Alternate Hypothesis: There is relationship between income and expenditure of the family 3. Test Statistic: t test for coefficient of correlation is

Exercise: 10 REGRESSION ANALYSIS Problem:


The following table gives the food expenditure, annual income and family size of 10 families. Fit a multiple regression equation of Food Expenditure on annual family Income and family Size..
Family Annual Food Expenditure (000) Annual Income(000) Family Size (number in family)

1 2

5.2 5.1

28 26

3 3

3
4 5 6

5.6
4.6 11.3 8.1

32
24 54 29

2
1 4 2

7
8 9 10

7.8
5.8 5.1 18.0

44
30 40 82

3
2 1 6

The regression model is

Non-Parametric Test
One sample test:
Binomial Test Chi-Square test for goodness of fit Kolmogorov-Smirnov one sample test

Two Independent sample:


Fisher Exact test Chi-Square test for intendance of attributes Median test Mann-Whitney U test Kolmogorov-Smirnov Two sample test

Non-Parametric Test
Two dependent sample
McNemar test Sign test Wilcoxon Matched-Pairs signed rank test Walsh test

More than two independent samples


Krushkal_Wallis one-way analysis Chi-square test for k impendent sample Extention of Median test

More than two dependent samples


Friedman Two way analysis Cochran Q test

Mann-Whitney U test
Mann-Whitney U test is

Where

Wilcoxon test
Wilcoxon test is

Where

T = Sum of rank with less frequent sign

Krushkal-Wallis one-way analysis


Krushkal - Wallis test is

Where

R = Sum of rank of each group


N = Total number of observations n = Number of observation in each group k = Number of groups

Friedman Two way analysis


Friedman test is

Where

R = Sum of rank of each items


N = Total number of observations k = Number of items

You might also like