SPSS Def + Example - New - 1!1!2011

Data Analysis Using SPSS
By
Dr.R.RAVANAN
Associate Professor Department of Statistics Presidency College Chennai 600 005
E-mail: ravananstat@gmail.com Mobile: 98403 75672 / 94442 21627
What is SPSS?
Statistical Package for Social Science General Purpose Statistical Software Consists of three components
Data Window - data entry and database (.sav) Output Window - all output from any SPSS session (.lst) Syntax Window - commands lines (.sps)
Data Entry & Preparation

Data entry
New or Recalled (SPSS or non-SPSS)
Data Definition Data Manipulation and Variable Development
Data Definition
Purpose:
Give meanings to the numbers for ease of reading the output
Involves
Data Format Variable Name Value Labels Missing Values Command: Data Data Definition
Data Manipulation
Recoding To give new values to old values (especially reversing negatively worded questions) To form nominal variable from continuous data Variable Development To form new variables combinations of old ones or functions of old ones Command: Transform Recode/ Compute
Data Analysis - Descriptive

Purpose: To describe each variable - What is the current level of the variable of interest? Command Frequency Means, Minimum, Maximum, Standard Deviation, Quartiles, Standard Deviation Analyze Frequencies /Descriptives
Data Analysis - Descriptive

Frequencies for two or more nominal variables Analyze Summarize Crosstabulation
Means of variables by subgroups defined by one or more nominal variables Analyze Compare Means Means (Use of Levels)
Parametric Test of Differences

When
dependent continuous variable and we want to test differences across groups
Command
Analyze Compare Means Independent t-test/ Paired t-test/ oneway ANOVA
Non-Parametric Test of Differences

When
dependent variable ordinal or normal assumption not met
Command
Analyze Non-parametric 2 Independent/ 2 related samples/ k independent samples/ k related samples
Parametric Two-Way ANOVA

When
continuous dependent variable and related groups
Command
Analyze General Linear Model Simple Note: Fixed Factor Effect
Bivariate Relationship
When
Covariation between two variables
Correlation:
When both are continuous or ordinal
Command
Analyze Correlate Bivariate (with option for Spearman if both ordinal)
Regression Analysis
When
To establish relationship between one continuous dependent variable and a number of continuous independent variables
Command
Analyze Regression Linear (Use Statistics, Save options)
Issues:
Assumptions of Regression - normality; constant variance, independence of independent variables; independence of error terms
Regression Analysis
Issues (cont.)
Outliers and Leverage Values Choice of Selection Method of Independent Variables - Enter, Backward, Forward, Stepwise Dummy Independent Variables
Options
Residual Analysis; Influence Statistics, Collinearity Diagnostics, Normality Plots
Regression Analysis
Interpretation
Goodness of Model: R2, F-statistics, Adj. R2, Standard error Strength of Influence of Independent Variables: beta and standardized beta
Reliability Analysis
When
Before forming composite index to a variable from a number of items
Command
Analyze Scale Reliability Analysis (with option for Descriptives item, scale, scale if item deleted)
Interpretation
alpha value greater than 0.7 is good; more than 0.5 is acceptable; delete some items if necessary
Measures of Reliability
Internal Consistency: (of items in a scale): 1. Average inter-item correlation If average inter-item correlation > 0.6, then standardize items and add them together as an index. 2. Cronbach's alpha , which measures " internal consistency of items in a scale" Garson ,G.D.(1999) and is
Factor Analysis
When
To reduce the number of variables to underlying dimensions
Command
Analyze Data Reduction Factor (Option: rotation, save factor scores)
Issues
Assumptions sufficient correlations between the variables (Bartlett test; anti-image, KMO test of sufficiency)
Discriminant Analysis
When
Dependent Variable is Nominal and the Purpose is to predict group membership on the basis of independent variables
Command
Analyze Classify Discriminant (Option: Classify by summary tables; Select - for holdout and analysis samples
Issues
Similar to Regression
Discriminant Analysis
Interpretation
Goodness of Analysis: Hits Ratio compared to maximum chance, proportional chance and Press Q. Univariate Results: To establish the discriminating variables
Exercise 1: t TEST FOR SINGLE MEAN

Problem:
The satisfaction levels of 12 employees current job are given below:
Emp No
Satisfaction level
10
11
12
HS
HS
HS
HS
HS
Test whether the level of satisfaction are above average level at 1% level
Solution:
1. Null Hypothesis: The level of satisfaction of employees is equal to average level. 2. Alternate Hypothesis: The level of satisfaction of employees is not equal to average level 3. Test Statistic: t test for single mean is
Exercise 2: t -TEST FOR DIFFERENCE OF TWO MEANS(INDEPENDENT SAMPLE) Problem: The Marks obtained by a group of 9 regular students and another group of 11 part-time course students in a test are given below:
Regular Part -Time 70 78 75 71 73 59 78 69 62 70 71 62 60 56 69 64 72 72 68 66
Examine whether the marks obtained by regular and part-time students differ significantly at 5% level of significance.
Solution: 1. Null Hypothesis: There is no significant difference between the average marks obtained by regular and Part time students 2. Alternate Hypothesis: There is a significant difference between the average marks obtained by regular and Part-Time students.
3. Test Statistic: t test for difference of two means is
Exercise 3: PAIRED t TEST FOR DIFFERENCE OF TWO MEANS (DEPENDENT SAMPLES) Problem: A Company arranged an intensive training course for its team of salesmen. A random sample of 10 salesmen was selected and the value (in 000) of their sales made in the weeks immediately before and after the course are shown in the following table: Salesmen Sales Before Sales After 1 12 18 2 3 4 18 5 10 13 6 21 22 7 8 9 8 12 10 14 16
23 5
19 15 17 19
22 15 21
Test whether there is evidence of an increase in mean sales.
Solution: 1. Null Hypothesis: There is no significant difference in mean sales of before and after the training course. 2. Alternate Hypothesis: There is significant difference in mean sales of before and after the training course.
3. Test Statistic: Paired t test for difference of two means is
Exercise 4: F-TEST FOR EQUALITY OF TWO VARIANCE

Problem: Time taken by workers in performing a job are given below Method I 20 16 26 27 23 22
Method II
27
33
42
35
32
34
38
Test whether there is any significance difference between the variance of time distribution.
Solution: 1. Null Hypothesis: There is no significant difference between the variance of method I and method II with regard to time distribution. 2. Alternate Hypothesis: There is significant difference between the variance of method I and method II with regard to time distribution. 3. Test Statistic: F test for equality of variance is
Exercise 5: ANOVA (ONE WAY CLASSIFICATION)

Problem:
The Following table gives the yields of 15 sample of plot under three varieties of seed. Variety A Variety B Variety C 20 18 25 20 20 28 23 17 22 16 15 28 20 25 32
Test whether there is significance difference in the average yield of three varieties of seed
1. Null
Hypothesis: There is no significant difference between average yield of three varieties of seeds
2. Alternate Hypothesis: There is a significant difference between the average yield of three varieties of seeds.
Exercise 6 : ANOVA (TWO WAY CLASSIFICATION)

Problem: Perform a two-way ANOVA and test for the difference between varieties as well as blocks to the following data.
Variety
1 A B C 52 43 39 2 56 41 39
Blocks
3 48 45 41 4 44 38 41
1. Null Hypothesis: There is no significant difference between the mean yields between varieties as well as blocks. 2. Alternate Hypothesis: There is significant difference between the mean yields between varieties as well as blocks.
Exercise 7: CHI SQUARE TEST FOR GOODNESS OF FIT
Problem: A company keeps records of accidents. During a recent safety review, a random sample of 60 accidents was selected and classified by the day of the week on which they occurred.
Day No of accidents Monday Tuesday Wednesday Thursday Friday 8 12 9 14 17
Test whether there is any evidence that accidents are more likely on some days than others.
Solution: 1. Null Hypothesis: Accidents are equally distributed over the days of the week. 2. Alternate Hypothesis: Accidents are not equally distributed over the days of the week 3. Test Statistic: Chi-square test for goodness of fit is
Exercise 8: CHI SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES Problem: The following table gives the data relating to the condition of child and condition of home. Test whether the two attributes are independent.
Condition of Child
Clean
Condition of Home
Clean 70 Dirty 50
Fairly clean
Dirty
80
35
20
45
Solution: 1. Null Hypothesis: There is no association between condition of child and condition of home. 2. Alternate Hypothesis: There is an association between condition of child and condition of home. 3. Test Statistic: Chi-square test for independence of attributes is
Exercise 9: TEST FOR SIGNIFICANCE OF CORRELATION COEFFICIENT Problem: Find the correlation coefficient between income and expenditure of the family to the following data. Also test whether correlation coefficient is significant. Income ( in hundreds) Expenditure (in hundreds) 60 55 58 50 45 40 65 60 56 62 38 45 70 63
Solution: First find the coefficient of correlation by using the formula
1. Null Hypothesis: There is no relationship between income and expenditure of the family
2. Alternate Hypothesis: There is relationship between income and expenditure of the family 3. Test Statistic: t test for coefficient of correlation is
Exercise: 10 REGRESSION ANALYSIS Problem:

The following table gives the food expenditure, annual income and family size of 10 families. Fit a multiple regression equation of Food Expenditure on annual family Income and family Size..
Family Annual Food Expenditure (000) Annual Income(000) Family Size (number in family)
1 2
5.2 5.1
28 26
3 3
3
4 5 6
5.6
4.6 11.3 8.1
32
24 54 29
2
1 4 2
7
8 9 10
7.8
5.8 5.1 18.0
44
30 40 82
3
2 1 6
The regression model is
Non-Parametric Test
One sample test:
Binomial Test Chi-Square test for goodness of fit Kolmogorov-Smirnov one sample test
Two Independent sample:

Fisher Exact test Chi-Square test for intendance of attributes Median test Mann-Whitney U test Kolmogorov-Smirnov Two sample test
Non-Parametric Test
Two dependent sample
McNemar test Sign test Wilcoxon Matched-Pairs signed rank test Walsh test
More than two independent samples

Krushkal_Wallis one-way analysis Chi-square test for k impendent sample Extention of Median test
More than two dependent samples

Friedman Two way analysis Cochran Q test
Mann-Whitney U test
Mann-Whitney U test is
Where
Wilcoxon test
Wilcoxon test is
Where
T = Sum of rank with less frequent sign
Krushkal-Wallis one-way analysis

Krushkal - Wallis test is
Where
R = Sum of rank of each group

N = Total number of observations n = Number of observation in each group k = Number of groups
Friedman Two way analysis

Friedman test is
Where
R = Sum of rank of each items

N = Total number of observations k = Number of items

SPSS Def + Example - New - 1!1!2011

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPSS Def + Example - New - 1!1!2011

Uploaded by

Copyright:

Available Formats

Data Analysis Using SPSS

E-mail: ravananstat@gmail.com Mobile: 98403 75672 / 94442 21627

Data Entry & Preparation

Data Definition Data Manipulation and Variable Development

Data Analysis - Descriptive

Data Analysis - Descriptive

Parametric Test of Differences

Non-Parametric Test of Differences

Parametric Two-Way ANOVA

Exercise 1: t TEST FOR SINGLE MEAN

3. Test Statistic: t test for difference of two means is

Test whether there is evidence of an increase in mean sales.

3. Test Statistic: Paired t test for difference of two means is

Exercise 4: F-TEST FOR EQUALITY OF TWO VARIANCE

Exercise 5: ANOVA (ONE WAY CLASSIFICATION)

Exercise 6 : ANOVA (TWO WAY CLASSIFICATION)

Exercise 7: CHI SQUARE TEST FOR GOODNESS OF FIT

Solution: First find the coefficient of correlation by using the formula

Exercise: 10 REGRESSION ANALYSIS Problem:

The regression model is

Two Independent sample:

More than two independent samples

More than two dependent samples

T = Sum of rank with less frequent sign

Krushkal-Wallis one-way analysis

R = Sum of rank of each group

Friedman Two way analysis

R = Sum of rank of each items

You might also like