Program

Data Analysis using SPSS

PRESENTER

MR VENKAT

SPSS

• Statistical

• Package for

• Social

• Sciences

VERSIONS OF SPSS

• SPSS Ver-1 to Ver-5 : DOS

VERSIONS

• SPSS Ver-6 to Ver-15 : WINDOWS

VERSIONS

• SPSS-X : For MAIN FRAMES (on

various operating system platforms)

• SPSS-LAN: For LANs

• Web site: http://www.spss.com

BASIC APPLICATIONS

• Creating data as Spread-

sheet

• Generating Reports as

Tables

• Statistical Analysis of Data

• Graphic Presentations

MAIN STEPS IN USING SPSS

• Creating data or Getting data

• Defining data

• Modifying data

• Processing data

– generating tables

– statistical analysis

– generating graphs

Structure of SPSS data file

• Variables (Fields) in columns

• Cases (Respondents) in rows

• A case contains several

variables

Data Definition

• Variables Name

• Variable Type

• Field Width

• Decimal Positions

• Variable Label

• Value Labels

• Missing Values

• Column Width

• Alignment

• Scale

Variable Name

• Maxi. 8 characters (up to Ver

10)

• First letter must be alphabet

• Arithmetic operators, special

symbols and blank spaces not

permitted

• Two variables can not have

same name in one data file

Variable Label

• It helps in reading outputs.

• No restriction on characters.

Variable Type

• Numeric (Floating point)

• String (Character / Text)

• Date

• Currency

Value Labels

• It helps in reading tables and other

outputs.

• For example variable “Marital Status”

has five values (codes):

– value 1 means “Never Married”

– value 2 means “Currently Married”

– value 3 means “Widow/Widower”

– value 4 means “Divorced”

– value 5 means “Separated”

Missing Values

• These are values indicating “No

Response” or “Not Applicable” in any

variable.

• Declaring missing values tells the SPSS

package to ignore the cases containing

these values during analysis.

• A blank in Excel or dBase/FoxPro file is

treated as missing value.

• In SPSS data file, blanks appear as

dots (.) denoting that theses are

missing values.

Creating Data directly in SPSS

data” on the menu bar.

• On getting “SPSS data editor” window,

click on “variable view” (right bottom)

and start defining data file i.e. variable

name, variable type, variable label,

value labels, missing values etc.

MANIPULATING FILES

• Insert variable

• Sort cases

• Transpose - Interchange rows and

columns

• Merge Files - Add cases, Add variables

• Aggregate

• Select cases - Select with “if” condition

• Weight cases - for estimation /

projection

VIEW

• Status Bar - process, selection,

weight, n of cases

• Tool Bar - for data, syntax,

chart, navigator (output)

• Fonts - type, size

• Grid Lines

• Value Labels

Data Modifications

• Compute - create new variable in

existing data file through an

arithmetic expression.

• Recode - reorganize values of a

variable.

• Rank cases

• Auto recode

• Create Time Series

• Replace missing values

STATISTICAL PROCEDURES

OLAP Cubes

• On Line Analytical Processing Cubes

• Calculates uni-variate summary statistics

with-in one or more categorical variables

DESCRIPTIVE STATISTICS

– Frequencies - one variable at a time

with various uni-variate statistics.

– Descriptives - uni-variate statistics.

– Explore - studying behaviour of

variables.

– Crosstabs - Two-way, Three-way

– Ratio Statistics

MEANS

• Display mean & S.D. by groups.

• One sample t-test.

• Two independent sample t-test.

• Two related samples or paired

samples t-test.

• One-way ANalysis Of VAriance

(ANOVA) with post-hoc tests.

LINEAR REGRESSION

• Methods: Enter, Stepwise, Remove, Backward,

Forward.

• Regression Coefficients: Estimate, Standard

Error, Standardized coefficients, Significance.

• Residuals: Durbin-Watson test (for auto-

correlation)

• Save: Predicted values, Residuals etc.

• Plot: Histogram, Normal Probability plot.

• Others: Multi-colinearity diagnosis, partial

correlation, R-square change etc.

CORRELATIONS

• Bivariate Correlations.

• Partial Correlations.

• Distances - Similarities and

Dissimilarities

CLASSIFY

• K-means Cluster

• Hierarchical Cluster

• Discriminant Analysis

DATA REDUCTION

• Factor Analysis.

• Correspondence Analysis.

• Optimal Scaling - Homals,

Princals,

Overals.

FACTOR ANALYSIS

• Methods: Principal Components, Principal

Axis factoring, Maximum Likelihood etc.

• Criteria: Minimum Eigen value, N of factors,

Number of Iterations.

• Rotation: Varimax, Quartimax, Equamax,

Promax, Oblimin.

• Display: Initial factor matrix, Rotated factor

matrix.

• Plot: Scree plot.

SCALES

• Reliability Analysis - Alpha, Split-

half, Guttman, Parallel.

• Multi Dimensional Scaling (MDS)

NON-PARAMETRIC

TESTS

• Chi-square

• Binomial

• Runs test

• One sample K-S test

• Two independent samples tests

• Several independent samples tests

• Two related samples tests

• Several related samples tests

TIME SERIES ANALYSIS

• Exponential Smoothing.

• Autoregression.

• Auto Regressive Integrated Moving

Averages (ARIMA).

• X11ARIMA.

• Seasonal Decomposition.

MULTIPLE RESPONSE

ANALYSIS

• Defining sets.

• Frequencies

• Crosstabulation.

CHARTS

• Bar, Line, Area, Pie, Hi-Low

• Pareto Charts, Control Charts (X-

bar,R,p,c)

• Box Plot, Error Bar

• Scatter Plot, Histogram, P-P Plot, Q-Q

Plot, Sequence Charts

• ROC Curve (Receivers’ Op

Characteristic)

• Time Series : Autocorrelations, Spectral

Plots, Cross-correlations,

Types of Data

• Nominal: A variable can be treated as

nominal when its values represent

categories with no intrinsic ranking; for

example, the department of the company

in which an employee works.

• Examples of nominal variables include

• region

• zip code

• religious affiliation etc.

Ordinal Data

• A variable can be treated as ordinal when

its values represent categories with some

intrinsic ranking; for example, levels of

service satisfaction from highly dissatisfied

to highly satisfied. Examples of ordinal

variables include attitude scores

representing degree of satisfaction or

confidence and preference rating scores.

Scale Data

• A variable can be treated as scale when

its values represent ordered categories

with a meaningful metric, so that distance

comparisons between values are

appropriate. Examples of scale variables

include age in years and income in

thousands of dollars.

Data Analysis

• Simple Tabulation and Cross Tabulation

• Univariate and Bivariate Analysis

• Dependent and Independent variables

• First Stage Analysis- Simple Tabulation

• Second Stage Analysis- Cross Tabulation

• The Chi-square test for cross tabulation

Anova and the design of

Experiments

• The analysis of variance technique is used

when the independent variables are of

nominal scale (categorical) and the

dependent variable is metric.

• The independent variable could be

different level of prices, different pack

sizes, or different product colors and the

dependent variable could be sales of the

product.

Experimental Designs

• Completely Randomized design in a one

way ANOVA (single Factor)

• Randomized Block Design (single blocking

factor)

• Latin Square Design (two blocking factor)

• Factoral design with two or more factors.

Correlation and Regression

• Correlation Analysis- to measure the

degree of association between two sets of

quantitative data e.g. how are sales of

product A correlated with sales of product

B etc.

• Regression Analysis- to explain the

variation in one variable based on the

variation in one or more variables.

Regression

• Basically two approaches:

• 1. Hit and trial approach (stepwise regression)-

exploratory research

• 2. A preconceived approach

• The output consist of the beta coefficient for all the

independent variables in the model. The output also

gives the result of a t-test for significance of each

variable in the model, and the result of F-test for model

on the whole.

• The coefficient of determination R2 is the total varience in

y explained by all independent variables in the

regression equation.

Problem

• A manufacturer and marketer of electric motors

would like to build a regression model consisting

of 5 or 6 independent variables, to predict sales.

Past data has been collected for 15 sales

territories, on sales and 6 independent variables.

Build a regression model and recommend

whether or not it should be used by the company

Dependent variable

Y= Sales in Rs. Lakh in the territory

Independent Variable

X1= Mkt potential in the territory

X2= No. of dealers of the company in the

territory

X3= No. of sales people in the territory

X4= Index of Competitor activity on a 5 point

scale

(1= low, 5= high)

X5= No. of service people in the territory

X6= No. of existing customers in the territory

Factor Analysis

• For Data reduction

• There are two stages in Factor analysis

• Factor Extraction process

• Rotation of principal components

•ANY

QUESTIONS

PLEASE?????

???

THANK YOU

