You are on page 1of 42

Management Development

Program
Data Analysis using SPSS

PRESENTER
MR VENKAT
SPSS
• Statistical
• Package for
• Social
• Sciences
VERSIONS OF SPSS
• SPSS Ver-1 to Ver-5 : DOS
VERSIONS
• SPSS Ver-6 to Ver-15 : WINDOWS
VERSIONS
• SPSS-X : For MAIN FRAMES (on
various operating system platforms)
• SPSS-LAN: For LANs
• Web site: http://www.spss.com
BASIC APPLICATIONS
• Creating data as Spread-
sheet
• Generating Reports as
Tables
• Statistical Analysis of Data
• Graphic Presentations
MAIN STEPS IN USING SPSS
• Creating data or Getting data
• Defining data
• Modifying data
• Processing data
– generating tables
– statistical analysis
– generating graphs
Structure of SPSS data file
• Variables (Fields) in columns
• Cases (Respondents) in rows
• A case contains several
variables
Data Definition
• Variables Name
• Variable Type
• Field Width
• Decimal Positions
• Variable Label
• Value Labels
• Missing Values
• Column Width
• Alignment
• Scale
Variable Name
• Maxi. 8 characters (up to Ver
10)
• First letter must be alphabet
• Arithmetic operators, special
symbols and blank spaces not
permitted
• Two variables can not have
same name in one data file
Variable Label
• It helps in reading outputs.
• No restriction on characters.
Variable Type
• Numeric (Floating point)
• String (Character / Text)
• Date
• Currency
Value Labels
• It helps in reading tables and other
outputs.
• For example variable “Marital Status”
has five values (codes):
– value 1 means “Never Married”
– value 2 means “Currently Married”
– value 3 means “Widow/Widower”
– value 4 means “Divorced”
– value 5 means “Separated”
Missing Values
• These are values indicating “No
Response” or “Not Applicable” in any
variable.
• Declaring missing values tells the SPSS
package to ignore the cases containing
these values during analysis.
• A blank in Excel or dBase/FoxPro file is
treated as missing value.
• In SPSS data file, blanks appear as
dots (.) denoting that theses are
missing values.
Creating Data directly in SPSS

• After opening SPSS click on “file - new -


data” on the menu bar.
• On getting “SPSS data editor” window,
click on “variable view” (right bottom)
and start defining data file i.e. variable
name, variable type, variable label,
value labels, missing values etc.
MANIPULATING FILES
• Insert variable
• Sort cases
• Transpose - Interchange rows and
columns
• Merge Files - Add cases, Add variables
• Aggregate
• Select cases - Select with “if” condition
• Weight cases - for estimation /
projection
VIEW
• Status Bar - process, selection,
weight, n of cases
• Tool Bar - for data, syntax,
chart, navigator (output)
• Fonts - type, size
• Grid Lines
• Value Labels
Data Modifications
• Compute - create new variable in
existing data file through an
arithmetic expression.
• Recode - reorganize values of a
variable.
• Rank cases
• Auto recode
• Create Time Series
• Replace missing values
STATISTICAL PROCEDURES
OLAP Cubes
• On Line Analytical Processing Cubes
• Calculates uni-variate summary statistics
with-in one or more categorical variables
DESCRIPTIVE STATISTICS
– Frequencies - one variable at a time
with various uni-variate statistics.
– Descriptives - uni-variate statistics.
– Explore - studying behaviour of
variables.
– Crosstabs - Two-way, Three-way
– Ratio Statistics
MEANS
• Display mean & S.D. by groups.
• One sample t-test.
• Two independent sample t-test.
• Two related samples or paired
samples t-test.
• One-way ANalysis Of VAriance
(ANOVA) with post-hoc tests.
LINEAR REGRESSION
• Methods: Enter, Stepwise, Remove, Backward,
Forward.
• Regression Coefficients: Estimate, Standard
Error, Standardized coefficients, Significance.
• Residuals: Durbin-Watson test (for auto-
correlation)
• Save: Predicted values, Residuals etc.
• Plot: Histogram, Normal Probability plot.
• Others: Multi-colinearity diagnosis, partial
correlation, R-square change etc.
CORRELATIONS
• Bivariate Correlations.
• Partial Correlations.
• Distances - Similarities and
Dissimilarities
CLASSIFY
• K-means Cluster
• Hierarchical Cluster
• Discriminant Analysis
DATA REDUCTION
• Factor Analysis.
• Correspondence Analysis.
• Optimal Scaling - Homals,
Princals,
Overals.
FACTOR ANALYSIS
• Methods: Principal Components, Principal
Axis factoring, Maximum Likelihood etc.
• Criteria: Minimum Eigen value, N of factors,
Number of Iterations.
• Rotation: Varimax, Quartimax, Equamax,
Promax, Oblimin.
• Display: Initial factor matrix, Rotated factor
matrix.
• Plot: Scree plot.
SCALES
• Reliability Analysis - Alpha, Split-
half, Guttman, Parallel.
• Multi Dimensional Scaling (MDS)
NON-PARAMETRIC
TESTS
• Chi-square
• Binomial
• Runs test
• One sample K-S test
• Two independent samples tests
• Several independent samples tests
• Two related samples tests
• Several related samples tests
TIME SERIES ANALYSIS

• Exponential Smoothing.
• Autoregression.
• Auto Regressive Integrated Moving
Averages (ARIMA).
• X11ARIMA.
• Seasonal Decomposition.
MULTIPLE RESPONSE
ANALYSIS
• Defining sets.
• Frequencies
• Crosstabulation.
CHARTS
• Bar, Line, Area, Pie, Hi-Low
• Pareto Charts, Control Charts (X-
bar,R,p,c)
• Box Plot, Error Bar
• Scatter Plot, Histogram, P-P Plot, Q-Q
Plot, Sequence Charts
• ROC Curve (Receivers’ Op
Characteristic)
• Time Series : Autocorrelations, Spectral
Plots, Cross-correlations,
Types of Data
• Nominal: A variable can be treated as
nominal when its values represent
categories with no intrinsic ranking; for
example, the department of the company
in which an employee works.
• Examples of nominal variables include
• region
• zip code
• religious affiliation etc.
Ordinal Data
• A variable can be treated as ordinal when
its values represent categories with some
intrinsic ranking; for example, levels of
service satisfaction from highly dissatisfied
to highly satisfied. Examples of ordinal
variables include attitude scores
representing degree of satisfaction or
confidence and preference rating scores.
Scale Data
• A variable can be treated as scale when
its values represent ordered categories
with a meaningful metric, so that distance
comparisons between values are
appropriate. Examples of scale variables
include age in years and income in
thousands of dollars.
Data Analysis
• Simple Tabulation and Cross Tabulation
• Univariate and Bivariate Analysis
• Dependent and Independent variables
• First Stage Analysis- Simple Tabulation
• Second Stage Analysis- Cross Tabulation
• The Chi-square test for cross tabulation
Anova and the design of
Experiments
• The analysis of variance technique is used
when the independent variables are of
nominal scale (categorical) and the
dependent variable is metric.
• The independent variable could be
different level of prices, different pack
sizes, or different product colors and the
dependent variable could be sales of the
product.
Experimental Designs
• Completely Randomized design in a one
way ANOVA (single Factor)
• Randomized Block Design (single blocking
factor)
• Latin Square Design (two blocking factor)
• Factoral design with two or more factors.
Correlation and Regression
• Correlation Analysis- to measure the
degree of association between two sets of
quantitative data e.g. how are sales of
product A correlated with sales of product
B etc.
• Regression Analysis- to explain the
variation in one variable based on the
variation in one or more variables.
Regression
• Basically two approaches:
• 1. Hit and trial approach (stepwise regression)-
exploratory research
• 2. A preconceived approach
• The output consist of the beta coefficient for all the
independent variables in the model. The output also
gives the result of a t-test for significance of each
variable in the model, and the result of F-test for model
on the whole.
• The coefficient of determination R2 is the total varience in
y explained by all independent variables in the
regression equation.
Problem
• A manufacturer and marketer of electric motors
would like to build a regression model consisting
of 5 or 6 independent variables, to predict sales.
Past data has been collected for 15 sales
territories, on sales and 6 independent variables.
Build a regression model and recommend
whether or not it should be used by the company
Dependent variable
Y= Sales in Rs. Lakh in the territory
Independent Variable
X1= Mkt potential in the territory
X2= No. of dealers of the company in the
territory
X3= No. of sales people in the territory
X4= Index of Competitor activity on a 5 point
scale
(1= low, 5= high)
X5= No. of service people in the territory
X6= No. of existing customers in the territory
Factor Analysis
• For Data reduction
• There are two stages in Factor analysis
• Factor Extraction process
• Rotation of principal components
•ANY
QUESTIONS
PLEASE?????
???
THANK YOU