You are on page 1of 28

Introduction to SPSS-Part 1

Vignes Gopal Krishna


Fast track PhD student, SLAI fellow, and
Research Assistant
University of Malaya

SPSS
Statistical Package/Product for Social
Sciences(Economics, Sociology, Population
Studies, and etc)- Subjects People/Society
Statistical Package/Product for
Sciences(SPS) (Health Sciences,
Neurosciences, Medical Sciences,
Economics, Sociology and etc)-Subjects
People/Society/Patients/Animals/Neurons

SPSS- Rows X Columns X Cells (RCC)


Rows Subjects, Columns Variables, Cells
Values/Statements
SPSS = Main Inputs (DV-views) X Outputs (Results)
Additional inputs (Scripts & Syntax)
Advantages
Deals with the process of quantifying qualitative data
Numerical presentation of qualitative data
(Descriptive and Inferential Statistics)
Deals with both parametric and non-parametric
approaches
Deals with Cross Sectional Data, Time Series Data,
and Panel Data

Rows

Menus

SPSS Layout

Cells
Icons
Columns

SPSS Multi-dimensional Matrix


Will you be able to find the number
of rows and columns?

Data View
Variable View

Disadvantages
Doesnt deal with advanced mode of modeling
and quantitative techniques (Not possible by
menus)
Doesnt deal with the advanced techniques of
data type.(Not possible by menus)
Common measurement
(a)Categorical variable (CAV)-Nominal & Ordinal
(b)Continuous variable (COV)-Scale(Ratio & Interval)
(c) String Qualitative statements (Not important in
SPSS)-Nvivo, QDA-Miner, Dedoose, Atlas-TI, and
etc

Classification variable = is a partial element of


categorical variable.
Classification variable-variable that is used to
classify qualitative arguments/statements
variable by categories (Categorical variable) +
variable by statements (Non-Categorical
variable)
Categorical variable
(a)Dichotomous variable (Binomial) 2 values
NO / OR Independent & Dependent samples
(b)Polychotomous variables (Multinomial)- >2
values NO/OR Independent & Dependent
samples

Categorical variable
(a)constant and fixed
(b)Separated by categories
(c)Gradual change = 0, static
(d)Nominal (X order) and Ordinal
(Order)/Rank
Continuous variables
(a)X constant and fixed
(b)Separated by ratios and intervals
(c)Gradual change !=0, dynamic

Types of Variables
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)

Bi + nary variable = 2 groups of variables (0 and 1) Examples: Gender(0=Male, 1=Female),


Case and Control(0=Healthy, 1=Disease), Fluctuations(0=Increase, 1=Decrease.
Dichotomous variable = 2 groups of variables(can be any 2 values)
Examples:Gender(2=Male,3=Female), Case and control(0=Before Treatment,1=Present
Treatment)
Independent variable = stand alone variable-Cor x1,x2,x3 = 0 Predictor/Regressor/Indicator
Dependent variable = relying on factors Cory,x1,x2 !=0)-Predictand/Regressand/Outcome
Confounding variable = distorts the effects of one variable on another. -expansion of matching
reduces the effects of confounding.
Control variable controls the effects of IV on DV.
Controlled variable another term of Independent Variable(IV)
Instrumental variable variable that has zero correlation with residuals/error terms, but, has
correlation with dependent variable
Criterion variable a variable that has presumed effect Non-experimental research
Discrete variable a variable that takes up distinct values
Dummy variable similar as binary variable classification variable
Endogeneous variable inside the system-influenced by variables that are entering into the
system.
Exogeneous variable outside the system- entering the systm-influencing the endogeneous
variable
Interval variable a form of scale variable
Ratio variable a form of scale variable
Intervening variable intervene the association between the main variables. moderating and
mediating variables
Mediating variable Indirect effect on the association between the main variables
Moderating variable indirect effect through interaction effects between related variables

(s)Polychotomous variables take up more than 2


values/groups
(t)Manifest variable indicator variable that can
indicate the presence of latent variable
(u)Latent variable variable that cannot be measured
directly it has to depend on manifest variables.
(v)Manipulated variable Similar as IV
(w)Outcome variable Similar as DV-presumed effect
(x)Predictor variable Similar as IV-presumed cause
(y) Nominal variable takes up any value doesnt
follow orders/ranks
(z) Ordinal variable takes up values based on
orders/ranks.
* Treatment variable Similar as IV

Types of Quantitative Data


(a)Time Series Data data follows the series of timing
single
country/industry/activity/firm/organization/stock
market/society and etc multiple sampling periods
(b) Cross Sectional Data data follows the cross
evaluations of various forms of
subjects(countries/industries/activities/firms)-single
point of time
(c) Panel Data Time Series Data + Cross Sectional
Data with different characteristics
(d) Pooled Data Combined version of data with
similar characteristics
(e) Longitudinal Data Wider scope of data variation
of timing

Types of Qualitative Data


(a)Factual Data Demographical
Data(Marital Status, Level of Education,
Age, Position and etc)- (Experimental
and Non-experimental Data) Yes/No
versus Yes/No/Dont know Which one is more
preferable?
True or False
(b)Positive and Normative Data Actual
versus predicted, Agreement to
Disagreement, Likes to Dislikes
(c) Logical Arguments True or False
(d) Boolean Statements AND, OR, NOT

Likert Scale(LS) and


Scale(S)

LS != S

In a normal case, Scale


refers to ratio or
interval?

For example:5 Levels of Likert Scale


1=Strongly Agree
2=Agree
3=Neither Agree nor Disagree
4=Disagree
5=Strongly Disagree

Sample and Population


The association between Sample and
Population can be seen in the context of
Donut

RVRCNB
Approach

Which one is good?

Parameter and Statistics


Parameter = Population(Actual)
Statistics = Sample(Prediction)
Y=0 + 1X1 + 2X2 + (Parameter)
PY=P0 + P1X1 + P2X2 + P
(Statistics)
Statistics ~ Parameter (Actual
Population is Unknown)-estimated
Population

Descriptive and Inferential


Statistics
*For quantitative mode of single/multi-purposes
*Descriptive = Describe + Narrative(Describing subjects) Single Purpose(SP)
*Inferential = Investigation + Narrative(Investigating subjects) Multi
Purposes(MP)
Descriptive Analysis Quantitative research
(a) Descriptive Statistics (Continuous variables)-[Mean, Median, Variance, Standard
deviation, Max, Min , Range, skewness, kurtosis, Standard error of mean,
Histogram with normal curve, Normal Q-Q plot, Normal P-P plot Uni-variate
(b)Frequency Distribution(Categorical variables)-[Mode(similar as frequency),
Median, Variance and Standard Deviation, Max, Min, Range]-Uni-variate
Inferential Analysis Quantitative research
(a) Normality tests -hypothesis testing SPSS(Shapiro Wilk and KolmogorovSmirnov)
(b)Non-normality tests hypothesis testing SPSS(One Sample Kolmogorov
Smirnov tests for uniform, Poisson, and Exponential distributions)-Others are
possible through Scripts and Syntax
(c) Mean differences Single mean test, One sample t-test, Two samples
(Independent and Dependent sample tests)
(d)Association Linear and Non-Linear modes of regressions
(e)
Correlation Linear and Non-Linear modes of correlations

Types of Samplings

What
type of
research
?

All the research starts with a single or multiple


purposes..Purposive Sampling
Additional types of samplings
(a)Simple random sampling samples that have been
selected randomly-equal chance of probability
unbiased sampling
(b)Systematic sampling samples that have been
selected from ordered sampling frame
(c)Stratified sampling sampling mode that are divided
into homogeneous subgroups
(d) Cluster sampling sampling that deals with the
division of it into groups that deals with the similar
characteristics.
(e)Convenience sampling Easy sampling choose
groups of interest.

Dependent and Independent


Samples
Dependent Samples Same subjects at
different levels (Very Highly Correlated)
Independent Samples Different
subjects at same and different levels.
Independent
(Low and Moderate
Correlations)
Sampl
e1
Populatio
n1

Populatio
n1

and Dependent
samples

Sampl
e2
Sampl
e3
Sampl
e 4

Parametric versus Nonparametric

Introduction
The terms of parametric and nonparametric were coined by Jacob
Wolfowitz in the year of 1942.
Parametric (distribution is known)
Non-parametric (distribution is unknown)
In my point of view, I would say that it is just
a general thought of statistics and it
should be used as a benchmark or
baseline on the development of various
statistical modes of intellectual thoughts
on the statistical tests.

Characteristics of parametric approach


(a)Data follows the probability distribution
(b) Tied up with probability mode of sampling type (Simple random
sampling, Stratified random sampling, systematic random
sampling, random cluster, stratified random cluster, Complex
Multi-stage Random, Random mode of purposive sampling)
(c)Deals with the statistical inferences on the distributions of
parameters
(d) Always linked with linearity of data(variables and
errors/residuals(uncertainty))
(e) Patterns of data(variables and errors/residuals follows the style
of homogeneity)
(f) Follows strict forms of assumptions (robust = if the assumptions
are fulfilled)
I would classify this approach as the classical approach due to the
fact that it doesnt the evolutionary direction of momentum.

Assumptions of parametric
approach
(a)Linearity of parameters
(b)Homogeneity/Homogeneous mode of existing variables
and omitted variables(error terms/residuals)-symmetrical
form of distribution.
(c)Dependent variables /residuals should be normally
distributed.
(d) Randomness among the selected samples should be
maintained (only if it has got to do with random sampling)
(e)Expansionary use of non-categorical variables(continuous
variables) in the statistical tests.
(f) Minimization of outliers
(g) Mean, Mode, and Median of the variables are
approximately the same (for the case of normal
distribution)-Bell Shaped Normal Curve.
(h) Doesnt deal with the process of resampling(Bootstrapping)

Identification on the statistical approach is


an important step that should be taken
before moving to existing forms of
statistical tests.
Distributional tests are needed to determine
the nature of data(variables and residuals)
In a simple context,
Parametric follows normal distribution
Non-parametric follows free distribution

Distribution tests of normality


Graphical approach
(a)Histogram with normal curve
(b)Box plot
(c)Normal Q-Q plot
(d)Normal P-P plot
(e)Leverage Plot

Numerical approach
Uni-variate tests
(a)Jarque Bera test
(b)Coefficient of variations
(c)Coefficient of Skewness and Kurtosis
(d)Kolmogorov-Smirnov test
(e)Shapiro-Wilk test
(f)Shapiro-Francia test
(g)Anderson-Darling test
Multi-variate tests
(a)Multivariate tests of normality

Parametric tests of correlation


(a)Pearson product moment correlation coefficient (Bivariate
analysis)
(b) Stepwise mode of linear regression (Multivariate analysis)
(c) Auxiliary mode of linear regression (Multivariate analysis)
(d) Scatter plot /Scatterplot matrix with fitness line(linear form)
(Bivariate analysis)
Non-parametric tests of correlation
(a)Spearman rank correlation (Bivariate analysis)
(b)Kendall Taus rank correlation (Bivariate analysis)
(c)Stepwise mode of Non-linear regression (Multivariate analysis)
(d)Auxiliary mode of Non-Linear regression (Multivariate analysis)
(e)Scatter plot/Scatterplot matrix with fitness line(Non-Linearity
form) (Bivariate analysis)

Parametric tests of associations


(a)Linear regression (Bivariate and Multivariate)
(b)Stepwise mode of Linear regression(Bivariate and Multivariate)
(c) Auxiliary mode of Linear regression(Bivariate and Multivariate)
(d)Linear mode of co-integration tests
(e)Linear mode of causality tests
Non-parametric tests of associations
(a)Non-Linear regression (Bi-variate and Multivariate)
(b)Logistic regression (LR) DV(categorical variable)
*Ordered LR (Ordinal variable)
* Un-ordered LR (Nominal variable)
(c) Correspondence Analysis
independent sample (Pearson Chi-Square, Contingency Coefficient
(Nominal),Phi-Cramers V(Nominal), Lambda (Nominal)

Main features of SPSS


Inferential Statistics

Linear Regression

Parametric

Regression

Non-Parametric

Linear Curve
Estimation
Linear Weight
Estimation &
Different types of
estimation
Probit Regression

Non-Linear
Regression

Non-Linear Curve
Estimation

Non-Linear Weight
Estimation &
Different types of
estimation

Tobit Regression
Linear mode of
Scatter plot
Simultaneous
regression

Logit Regression
Non-Parametric
Regression

Non-Linear mode
of Scatter Plot
Non-Linear mode
of Simultaneous
equation

Parametri
c
correlatio
n

Pearson correlation
Linear Mode of
Stepwise Regression
Linear Mode of
Auxiliary regression
VIF & Tolerance
Value