You are on page 1of 68

Mastering Data Analysis Tools

and
Techniques
Using SPSS, STATA and AMOS
by
Muhammad Qaiser Shahbaz
and
Nadeem Shafique Butt
Workshop Objectives:
• To develop Data Analysis Skills

• Use of appropriate Statistical Techniques

• Use of various Statistical Packages to perform


Statistical Analysis

• Understanding and Interpretation of Results


Types of Analysis:
First we will learn to perform
analysis using SPSS
then
We replicate all these analysis
using STATA
and
Finally we will Use AMOS for
SEM
Before starting analyzing data let
me introduce SPSS and its basic
structure
Descriptive Analysis:

• Descriptive Analysis for Qualitative Variables

• Descriptive Analysis for Quantitative Variables


Descriptive Analysis of Qualitative Data

Qualitative Data
(Categorical Data)

Tables Graphs Numbers

One Way Table


Two Way Table Bar Chart
. Pie Chart
Percentages
. Clustered Bar
. Chart
N-Way Table
Descriptive Analysis for Quantitative Data
Quantitative Data
(Numerical Data)

Tables Graphs Numbers


Frequency Distribution Histogram
Stem and Leaf Box-Plot

Important
Center Variation Distribution
Points
Mean
Median Range
Median
Mode Inter Quartile Range Skewness
Quartiles
Geometric Mean Variance Kurtosis
Percentiles
Harmonic Mean Standard Deviation
Trimmed Mean
Tabular Methods
Graphical Methods
Numerical Methods
Practice Session for Descriptive
Analysis
• Import Customer’s Databas.xls into SPSS
• Label data properly
• Make One-way tables for variables (Age, Sex,
OwnHome and Married). Also make pie chart and
bar chart for these variables
• Make Two-way tables (sex by OwnHome and
Married by OwnHome). Also make clustered bar
chart for each variable
• Produce Detailed Numerical descriptive statistics
for variable “Purchases” (Mean, Median, ………..).
Also make histogram and stem & leaf and box-
plot for variable “Purchases”
Inferential Analysis
Parametric & Non-Parametric
Inference

Normality Normality
Normality
Normality Normality
Normality
+ +
++ ++
Equal Un-Equal
Equal Un-Equal
Variances Variances
Variances
Variances Variances
Comparing One Group
• Kinds of Research Questions
For the one-sample situation, the prime concern in research is
examining a measure of central tendency (location) for the
population of interest. The best-known measures of location
are the mean and median. For a one-sample situation, we
might want to know if the average waiting time in a doctor's
office is greater than one hour, or if the average growth of
roses is 4 inches or more with a certain fertilizer, or is annual
return is 10.2% for the banks that exercised comprehensive
planning.
Comparing Two Groups
 Kinds of Research Questions

One of the most common tasks in research is to compare two


populations (groups). We might want to compare the income level
of two regions, the nitrogen content of two lakes, or the
effectiveness of two drugs.
The first question that arises is what aspects (parameters) of the
populations shall we compare. We might consider comparing the
averages, the medians, the standard deviations, the distributional
shapes (histogram), or maximum values. We base the comparison
parameter on our particular problem.
Perhaps the simplest comparison that we can make is between the
means of the two populations.
Comparing more than two Groups
 Kinds of Research Questions

One of the most common tasks in research is to compare several


populations (groups). We might want to compare the income level
of three regions, the nitrogen content of four lakes, or the
effectiveness of four drugs.
The first question that arises concerns which aspects (parameters)
of the populations we should compare. We might consider
comparing the means, medians, standard deviations, distributional
shapes (histograms), or maximum values. We base the comparison
of parameter on our particular problem.
Perhaps the simplest comparison that we can make is to compare
means of several populations.
One Sample t-test

• One Sample t-test is used to compare one group to a


given standard on the basis of Arithmetic Average
(Mean).
Assumptions of the One-sample t-test

• The data are continuous.

• The data follow the Normal distribution.

• The sample is a simple random sample from the


population.
Hypotheses and Formulas

H 0 :   0 , H A :   0

X 
t
s2
n

With
df  n  1
Case Study
A manufacturer of high-performance automobiles
produces disc brakes that must measure 322 millimeters
in diameter. Quality control manager randomly selects
128 discs and measures their diameters.

We can use One Sample T Test to determine whether or


not the mean diameters of the brakes in sample
significantly differ from 322 millimeters.
SPSS Analytic Procedure
The Sign Test
The sign test is perhaps the oldest of all the nonparametric
procedures. This nonparametric test is based on the
binomial distribution. It assumes two mutually exclusive
outcomes, constant or stable probability of success or
failure, and n independent trials
The terminology, sign test, reinforces the point that the
data are converted to a series of plus and minus signs. The
test is based on the number of plus signs that occur. Zero
differences are thrown out, and the sample size is reduced
accordingly.
Assumptions of the Sign Test

• The data are continuous

• The distribution of these data is symmetric.

• The measurement scale is at least interval.


Hypotheses and Formulas

H 0 :    0
H A :    0 ,    0 ,    0
w  w
Z
w
n  n  1
w 
4
n  n  1  2n  1
w 
24
w   R
Case Study
A Researcher believes that median salary of HR
Manager is 50 thousands. To confirm this
hypothesis he selects a random sample of 1207
HR Managers from different companies.

We can use Sign Test to determine whether or


not the median salary is significantly different
from 50 thousands.
SPSS Analytic Procedure
Paired Samples t-test
 Kinds of Research Questions
In the paired case, we take two measurements on same
individual at different times, or we have one measurement
on each individual of a pair.
Examples of the first case are two insurance-claim adjusters
assessing the damage for the same 15 cases. Evaluation of
the improvement in aerobic fitness for 15 subjects where
measurements are made at the beginning of the fitness
program and at the end of it.
An example of the second paired situation is the testing of
the effectiveness of two drugs, A and B, on 20 pairs of
patients who have been matched on physiological and
psychological variables. One patient in the pair receives
drug A, and the other patient gets drug B.
Assumptions of the paired-sample t-test

 The data are continuous.

 The data, i.e., the differences for the matched-pairs,


follow a Normal distribution.

 The sample of pairs is a simple random sample from


its population.
Hypotheses and Formulas

H 0 : d  0, H A : d  0

X d  d
t
2
s d
n
With
df  n  1
Case Study
A researcher in behavioral medicine believes that stress often
makes asthma symptoms worse for people who suffer from this
respiratory disorder. Therefore, the researcher decides to study the
effect of relaxation training on the severity of their symptoms.
A sample of 5 patients is selected. During the week before
treatment, the investigator records the severity of their symptoms
by measuring how many doses of medication are needed for
asthma attacks. Then the patients receive relaxation training. For
the week following the training the researcher once again records
the number of doses used by each patient.
Data from Gravetter and Wallnau (4th Ed.) p. 319.
SPSS Analytic Procedure
Wilcoxon Signed Rank test

• Wilcoxon Signed Rank test is used to test the


median difference of zero in case of non –
normal populations.
Assumptions of the two-sample t-test
• The differences are continuous.

• The distribution of these differences is symmetric.

• The differences are mutually independent.

• The measurement scale is at least interval.


Hypotheses and Formulas

H 0 : 1   2
H1 : 1   2
w  w
Z
w
n  n  1
w 
4
n  n  1  2n  1
w 
24
w   R
Case Study
An educationist wants to see the effectiveness of
new teaching method. For this She selected 600
students and record their scores in a test of 150
marks. The scores are recorded before and after the
new teaching method.

The Wilcoxon Signed Rank test can be used to test


the effectiveness of new teaching method.
SPSS Analytic Procedure
Independent Samples t-test
Equal Variances

Independent sample t – test is used to compare two


groups on the basis of their averages.
Assumptions of the two-sample t-test

 The data are continuous


 The data follow the Normal distribution.
 The variances of the two populations are equal
 The two samples are independent
 Both samples are simple random samples from their
respective populations.
Hypotheses and Formulas

H 0 : 1  2 , H A : 1  2

X 1  X 2   1  2 
t
 1  1  2  21 1 
n  1 s 2
 n  1 s 2

 
n1  n2  2  n1 n2 
With
df  n1  n2  2
Case Study
• An analyst at a department store wants to evaluate a
recent credit card promotion. To this end, 500
cardholders were randomly selected. Half received
an ad promoting a reduced interest rate on
purchases made over the next three months, and
half received a standard seasonal ad.
• We can use Independent-Samples T Test to compare
the spending of the two groups.
SPSS Analytic Procedure
Independent Samples t-test
Unequal Variances

• Independent Samples t-test is use to compare two


independent groups on the basis of average. This
test does not require homogeneity of the variances.
Hypotheses and Formulas

H 0 : 1  2 , H A : 1  2

2
 2 2

X 1  X 2   1  2  s
  
1 s 2
 n1 n2 
t With df   2 
 2
 2 2 2

s s    
2

 s1   s 2 
  2
1
 n1   n2 
 n1 n2     
  n1  1 n2  1
Case Study
• A researcher wishes to compare the
expenditure behavior of the students, one of
the research question is to see the difference
in expenditures by gender.
SPSS Analytic Procedure
Mann-Whitney Test
• Mann-Whitney Test is used to compare the
two independent groups on the basis of
medians. This test does not require the
assumption of normality.
Mann-Whitney U Test
Assumptions
 The variable of interest is continuous. The measurement scale
is at least ordinal.

 The probability distributions of the two populations are


identical, except for location.

 The two samples are independent.

 Both samples are simple random samples from their


respective populations.
Hypotheses and Formulas
H 0 : 1   2
H A : 1   2

u  u
z
u u 
n1n2
2
n1n2  n1  n2  1
u 
12
n1  n1  1
u  w
2

W is the sum of ranks of the smaller sample


Case Study
Data on birth weight of infants born to mothers with
different levels of prenatal care. Two independent
samples data for univariate analysis. Test data for
Mann-Whitney U-Test, obtained from Howell, David
D. Fundamental Statistics for the Behavioral Sciences
3rd Edition, p385.
SPSS Analytic Procedure
One-Way Analysis of Variance
Equal Variances

• One Way Analysis of Variance is used to


compare more than two groups on the basis
of their averages.
One-Way Analysis of Variance
Assumptions
 The data are continuous.

 The data follow the Normal distribution, each group


is normally distributed.

 The variances of the populations are equal.

 The groups are independent.

 Each group is a simple random sample from its


population.
Hypotheses and Formulas
H 0 : 1  2  3  .......  k
H A : Atleast one pair is significantly diffrent

MSG
F
MSE
MSG is the Mean Square of Group and MSE is the Mean Square Error
Example
• This is a hypothetical data file that concerns the
popularity of a TV channel. Using a prototype, the
marketing team has collected focus group data. One
of the question of interest is to see the difference in
popularity of the TV channel in different age groups.
• This hypothesis can be tested using One Way
ANOVA.
SPSS Analytic Procedure
One-Way Analysis of Variance
Unequal Variances

• Welch ANOVA is used to compare more than two


groups on the basis of averages. This test does not
require the homogeneity of variances.
Welch Analysis of Variance
Assumptions
 The data are continuous

 The data follow the Normal distribution, each group is


normally distributed.

 The groups are independent.

 Each group is a simple random sample from its population.


H 0 : 1  2  3  .......  k
H A : Atleast one pair is significantly diffrent

1 k ni 
2 
 X i.  X ..  
2

k  1 i 1 s  
F i

 2 
2  k  2 k  n i / s 
1 2  1  k
k  1 i 1 
i
2
 /  ni  1

  ni / s i 
 i 1 
1
  
2

With  k 
2
 
 3
df  2   1  n n i Si
/
 /  ni  1 
 k  1 i 1  2 
   ni / S i  
  i 1  
Case Study
• A sales manager evaluates two new training courses.
• Sixty employees, divided into three groups, all
receive standard training. In addition, group 2
receives technical training, and group 3 receives a
hands-on tutorial. Each employee was tested at the
end of the training course and their score recorded.
SPSS Analytic Procedure
Kruskal-Wallis Test

• Kruskal-Wallis H-test is used to compare more


than two groups on the basis of their medians.
Kruskal-Wallis Test
Assumptions
• The variable of interest is continuous, the
measurement scale is at least ordinal.

• The probability distributions of the populations are


identical, except for location.

• The groups are independent.

• All groups are simple random samples from their


respective populations.
Hypotheses and Formulas

H 0 : 1   2  ......   k
H A : At least one pair of median is significantly diffrent

k
12 Ri
H
N  N  1

i 1 ni
 3  N  1
Case Study
A health scientist wishes to compare the
survival experiences after breast cancer with
different Pathological Tumor Size (Categories).

We can use Kruskal-Wallis H-Test to


determine whether or not the median
survival time of the patients is significantly
differ in different pathological tumor size.
SPSS Analytic Procedure
Model Building Techniques

You might also like