Professional Documents
Culture Documents
Data Preparation
▪ Once you’ve opened a data file or entered data in the Data Editor, you
can start creating reports, charts, and analyses without any additional
preliminary work. However, there are some additional data preparation
features that you may find useful, including the ability to:
Assign variable properties that describe the data and determine how
certain values should be treated.
Identify cases that may contain duplicate information and exclude
those cases from analyses or delete them from the data file.
Create new variables with a few distinct categories that represent
ranges of values from variables with a large number of possible values.
1. Value Label
2. Value
3. Frequency
4. Percent
5. Valid Percent
6. Cumulative Percent
10
10
➢ The first table in the output is called the Statistics table. It shows
us that there are no missing values for the variable GENDER
➢ The second table is called the Frequency table and tells us how
frequently each of the responses occur.
11
11
Summary Statistics
➢ In addition to frequency, we may request optional
descriptive and summary statistics appropriate for
continuous and numeric variables. For instance, if we
are interested in determining the mean salary of the
respondents, we follow steps 1-5 in the preceding
section and these additional procedures:
12
12
Summary Statistics
6. Click on the Statistics button found at the bottom part of the
frequencies dialogue box.
(The statistics dialogue box will appear.)
13
13
Summary Statistics
7. Click on the box beside the mean option.
(A checkmark should appear at the box.)
14
14
Summary Statistics
8. Click on Continue.
9. Click on OK.
**The output is as follows:
15
15
Summary Statistics
▪ The Statistics table contains the mean.
▪ The Frequency table shows the lowest as well as the highest
salary.
16
16
Summary Statistics
▪ In addition to the mean, there are other descriptive
statistics available from SPSS:
17
17
Summary Statistics
Median The value above and below which half of the cases fall; the 50th
percentile. The median is not affected by outlying values, unlike
the mean which can be affected by a few extremely high or low
values
Mode The most frequently occurring value. If several values share the
greatest frequency of occurrence, each of them is a mode. The
frequencies procedure reports only the smallest of such multiple
modes.
Sum The sum or total of the values, across all cases with non-
missing values.
18
18
Summary Statistics
▪ Measures of Dispersion
Standard A measure of dispersion around the mean.
Deviation
Variance A measure of dispersion around the mean, equal to the sum of
the squared deviations from the mean divided by one less than
the number of cases.
Standard A measure of how much the value of the mean vary from
error of the sample to sample taken from the same distribution
mean
19
19
Summary Statistics
▪ Measures of Distribution
Quartiles Displays values corresponding to the 25th, 50th and 75th
percentiles.
Percentiles For each specified percentile, displays the value below which
percentage of the cases fall. For example, the 50th percentile is
the same as the median, the value below which 50% of the
cases fall.
Cut points Values that divide the cases into some number of equal sized
for equal groups. Enter a value between 2 and 100 to specify the number
groups of equal – sized groups.
20
20
Graphical Presentation
21
21
Graphical Presentation
Type Function
Pie chart It displays the contribution of parts to the whole. Each
slice of a pie chart corresponds to a group defined by a
single grouping variable. For pie charts, the scale axis
can be labeled either by frequency counts or percentage
Bar chart Displays the count for each distinct value or category as
a separate bar, allowing us to compare categories
visually. Similar with pie charts, the scale axis in bar
charts can be labeled either by frequency counts or
percentage
Histogram Has also bars, but they are plotted along an equal
interval scale. The height of each bar is the count of
values of a quantitative variable falling within the interval.
It shows the shape, center and spread of the distribution.
We have the option to superimpose a normal curve on
the histogram which will help us judge whether the data
are normally distributed.
22
22
Graphical Presentation
To obtain a chart for salary,
6. Click on the Charts button found at the bottom part of the
frequencies dialogue box.
(The Frequencies: Charts dialogue box will appear.)
23
23
Graphical Presentation
7. Tick the box beside “Histograms”.
9. Click on continue.
24
24
Graphical Presentation
▪ In addition to a frequency table, the following
histogram will be generated.
25
25
Customizing Charts
▪ To edit a chart you see in the chart carousel, click on the edit
button. Choose edit content, and click “”in separate window”.
Your chart will be pulled out in the carousel and is placed in its own
chart window. There are many things you can do to customize your
chart. Just a few is covered here. Feel free to experiment with more.
▪ Title
▪ Bar Labels
▪ Color
26
26
Descriptives
▪ SPSS also allows us to display univariate summary
statistics for several variables in a single table. This
can be done using the DESCRIPTIVES procedure.
The following are the available summary statistics:
sample size, mean, minimum, maximum, standard
deviation, variance, range, sum standard error of the
mean and kurtosis and skewness with their standard
errors.
▪ In the summary table, variables can be ordered by the
size of their means (in ascending or descending order),
alphabetically, or by the order by which we select
variables (the default).
27
27
Descriptives
28
28
Descriptives
1. Click on analyze (Menu Bar)
2. Click on Descriptive Statistics.
3. Click on Descriptives. (The “Descriptives” dialogue
box will appear)
4. Select the variable(s) for which you want to
produce summary statistics by highlighting them.
5. Click on the right arrow button. (the selected
variable(s) will be transferred to the right hand list
box)
6. Click on OK.
29
29
Descriptives
30
30
HANDS ON EXERCISES
▪ Open the data on
customer_dbase.sav/client_dbase1.sav
I. Generate frequency tables and appropriate
summary statistics for the following variables. Round
off percentages to 2 decimal places. Interpret your
results.
a. geographic indicator (region)
b. age category
c. job category
d. level of education
e. job satisfaction
31
31
HANDS ON EXERCISES
Based on the results,
1. What age group has the most number? The least number?
2. What percent had a college degree?
3. What job(s) are majority of the respondents engaged in?
4. How many of the respondents are atleast somewhat satisfied of
their jobs?
5. What age group have the least number of some high school
degree? The most number of post undergraduate degree
holders?
6. Who are most satisfied in their jobs in terms of their level of
education? Most dissatisfied?
32
32
Computing Variables
Session 6
33
To Compute Variables
➢ From the menus, choose:
▪ Transform
▪ Compute Variable...
34
34
To Compute Variables
35
35
36
36
37
37
38
38
39
39
Result Treatment
▪ The following options are available for how the result is calculated:
▪ Truncate to integer. Any fractional portion of the result is ignored. For example,
subtracting 0/28/2006 from 10/21/2007 returns a result of 0 for years and 11 for months.
▪ Round to integer. The result is rounded to the closest integer. For example, subtracting
▪ 10/28/2006 from 10/21/2007 returns a result of 1 for years and 12 for months.
▪ Retain fractional part. The complete value is retained; no rounding or truncation is
applied. For example, subtracting 10/28/2006 from 10/21/2007 returns a result of 0.98 for years
and11.76 for months.
40
40
41
41
42
42
43
43
Output
44
44
45
45
46
46
47
47
48
48
Output
49
49
Output
50
50
51
51
52
52
53
53
Output
EX. Construct a frequency table
with appropriate labels: 0-did
not finish college;1-finished
college. Round percentage to
the nearest hundredth.
54
54
55
55
Example 3
56
56
HANDS ON EXERCISES
57
57
Recoding Variables
Session 7
58
Recoding Values
▪ You can modify data values by recoding them. This is
particularly useful for collapsing or combining
categories. You can recode the values within existing
variables, or you can create new variables based on
the recoded values of existing variables.
59
59
60
60
61
61
62
62
63
63
64
64
EXAMPLE:
65
65
HANDS ON EXERCISE
66
66
Tests of Hypotheses
▪ T-test
▪ Analysis of Variance
▪ Chi – square test
▪ Correlations
67
67
68
68
Non-parametric statistics
▪ If a distribution deviates markedly from normality then you take the
risk that the statistic will be inaccurate. The safest thing to do is to
use an equivalent non-parametric statistic.
69
Parametric Assumptions
▪ The observations must be independent
▪ The observations must be drawn from normally
distributed populations
▪ These populations must have the same
variances
▪ The means of these normal populations must
be linear combinations of effects due to
columns and/or rows
70
Nonparametric Assumptions
71
Nonparametric Methods
72
Parametric Nonparametric
▪ Two samples – compare t-test for Mann-Whitney U test
independent
mean value for some
samples
variable of interest
Kolmogorov-Smirnov two
sample test
73
Parametric Nonparametric
Analysis of variance Kruskal-Wallis analysis of
▪ Multiple groups
(ANOVA/ MANOVA) ranks
Median test
74
75
Parametric Nonparametric
▪ Two variables of interest Correlation Spearman R
are categorical coefficient
Kendall Tau
Coefficient Gamma
Chi square
Phi coefficient
Kendall coefficient of
concordance
76
The t– test
77
77
78
79
79
80
80
For numeric grouping variables, define the two groups for the t test by
specifying two values or a cutpoint:
Use specified values. Enter a value for Group 1 and another value
for Group 2. Cases with any other values are excluded from the analysis.
Numbers need not be integers (for example, 6.25 and 12.5 are valid).
Cutpoint. Enter a number that splits the values of the grouping
variable into two sets. All cases with values that are less than the cutpoint
form one group, and cases with values that are greater than or equal to
the cutpoint form the other group.
81
81
82
82
Example
▪ Is there a significant difference in the current salary of
the respondents when they are grouped according to
gender?
▪ Considering normality
▪ Not considering normality
83
83
84
84
85
85
Output File
86
86
Output File
87
87
One-Way ANOVA
▪ The One-Way ANOVA procedure produces a one-way analysis of
variance for a quantitative dependent variable by a single factor
(independent) variable. Analysis of variance is used to test the hypothesis
that several means are equal. This technique is an extension of the two-
sample t test.
▪ In addition to determining that differences exist among the means, you
may want to know which means differ. There are two types of tests for
comparing means: a priori contrasts and post hoc tests. Contrasts are
tests set up before running the experiment, and post hoc tests are run
after the experiment has been conducted. You can also test for trends
across categories.
88
88
89
89
90
90
91
91
92
92
93
93
Output file
▪ The value for sig (probability value) is less than 0.05, hence,
significant difference
94
94
Bivariate Correlations
95
95
96
96
97
97
98
98
99
99
Computing Correlations
**USE employee data.
100
100
Output file
101
101
Chi-Square Test
▪ Tabulates a variable into categories and computes a
chi-square statistic based on the differences between
observed and expected frequencies.
▪ The Chi-Square Test procedure tabulates a variable
into categories and computes a chi-square statistic.
This goodness-of-fit test compares the observed and
expected frequencies in each category to test that all
categories contain the same proportion of values or
test that each category contains a user-specified
proportion of values.
102
102
103
103
Output file
104
104
Output file
▪ The value for asymp. Sig. (probability value) is less than 0.05,
hence, significant relationship.
105
105
HANDS ON EXERCISE
▪ **USE data on customer_dbase
1. Is there a significant difference in the household income
of the respondents when grouped according to
1. gender?
2. level of education?
106
106
HANDS ON EXERCISE
4. Is there a significant relationship between job category
and level of education?
5. Is there a significant correlation between household
income and credit card debt?
107
107
(Plonskey, 2001)
108
THANK YOU!
109
109