You are on page 1of 34

Chapter Fifteen

15-1
Overview of the Data Analysis Procedure

Validation & Coding Data Logical Tabulation &


Editing Entry Cleaning Statistical
of Data Analysis

15-2
Key Terms & Definitions
Overview of the Data Analysis Procedure
Step One:
• Validation: Process of ascertaining that interviews actually were conducted as specified.
• Editing: Process of ascertaining that questionnaires were filled out properly and completely.

Step Two:
• Coding: Process of grouping and assigning numeric codes to the various responses to a question.

Step Three:
• Data Entry: Process of converting information to an electronic format.
• Intelligent Data Entry: Form of data entry in which the information being entered into the data
entry device is checked for internal logic.
• Scanning: Form of data entry in which responses on questionnaires are read in automatically by
the data entry device.

15-3
Key Terms & Definitions
Overview of the Data Analysis Procedure
Step Four:
• Clean the Data: Check for data entry errors or data entry inconsistencies.
• Logical or Machine Cleaning: Final computerized error check of data.
• Error Check Routines: Computer programs that accept instructions from the user to
check for logical errors in the data.
• Marginal Report: Computer-generated table of the frequencies of the responses to each
question, used to monitor entry of valid codes and correct use of skip patterns.

Step Five:
• Data Analysis – e.g. Frequency Tables: Table
showing the number of respondents choosing each answer
to a survey question.
• Cross Tabulation Tables - Examination of the responses to
one question relative to the responses to one or more other
questions.

15-4
Key Terms & Definitions
The Coding Process

1. List responses
2. Consolidate responses
3. Set codes
4. Enter codes
1. Read responses to individual open-
ended questions on questionnaires
2. Match individual responses with
consolidated list of response
categories

15-6
Key Terms & Definitions
Frequency Table

15-5
Key Terms & Definitions
Frequency Table

Among 500 respondents, 50 of them did not attend a movie theatre in


the past year. This, corresponds to 10% of the sample. 90% of the
sample attended a movie theatre at least once in the past year.
Bar Chart
Pie Chart
Cross Tabulation: Example 1

15-9
Key Terms & Definitions
Cross Tabulation

INGREDIENTS
Two categorical
variable

15-10
Key Terms & Definitions
Cross Tabulation
The expected value for each
cell is row total*column
total/overall total
Null hypothesis: Assumes that there is no association between the two variables.
Alternative hypothesis: Assumes that there is an association between the two
variables.

We cannot reject the


null hypothesis. We
conclude that there is
no association between
gender and movie
attendance

When the frequencies are less than 5 in a


cell in >20% of the cells, Chi-square is
inappropriate. 
Cross Tabulation: Example 2
We reject the null hypothesis,
because p<0,05 and conclude that
there is an association between
gender and money spent on things
at the theatre

However, chi-sqaure test is not


appropriate because 25% of the cells have
expected count less than 5.
Descriptive Statistics

Mean:
Mean:
••The
Thesum
sumofofthe
thevalues
valuesfor
forall
allobservations
observationsofofaavariable
variable
divided
dividedbybythe
thenumber
numberofofobservations.
observations.

Median:
Median:
•• Value
Valuebelow
belowwhich
which50
50percent
percentofofthe
theobservations
observationsfall.
fall.

Mode:
Mode:
••The
Thevalue
valuethat
thatoccurs
occursmost
mostfrequently.
frequently.

15-15
Key Terms & Definitions
Measures of Dispersion
These measures indicate how spread out the data are:

Range:
Range:
••The
Themaximum
maximumvalue
valuefor
foraavariable
variableminus
minusthe
theminimum
minimumvalue
valuefor
forthat
that
variable.
variable.

Variance:
Variance:
••The
Thesums
sumsofofthe
thesquared
squareddeviations
deviationsfrom
fromthe
themean
meandivided
dividedby
bythe
the
number
numberofofobservations
observationsminus
minusone.
one.
••The
Thesame
sameformula
formulaasasstandard
standarddeviation
deviationwith
withthe
thesquaring.
squaring.

Standard
StandardDeviation:
Deviation:
••Measure
Measureofofdispersion
dispersioncalculated
calculatedby
bysubtracting
subtractingthe
themean
meanofofthe
theseries
series
from each value in a series, squaring each result, summing the results,
from each value in a series, squaring each result, summing the results,
dividing
dividingthethesum
sumbybythe
thenumber
numberofofitems
itemsminus
minus1,1,and
andtaking
takingthe
thesquare
square
root
rootofofthis
thisvalue.
value. 15-16
Standard Deviation: Example

1) You are told that the average starting salary for someone working at
Company Statistix is $70,000, you may think, “Wow! That’s great.” If the
standard deviation is $20,000, what would you think? How about $5,000?

2) Without the standard deviation, you can’t compare two data sets effectively.
Suppose two sets of data have the same average; does that mean that the
data sets must be exactly the same?
Not at all. For example, the data sets 199, 200, 201 and 0, 200, 400 both have the
same average (200) yet they have very different standard deviations. The first data
set has a very small standard deviation (s=1) compared to the second data set
(s=200).
Normal Distribution
Descriptive Statistics
Descriptive
Descriptive statistics
statistics are
are the
the most
most efficient
efficient means
means ofof summarizing
summarizing the
the
characteristics
characteristicsofoflarge
largesets
setsofofdata.
data.InInaastatistical
statisticalanalysis,
analysis,the
theanalyst
analyst
calculates
calculatesone
onenumber
numberororaafew fewnumbers
numbersthatthatreveal
revealsomething
somethingabout
aboutthe
the
characteristics
characteristicsofoflarge
largesets
setsofofdata.
data.
Significant discrepancies in “Mean”
and Median” should cause you to
look further into this data.

Years in Business

Mean 22.4
Standard Error 2.6
Median 15.0
Mode 5.0
Standard Deviation 23.1
Sample Variance 534.5
Kurtosis 3.8
Skewness 2.1
Range 98.0
Minimum 2.0
Maximum 100.0
Sum 1770.5
Count 79.0

15-17
Key Terms & Definitions
Descriptive Statistics: Example
Descriptive Statistics
Ingredient: Scale variable
Descriptive Statistics
Descriptive Statistics

Rule of thumb: If -1,96<Skewness/std err<+1,96


and -1,96<Kurtosis/std err<+1,96 the distribution
is normal
Descriptive Statistics
Descriptive Statistics: Explore
Descriptive Statistics: Explore

Ingredients:
1 Scale variable
1 Categoric
variable
Descriptive Statistics: Explore
Descriptive Statistics: Explore
Descriptive Statistics: Explore
Descriptive Statistics: Explore

It is the mean
when %5 of
the outliers are
taken out

You might also like