Professional Documents
Culture Documents
CH 12
TOPICS DISCUSSED
GETTING DATA READY FOR ANALYSIS
• Editing Data
• Handling Blank Responses
• Coding
• Categorizing
• Entering Data
DATA ANALYSIS
• Basic Objectives in Data Analysis
• Feel for the Data
• Testing Goodness of Data
• Hypothesis Testing
DATA ANALYSIS AND INTERPRETATION
• Use of Several Data-Analytic Techniques
• Descriptive Statistics
• Inferential Statistics
SOME SOFTWARE PACKAGES USEFUL FOR DATA ANALYSIS USE OF
EXPERT SYSTEMS IN SELECTING THE APPROPRIATE
STATISTICAL TESTS
CHAPTER OBJECTIVES
After completing Chapter 12 you should be able to:
1. Edit questionnaire and interview responses.
2. Handle blank responses.
3. Set up the coding key for the data set and code the data.
4. Categorize data.
5. Create a data file.
6. Use SPSS or Excel or SAS or other software program for data entry
and data analysis.
7. Get a ―feel‖ for the data.
1
8. Test the goodness of data.
9. Interpret the computer results of tests of various hypotheses.
After data have been collected from a representative sample of the
population, the next step is to analyze them to test the research
hypotheses. Data analysis is now routinely done with software
programs such as SPSS, SAS, STATPAK, SYS- TAT, Excel, and the like. All
are user-friendly and interactive and have the capa- bility to seamlessly
interface with different databases. Excellent graphs and charts can also
be produced through most of these software programs. Some of the
charts generated from Excel‘s Chart Wizard may be seen in the next
chapter.
However, before we start analyzing the data to test hypotheses, some
prelim- inary steps need to be completed. These help to ensure that the
data are rea- sonably good and of assured quality for further analysis.
Figure 12.1 identifies the four steps in data analysis: (1) getting data
ready for analysis, (2) getting a feel for the data, (3) testing the
goodness of data, and (4) testing the hypothe- ses. We will now examine
each of these steps.
2
the use of a different color pencil or ink so that the original information
is still available in case of further doubts later.
Incoming mailed questionnaire data have to be checked for
incompleteness and inconsistencies, if any, by designated members of
the research staff. Incon- sistencies that can be logically corrected
should be rectified and edited at this stage. For instance, the respondent
might have inadvertently not answered the question on a questionnaire
asking whether or not she is married. Against the col- umn asking for
the number of years married, she might have responded 12 years; in the
number of children column, she might have marked 2, and for ages of
chil- dren, she might have answered 8 and 4. The latter three responses
would indi- cate that the respondent is in all probability married. The
unfilled response to the marital status question could then be edited by
the researcher to read ―yes.‖ It is, however, possible that the
respondent deliberately omitted responding to the item because she is
either a widow or has lately been separated or widowed, or for some
other reason. If such were to be the case, we would be introducing a bias
in the data by editing the data to read ―yes.‖ Hence, whenever possible,
it would be better to follow up with the respondent and get the correct
data while editing. The example we gave is a clear case for editing, but
some others may not be so simple, or omissions could be left unnoticed
and not rectified. There may be other biases that could affect the
goodness of the data, over which the researcher has no control. The
validity and the replicability of the study could thus be impaired.
As indicated in Chapter 10 under ―Data Collection Methods,‖ much of
the edit- ing is automatically taken care of in the case of computer-
assisted telephone interviews and electronically administered
questionnaires, even as the respon- dent is answering the questions.
4
key in the data. This method, in contrast to flipping through each
questionnaire for each item, avoids confusion, especially when there are
many questions and a large number of questionnaires as well. The
easiest way to illustrate a coding scheme is through an example. Let us
take the correct answer to Exercise 10.4 in Chapter 10—the
questionnaire design exercise to test the job involvement–job
satisfaction hypothesis in the Serakan Co. case—and see how it can be
coded.
Coding
The next step is to code the responses. In Chapter 10, we discussed the
convenience of using scanner sheets for collecting questionnaire data;
such sheets facilitate the entry of the responses directly into the
computer without manual keying in of the data. However, if for
whatever reason this cannot be done, then it is perhaps better to use a
coding sheet first to transcribe the data from the questionnaire and then
key in the data. This method, in contrast to flipping through each
questionnaire for each item, avoids confusion, especially when there are
many questions and a large number of questionnaires as well. The
easiest way to illustrate a coding scheme is through an example. Let us
take the correct answer to Exercise 10.4 in Chapter 10—the
questionnaire design exercise to test the job involvement–job
satisfaction hypothesis in the Serakan Co. case—and see how it can be
coded.
Table 12.1
Coding of Serakan Co. Questionnaire
5
Here are some questions that ask you to tell us how you experience your
work life in general. Please circle the appropriate number on the scales
below.
To what extent would you agree with the following statements, on a
scale of 1 to 7, 1 denoting very low agreement, and 7 denoting very high
agreement?
Categorization
At this point it is useful to set up a scheme for categorizing the variables
such that the several items measuring a concept are all grouped
together. Responses to some of the negatively worded questions have
also to be reversed so that all answers are in the same direction. Note
that with respect to negatively worded questions, a response of 7 on a
6
7-point scale, with 7 denoting ―strongly agree,‖ really means ―strongly
disagree,‖ which actually is a 1 on the 7-point scale. Thus the item has
to be reversed so as to be in the same direction as the positively worded
questions. This can be done on the computer through a Transform and
RECODE statement. In the Serakan Co. data, items 16 to 21 will have to
be recoded such that scores of
7 are read as 1; 6 as 2; 5 as 3; 3 as 5; 2 as 6; and 1 as 7.
If the questions measuring a concept are not contiguous but scattered
over various parts of the questionnaire, care has to be taken to include
all the items with- out any omission or wrong inclusion.
Entering Data
If questionnaire data are not collected on scanner answer sheets, which
can be directly entered into the computer as a data file, the raw data will
have to be manually keyed into the computer. Raw data can be entered
through any soft- ware program. For instance, the SPSS Data Editor,
which looks like a spread- sheet, can enter, edit, and view the contents
of the data file. Each row of the editor represents a case, and each
column represents a variable. All missing values will appear with a
period (dot) in the cell. It is possible to add, change, or delete values
easily after the data have been entered.
It is also easy to compute the new variables that have been categorized
earlier, using the Compute dialog box, which opens when the Transform
icon is chosen. Once the missing values, the recodes, and the computing
of new variables are taken care of, the data are ready for analysis.
DATA ANALYSIS
In the rest of this chapter, we will elaborate on the various statistical
tests and the interpretation of the results of the analyses, using the SPSS
Version 11.0 for Windows—a menu-driven software program. In the
Appendix to this chapter, we also show the results of data analysis,
using Excel. Use of these two programs is illustrated mainly because
they are easily available in business settings. It should be noted that any
other software program can be used as well, and they would produce
similar results, which will be interpreted in the same manner.
Basic Objectives in Data Analysis
In data analysis we have three objectives: getting a feel for the data,
testing the goodness of data, and testing the hypotheses developed for
the research. The feel for the data will give preliminary ideas of how
7
good the scales are, how well the coding and entering of data have been
done, and so on. Suppose an item tapped on a 7-point scale has been
improperly coded and/or entered as 8; this will be highlighted by the
maximum values on the descriptive statistics and the error can be
rectified. The second objective—testing the goodness of data—can be
accomplished by submitting the data for factor analysis, obtaining the
Cronbach‘s alpha or the split-half reliability of the measures, and so on.
The third objective—hypotheses testing—is achieved by choosing the
appropriate menus of the software programs, to test each of the
hypotheses using the relevant statistical test. The results of these tests
will determine whether or not the hypotheses are substantiated. We
will now discuss data analysis with respect to each of these three
objectives in detail.
8
A frequency distribution of the nominal variables of interest should be
obtained. Visual displays thereof through histograms/bar charts, and so
on, can also be pro- vided through programs that generate charts. In
addition to the frequency distributions and the means and standard
deviations, it is good to know how the dependent and independent
variables in the study are related to each other. For this purpose, an
intercorrelation matrix of these variables should also be obtained.
It is always prudent to obtain (1) the frequency distributions for the
demo- graphic variables, (2) the mean, standard deviation, range, and
variance on the other dependent and independent variables, and (3) an
intercorrelation matrix of the variables, irrespective of whether or not
the hypotheses are directly related to these analyses. These statistics
give a feel for the data. In other words, examination of the measure of
central tendency, and how clustered or dispersed the variables are,
gives a good idea of how well the questions were framed for tap- ping
the concept. The correlation matrix will give an indication of how
closely related or unrelated the variables under investigation are. If the
correlation between two variables happens to be high—say, over .75—
we would start to wonder whether they are really two different
concepts, or whether they are measuring the same concept. If two
variables that are theoretically stated to be related do not seem to be
significanly correlated to each other in our sample, we would begin to
wonder if we have measured the concepts validity and reliably. Recall
our discussions on convergent and discriminant validity in Chapter 10.
Establishing the goodness of data lends credibility to all subsequent
analyses and findings. Hence, getting a feel for the data becomes the
necessary first step in all data analysis. Based on this initial feel, further
detailed analyses may be done to test the goodness of the data.
9
the concept. The closer Cronbach‘s alpha is to 1, the higher the internal
consistency reliability.
Another measure of consistency reliability used in specific situations is
the split-half reliability coefficient. Since this reflects the correlations
between two halves of a set of items, the coefficients obtained will vary
depending on how the scale is split. Sometimes split-half reliability is
obtained to test for consistency when more than one scale, dimension,
or factor, is assessed. The items across each of the dimensions or factors
are split, based on some predetermined logic (Campbell, 1976). In
almost every case, Cronbach‘s alpha is an adequate test of internal
consistency reliability. You will see later in this chapter how Cronbach‘s
alpha is obtained through computer analysis.
As discussed in Chapter 9, the stability of a measure can be assessed
through parallel form reliability and test–retest reliability. When a
high correlation between two similar forms of a measure (see Chapter
9) is obtained, parallel form reliability is established. Test–retest
reliability can be established by computing the correlation between the
same tests administered at two different time periods.
Validity
Factorial validity can be established by submitting the data for factor
analysis. The results of factor analysis (a multivariate technique) will
confirm whether or not the theorized dimensions emerge. Recall from
Chapter 8 that measures are developed by first delineating the
dimensions so as to operationalize the concept. Factor analysis would
reveal whether the dimensions are indeed tapped by the items in the
measure, as theorized. Criterion-related validity can be established by
testing for the power of the measure to differentiate individuals who are
known to be different (refer to discussions regarding concurrent and
predictive validity in Chapter 9). Convergent validity can be
established when there is high degree of correlation between two
different sources responding to the same measure (e.g., both
supervisors and subordinates respond similarly to a perceived reward
system mea- sure administered to them). Discriminant validity can be
established when two distinctly different concepts are not correlated to
each other (as, for example, courage and honesty; leadership and
motivation; attitudes and behavior). Convergent and discriminant
10
validity can be established through the multitrait multi- method matrix,
a full discussion of which is beyond the scope of this book. The student
interested in knowing more about factor analysis and the multitrait
multi- method matrix can refer to books on those subjects. When well-
validated measures are used, there is no need, of course, to establish
their validity again for each study. The reliability of the items can,
however, be tested.
Hypothesis Testing
Once the data are ready for analysis, (i.e., out-of-range/missing
responses, etc., are cleaned up, and the goodness of the measures is
established), the researcher is ready to test the hypotheses already
developed for the study. In the Module at the end of the book, the
statistical tests that would be appropriate for different hypotheses and
for data obtained on different scales are discussed. We will now
examine the results of analyses of data obtained from a company, and
how they are interpreted .
12
entered into the computer. Thereafter, the data were submitted for
analysis to test the following hypotheses, which were formulated by the
researchers:
1. Men will perceive less equity than women (or women will perceive
more equity than men).
2. The job satisfaction of individuals will vary depending on the shift
they work.
3. Employees‘ intentions on leave (ITL) will vary according to their job
title. In other words, there will be significant differences in the ITL of
top managers, middle level managers, supervisors, and the clerical and
blue-collar employees.
4. There will be a relationship between the shifts that people work (first,
second, and third shift) and the part-time versus full-time status of
employees. In other words, these two factors will not be independent.
5. The four independent variables of job characteristics, distributive
justice, burnout, and job satisfaction will significantly explain the
variance in intention to leave.
In may be pertinent to point out here that the five hypotheses derived
from the theoretical framework are particularly relevant for finding
answers to the turnover issue in direct and indirect ways. For example,
if men perceived more inequity (as could be conjectured from the
interview data), it would be important to set right their
(mis)perceptions so that they are less inclined to leave (if indeed a
positive correlation between perceived inequities and ITL is found). If
work shift has an influence on job satisfaction (irrespective of its
influence on ITL), the matter will have to be further examined since job
satisfaction is also an important outcome variable for the organization.
If employees at particular levels have greater intentions of leaving,
further information has to be gathered as to what can be done for these
groups. If there is a pattern to the part-time/full-time employees
working for particular shifts, this might offer some suggestions for
further investigation, such as: ―Do part-time employees in the night
shift have some special needs that are not addressed currently?‖ The
results of testing the last hypothesis will certainly offer insights into
how much of the variance in ITL will be explained by the four
independent variables, and what corrective action, if any, needs to be
taken.
The researcher submitted the data for computer analysis using the SPSS
Version
13
11.0 for Windows software program. We will now proceed to discuss
the results of these analyses and their interpretation. In particular, we
will examine the following:
1. The establishment of Cronbach‘s alpha for the measures.
2. The frequency distribution of the variables.
3. Descriptive statistics such as the mean and standard deviation.
4. The Pearson correlation matrix.
5. The results of hypotheses testing.
14
It is important to note that all the negatively worded items in the
questionnaire should first be reversed before the items are submitted
for reliability tests. Unless all the items measuring a variable are in the
same direction, the reliabilities obtained will be incorrect.
Output 12.1
Reliability Analysis
1. From the menus, choose: Analyze Scale
Reliability Analysis…
2. Select the variables constituting the scale.
3. Choose Model Alpha.
Reliability Output
Reliability Coefficients 6 items
Alpha = .8172 Standardized item alpha = .8168
Output 12.2
Frequencies
From the menus, choose: Analyze
15
Descriptive Statistics
Frequencies…
(Select the relevant variables) Choose needed:
Statistics… Charts…
Format (for the order in which the results are to be displayed)
worked for the organization for less than a year, 20% 1 to 3 years, 20%
4 to 6 years,
the balance 39% over 6 years, including 8% who had worked for over
20 years.
We thus have a profile of the employees in this organization, which is
use- ful to describe the sample in the Methods Section of the Written
Report (see next chapter). The frequencies can also be visually
displayed as bar charts, histograms, or pie charts by clicking on
Statistics in the menu, then Summa- rize, then Frequencies, and Charts in
the Frequencies dialog box and then selecting the needed chart.
16
Output 12.3
Descriptive Statistics: Central Tendencies and Dispersions
From the menus, choose: Analyze
Descriptive Statistics
Descriptives…
(Select the variables)
Options…
(Choose the relevant statistics needed)
The variance for burnout, job satisfaction, and the job characteristics is
not high. The variance for ITL and perceived equity (distributive justice)
is only slightly more, indicating that most respondents are very close to
the mean on all the variables.
In sum, the perceived equity is rather low, not much burnout
experienced, the job is perceived to be fairly enriched, there is average
job satisfaction, and there is neither a strong intention to stay with the
organization nor to leave it.
Inferential Statistics: Pearson Correlation
The Pearson correlation matrix obtained for the five interval-scaled
variables is shown in Output 12.4. From the results, we see that the
intention to leave is, as would be expected, significantly, negatively
correlated to perceived distributive justice (equity), job satisfaction, and
enriched job. That is, the intention to leave is low if equitable treatment
and job satisfaction are experienced, and the job is enriched. However,
when individuals experience burnout (physical and emotional
exhaustion), their intention to leave also increases (positive correlation
of
.33). Job satisfaction is also positively correlated to perceived equity,
and enriched job. It is negatively correlated to burnout and ITL. The
correlations are all in the expected direction.
The Pearson correlation coefficient is appropriate for interval- and
ratio-scaled variables, and the Spearman Rank or the Kendall‘s Tau
coefficients are appropriate when variables are measured on an ordinal
scale. Any bivariate correlation can be obtained by clicking the relevant
menu, identifying the variables, and seeking the appropriate parametric
or nonparametric statistics.
17
It is important to note that no correlation exceeded .59 for this sample.
If correlations were higher (say, .75 and above), we might have had to
suspect whether or not the correlated variables are two different and
distinct variables and would have doubted the validity of the measures.
Hypothesis Testing
Five hypotheses were generated for this study as stated earlier. These
call for the use of a t-test (for hypothesis 1), an ANOVA (for hypotheses
2 and 3), a chi- square test (for hypothesis 4), and a multiple regression
analysis (for hypothesis
5). The results of these tests and their interpretation are discussed
below.
Output 12.4
Pearson Correlations Matrix
From the menus, choose: Analyze
Correlate
Bivariate…
(Select relevant variables) Option…
Select:
a. Type of correlation coefficient: select relevant one (e.g. Pearson,
Kendall’s
Tau, Spearman)
b. Test of significance—two tailed, one-tailed.
H1A: Women will perceive more equity than men (or men will perceive
less equity than women).
Statistically expressed: H1A is: μW > μM
A t-test will indicate if the perceived differences are significantly
different for women than for men. The results of the t-test done are
shown in Output 12.5. As may be seen, the difference in the means of
18
2.43 and 2.34 with standard devi- ations of .75 and .76 for the women
and men on perceived equity (or distribu- tive justice) is not significant
(see table showing t-test for Equality of Means). Thus, hypothesis 1 is
not substantiated.
Output 12.5
t-Test for Differences between Two Groups
(Independent Samples Test) Choose:
Analyze
Compare Means
Independent-Samples t Test…
Select a. single grouping variable and click Define groups to specify the
two codes to be compared.
Options…
(Specify Confidence level required – .05, .01, etc.)
Output 12.6
ANOVA
Choose: Analyze
Compare Means
One-Way ANOVA…
(Select the dependent variable/s and one independent factor variable)
Since there are more than two groups (three different shifts) and job
satisfaction is measured on an interval scale, ANOVA is appropriate to
19
test this hypothesis. The results of ANOVA, testing this hypothesis, are
shown in Output 12.6.
The df in the third column refers to the degrees of freedom, and each
source
of variation has associated degrees of freedom. For the between-groups
variance, df = (K – 1), where K is the total number of groups or levels.
Because there were three shifts, we have (3 – 1) = 2 df. The df for the
within-groups sum of squares equals (N – K), where N is the total
number of respondents and K is the total number of groups. If there
were no missing responses, (N – K) should be (174 –
3) = 171. However, in this case, there were 12 missing responses, and
hence the associated df is (162 – 3) = 159.
The mean square for each source of variation (column 5 of the results)
is derived by dividing the sum of squares by its associated df. Finally, the
F value itself equals the explained mean square divided by the residual
mean square.
20
where the five μ‘s represent the five means on ITL of employees at the
five dif- ferent job levels.
H3A: The ITL of members at the five different job levels will not be the
same
Statistically expressed, H3A is: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5
The results of this ANOVA test shown in Output 12.7 do not indicate any
signif- icant differences in the intention to leave among the five groups
(F = 1.25; p =
.29). Thus, hypothesis 3 was not substantiated.
Hypothesis 4: Use of Chi-Square Test. Hypothesis 4 can be stated in
the null and alternate as follows:
H4o: Shifts worked and employment status (part-time vs. full-time) will
be inde- pendent (i.e., will not be related).
H4A: There will be a relationship between the shifts that people work
and their part-time vs. full-time status.
Since both variables are nominal, a chi-square (χ2) test was done, the
results of which are shown in Output
12.8. The cross-tabulation count indicates that, of
Output 12.7
ANOVA with ITL as the Dependent Variable
One-way ANOVA Output
Output 12.8
Chi-square Test
Choose: Analyze
Descriptive Statistics
Crosstabs…
(Enter variables in the Rows and Columns boxes)
Statistics…
Select Chi-square
the full-time employees, 103 work the first shift, 25 work the second
shift, and
21
18 the third shift. Of the part-time employees, 16 work the first shift, 8
the second shift, and 4 the third shift.
It may be seen that the χ2 value of 2.31, with two degrees of freedom, is
not
significant. In other words, the part-time/full-time status and the shifts
worked are not related. Hence hypothesis 4 has not been
substantiated.
Hypothesis 5: Use of Multiple Regression Analysis. The last
hypothesis can be stated in the null and alternate as follows:
H50: The four independent variables will not significantly explain the
variance in intention to leave.
H5A: The four independent variables will significantly explain the
variance in intention to leave.
22
most important). If we look at the column Beta under Standardized
Coefficients, we see that the highest number in the beta is –.37 for job
satisfaction, which is significant at the .0001 level. It may also be seen
that this is the only independent variable that is significant. The
negative beta weight indicates that if ITL is to be reduced, it is necessary
to enhance the job satisfaction of employees.
Overall Interpretation and Recommendations to the President
Of the five hypothesis tested, two were substantiated and three were
not. From the results of the multiple regression analysis, it is clear that
job satisfaction is the most influential factor in explaining employees‘
intentions to stay with the organization. Whatever is done to increase
job satisfaction will therefore help employees to think less about leaving
and induce them to stay.
It is also clear from the results that ITL does not differ with job level.
That is, employees at all levels feel neither too strongly inclined to stay
with the organi- zation nor to leave it. Hence, if retention of employees
is a top priority for the president, it is important to pay attention to
employees at all levels and formu- late policies and practices that help
enhance the job satisfaction of all of them. Also, since job satisfaction is
found to be significantly lower for employees work- ing the evening
shift, further interviews with them might shed some light on the factors
that make them dissatisfied. Corrective action can then be taken.
It is informative to find that the perceived equity, though not
significantly different for men and women as originally hypothesized, is
neverthless rather low for all (see Output 12.3). The Pearson correlation
matrix (Output 12.4) indicates that perceived equity (or distributive
justice) is positively correlated.
Output 12.9
Multiple Regression Analysis
Choose: Analyze
Regression
Linear…
(Enter dependent and independent variables)
23
job satisfaction and negatively correlated to ITL. The president will
there-
be well advised to rectify inequities, in the system, if they do really exist,
clear misperceptions of inequities, if this were to be actually the case.
Increasing job satisfaction will no doubt help to reduce employees‘
intention quit, but the fact that only 30% of the variance in Intention to
Leave was sig-
explained by the four independent variables considered in this study
leaves 70% unexplained. In other words, there are other additional
variables are important in explaining ITL that have not been considered
in this study. further research might be necessary to explain more of the
variance in ITL, if president desires to pursue the matter further.
We have now seen how different hypotheses can be tested by applying
the statistical tests in data analysis. Based on the interpretation of the
the research report is then written, making necessary recommendations
discussing the pros and cons of each, together with cost/benefit
analysis. to the study are also specifically stated so that the reader is
made
of the biases that might have crept into the study. This also gives a pro-
touch to the study, attesting to its scientific orientation.
24
The Statistical Navigator is a useful guide for those who are not well
versed
in statistics but want to ensure that they use the appropriate statistical
techniques.
Incidentally, Expert Systems can also be used for making decisions with
respect to various aspects of the research design—nature of study,
time hori- zon, type of study, study setting, unit of analysis,
sampling designs, data collection methods, and the like.
Other applications of Expert Systems for business decisions using
available data include Auditor (for decisions on allowing for bad debts),
and Tax Advisor (this helps audit firms to advise clients on estate
planning). As suggested by Luconi, Malone, and Morton (1986), Expert
Systems can be used for making deci- sions with respect to operational
control (accounts receivable, inventory control, cash management,
production scheduling), management control (budget analy- sis,
forecasting, variance analysis, budget preparation), and strategic
planning (warehouse and factory location, mergers and acquisitions,
new product plan- ning). Thus, there is infinite scope for developing and
using expert systems to aid managerial problem solving and decision
making.
25
the four categories of clien- tele he should choose to continue to serve in
the future.
What kind of analysis should be done in the above case and why?
4. Below are Tables 12A to 12D, summarizing the results of data
analyses of research conducted in a sales organization that operates in
50 different cities of the country, and employs a total sales force of
about 500. The number of salesman sampled for the study was 150.
You are to:
a. Interpret the information contained in each of the tables in as much
detail as possible. b. Summarize the results for the CEO of the company.
c. Make recommendations based on your interpretation of the results.
26