Research Methods No.2

Research Methodology in Business
Prof. Dr. Mohammed Al-Taee
CH 12
DATA ANALYSIS AND INTERPRETATION
TOPICS DISCUSSED
GETTING DATA READY FOR ANALYSIS
• Editing Data
• Handling Blank Responses
• Coding
• Categorizing
• Entering Data
DATA ANALYSIS
• Basic Objectives in Data Analysis
• Feel for the Data
• Testing Goodness of Data
• Hypothesis Testing
• Use of Several Data-Analytic Techniques
• Descriptive Statistics
• Inferential Statistics
SOME SOFTWARE PACKAGES USEFUL FOR DATA ANALYSIS USE OF
EXPERT SYSTEMS IN SELECTING THE APPROPRIATE
STATISTICAL TESTS
CHAPTER OBJECTIVES
After completing Chapter 12 you should be able to:
1. Edit questionnaire and interview responses.
2. Handle blank responses.
3. Set up the coding key for the data set and code the data.
4. Categorize data.
5. Create a data file.
6. Use SPSS or Excel or SAS or other software program for data entry
and data analysis.
7. Get a ―feel‖ for the data.
1
8. Test the goodness of data.
9. Interpret the computer results of tests of various hypotheses.
After data have been collected from a representative sample of the
population, the next step is to analyze them to test the research
hypotheses. Data analysis is now routinely done with software
programs such as SPSS, SAS, STATPAK, SYS- TAT, Excel, and the like. All
are user-friendly and interactive and have the capa- bility to seamlessly
interface with different databases. Excellent graphs and charts can also
be produced through most of these software programs. Some of the
charts generated from Excel‘s Chart Wizard may be seen in the next
chapter.
However, before we start analyzing the data to test hypotheses, some
preliminary steps need to be completed. These help to ensure that the
data are rea- sonably good and of assured quality for further analysis.
Figure 12.1 identifies the four steps in data analysis: (1) getting data
ready for analysis, (2) getting a feel for the data, (3) testing the
goodness of data, and (4) testing the hypotheses. We will now examine
each of these steps.
GETTING DATA READY FOR ANALYSIS

After data are obtained through questionnaires, interviews, observation,
or through secondary sources, they need to be edited. The blank
responses, if any, have to be handled in some way, the data coded, and a
categorization scheme has to be set up. The data will then have to be
keyed in, and some software program used to analyze them. Each of
these stages of data preparation is discussed below.
Editing Data
Data have to be edited, especially when they relate to responses to
open-ended questions of interviews and questionnaires, or
unstructured observations. In other words, information that may have
been noted down by the interviewer, observer, or researcher in a hurry
must be clearly deciphered so that it may be coded sys- tematically in its
entirety. Lack of clarity at this stage will result later in confusion. In an
earlier chapter, it was recommended that such editing should be done
preferably the very same day the data are collected so that the
respondents may be contacted for any further information or
clarification, as needed. The edited data should be identifiable through
2
the use of a different color pencil or ink so that the original information
is still available in case of further doubts later.
Incoming mailed questionnaire data have to be checked for
incompleteness and inconsistencies, if any, by designated members of
the research staff. Incon- sistencies that can be logically corrected
should be rectified and edited at this stage. For instance, the respondent
might have inadvertently not answered the question on a questionnaire
asking whether or not she is married. Against the column asking for
the number of years married, she might have responded 12 years; in the
number of children column, she might have marked 2, and for ages of
children, she might have answered 8 and 4. The latter three responses
would indicate that the respondent is in all probability married. The
unfilled response to the marital status question could then be edited by
the researcher to read ―yes.‖ It is, however, possible that the
respondent deliberately omitted responding to the item because she is
either a widow or has lately been separated or widowed, or for some
other reason. If such were to be the case, we would be introducing a bias
in the data by editing the data to read ―yes.‖ Hence, whenever possible,
it would be better to follow up with the respondent and get the correct
data while editing. The example we gave is a clear case for editing, but
some others may not be so simple, or omissions could be left unnoticed
and not rectified. There may be other biases that could affect the
goodness of the data, over which the researcher has no control. The
validity and the replicability of the study could thus be impaired.
As indicated in Chapter 10 under ―Data Collection Methods,‖ much of
the editing is automatically taken care of in the case of computer-
assisted telephone interviews and electronically administered
questionnaires, even as the respondent is answering the questions.
Handling Blank Responses

Not all respondents answer every item in the questionnaire. Answers
may have been left blank because the respondent did not understand
the question, did not know the answer, was not willing to answer, or
was simply indifferent to the need to respond to the entire
questionnaire. In the last situation, the respondent is likely to have left
many of the items blank. If a substantial number of questions—say,
25% of the items in the questionnaire—have been left unanswered, it
may be a good idea to throw out the questionnaire and not include it in
the data set for analysis. In this event, it is important to mention the
3
number of returned but unused responses due to excessive missing data
in the final report submitted to the sponsor of the study. If, however,
only two or three items are left blank in a questionnaire with, say, 30 or
more items, we need to decide how these blank responses are to be
handled.
One way to handle a blank response to an interval-scaled item with a
midpoint would be to assign the midpoint in the scale as the response
to that particular item. An alternative way is to allow the computer to
ignore the blank responses when the analyses are done. This, of course,
will reduce the sample size whenever that variable is involved in the
analyses. A third way is to assign to the item the mean value of the
responses of all those who have responded to that particular item. A
fourth is to give the item the mean of the responses of this particular
respondent to all other questions measuring this variable. A fifth way of
dealing with it is to give the missing response a random number within
the range for that scale. It should also be noted that SPSS uses linear
interpolation from adjacent points as also a linear trend to replace
missing data. Thus, there are at least seven different ways of handling
missing data.
As may be seen, there are several ways of handling blank responses; a
common approach, however, is either to give the midpoint in the scale
as the value or to ignore the particular item during the analysis. The
computer can be programmed to handle missing and ―don‘t know‖
responses in the manner we decide to deal with them. The best way to
handle missing data to enhance the validity of the study, especially if the
sample size is big, is to omit the case where the datum relating to a
particular analysis is missing. If however, many of the respondents have
answered ―don‘t know‖ to a particular item or items, further
investigation may well be worthwhile. The question might not have
been clear or some organizational aspect could have precluded them
from answering, which then might need further probing.
Coding
The next step is to code the responses. In Chapter 10, we discussed the
convenience of using scanner sheets for collecting questionnaire data;
such sheets facilitate the entry of the responses directly into the
computer without manual keying in of the data. However, if for
whatever reason this cannot be done, then it is perhaps better to use a
coding sheet first to transcribe the data from the questionnaire and then
4
key in the data. This method, in contrast to flipping through each
questionnaire for each item, avoids confusion, especially when there are
many questions and a large number of questionnaires as well. The
easiest way to illustrate a coding scheme is through an example. Let us
take the correct answer to Exercise 10.4 in Chapter 10—the
questionnaire design exercise to test the job involvement–job
satisfaction hypothesis in the Serakan Co. case—and see how it can be
coded.
Coding
The next step is to code the responses. In Chapter 10, we discussed the
convenience of using scanner sheets for collecting questionnaire data;
such sheets facilitate the entry of the responses directly into the
computer without manual keying in of the data. However, if for
whatever reason this cannot be done, then it is perhaps better to use a
coding sheet first to transcribe the data from the questionnaire and then
key in the data. This method, in contrast to flipping through each
questionnaire for each item, avoids confusion, especially when there are
many questions and a large number of questionnaires as well. The
easiest way to illustrate a coding scheme is through an example. Let us
take the correct answer to Exercise 10.4 in Chapter 10—the
questionnaire design exercise to test the job involvement–job
satisfaction hypothesis in the Serakan Co. case—and see how it can be
coded.
Coding the Serakan Co. Data

In the Serakan Co. questionnaire, we have 5 demographic variables and
16 items measuring involvement and satisfaction as shown in Table
12.1.
Table 12.1
Coding of Serakan Co. Questionnaire
5
Here are some questions that ask you to tell us how you experience your
work life in general. Please circle the appropriate number on the scales
below.
To what extent would you agree with the following statements, on a
scale of 1 to 7, 1 denoting very low agreement, and 7 denoting very high
agreement?
The responses to the demographic variables can be coded from 1 to 5

for age, and 1 to 6 for the variables of education and job level,
depending on which box in the columns was checked by the respondent.
Sex can be coded as 1 or 2 depending on whether the response was
from a male or a female. Work shift can be coded 1 to 3, and
employment status as either 1 or 2.
It is easy to see that when some thought is given to coding at the time of
designing the questionnaire, coding can become simple. For example,
since numbers were given within the boxes for all the above items
(instead of simply putting a box for marking the appropriate one), it
would be easy to transfer them to the code sheet, or directly key in the
data.
Items numbered 6 to 21 on the questionnaire can be coded by using the
actual number circled by the respondents. If, for instance, 3 had been
circled for the first question, then the response will be coded as 3; if 4
was circled, we would code it as 4, and so on.
It is possible to key in the data directly from the questionnaires, but that
would need flipping through several questionnaires, page by page,
resulting in possible errors and omissions of items. Transfer of the data
first onto a code sheet would thus help.
Human errors can occur while coding. At least 10% of the coded
questionnaires should therefore be checked for coding accuracy. Their
selection may follow a systematic sampling procedure. That is, every nth
form coded could be verified for accuracy. If many errors are found in
the sample, all items may have to be checked.
Categorization
At this point it is useful to set up a scheme for categorizing the variables
such that the several items measuring a concept are all grouped
together. Responses to some of the negatively worded questions have
also to be reversed so that all answers are in the same direction. Note
that with respect to negatively worded questions, a response of 7 on a
6
7-point scale, with 7 denoting ―strongly agree,‖ really means ―strongly
disagree,‖ which actually is a 1 on the 7-point scale. Thus the item has
to be reversed so as to be in the same direction as the positively worded
questions. This can be done on the computer through a Transform and
RECODE statement. In the Serakan Co. data, items 16 to 21 will have to
be recoded such that scores of
7 are read as 1; 6 as 2; 5 as 3; 3 as 5; 2 as 6; and 1 as 7.
If the questions measuring a concept are not contiguous but scattered
over various parts of the questionnaire, care has to be taken to include
all the items without any omission or wrong inclusion.
Entering Data
If questionnaire data are not collected on scanner answer sheets, which
can be directly entered into the computer as a data file, the raw data will
have to be manually keyed into the computer. Raw data can be entered
through any software program. For instance, the SPSS Data Editor,
which looks like a spread- sheet, can enter, edit, and view the contents
of the data file. Each row of the editor represents a case, and each
column represents a variable. All missing values will appear with a
period (dot) in the cell. It is possible to add, change, or delete values
easily after the data have been entered.
It is also easy to compute the new variables that have been categorized
earlier, using the Compute dialog box, which opens when the Transform
icon is chosen. Once the missing values, the recodes, and the computing
of new variables are taken care of, the data are ready for analysis.
DATA ANALYSIS
In the rest of this chapter, we will elaborate on the various statistical
tests and the interpretation of the results of the analyses, using the SPSS
Version 11.0 for Windows—a menu-driven software program. In the
Appendix to this chapter, we also show the results of data analysis,
using Excel. Use of these two programs is illustrated mainly because
they are easily available in business settings. It should be noted that any
other software program can be used as well, and they would produce
similar results, which will be interpreted in the same manner.
Basic Objectives in Data Analysis
In data analysis we have three objectives: getting a feel for the data,
testing the goodness of data, and testing the hypotheses developed for
the research. The feel for the data will give preliminary ideas of how
7
good the scales are, how well the coding and entering of data have been
done, and so on. Suppose an item tapped on a 7-point scale has been
improperly coded and/or entered as 8; this will be highlighted by the
maximum values on the descriptive statistics and the error can be
rectified. The second objective—testing the goodness of data—can be
accomplished by submitting the data for factor analysis, obtaining the
Cronbach‘s alpha or the split-half reliability of the measures, and so on.
The third objective—hypotheses testing—is achieved by choosing the
appropriate menus of the software programs, to test each of the
hypotheses using the relevant statistical test. The results of these tests
will determine whether or not the hypotheses are substantiated. We
will now discuss data analysis with respect to each of these three
objectives in detail.
Feel for the Data

We can acquire a feel for the data by checking the central tendency and
the dispersion. The mean, the range, the standard deviation, and the
variance in the data will give the researcher a good idea of how the
respondents have reacted to the items in the questionnaire and how
good the items and measures are. If the response to each individual item
in a scale does not have a good spread (range) and shows very little
variability, then the researcher would suspect that the particular
question was probably not properly worded and respondents did not
quite understand the intent of the question. Biases, if any, could also be
detected if the respondents have tended to respond similarly to all
items—that is, stuck to only certain points on the scale. The maximum
and minimum scores, mean, standard deviation, variance, and other
statistics can be easily obtained, and these will indicate whether the
responses range satisfactorily over the scale. Remember that if there is
no variability in the data, then no variance can be explained!
Researchers go to great lengths obtaining the central tendency, the
range, the dispersion, and other statistics for every single item
measuring the dependent and independent variables, especially when
the measures for a concept are newly developed.
8
A frequency distribution of the nominal variables of interest should be
obtained. Visual displays thereof through histograms/bar charts, and so
on, can also be pro- vided through programs that generate charts. In
addition to the frequency distributions and the means and standard
deviations, it is good to know how the dependent and independent
variables in the study are related to each other. For this purpose, an
intercorrelation matrix of these variables should also be obtained.
It is always prudent to obtain (1) the frequency distributions for the
demographic variables, (2) the mean, standard deviation, range, and
variance on the other dependent and independent variables, and (3) an
intercorrelation matrix of the variables, irrespective of whether or not
the hypotheses are directly related to these analyses. These statistics
give a feel for the data. In other words, examination of the measure of
central tendency, and how clustered or dispersed the variables are,
gives a good idea of how well the questions were framed for tap- ping
the concept. The correlation matrix will give an indication of how
closely related or unrelated the variables under investigation are. If the
correlation between two variables happens to be high—say, over .75—
we would start to wonder whether they are really two different
concepts, or whether they are measuring the same concept. If two
variables that are theoretically stated to be related do not seem to be
significanly correlated to each other in our sample, we would begin to
wonder if we have measured the concepts validity and reliably. Recall
our discussions on convergent and discriminant validity in Chapter 10.
Establishing the goodness of data lends credibility to all subsequent
analyses and findings. Hence, getting a feel for the data becomes the
necessary first step in all data analysis. Based on this initial feel, further
detailed analyses may be done to test the goodness of the data.
Testing Goodness of Data

The reliability and validity of the measures can now be tested.
Reliability
As discussed in Chapter 9, the reliability of a measure is established by
testing for both consistency and stability. Consistency indicates how
well the items measuring a concept hang together as a set. Cronbach‟s
alpha is a reliability coefficient that indicates how well the items in a set
are positively correlated to one another. Cronbach‘s alpha is computed
in terms of the average intercorrelations among the items measuring
9
the concept. The closer Cronbach‘s alpha is to 1, the higher the internal
consistency reliability.
Another measure of consistency reliability used in specific situations is
the split-half reliability coefficient. Since this reflects the correlations
between two halves of a set of items, the coefficients obtained will vary
depending on how the scale is split. Sometimes split-half reliability is
obtained to test for consistency when more than one scale, dimension,
or factor, is assessed. The items across each of the dimensions or factors
are split, based on some predetermined logic (Campbell, 1976). In
almost every case, Cronbach‘s alpha is an adequate test of internal
consistency reliability. You will see later in this chapter how Cronbach‘s
alpha is obtained through computer analysis.
As discussed in Chapter 9, the stability of a measure can be assessed
through parallel form reliability and test–retest reliability. When a
high correlation between two similar forms of a measure (see Chapter
9) is obtained, parallel form reliability is established. Test–retest
reliability can be established by computing the correlation between the
same tests administered at two different time periods.
Validity
Factorial validity can be established by submitting the data for factor
analysis. The results of factor analysis (a multivariate technique) will
confirm whether or not the theorized dimensions emerge. Recall from
Chapter 8 that measures are developed by first delineating the
dimensions so as to operationalize the concept. Factor analysis would
reveal whether the dimensions are indeed tapped by the items in the
measure, as theorized. Criterion-related validity can be established by
testing for the power of the measure to differentiate individuals who are
known to be different (refer to discussions regarding concurrent and
predictive validity in Chapter 9). Convergent validity can be
established when there is high degree of correlation between two
different sources responding to the same measure (e.g., both
supervisors and subordinates respond similarly to a perceived reward
system measure administered to them). Discriminant validity can be
established when two distinctly different concepts are not correlated to
each other (as, for example, courage and honesty; leadership and
motivation; attitudes and behavior). Convergent and discriminant
10
validity can be established through the multitrait multi- method matrix,
a full discussion of which is beyond the scope of this book. The student
interested in knowing more about factor analysis and the multitrait
multi- method matrix can refer to books on those subjects. When well-
validated measures are used, there is no need, of course, to establish
their validity again for each study. The reliability of the items can,
however, be tested.
Hypothesis Testing
Once the data are ready for analysis, (i.e., out-of-range/missing
responses, etc., are cleaned up, and the goodness of the measures is
established), the researcher is ready to test the hypotheses already
developed for the study. In the Module at the end of the book, the
statistical tests that would be appropriate for different hypotheses and
for data obtained on different scales are discussed. We will now
examine the results of analyses of data obtained from a company, and
how they are interpreted .

Data analysis and interpretation of results may be most meaningfully
explained by referring to a business research project. After a very brief
description of the back ground of the company in which the research
was carried out and the sample, we will discuss the analysis done to
obtain a feel for the data, establish reliability, and test each hypothesis.
We will also discuss how the results are interpreted.
RESEARCH DONE IN EXCELSIOR ENTERPRISES
Excelsior Enterprises is a medium-sized company, manufacturing and
selling instruments and supplies needed by the health care industry,
including blood pressure instruments, surgical instruments, dental
accessories, and so on. The company, with a total of 360 employees
working three shifts, is doing reason- ably well but could do far better if
it did not experience employee turnover at almost all levels and in all
the departments. The president of the company called in a research
team to study the situation and to make recommendations on the
turnover problem.
Since access to those who had left the company would be difficult, the
research team suggested to the president that they would talk to the
current employees, and based on their inputs and a literature survey,
11
try to get at the factors influencing employees‘ intentions to stay with, or
leave, the company. Since past research has shown that intention to
leave (ITL) is an excellent predictor of actual turnover, the president
concurred.
The team first conducted an unstructured interview with about 50
employees at various levels and from different departments. Their
broad statement was: ―We are here to find out how you experience
your work life. Tell us whatever you consider is important for you in
your job, as issues relate to your work, the environment, the
organization, supervision, and whatever else you think is relevant. If we
get a good handle on the issues involved, we may be able to make
appropriate recommendations to management to enhance the quality of
your work life. We would just like to talk to you now, and administer a
questionnaire later.‖
Each interview typically lasted about 45 minutes, and notes on the
responses were written down by the team members. When the
responses were tabulated, it became clear that the issues most
frequently brought up by the respondents in one form or another,
related to three main areas: the job (employees said the jobs were dull
or too complex; there was lack of freedom to do the job as one wanted
to, etc.), perceived inequities (remarks such as ―other companies pay
more for the kind of jobs we do‖; ―compared to the work we do, we are
not adequately paid‖; etc.); and burnout (comments such as ―there is so
much work to be done that by the end of the day we are physically and
emotionally exhausted‖; ―we feel the frequent need to take time off
because of exhaustion‖; etc.).
A literature survey confirmed that these variables were good predictors
of intention to leave and subsequent turnover. In addition, job
satisfaction was also found to be a useful predictor. A theoretical
framework was developed based on the interviews and the literature
survey, and five hypotheses (stated later) were developed.
Next, a questionnaire was designed incorporating well-validated and
reliable measures for the four independent variables of job
characteristics, perceived inequity, burnout, and job satisfaction, and
the dependent variable of intention to leave. Demographic variables
such as age, education, gender, tenure, job title, department, and work
shift were also included in the questionnaire. The questionnaire was
administered personally to 174 employees who were chosen on a
disproportionate stratified random sampling basis. The responses were
12
entered into the computer. Thereafter, the data were submitted for
analysis to test the following hypotheses, which were formulated by the
researchers:
1. Men will perceive less equity than women (or women will perceive
more equity than men).
2. The job satisfaction of individuals will vary depending on the shift
they work.
3. Employees‘ intentions on leave (ITL) will vary according to their job
title. In other words, there will be significant differences in the ITL of
top managers, middle level managers, supervisors, and the clerical and
blue-collar employees.
4. There will be a relationship between the shifts that people work (first,
second, and third shift) and the part-time versus full-time status of
employees. In other words, these two factors will not be independent.
5. The four independent variables of job characteristics, distributive
justice, burnout, and job satisfaction will significantly explain the
variance in intention to leave.
In may be pertinent to point out here that the five hypotheses derived
from the theoretical framework are particularly relevant for finding
answers to the turnover issue in direct and indirect ways. For example,
if men perceived more inequity (as could be conjectured from the
interview data), it would be important to set right their
(mis)perceptions so that they are less inclined to leave (if indeed a
positive correlation between perceived inequities and ITL is found). If
work shift has an influence on job satisfaction (irrespective of its
influence on ITL), the matter will have to be further examined since job
satisfaction is also an important outcome variable for the organization.
If employees at particular levels have greater intentions of leaving,
further information has to be gathered as to what can be done for these
groups. If there is a pattern to the part-time/full-time employees
working for particular shifts, this might offer some suggestions for
further investigation, such as: ―Do part-time employees in the night
shift have some special needs that are not addressed currently?‖ The
results of testing the last hypothesis will certainly offer insights into
how much of the variance in ITL will be explained by the four
independent variables, and what corrective action, if any, needs to be
taken.
The researcher submitted the data for computer analysis using the SPSS
Version
13
11.0 for Windows software program. We will now proceed to discuss
the results of these analyses and their interpretation. In particular, we
will examine the following:
1. The establishment of Cronbach‘s alpha for the measures.
2. The frequency distribution of the variables.
3. Descriptive statistics such as the mean and standard deviation.
4. The Pearson correlation matrix.
5. The results of hypotheses testing.
Some Preliminary Steps

It is useful to know that the SPSS Student Version 11.0 for Windows
comes with an online tutorial, which can be very helpful. To have some
idea of how the Main Menu in SPSS Version 11.0 is set up, the main bar
lists several items, two of which are used frequently during data
analysis—the TRANSFORM and the STATISTICS menus. The Transform
menu makes changes to selected variables and computes new variables,
and the Statistics menu is used to perform any selected statistical
procedure. By clicking on Recode in the pull-down menu from
Transform, new values can be assigned to a variable, and by clicking on
Compute and doing what is indicated in the same menu, a new variable
can be computed. Missing values can be assigned a number by clicking
on Data in the menu bar, then clicking on the Define variable dialog box,
and thereafter on Missing Values, and following through. Once these
preliminaries are taken care of, the reliability of measures can be
checked.
Checking the Reliability of Measures: Cronbach‟s Alpha
The interitem consistency reliability or the Cronbach‘s alpha reliability
coefficients of the five independent and dependent variables were
obtained. They were all above .80. A sample of the result obtained for
Cronbach‘s alpha test for the dependent variable, Intention to Leave,
together with instructions on how it is obtained is, shown in Output
12.1.
The result indicates that the Cronbach‘s alpha for the six-item Intention
to Leave measure is .82. The closer the reliability coefficient gets to 1.0,
the better. In general, reliabilities less than .60 are considered to be
poor, those in the .70 range, acceptable, and those over .80 good.
Cronbach‘s alpha for the other four independent variables ranged from .
81 to .85. Thus, the internal consistency reliability of the measures
used in this study can be considered to be good.
14
It is important to note that all the negatively worded items in the
questionnaire should first be reversed before the items are submitted
for reliability tests. Unless all the items measuring a variable are in the
same direction, the reliabilities obtained will be incorrect.
Output 12.1
Reliability Analysis
1. From the menus, choose: Analyze Scale
Reliability Analysis…
2. Select the variables constituting the scale.
3. Choose Model Alpha.
Reliability Output
Reliability Coefficients 6 items
Alpha = .8172 Standardized item alpha = .8168
Obtaining Descriptive Statistics: Frequency Distributions

Frequency distributions were obtained for all the personal data or
classification variables. The frequencies for the number of individuals in
the various departments for this sample are shown in Output 12.2. It
may be seen therefrom that the greatest number of individuals in the
sample came from the Production Department (28.1%), followed by the
Sales Department (25.3%). Only three individuals (1.7%) came from
Public Relations, and five individuals each from the Finance,
Maintenance, and Accounting Departments (2.9% from each). The low
numbers in the sample in some of the departments are a function of the
total population (very few members) in those departments.
From the frequencies obtained for the other variables (results not
shown here) it was found that 86% of the respondents are men and
14% women; about 68% worked the first shift, 19% the second shift,
and 13% the third shift. Sixteen percent of the respondents worked part
time and 84% full time. About 8% had elementary school education,
28% a high school diploma, 23% a bachelor‘s degree, 30% a mas- ter‘s
degree, and 11% had doctoral degrees. About 21% of the respondents
had
Output 12.2
Frequencies
From the menus, choose: Analyze
15
Descriptive Statistics
Frequencies…
(Select the relevant variables) Choose needed:
Statistics… Charts…
Format (for the order in which the results are to be displayed)
worked for the organization for less than a year, 20% 1 to 3 years, 20%
4 to 6 years,
the balance 39% over 6 years, including 8% who had worked for over
20 years.
We thus have a profile of the employees in this organization, which is
useful to describe the sample in the Methods Section of the Written
Report (see next chapter). The frequencies can also be visually
displayed as bar charts, histograms, or pie charts by clicking on
Statistics in the menu, then Summa- rize, then Frequencies, and Charts in
the Frequencies dialog box and then selecting the needed chart.
Descriptive Statistics: Measures of Central Tendencies and

Dispersion
Descriptive statistics such as maximum, minimum, means, standard
deviations, and variance were obtained for the interval-scaled
independent and dependent variables. The results are shown in
Computer Output 12.3.
It may be mentioned that all variables excepting ITL were tapped on a 5-
point scale. ITL was measured on a 4-point scale. From the results, it
may be seen that the mean on perceived equity (termed distributive
justice) is rather low (2.38 on a 5-point scale), as was the mean on
experienced burnout (2.67). Job satisfaction is about average (3.12 on a
5-point scale), and the job is perceived as somewhat enriched (3.47).
The mean of 2.21 on a 4-point scale for ITL indicates that most of the
respondents are neither bent on leaving nor staying. The minimum of 1
indicates that there are some who do not intend to leave at all, and the
maximum of 4 indicates that some are seriously considering leaving.
16
Output 12.3
Descriptive Statistics: Central Tendencies and Dispersions
Descriptives…
(Select the variables)
Options…
(Choose the relevant statistics needed)
The variance for burnout, job satisfaction, and the job characteristics is
not high. The variance for ITL and perceived equity (distributive justice)
is only slightly more, indicating that most respondents are very close to
the mean on all the variables.
In sum, the perceived equity is rather low, not much burnout
experienced, the job is perceived to be fairly enriched, there is average
job satisfaction, and there is neither a strong intention to stay with the
organization nor to leave it.
Inferential Statistics: Pearson Correlation
The Pearson correlation matrix obtained for the five interval-scaled
variables is shown in Output 12.4. From the results, we see that the
intention to leave is, as would be expected, significantly, negatively
correlated to perceived distributive justice (equity), job satisfaction, and
enriched job. That is, the intention to leave is low if equitable treatment
and job satisfaction are experienced, and the job is enriched. However,
when individuals experience burnout (physical and emotional
exhaustion), their intention to leave also increases (positive correlation
of
.33). Job satisfaction is also positively correlated to perceived equity,
and enriched job. It is negatively correlated to burnout and ITL. The
correlations are all in the expected direction.
The Pearson correlation coefficient is appropriate for interval- and
ratio-scaled variables, and the Spearman Rank or the Kendall‘s Tau
coefficients are appropriate when variables are measured on an ordinal
scale. Any bivariate correlation can be obtained by clicking the relevant
menu, identifying the variables, and seeking the appropriate parametric
or nonparametric statistics.
17
It is important to note that no correlation exceeded .59 for this sample.
If correlations were higher (say, .75 and above), we might have had to
suspect whether or not the correlated variables are two different and
distinct variables and would have doubted the validity of the measures.
Hypothesis Testing
Five hypotheses were generated for this study as stated earlier. These
call for the use of a t-test (for hypothesis 1), an ANOVA (for hypotheses
2 and 3), a chi- square test (for hypothesis 4), and a multiple regression
analysis (for hypothesis
5). The results of these tests and their interpretation are discussed
below.
Hypothesis 1: Use of t-Test. Hypothesis 1 can be stated in the null and

alternate as follows:
H10: There will be no difference between men and women in their
perceived inequities.
Statistically expressed: H10 is: μW = μM
where μW is the equity perceived by women and μM the equity
perceived by men.
Output 12.4
Pearson Correlations Matrix
Correlate
Bivariate…
(Select relevant variables) Option…
Select:
a. Type of correlation coefficient: select relevant one (e.g. Pearson,
Kendall’s
Tau, Spearman)
b. Test of significance—two tailed, one-tailed.
H1A: Women will perceive more equity than men (or men will perceive
less equity than women).
Statistically expressed: H1A is: μW > μM
A t-test will indicate if the perceived differences are significantly
different for women than for men. The results of the t-test done are
shown in Output 12.5. As may be seen, the difference in the means of
18
2.43 and 2.34 with standard deviations of .75 and .76 for the women
and men on perceived equity (or distributive justice) is not significant
(see table showing t-test for Equality of Means). Thus, hypothesis 1 is
not substantiated.
Output 12.5
t-Test for Differences between Two Groups
(Independent Samples Test) Choose:
Analyze
Compare Means
Independent-Samples t Test…
Select a. single grouping variable and click Define groups to specify the
two codes to be compared.
Options…
(Specify Confidence level required – .05, .01, etc.)
Hypothesis 2: Use of ANOVA. The second hypothesis can be stated in

the null and alternate as follows:
H20: The job satisfaction of individuals will be the same irrespective of
the shift they work (1, 2, or 3).
Statistically expressed, H20 is: μ1 = μ2 = μ3
where μ1, μ2 and μ3 signify the means on the job satisfaction of
employees working in shifts 1, 2, and 3, respectively.
H2A: The job satisfaction of individuals will not be the same (will vary)
depending on which shift they work.
Statistically expressed, H2A is: μ1 ≠ μ2 ≠ μ3
Output 12.6
ANOVA
Choose: Analyze
Compare Means
One-Way ANOVA…
(Select the dependent variable/s and one independent factor variable)
Since there are more than two groups (three different shifts) and job
satisfaction is measured on an interval scale, ANOVA is appropriate to
19
test this hypothesis. The results of ANOVA, testing this hypothesis, are
shown in Output 12.6.
The df in the third column refers to the degrees of freedom, and each
source
of variation has associated degrees of freedom. For the between-groups
variance, df = (K – 1), where K is the total number of groups or levels.
Because there were three shifts, we have (3 – 1) = 2 df. The df for the
within-groups sum of squares equals (N – K), where N is the total
number of respondents and K is the total number of groups. If there
were no missing responses, (N – K) should be (174 –
3) = 171. However, in this case, there were 12 missing responses, and
hence the associated df is (162 – 3) = 159.
The mean square for each source of variation (column 5 of the results)
is derived by dividing the sum of squares by its associated df. Finally, the
F value itself equals the explained mean square divided by the residual
mean square.
In this case, F = 3.327 (.831/.249). This F value is significant at the .04

level. This implies that hypothesis 2 is substantiated. That is, there are
significant differences in the mean satisfaction levels of workers in the
three shifts, and the null hypothesis can be rejected.
The F test used here is called the overall or omnibus F test. To
determine
among which groups the true differences lie, other tests need to be
done, as discussed in Chapter 9. The Duncan Multiple Range Test
was performed for the purpose (Output not shown). The results showed
that the mean job satisfaction
for the three groups was 3.15 for the first shift, 2.91 for the second shift,
and 3.23 for the third shift. The second shift with the low job
satisfaction is the one that is significantly different from groups 1 and 3
at the p ≤ .05 level.
Hypothesis 3: Use of ANOVA. Hypothesis 3 can be stated in the null
and the alternate as follows:
H30: There will be no difference in the intention to leave of employees
at the five different job levels.
Statistically expressed, H30 is: μ1 = μ2 = μ3 = μ4 = μ5
20
where the five μ‘s represent the five means on ITL of employees at the
five different job levels.
H3A: The ITL of members at the five different job levels will not be the
same
Statistically expressed, H3A is: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5
The results of this ANOVA test shown in Output 12.7 do not indicate any
significant differences in the intention to leave among the five groups
(F = 1.25; p =
.29). Thus, hypothesis 3 was not substantiated.
Hypothesis 4: Use of Chi-Square Test. Hypothesis 4 can be stated in
the null and alternate as follows:
H4o: Shifts worked and employment status (part-time vs. full-time) will
be independent (i.e., will not be related).
H4A: There will be a relationship between the shifts that people work
and their part-time vs. full-time status.
Since both variables are nominal, a chi-square (χ2) test was done, the
results of which are shown in Output
12.8. The cross-tabulation count indicates that, of
Output 12.7
ANOVA with ITL as the Dependent Variable
One-way ANOVA Output
Output 12.8
Chi-square Test
Choose: Analyze
Crosstabs…
(Enter variables in the Rows and Columns boxes)
Statistics…
Select Chi-square
the full-time employees, 103 work the first shift, 25 work the second
shift, and
21
18 the third shift. Of the part-time employees, 16 work the first shift, 8
the second shift, and 4 the third shift.
It may be seen that the χ2 value of 2.31, with two degrees of freedom, is
not
significant. In other words, the part-time/full-time status and the shifts
worked are not related. Hence hypothesis 4 has not been
substantiated.
Hypothesis 5: Use of Multiple Regression Analysis. The last
hypothesis can be stated in the null and alternate as follows:
H50: The four independent variables will not significantly explain the
H5A: The four independent variables will significantly explain the
To test this hypothesis, multiple regression analysis was done. The

results of regressing the four independent variables against Intention to
Leave can be seen in Output 12.9.
The first table in the Output lists the four independent variables that are
entered into the regression model and R (.548) is the correlation of the
four independent variables with the dependent variable, after all the
intercorrelations among the four independent variables are taken into
account.
In the Model Summary table, The R Square (.30), which is the explained
variance, is actually the square of the multiple R (.548)2 The ANOVA
table shows that the F value of 16.72 is significant at the .0001 level. In
the df (degree of freedom)
in the same table, the first number represents the number of
independent variables (4), the second number (156) is the total number
of complete responses for all the variables in the equation (N), minus
the number of independent variables (K) minus 1. (N – K – 1) [(161 – 4 –
1) = 156]. The F statistic produced (F
= 16.72) is significant at the .0001 level.
What the results mean is that 30% of the variance (R-square) in
Intention to
Leave has been significantly explained by the four independent
variables. Thus,
hypothesis 5 is substantiated.
The next table titled Coefficients helps us to see which among the four
independent variables influences most the variance in ITL (i.e., is the
22
most important). If we look at the column Beta under Standardized
Coefficients, we see that the highest number in the beta is –.37 for job
satisfaction, which is significant at the .0001 level. It may also be seen
that this is the only independent variable that is significant. The
negative beta weight indicates that if ITL is to be reduced, it is necessary
to enhance the job satisfaction of employees.
Overall Interpretation and Recommendations to the President
Of the five hypothesis tested, two were substantiated and three were
not. From the results of the multiple regression analysis, it is clear that
job satisfaction is the most influential factor in explaining employees‘
intentions to stay with the organization. Whatever is done to increase
job satisfaction will therefore help employees to think less about leaving
and induce them to stay.
It is also clear from the results that ITL does not differ with job level.
That is, employees at all levels feel neither too strongly inclined to stay
with the organization nor to leave it. Hence, if retention of employees
is a top priority for the president, it is important to pay attention to
employees at all levels and formu- late policies and practices that help
enhance the job satisfaction of all of them. Also, since job satisfaction is
found to be significantly lower for employees working the evening
shift, further interviews with them might shed some light on the factors
that make them dissatisfied. Corrective action can then be taken.
It is informative to find that the perceived equity, though not
significantly different for men and women as originally hypothesized, is
neverthless rather low for all (see Output 12.3). The Pearson correlation
matrix (Output 12.4) indicates that perceived equity (or distributive
justice) is positively correlated.
Output 12.9
Multiple Regression Analysis
Choose: Analyze
Regression
Linear…
(Enter dependent and independent variables)
23
job satisfaction and negatively correlated to ITL. The president will
there-
be well advised to rectify inequities, in the system, if they do really exist,
clear misperceptions of inequities, if this were to be actually the case.
Increasing job satisfaction will no doubt help to reduce employees‘
intention quit, but the fact that only 30% of the variance in Intention to
Leave was sig-
explained by the four independent variables considered in this study
leaves 70% unexplained. In other words, there are other additional
variables are important in explaining ITL that have not been considered
in this study. further research might be necessary to explain more of the
variance in ITL, if president desires to pursue the matter further.
We have now seen how different hypotheses can be tested by applying
the statistical tests in data analysis. Based on the interpretation of the
the research report is then written, making necessary recommendations
discussing the pros and cons of each, together with cost/benefit
analysis. to the study are also specifically stated so that the reader is
made
of the biases that might have crept into the study. This also gives a pro-
touch to the study, attesting to its scientific orientation.
USE OF EXPERT SYSTEMS IN CHOOSING THE APPROPRIATE

STATISTICAL TESTS
As we know, the Expert System employs unique programming
techniques to model the decisions that experts make. A considerable
body of knowledge fed into the system and some good software and
hardware help the individual using it to make sound decisions about the
problem that he or she is concerned about solving. In sum, an Expert
System can be thought of as an ―advisor,‖ clarifying or resolving
problematic issues that are confusing to the individual.
Expert Systems relating to data analysis help the preplexed researcher
to choose the most appropriate statistical procedure for testing
different types of hypothesis. The Statistical Navigator is an Expert
System that recommends one or more statistical procedures after
seeking information on the goals (i.e., the purpose of the analysis—say,
to understand the relationship between two variables), and the data
(i.e., categories, scales).
24
The Statistical Navigator is a useful guide for those who are not well
versed
in statistics but want to ensure that they use the appropriate statistical
techniques.
Incidentally, Expert Systems can also be used for making decisions with
respect to various aspects of the research design—nature of study,
time hori- zon, type of study, study setting, unit of analysis,
sampling designs, data collection methods, and the like.
Other applications of Expert Systems for business decisions using
available data include Auditor (for decisions on allowing for bad debts),
and Tax Advisor (this helps audit firms to advise clients on estate
planning). As suggested by Luconi, Malone, and Morton (1986), Expert
Systems can be used for making decisions with respect to operational
control (accounts receivable, inventory control, cash management,
production scheduling), management control (budget analysis,
forecasting, variance analysis, budget preparation), and strategic
planning (warehouse and factory location, mergers and acquisitions,
new product planning). Thus, there is infinite scope for developing and
using expert systems to aid managerial problem solving and decision
making.
DISCUSSION QUESTIONS AND POINTS TO PONDER

1. What kinds of biases do you think could be minimized or avoided
during the data analysis stage of research?
2. When we collect data on the effects of treatment in experimental
designs, which statistical test would be most appropriate to test the
treatment effects?
3. A tax consultant wonders whether he should be more selective about
the class of clients he serves so as to maximize his income. He usually
deals with four categories of clients: the very rich, rich, upper middle
class, and middle class. He has records of each and every client served,
the taxes paid by them, and how much he has charged them. Since many
particulars in respect of the clients vary (number of dependents,
business deductibles, etc.), irrespective of the category they belong to,
he would like an appropriate analysis to be done to see which among
25
the four categories of clien- tele he should choose to continue to serve in
the future.
What kind of analysis should be done in the above case and why?
4. Below are Tables 12A to 12D, summarizing the results of data
analyses of research conducted in a sales organization that operates in
50 different cities of the country, and employs a total sales force of
about 500. The number of salesman sampled for the study was 150.
You are to:
a. Interpret the information contained in each of the tables in as much
detail as possible. b. Summarize the results for the CEO of the company.
c. Make recommendations based on your interpretation of the results.
26

Research Methods No.2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Methods No.2

Uploaded by

Copyright:

Available Formats

Research Methodology in Business

Prof. Dr. Mohammed Al-Taee

DATA ANALYSIS AND INTERPRETATION

GETTING DATA READY FOR ANALYSIS

Handling Blank Responses

Coding the Serakan Co. Data

The responses to the demographic variables can be coded from 1 to 5

Feel for the Data

Testing Goodness of Data

DATA ANALYSIS AND INTERPRETATION

Some Preliminary Steps

Obtaining Descriptive Statistics: Frequency Distributions

Descriptive Statistics: Measures of Central Tendencies and

Hypothesis 1: Use of t-Test. Hypothesis 1 can be stated in the null and

Hypothesis 2: Use of ANOVA. The second hypothesis can be stated in

In this case, F = 3.327 (.831/.249). This F value is significant at the .04

To test this hypothesis, multiple regression analysis was done. The

USE OF EXPERT SYSTEMS IN CHOOSING THE APPROPRIATE

DISCUSSION QUESTIONS AND POINTS TO PONDER

You might also like