You are on page 1of 36

KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES

BA4205-BUSINESS RESEARCH METHODS


UNIT- 4 DATA PREPARATION AND ANALYSIS
Data Preparation – editing – Coding –Data entry – Validity of data – Qualitative Vs Quantitative data
analyses – Applications of Bivariate and Multivariate statistical techniques – Factor analysis –
Discriminate analysis – cluster analysis – multiple regression and correlation – multidimensional
scaling – Conjoint Analysis - Application of statistical software for data analysis.

TABLE OF CONTENTS

4.1 DATA PREPARATION ...................................................................................................... 2

4.2 QUALITATIVE AND QUANTITATIVE DATA ANALYSIS ......................................... 16

4.3 BIVARIATE CORRELATION ANALYSIS ..................................................................... 18

4.4 MULIVARIATE ANALYSIS ............................................................................................ 20

4.5 APPLICATION OF STATISTICAL SOFTWARE ............................................................. 32

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 1 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
UNIT IV
4.1 DATA PREPARATION
Data preparation includes editing, coding, and data entry and is the activity that ensures the accuracy
of the data and their conversion from raw form to reduced and classified forms that are more
appropriate for analysis. Preparing a descriptive statistical summary is another preliminary step leading
to an understanding of the collected data. It is during this step that data entry errors may be revealed
and corrected.
The data collected from the respondents is generally not in the form to be analyzed directly. After the
responses are recorded or received, the next stage is that of preparation of data
i.e. to make the data amenable for appropriate analysis. This process encompasses the following
stages.
The data, after collection, has to be processed and analyzed in accordance with the outline laid down
for the purpose at the time of research plan. This is essential for scientific study. Data preparation
includes the following.

Data
Preparation

Validation of
Editing Coding Data entry Classification Tabulation
data

EDITING
The customary first step in analysis is to edit the raw data. Editing detects errors and omissions,
corrects them when possible, and certifies that maximum data quality standards are achieved. The
editor's purpose is to guarantee that data are:
 Accurate.
 Consistent with the intent of the question and other information in the survey.
 Uniformly entered.
 Complete.
 Arranged to simplify coding and tabulation.

Editing is the process of examining errors as well as omissions in the collected data and making
necessary corrections in the same. An editor might be the researcher itself or any other representative
of the research team. The following examples indicate how editing can be helpful.

Field Editing
In large projects, field editing review is a responsibility of the field supervisor. It, should be done
soon after the data have been gathered. During the stress of data collection in a personal interview
and paper-and-pencil recording in an observation, the researcher often uses ad hoc abbreviations
special symbols. Soon after the interview, experiment, or observation, the investigator should
review the reporting forms. It is difficult to complete what was abbreviated or written in shorthand
or note illegibly if the entry is not caught that day. When entry gaps are present from interviews, a

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 2 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
call back should be made rather than guessing what the respondent "probably would have said."
Self-interviewing has no place in quality research.
Field editing consists in the review of the reporting forms by the investigator for completing
(translating or rewriting) what the latter has written in abbreviated and/or in illegible form at the
time of recording the respondents‟ responses. This type of editing is necessary in view of the fact
that individual writing styles often can be difficult for others to decipher. This sort of editing should
be done as soon as possible after the interview, preferably on the very day or on the next day. While
doing field editing, the investigator must restrain himself and must not correct errors of omission by
simply guessing what the informant would have said if the question had been asked.

Central Editing
Central editing should take place when all forms or schedules have been completed and returned to
the office. This type of editing implies that all forms should get a thorough editing by a single editor
in a small study and by a team of editors in case of a large inquiry. Editor(s) may correct the obvious
errors such as an entry in the wrong place, entry recorded in months when it should have been
recorded in weeks, and the like. In case of inappropriate on missing replies, the editor can
sometimes determine the proper answer by reviewing the other information in the schedule. At
times, the respondent can be contacted for clarification. The editor must strike out the answer if the
same is inappropriate and he has no basis for determining the correct answer or the response. In
such a case an editing entry of „no answer‟ is called for. All the wrong replies, which are quite
obvious, must be dropped from the final results, especially in the context of mail surveys.

Another problem that editing can detect concerns faking an interview that never took place. This
armchair interviewing" is difficult to spot, but the editor is in the best position to do so. To uncover
this, the editor must analyze as a set the instruments used by each interviewer. Here are some useful
rules to guide editors in their work:

 Be familiar with instructions given to interviewers and coders.


 Do not destroy, erase, or make illegible the original entry by the interviewer; original entries
should remain legible.
 Make all editing entries on an instrument in some distinctive color and in a standardized
form.
 Initial all answers changed or supplied.
 Place initials and date of editing on each instrument completed.

Editors must keep in view several points while performing their work:
1. They should be familiar with instructions given to the interviewers and coders as well as
with the editing instructions supplied to them for the purpose.
2. While crossing out an original entry for one reason or another, they should just draw a single
line on it so that the same may remain legible.
3. They must make entries (if any) on the form in some distinctive colour and that too in a
standardized form.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 3 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
4. They should initial all answers which they change or supply.
5. Editor’s initials and the date of editing should be placed on each completed form or schedule.

Editing Data
After getting the raw data, the first job is to edit the data in such a way that the data are presentable,
readable, and accurate. The basic purpose of editing is to impose some minimum quality standard on
the raw data. Editing involves the inspection and correction of each questionnaire or observation
form. Data editing is often done in two stages-field edit and central office edit .The field edit is a
preliminary edit designed to detect the obvious omissions and inaccuracies in the data. Some of the
items checked in this editing are completeness, legibility, comprehensibility, consistency, and
uniformity of the data.

The central office edit, which follows the field edit, involves more thorough and rigorous scrutiny
and correction of completed returns. Most of the time, the questionnaires are pre-coded and are
entered recently into the computer with little or no editing. Although this approach is economical, it
often produces less accurate data. While entering the data into the computer, it is very important,
decide what to do with unclear or wrong responses, missing data, or inconsistent response. In
addition, after the data are entered into the computer, computer editing should be conducted.

The editor may also try to discern disinterest on the part of the respondent by careful scrutiny of the
markings on the questionnaire. This means, data editing involves taking care of missing data and
ambiguous answers, checking of accuracy and quality of the data, and, finally, computer editing.

Missing data
It is very common for a questionnaire to be returned with one or more specific questions
unanswered. This constitutes a 'no response'. It is very important to take a decision about what to do
about such missing data. Often it is advisable to use the data as it is. That is, unanswered questions
could be assigned a 'missing' code, and entered into the computer When multivariate analyses are
being conducted, it is generally necessary to completely exc1ude any respondent with missing data
on any variable in the analysis.

Ambiguous answers
Many questionnaires contain one or more responses whose meaning is not clear. Under these
circumstances, it is necessary to decide whether to 'guess' which answer is correct based on other
responses in the questionnaire, or to discard the entire questionnaire, or to treat both answers as
missing data, or to re-contact the respondent, or take other relevant action. Other kinds of
ambiguities that may occur are illegible responses and marks between response categories.

Accuracy/Quality
As the analyst reviews a number of questionnaires, he should note suspect responses. Respondents will
sometimes rush through questionnaires in an almost random manner. This tends to produce a number
of inconsistent responses. Questionnaires containing such inconsistencies should be examined

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 4 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
carefully end deleted from the database if it appears that the respondents were haphazard in completing
them. While editing, it is important to be alert for inconsistencies between responses obtained by
different interviewers and individual questions that are frequently left unanswered or that produce
ambiguous responses.

Computer editing
The computer can instructed to examine each set of coded responses for values that lie outside the
permissible range, or for conflicting responses to similar questions. It can also be used to run checks,
and to detect variations in responses between interviewers.

Cross tabulation
This is a method of tabulating relationships generally among two variables in which joint occurrences
of the variables will be reflected as the cells in the table.
Cross tabulation is a most important technique for studying the relationships among variables. It
involves constructing a table so that one can see how respondents with a given value on one variable
responded to one or more other variables, Constructing a two-way cross tabulation involves the
following steps: .

1. On the horizontal axis, list the value or name for each category of the first variable.
2. On the vertical axis, list the value or name for each category of the second variable.
3. For each respondent, locate the category on the horizontal axis that corresponds to his or her
response.
4. Then find the value on the vertical axis that corresponds to his or her response on the second
variable.
5. Record a 1 in the cell where the two values intersect.
6. Count the 1 in each cell.

CODING
Coding involves assigning numbers or other symbols to answers so that the responses can be grouped
into a limited number of categories. In coding, categories are the partitions of a data set of a given
variable (e.g., if the variable is gender, the partitions are male and female).
Categorization is the process of using rules to partition a body of data. Both closed- and open-
response questions must be coded.

Numeric coding simplifies the researcher's task in converting a nominal variable, like gender, to a
"dummy variable". Statistical software also can use alphanumeric codes, as when we use M and F, or
other letters, in combination with numbers and symbols for gender.

Coding refers to the process of assigning numerals or other symbols to answers so that responses can
be put into a limited number of categories or classes. Such classes should be appropriate to the
research problem under consideration. They must also possess the characteristic of exhaustiveness
(i.e., there must be a class for every data item) and also that of mutual exclusively which means that a
specific answer can be placed in one and only one cell in a given category set. Another rule to be
observed is that of uni-dimensionality by which is meant that every class is defined in terms of only
one concept.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 5 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Coding is necessary for efficient analysis and through it the several replies may be reduced to a small
number of classes which contain the critical information required for analysis. Coding decisions should
usually be taken at the designing stage of the questionnaire. This makes it possible to pre-code the
questionnaire choices and which in turn is helpful for computer tabulation as one can straight forward
key punch from the original questionnaires. But in case of hand coding some standard method may be
used. One such standard method is to code in the margin with a coloured pencil. The other method can
be to transcribe the data from the questionnaire to a coding sheet. Whatever method is adopted, one
should see that coding errors are altogether eliminated or reduced to the minimum level.
Coding is the process by which verbal data are converted into variables and categories of variables
using through the creation of categories and concepts derived from the data. - Lockyer (2004)

Coding involves two steps

1. Specifying the different categories or classes into which the responses are to be classified and
2. Allocating individual answers to different categories

Guidelines for Coding Unstructured Questions

 Category codes should be mutually exclusive and collectively exhaustive.


 Only a few (10 percent or less) of the responses should fall into the „other‟ category.
 Critical issues should be assigned category codes even if none are mentioned.
 Data should be coded with a view to retain as much detail as possible.

Advantages of coding
 Allows the study to be repeated and validated.
 Makes methods transparent by recording analytical thinking used to devise codes.
 Facilitates comparison of data with other studies.

Codebook Construction
A codebook, or coding scheme, contains each variable in the study and specifies the application of
coding rules to the variable. It is used by the researcher or research staff to promote more accurate and
more efficient data entry. It is also the definitive source for locating the positions of variables in the
data file during analysis. In many statistical programs, the coding scheme is integral to the data file.
Most codebooks--computerized or not--contain the question number, variable name, location of the
variables code on the input medium(e.g., spreadsheet or SPSS data file), descriptors for the response
options and whether the variables is alphabetical or numeric.
Coding Rules

Four rules guide the pre- and post coding and categorization of a data set. The categories within a
single variable should be:
 Appropriate to the research problem and purpose.
 Exhaustive.
 Mutually exclusive.
 Derived from one classification dimension.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 6 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Researchers address these issues when developing or choosing each specific measurement question.
One of the purposes of pilot testing of any measurement instrument is to identify and anticipate
categorization issues.
Appropriateness

Appropriateness is determined at two levels:

1. The best partitioning of the data for testing hypotheses and showing relationships and
2. The availability of comparison data. For example, when actual age is obtained (ratio scale), the
editor may decide to group data by age ranges to simplify pattern discovery within the data.
The number of age groups and breadth of each range, as well as the endpoints in each range,
should be determined by comparison data-for example, U.S. census age ranges, a customer
database that includes age ranges, or the age data available from Fox TV used for making an
advertising media buy.

Exhaustiveness
Researchers often add an "other" option to a measurement question because they know they cannot
anticipate all possible answers. A large number of "other" responses, however, suggest the
measurement scale the researcher designed did not anticipate the full range of information. The editor
must determine if "other" responses appropriately fit into established categories, if new categories
must be added, if "other" data will be ignored, or if some combination of these actions will be taken.

Although the exhaustiveness requirement for a single variable may be obvious, a second aspect is less
apparent. Does one set of categories--often determined before the data are collected-full capture all the
information in the data? For example, responses to an open-ended question about family economic
prospects for the next year may originally be categorized only in terms of being "0 Or "pessimistic." It
may also be enlightening to classify responses in terms of other concepts such as the precise focus of
these expectations (income or jobs) and variations in responses between family and others in the
family.

Mutual Exclusivity
Another important rule when adding categories or realigning categories is that category com should be
mutually exclusive. This standard is met when a specific answer can be placed in one and only one cell
in a category set. For example, in a survey, assume that you asked participants for their occupation.
One editor's categorization scheme might include (l) professional, (2) managerial, (3) sales (4) clerical,
(5) crafts, (6) operatives, and (7) unemployed. As an editor, how would you code a participant‟s for
their answer that specified "salesperson at Gap and full-time student" or maybe "elementary teacher
preparer"?

Single Dimension
The problem of how to handle an occupation entry like "unemployed salesperson" brings up a fourth
category design. The need for a category set to follow a single classificatory principle means every in
the category set is defined in terms of one concept or construct. Returning to the occupation example,
the person in the study might be both a salesperson and unemployed. The "salesperson" label expresses
the concept occupation type; the response "unemployed" is another dimension concerned with current
employment status without regard to the respondent's normal occupation. When a category set

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 7 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
encompasses than one dimension, the editor may choose to split the dimensions and develop an
additional data "occupation" now becomes two variables: "occupation type" and "employment status."

Using Content Analysis for Open Questions


Content analysis measures the semantic content or the what aspect of a message. Its breadth makes it a
flexible and wide-ranging tool that may be used as a stand-alone methodology or as a problem-specific
technique.

Types of content
 Syntactical units can be words, phrases, sentences or paragraphs; words are the smallest and
most reliable data units to analyze.
 Referential units are described by words, pharases, and sentences; they may be objects, events,
persons.
 Propositional units are assertions about an object, event person and so on.
 Thematic units are topics contained within texts

Missing Data
Missing data are information from a participant or case that is not available for one or more variables
of interest. In survey studies, missing data typically occur when participants accidentally skip, refuse
to answer, or do not know the answer to an item on the questionnaire. In longitudinal studies, missing
data may result from participants dropping out of the study, or being absent for one or more data
collection periods. Missing data also occur due to researcher error, corrupted data files, and changes in
the research or instrument design after data were collected from some participants, such as when
variables are dropped or added. The strategy for handling missing data consists of a two-step process
the researcher first explores the pattern of missing data to determine the mechanism for missingness
(the probability that a value is missing rather than observed) and then selects a missing-data technique.
Mechanisms for Dealing with Missing Data

By knowing what caused the data to be missing, the researcher can select the appropriate missing-data
technique and thus avoid introducing bias in subsequent analysis. There are three basic types of
missing data:

 Data missing completely at random (MCAR)-the probability that a particular variable is


missing is NOT dependent on the variable itself and is NOT dependent on another variable in
the data set (e.g., a participant inadvertently skips a question).
 Data missing at random (MAR)-the probability that a particular variable is missing is NOT
 Dependent on the variable itself but is dependent on another variable in the data set (e.g., the
answer to the first question of a branched-question set might cause missing data to the second
question within the branched-question set).
 Data missing but not missing at random (NMAR)-when missing data are not predictable from
other variables in the data set.

VALIDATION OF DATA

After the data is coded, it is validated for data entry errors. The data is then used for further analysis.
The purpose of validating the data is that it has been collected as per the specifications in the
prescribed format or questionnaire.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 8 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
For example, if the respondent is asked to rate a particular aspect on 1 to 7, then the obvious responses
should be 1 or 2 ….., or 7. Any other inputted number is not considered as valid. In validation of the
data, the above data will be restricted to the integers between 1 and 7. This minimizes the errors. The
other validations are age within a number like 100, dates such as birth dates, joining dates, etc should
not be future dates etc.

Incidentally, while editing is done after the receipt of the responses, validation is done after inputting
the responses after coding. The validation is especially used to reduce data entry errors.

CLASSIFICATION

Most research studies result in a large volume of raw data which must be reduced into
homogeneous groups if we are to get meaningful relationships. This fact necessitates
classification of data which happens to be the process of arranging data in groups or classes on the
basis of common characteristics. Data having a common characteristic are placed in one class and in
this way the entire data get divided into a number of groups or classes. Classification can be one of the
following two types, depending upon the nature of the phenomenon involved:

(a) Classification according to attributes: As stated above, data are classified on the basis of common
characteristics which can either be descriptive (such as literacy, sex, honesty, etc.) or numerical (such
as weight, height, income, etc.). Descriptive characteristics refer to qualitative phenomenon which
cannot be measured quantitatively; only their presence or absence in an individual item can be noticed.
Data obtained this way on the basis of certain attributes are known as statistics of attributes
and their classification is said to be classification according to attributes.

(b) Classification according to class-intervals:


Unlike descriptive characteristics, the numerical characteristics refer to quantitative phenomenon
which can be measured through some statistical units. Data relating to income, production, age,
weight, etc. come under this category. Such data are known as statistics of variables and are classified
on the basis of class intervals. For instance, persons whose incomes, say, are within Rs 201 to Rs 400
can form one group, those whose incomes are within Rs 401 to s 600 can form another group and so
on. In this way the entire data may be divided into a number of groups or classes or what are usually
called, „class-intervals.‟ Each group of class-interval, thus, has an upper limit as well as a lower limit
which are known as class limits. The difference between the two class limits is known as class
magnitude. We may have classes with equal class magnitudes or with unequal class magnitudes. The
number of items which fall in a given class is known as the frequency of the given class. All the
classes or groups, with their respective frequencies taken together and put in the form of a table, are
described as group frequency distribution or simply frequency distribution. Classification according to
class intervals usually involves the following three main problems:
1. How may classes should be there? What should be their magnitudes?
2. How to choose class limits?
3. How to determine the frequency of each class?

TABULATION
When a mass of data has been assembled, it becomes necessary for the researcher to arrange the same
in some kind of concise and logical order. This procedure is referred to as tabulation. Thus, tabulation
is the process of summarising raw data and displaying the same in compact form (i.e., in the form of

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 9 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
statistical tables) for further analysis. In a broader sense, tabulation is an orderly arrangement of data in
columns and rows.

Tabulation is essential because of the following reasons.


1. It conserves space and reduces explanatory and descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.

Tabulation can be done by hand or by mechanical or electronic devices. The choice depends on the
size and type of study, cost considerations, time pressures and the availability of tabulating machines
or computers. In relatively large inquiries, we may use mechanical or computer tabulation if other
factors are favourable and necessary facilities are available. Hand tabulation is usually preferred in
case of small inquiries where the number of questionnaires is small and they are of relatively short
length. Hand tabulation may be done using the direct tally, the list and tally or the card sort and count
methods. When there are simple codes, it is feasible to tally directly from the questionnaire. Under this
method, the codes are written on a sheet of paper, called tally sheet, and for each response a stroke is
marked against the code in which it falls. Usually after every four strokes against a particular code, the
fifth response is indicated by drawing a diagonal or horizontal line through the strokes. These groups
of five are easy to count and the data are sorted against each code conveniently.

In the listing method, the code responses may be transcribed onto a large work-sheet, allowing a line
for each questionnaire. This way a large number of questionnaires can be listed on one work sheet.
Tallies are then made for each question.

The card sorting method is the most flexible hand tabulation. In this method the data are recorded on
special cards of convenient size and shape with a series of holes. Each hole stands for a code and when
cards are stacked, a needle passes through particular hole representing a particular code. These cards
are then separated and counted. In this way frequencies of various codes can be found out by the
repetition of this technique. We can as well use the mechanical devices or the computer facility for
tabulation purpose in case we want quick results, our budget permits their use and we have a large
volume of straight forward tabulation involving a number of cross-breaks.

Tabulation may also be classified as simple and complex tabulation. The former type of tabulation
gives information about one or more groups of independent questions, whereas the latter type of
tabulation shows the division of data in two or more categories and as such is deigned to give
information concerning one or more sets of inter-related questions. Simple tabulation generally results
in one-way tables which supply answers to questions about one characteristic of data only.

Generally accepted principles of tabulation: Such principles of tabulation, particularly of constructing


statistical tables, can be briefly states as follows:
1. Every table should have a clear, concise and adequate title so as to make the table intelligible
without reference to the text and this title should always be placed just above the body of the
table.
2. Every table should be given a distinct number to facilitate easy reference.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 10 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
3. The column headings (captions) and the row headings (stubs) of the table should be clear and
brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table,
along with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated just
below the table.
7. Usually the columns are separated from one another by lines which make the table more
readable and attractive. Lines are always drawn at the top and bottom of the table and below
the captions.
8. There should be thick lines to separate the data under one class from the data under another
class and the lines separating the sub-divisions of the classes should be comparatively thin
lines.
9. The columns may be numbered to facilitate reference.
10. Those columns whose data are to be compared should be kept side by side. Similarly,
percentages and/or averages must also be kept close to the data.
11. It is generally considered better to approximate figures before tabulation as the same would
reduce unnecessary details in the table itself.
12. In order to emphasise the relative significance of certain categories, different kinds of type,
spacing and indentations may be used.
13. It is important that all column figures be properly aligned. Decimal points and (+) or (-) signs
should be in perfect alignment.
14. Abbreviations should be avoided to the extent possible and ditto marks should not be used in
the table.
15. Miscellaneous and exceptional items, if any, should be usually placed in the last row of the
table.
16. Table should be made as logical, clear, accurate and simple as possible. If the data happen to be
very large, they should not be crowded in a single table for that would make the table unwieldy
and inconvenient.
17. Total of rows should normally be placed in the extreme right column and that of columns
should be placed at the bottom.

DATA ENTRY
Data entry converts information gathered by secondary or primary methods to a medium for viewing
and manipulation. Keyboarding remains a mainstay for researchers who need to create a data file
immediately and store it in a minimal space on a variety of media. However, researchers have profited
from more efficient ways of speeding up the research process, especially from bar coding and optical
character and mark recognition.

Alternative Data Entry Formats Keyboarding


A full-screen editor, with which an entire data file can be edited or browsed, is a viable means of data
entry for statistical packages like SPSS or SAS_ SPSS offers several data entry products, including
Data Entry Builder, which enables the development of forms and Surveys, and Data Entry Station
which gives centralized entry staff, such as telephone interviewers or online participants, access to the
survey. Both SAS and SPSS offer software that effortlessly accesses data from databases,
spreadsheets, data warehouses, or data marts.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 11 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Database Development

For large projects, database programs serve as valuable data entry devices. A database is a collection
of data organized for computerized retrieval. Programs allow users to define data fields and link files
so that storage, retrieval, and updating are simplified. A company's orders serve as an example of a
database. Ordering information may be kept in several files: salesperson's customer files, customer
financial records, order production records, and order shipping documentation.
The data are separated so that authorized people can see only those parts pertinent to their needs.
However, the files may be linked so that when, say, a customer changes his or her shipping address,
the change is entered once and all relevant files are updated. Another database entry option is e-mail
data capture. It has become popular with those using e-mail-delivered surveys. The e-mail survey can
be delivered to a specific respondent whose e-mail address is known. Questions are completed on the
screen, returned via e-mail, and incorporated into a database." An intranet can also capture data. When
participants linked by a network take an online survey by completing a database form, the data are
captured in a database in a network server for later or real-time analysis." ID and password
requirements can keep unwanted participants from skewing the results of an online survey.

Spread sheet
Spreadsheets are a specialized type of database for data that need organizing, tabulating, and simple
statistics. They also offer some database management, graphics, and presentation capabilities. Data
entry on a spreadsheet uses numbered rows and lettered columns with a matrix of thousands of cells
into which an entry may be placed. Spreadsheets allow you to type numbers, formulas, and text into
appropriate cells. Many statistics programs for personal computers and also charting and graphics
applications have data editors similar to the Excel spreadsheet .This is a convenient and flexible means
for entering and viewing data.

Optical Character Recognition


If you use a PC image scanner, you probably are familiar with Optical Character Recognition (OCR)
programs that transfer printed text into computer files in order to edit and use it without retyping.
There are other related applications.

Optical scanning of instruments-the choice of testing services-is efficient for researchers. Examinees
darken small circles, ellipses, or spaces between sets of parallel lines to indicate their answers. A more
flexible format, Optical Mark Recognition (OMR) uses a spreadsheet-style interface to read and
process user-created forms. Optical scanners process the marked-sensed questionnaires and store the
answers in a file. This method, most often associated with standardized and pre printed forms, has
been adopted by researchers for data entry and pre processing due to its speed (10 times faster than
keyboarding), cost savings on data entry, convenience in charting and reporting data, and improved
accuracy. It reduces the number of times data are handled, thereby reducing the number of errors that
are introduced.

Other techniques include direct-response entry, of which voting procedures used in several states are
an example. With a specially prepared punch card, citizens cast their votes by pressing a pen Shaped
instrument against the card next to the preferred candidate. This opens a small hole in a specific
column and row of the card. The cards are collected and placed directly into a card reader.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 12 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Voice Recognition
The increase in computerized random dialing has encouraged other data collection innovations. Voice
recognition and voice response systems are providing some interesting alternatives for the telephone
interviewer. Upon getting a voice response to a randomly dialed number, the computer branches into a
questionnaire routine. These systems are advancing quickly and will soon translate recorded voice
responses into data files.

Digital
Telephone keypad response, frequently used by restaurants and entertainment venues to evaluate
customer service, is another capability made possible by computers linked to telephone lines. Using
the telephone keypad (touch-tone), an invited participant answers questions by pressing the appropriate
number. The computer captures the data by decoding the tone's electrical signal and storing the
numeric or alphabetic answer in a data file. Although not originally designed for collecting survey
data, software components within Microsoft Windows 7 have advanced speech recognition
functionality, enabling people to enter and edit data by speaking into a microphone."

Field interviewers can use mobile computers or notebooks instead of clipboards and pencils. With a
built-in communications modem, wireless LAN (or local area network), or cellular link, their files can
be sent directly to another computer in the field or to a remote site. This lets supervisors inspect data
immediately or simplifies processing at a central facility.

Bar Code
Since adoption of the Universal Product Code (UPC) in 1973, the bar code has developed from a
technological curiosity to a business mainstay. Bar-code technology is used to simplify the
interviewer's role as a data recorder. When an interviewer passes a bar-code wand over the appropriate
codes, the data are recorded in a small, lightweight unit for translation later. Researchers studying
magazine readership can scan bar codes to denote a magazine cover that is recognized by an interview
participant.

The bar code is used in numerous applications: point-of-sale terminals, hospital patient ID bracelets,
inventory control, product and brand tracking, promotional technique evaluation, shipment tracking,
marathon runners, rental car locations (to speed the return of cars and generate invoices), and tracking
of insects' mating habits.
On the Horizon

Even with these time reductions between data collection and analysis, continuing innovations in multi-
media technology are being developed by the personal computer business. The capability to integrate
visual images, streaming video, audio, and data may soon replace video equipment as the preferred
method for recording an experiment, interview, or focus group. A copy of the response data could be
extracted for data analysis, but the audio and visual images would remain intact for later evaluation.
Although technology will never replace researcher judgment, it can reduce data-handling errors,
decrease time between data collection and analysis, and help provide more usable information.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 13 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
DATA MINING
Data mining is a set of approaches which can identify and extract useful patterns from data stored in
large real-world databases. There are two major approaches: general approach including models and
techniques which can be used for any kind of data and domain specific approach for a particular
problem situation. Data mining techniques consist of statistical analyses, heuristic search procedures
involving learning.

Computers deliver a flood of data, which with the advent of IT, are generated at great speeds. Sorting
out relevant information from this mass of data is becoming increasingly difficult; for example, earth
satellites generate terra-byte data every day. They are useful in many ways, but manual processing of
such data is impractical and would lead to outdated results. This mind boggling volume of data may
have a wealth of information, which needs to be unearthed. This is the motivation for data mining.

The purpose of data mining (DM) is knowledge discovery (KD). It extracts hidden information from
large databases and helps in decision-making. Most business houses/corporations have databases on
the computer giving minute details of the operations and transactions. Data mining can contribute
significantly in deriving valuable information from these databases for detailed decision-making.
Advances in storage and distribution have made centralized giant databases a reality. But it is the
computational and analytic techniques that can render this data into patterns, trends, and useful
information that DM provides. Data mining is an interdisciplinary field with contribution from
statistics, artificial intelligence, decision theory, and so on.

Data mining is a set of approaches that can identify and extract useful patterns from data stored in
large real world databases. Data mining is done using data mining systems, which are computer
software, as manual systems are just not practical with such massive databases.
Data mining requires human initiated queries. The person knows clearly what he is looking for. He
must have a substantial amount of domain knowledge. Data mining is important for the following
reasons.

Provides more qualified answers, for example which searching for a list of those who buy spirulina a
food supplement, but DM can give, answers for specific categories, like "urban men and women
between the age of 50-70", which is better for planning the sales.

Clustering model
This is used to segment a database into several groups. A priori knowledge of classes of the database is
not known (unlike in prediction models). It is a purely exploratory model used to identify different
classes in a data set (it is also called automatic classification model). The goal is to find classes, which
differ from one another and have similarities among members in a class. Though there are well known
clustering methods available in statistics, some modifications are needed for use in data mining. One
of the widely used clustering techniques is the Auto Class, which automatically classifies a data set.
This has to be iteratively used to obtain a large number of detailed classes and their hierarchies.

Data mining in research: Data mining can be useful in the following ways.
Collecting a specific class of data for hypothesis testing when a large database of mixed classes
is available or a reclassification is required.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 14 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Data segmentation can be done to generate refined hypotheses. Some creative guesses need to
be made. The query (intense questioning) is still in the human domain and the process is inductive
where AI methodologies can be used.
Patterns of events (stolen credit card frauds, retail purchases) can be obtained. Trends, as
required for regional target marketing, can also be obtained from past data and used for prediction.

CONTENT ANALYSIS

While carrying out ex post facto studies or field studies, it will often be necessary to analyse company
documents, published newsletters, internal reports, company annual meeting proceedings, and other
sources of information. A detailed and systematic way of getting the content of communication
through these documents is by the use of content analysis. Content analysis is a systematic study of
communication, content for the ideas, and propositions or symbols that are relevant to the subject of
study. Usually the frequency of appearance of a particular idea/symbol is obtained as a measure of
content. It is, however, possible to obtain more information, for example, whether the ideas appear
with favourable, unfavourable, or of neutral connotation. The main objective is to obtain terms of
themes or recurrent ideas or propositions that will be helpful for research.

The output of the content analysis will serve as supplementary data in qualitative research situations
and where interpretations are made of the analytical investigations. The basic problem of content
analysis is selective reporting, which is a conscious or unconscious process of observing and noting
only facts that fit some preconceived idea/idea Some of the other problems of content analysis are
detailed below.

1. Sampling problems: Representativeness of the sample is very difficult to achieve in content


analysis situations. The representation of a sample of various groups of documents need not be
equally weighted. More important documents must be given greater weightage, that is, the
percentage in the sample will be higher. Representing all documents sometimes is also not
necessary.

Example Let us say, a company's corporate thrust over several years has to be obtained from
various documents, like administrative reports; proceedings of the meetings of the board of
directos; memos sent to various heads of the departments, with respect to policies and procedures;
files maintained at the departments; and leaflets produced by the company. The chairman reports
must have the greatest weight in this situation. Next may be leaflets and third, the policy letters to
the heads of the departments, whereas the 'last one may not be important at all for the purpose and
may be left out of content analysis.

2. Analysis approach: Two approaches are possible. First is an a priori approach, in which the
contents of documents and method of analysis will be laid down, like, for example, frequency,
correlation, trends, and classifications. The second is a posterior approach in which the contents are
perused, significant ideas and themes are segregated and the items are classified using induction.
Both methods are equally useful, but in an priori approach the researcher has to be sure of what he
is looking for.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 15 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
3. Quantification problems: The frequency of occurrence of the ideas and themes is in most cases
the only quantification that is attempted. Many social scientists feel that qualitative classifications
with simple counts is adequate in most cases and forced quantification should be avoided.

4. Reliability: Replication of results must be possible and to enable this a correct codification of
the ideas and themes must be carefully done. Communication from all fields is treated as social
relations data; material is organised in many ways, based on the structure of the ideas. Such
organisation requires to be done with considerable amount of objectivity and using a coding
system.

The steps involved in the technique are ;


(i) Choosing the phenomenon.
(ii) Selecting the media from which the observations are made.
(iii)Derivation of coding categorisation.
(iv) Development of sampling strategy.
(v) Analysing data.

Some Recent Developments


In recent developments of content analysis methods, matrix format data is developed in order to
capture the complexity of large amounts of qualititative data. In the matrix format, one axis
represents constructs and the other occurrences (of patterns) of phenomena. A count of occurrences
(frequencies) may be entered into the matrix or they may be indicated qualitatively
(like large or small, high or low, and so on). Further, the occurrences (variables) may be ordered
along the time dimension so that antecedents and consequences may be identified
The other methods of content analysis are (i) Conversation analysis and (ii) Discourse analysis.

Conversation analysis:
Analyses of conversations in interviews and settings of schools examines how judgements are
made. Conversation analysis assumes that conversations are made up of stable patterns in the
context, are sequentially organised (in tune with the ongoing process), and are well grounded in
data (Silverman, 2000 may be referred for details).

Discourse analysis:
This deals not only with conversation but also texts of documents like newspapers; advertisements,
conference proceedings, speeches in company bulletins, and transcripts of recorded discussions. It
deals with the context of the discourse and, particularly, the aspects of power relations and
ideologies.

4.2 TYPES OF DATA ANALYSIS


The data analysis can be performed on the qualitative and quantitative data and thus are known as
1. Qualitative Data Analysis Techniques
2. Quantitative Data Analysis Techniques

Qualitative Data Analysis Techniques


Qualitative data analysis is an iterative and reflexive process that begins as data are being collected,
rather than after data collection has been ceased. In qualitative data analysis the raw data to be
analysed are text rather than the numbered.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 16 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS

The process of analysing qualitative data should be a public process, not a private, magical‟
experience. The researchers should be able to demonstrate how conclusions were arrived at and in such
a way that other could, if necessary, dispute them. It is possible to replicate a study – doing it again to
see if the same conclusions are reached. This may not be feasible in market research, but seeing if
similar conclusions may be drawn across all groups of individuals in a study may lend.

Quantitative Data Analysis


Quantitative data arise when numbers arises as a result of the process of measurement. When one
measures something, a value is selected from a scale of values that corresponds to the observation
made of some object or situation, or with a response to a question addressed to an individual. The
scales of values may be metric or non – metric, these in turn, may be subdivided into continuous and
discrete, and into nominal and ordinal scales.

Difference between qualitative analysis and Quantitative analysis:

Basis of Qualitative Data Analysis Quantitative Data Analysis


Difference
Interpretation It relies on interpretation and logic. This analysis relies on
Qualitative researchers present their STATISTICS. Quantitative
analyses using text and arguments. research use graphs and tables to
present their analysis.
Procedures Qualitative analysis has no set of Quantitative analysis follows
and Rules rules, but rather guidelines are there agreed upon standardised
to support the analysis. procedures and rules.
Occurrence This analysis occurs simultaneously Quantitative analysis occurs after
with data collection. data collection is finished.
Methodology Qualitative analysis may vary Methods of Quantitative analysis
methods depending on the are determined in advance as part
situations. of the study design.
Reliability Qualitative analysis is validity, but Their reliability is easy to
is less reliability or consistent. establish and that they generally
They have a corresponding involve sophisticated comparisons
weakness in their ability to of variables in different
compare variables in different conditions.
conditions.
Questions Open – ended questions and Specific questions obtain
probing yield detailed information predetermined responses to
that illuminates nuances and standardized questions.
highlights divers it.
Information Provide more information on the More likely provides information
application of the program a on the broad application of the
specific context to a specific program.
population.
Suitability More suitable when time and Relies on more extensive
resources are limited. interviewing.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 17 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
4.3 BIVARIATE CORRELATION ANALYSIS
Bivariate correlation analysis differs from nonparametric measures of association and regression
analysis in two important ways. First, parametric correlation requires two continuous variables
measured on an interval or ratio scale. Second, the coefficient does not distinguish between
independent and dependent variables. It treats the variables symmetrically since the coefficient r
has the same interpretation as ryx .

Bivariate Statistical Techniques


Bivariate analysis is the simultaneous analysis of two variables. It is usually undertaken to see if
one variable, such as gender is related to another variable, perhaps attitudes toward male/female
equality. The bivariate statistical technique includes the following:

Bivariate correlation analysis differs from non parametric measures of association and regression
analysis in two important ways. First, Parametric correlation requires two continuous variables
measured on an interval or ratio scale. Second, the coefficient does not distinguish between
independent and dependent variables. It treats the variables symmetrically since the coefficient xz
has the same interpretation ryx.

Pearson's Product Moment Coefficient r


The Pearson (product moment) correlation coefficient varies over a range of + 1 through 0 to -1.
The designation r symbolizes the coefficient's estimate of linear association based on sampling
data. The coefficient p represents the population correlation.

Correlation coefficients reveal the magnitude and direction of relationships. The magnitude is the
degree to which variables move in unison or opposition. The size of a correlation of +0.40 is the
same as one of -0.40. The sign says nothing about size. The degree of correlation is modest. The
coefficient's sign signifies the direction of the relationship. Direction tells us whether large values
on one variable are associated with large values on the other (and small values with small values).
When the values correspond in this way, the two variables have a positive relationship: as one
increases, the other also increases. Family income, for example, is positively related to household
food expenditures.

As income increases, food expenditures increase. Other variables are inversely related. Large
values on the first variable are associated with small values on the second (and vice versa).

The prices of products and services are inversely related to their scarcity. In general, as products
decrease in available quantity, their prices rise. The absence of a relationship is expressed by a
coefficient of approximately zero.

According to Croxton and Cowden, “The appropriate statistical tool for discovering and measuring
the relationship of quantitative nature and expressing it in brief formula is known as correlation”.

Significance of measuring correlation


1. Study relationship between variables: Correlation is very useful to economists to study the
relationship between variables, like price and quantity demanded. To businessmen, it helps to
estimate costs, sales, price and other related variables.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 18 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
2. Measuring degree of association and direction: Correlation analysis helps in measuring the
degree of association and direction of such relationship. In economic theory we come across
several types of variables which show some kind of relationship. For example, there exists a
relationship between price, supply and quantity demanded; convenience, amenities and service
standards are related to customer retention; yield a crop related to quantity of fertilizer applied,
type of soil, quantity of seeds, rainfall and so on.

3. Verifying and testing relation between variables: The relation between variables can be verified
and tested for significance, with the help of the correlation analysis.

4. Comparing the Relationship between variables: The coefficient of correlation is a relative


measure and we can compare the relationship between variables, which are expressed in different
units.

5. Determining Validity and Reliability: Correlations are useful in the areas of healthcare such as
determining the validity and reliability of clinical measures.

LINEAR CORRELATION
The correlation between two variables is said to be linear if corresponding to a unit change in the
value of one variable there is a constant change in the value of the other variable
i.e. incase of linear correlation the relation between the variables x and y is of the type.
y = a + bx
If a = 0, the relation become y = bx.
In such cases the values of the variables are in constant ratio.

Non - Linear (Curvilinear) Correlation:


The correlation between two variables is said to be non - linear (curvilinear) if corresponding to a
unit change in the value of one variable does not change at a constant rate but at fluctuating rate of
other variable.

SIMPLE REGRESSION
The dictionary term of the term “regression‟ is the act of returning or going back. The term
regression‟ was first used by Sir Francis Galton in 1877 while studying the relationship between
the heights of father and sons.
Regression equations: The regression equations express the regression lines. As there are two
regression lines, so there are two regression equations.

Regression equation of Y on X
∑Y = Na + b∑X
∑XY = a∑X + ∑X2

Regression equation of X on Y
∑X = Na + b∑Y
∑XY = a∑Y + ∑Y2

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 19 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
TWO – WAY ANOVA
Two way ANOVA involves only two categorical variables or factors and examines the effect of
these two factors on the dependent variable. For example, the sales of Hyundai Verna car may be
attributed to different salesmen and different states. It examines the interaction between the
different levels of these two factors. Similarly, the production of a particular product in a factory
may be attributed to the different types of machines as well as the different grades of executives.
Procedure for Two – Way ANOVA

 Identify dependent and independent variables.


 Partition (decomposition) of total variation.
 Calculate variations
 Calculate degree of freedom
 Calculate mean square
 Calculate F statistic or F ratio
 Determine level of significance
 Interpret the results.

4.4 MULTIVARIATE ANALYSIS


Multivariate analysis of variance, or MANOVA, is a commonly used multivariate technique.
MANOVA assesses the relationship between two or more continuous dependent variables and
categorical variables or factors. In business research, MANOVA can be used to test differences
among samples of employees, customers, manufactured products, production parts.

One author defines multivariate analysis as "those statistical techniques which focus upon, and
bring out in bold relief, the structure of simultaneous relationships among three or more
phenomena.'

Multivariate techniques may be classified as dependency and interdependency techniques.


Selection an appropriate technique starts with an understanding of this distinction. If criterion and
prediction variables exist in the research question, then we will have an assumption of dependence.

Multiple regression, multivariate analysis of variance (MANOVA), and discriminant analysis are
techniques in which criterion or dependent variables and predictor or independent variables are
present. Alternatively if the variables are interrelated without designating some as dependent and
others independent, then interdependence of the variables is assumed. Factor analysis, cluster
analysis, and multidimensional scaling are examples of interdependency techniques.
The following issues make up the list:

 Developing new products and services.


 Producing good-quality products and services.
 Making products that are safe to use.
 Hiring minorities.
 Providing jobs for people.
 Being good citizens of the communities in which they operate.
 Paying good salaries and benefits to employees.
 Charging reasonable prices for goods and services.
 Keeping profits at reasonable levels.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 20 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
 Advertising honestly.
 Paying their fair share.
 Cleaning up their own air and water pollution.

Multidimensional scaling develops a perceptual map of the locations of some objects relative to
others. This map specifies how the objects differ. Cluster analysis identifies homogeneous
subgroups or clusters of individuals or objects based on a set of characteristics. Factor analysis
looks for patterns among the variables to discover if an underlying combination of the original
variables (a factor) can summarize the original set. Based on your research objective, you select
factor analysis.

DEPENDENCY TECHNIQUES
Multiple Regression
In statistics, regression analysis is a statistical process for estimating the relationships among
variables. It includes many techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent variables.
More specifically, regression analysis helps one understand how the typical value of the dependent
variable (or 'Criterion Variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed. Most commonly, regression analysis
estimates theconditional expectation of the dependent variable given the independent variables –
that is, the average value of the dependent variable when the independent variables are fixed.

Uses of Regression analysis


 Regression analysis is widely used for prediction and forecasting
 Regression analysis is also used to understand which among the independent variables are
related to the dependent variable, and to explore the forms of these relationships.
 In restricted circumstances, regression analysis can be used to infer causal relationships
between the independent and dependent variables.

Multiple regression analysis represents a logical extension of two – variable regression analysis.
Instead of single independent variable, two or more independent variables are used to estimate the
values of dependant variable.

In multiple regression analysis there are three or more variables say X1, X2 and X3. We now take
X1 as the dependent variable and try to find out its relative movement for movements in both X2
and X3, which are independent variables. Thus in multiple regression analysis the effect of two or
more independent variables on one dependent variables is studied.

The main objective in using multiple regressions is to predict the variability the dependent variable
based on its covariance with all the independent variables. One can predict the level of the
dependent phenomenon through multiple regressions analysis model, given the levels of
independent variables. Given a dependent variable, the linear – multiple regression problems is to
estimate constants B1, B2, .............Bk and A such that the expression
Y = B1X1 + B2 X2 +… .... + B2 X2 + A
Provides a good estimate of an individual’s Y score based on his X scores

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 21 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Assumptions of Multiple Regression Analysis
The principle assumptions of linear multiple regression analysis are:
1. The dependent variable is a random variable whereas the independent variables need
not be random variable.
2. The relationship between the several independent variables and the one dependent
variable is linear and
3. The variances of the conditional distributions of the dependent variable, given various
combinations of values of the independent variables, are all equal. For internal
estimation, an additional assumption is that the conditional distributions for the
dependent variable follow the normal probability distribution.

Steps in Multiple Regressions


1. State the research hypothesis
2. State the null hypothesis
3. Gather the data
4. Assess each variability separately first (obtain measures of central tendency and
dispersion, frequency distribution, graphs) is the variable normally distributed?
5. Assess the relationship of each independent variable, one at a time, with the dependent
variable. (Calculate the correlation coefficient, obtain a scatter plot) are the two
variables linearly related?
6. Asses the relationships between all of the independent variables with each other
(obtain a correlation coefficient matrix for all the independent variables).
7. Calculate the regression equation from the data.
8. Calculate and examine appropriate measures of association and tests of statistical
significance for each coefficient and for the equation as a whole.
9. Accept or reject the null hypothesis.
10. Reject or accept the research hypothesis
11. Explain the practical implications of the findings.

Multiple regressions are used as a descriptive tool in four types of situations:


To develop a self-weighting estimating equation by which to predict values for a criterion variable
(DV) from the values for several predictor variables (IVs). Thus, we might try to predict company
sales on the basis of new product profiles change in technology, annual disposable income, and a
time factor. Another prediction study might be one in which we estimate a student's academic
performance in college from the variables of rank in high school class, SAT verbal scores, SAT
quantitative scores, and a rating scale reflecting impressions from an interview.
1. A descriptive application of multiple regression calls for controlling for confounding,
to better evaluate the contribution of other variables. For example, one might wish to
co brand of a product and the store in which it is bought to study the effects of price as
an indicator of product quality.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 22 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
2. To test and explain causal theories. In this approach, often referred to as path analysis,
is used to describe an entire structure of linkages that have been advanced from a
causal
3. To test hypotheses and to estimate population values as an inference tool.

Discriminant Analysis
Researchers often wish to classify people or objects into two or more distinct and well-
defined groups. One might need to classify persons as either buyers or non buyers, good or
bad credit risks, or to classify superior, average, or poor products in some market.

The objective of discriminant analysis to establish a classification method, based on a set of


attributes, in order to correctly predict the group membership of these subjects. With this
objective, it is easy to understand why discriminant analysis is frequently used in market
segmentation research.

Discriminant analysis suitable with nominal dependent variable and the interval independent
variables. It is a technique to analyze data when the criterion or dependent variable is
categorical and the predictor or independent variables are interval in nature (Lachenbruch,
1975).
Example: A satisfied/unsatisfied consumer, service seeker/non service seeker consumer.

Discriminant analysis has widespread applications in the field of business research.


This analysis helps to find out the linear combinations of the independent variable that compose mean
scores across the categories of dependent variables on the linear combination maximally different. This
linear combination is called the Discriminate Function (DF) and is represented as follows:
DF = V1X1 + V2X2 +. ................................... VmXm

The criterion used to decide when group means are maximally different is the familiar ANOVA
(F-test) for the different among, thus V's are derived so that

F = (SS between / SS within) is maximized

Objectives of Discriminant Analysis


The objectives of discriminant analysis can be listed as follows.

 To discover linear combination of variables to facilitate the discrimination between categories


of dependent variable in a feasible manner.

 To establish statistical significance of discriminant function and examine whether


significant differences exist among the groups, based on predictor variables.

 To find out the independent variables that are relatively better in discriminating between
the groups.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 23 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS

 To build the procedure for assigning new objects or individuals whose profile (and not the
group identity) is known to one of the two groups.

 To assess the accuracy of classification.

Applications of Discriminant analysis in business markets


Discriminant analysis is widely used in business research. According to Gupta (2003), following
aspects can be studied by this type of analysis:

 Identification of new buyer group


 Consumer behavior toward new product or brands.
 Brand loyalty
 Relationship between variables
 Checklist of properties of new products.

Procedure for Discriminant Analysis


1. Formulate the problem
2. Estimate discriminant function coefficients
3. Determine significance of discriminant functions
4. Interpret the results
5. Asses the validity of discriminant analysis.

CONJOINT ANALYSIS
The most common applications for conjoint analysis are market research and product
development. Consumers buying a MindWriter computer, for example, may evaluate a set of
attributes to choose the product that best meets their needs. They may consider brand, speed,
price, educational value, games, or capacity for work-related tasks. The attributes and their
features require that the buyer make trade: offs in the final decision making.
Method
Conjoint analysis typically uses input from nonmetric independent variables. Normally, we would
use cross-classification tables to handle such data, but even multi way tables become quickly
overwhelmed by the complexity. If there were three prices, three brands, three speeds, two levels
of educational values, two categories for games, and two categories for work assistance, the
model would have 216 decision levels (3 X 3 X 3 X 2 X 2 X 2).

A choice structure this size poses enormous difficulties for respondent and analysts. Conjoint
analysis solves this problem with various optimal scaling approaches, often with log-linear
models, to provide researchers with reliable answers that could not be obtained otherwise.

The objective of conjoint analysis is to secure utility scores (sometimes called part-worths)
that represent the importance of each aspect of a product or service in the subjects' overall
preference ratings. Utility scores are computed from the subjects' rankings or ratings of a set of

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 24 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
cards. Each card in the deck describes one possible configuration of combined product
attributes.

The first step in a conjoint study is to select the attributes most pertinent to the purchase
decision This may require an exploratory study such as a focus group, or it could be done by
an expert with thorough market knowledge. The attributes selected are the independent
variables, called factors. Possible values for an attribute are called factor levels.

FACTOR ANALYSIS
Factor analysis is a general term for several specific computational techniques used to examine
patterns of relationships (correlation) amongst select variables (also called factors that are
common among large number of variables). The objective of these techniques is to reduce a
large number of variables to a more manageable number variables (data reduction) from a
larger set of variables based on the nature and character of these relationships. For example,
one may have data on 100 employees with scores on six at Methods of Factor Analysis

Most of the factor methods operate by extracting the eigen values and eigen vectors from a square matrix.
Many factor methods have been developed. The widely used among these are R- mode and Q-mode
techniques. The R-mode technique considers the interrelations between variables, and operates by
extracting the eigen values and eigen vectors from a covariance or correlation matrix whereas Q-mode
analysis extracts the eigen values and eigen vectors from a matrix of similarities between all possible pairs
of objects. R-mode techniques are statistical procedures but Q-mode procedures focus on the similarities
between individuals .in the data set, and are not usually amenable to statistical analysis In any factor
analysis exercise, four steps are generally involved. They are:
1. Preparation of a relevant correlation matrix to be used as input for the software used
for extraction of factors
2. Extraction of initial factors
3. Rotation of initial set of factors to a terminal solution
4. Interpretation of the rotated factor solution

Many methods of extracting factors have been developed. Some of them are listed below:
(a) Principal component analysis
(b) Maximum likelihood method (or canonical factoring)
(c) Alpha factoring
(d) Image factoring
(e) Least squares method
Some restrictions are imposed while obtaining the initial factors. They are:
 There are p common factors,
 Underlying factors are orthogonal, and
 The first factor contributes to as much variation as possible, the second factor accounts for
as much of the residual variance left unexplained by the first factor, the third factor accounts
for as much of the residual variance left unexplained by the first two factors, and so on.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 25 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Method
Factor analysis begins with the task of reducing the number of variables in order to simplify subsequent
analyses. The data reduction process is based on the relationships or inter-correlations among the
variables within the correlation matrix. Although this can be done in a number of ways, the most
frequently used approach is principal components analysis. This method transforms a correlation or
covariance matrix into a set of orthogonal components that are equal to the original variables. The set
of variables, Pi called Principal components are linear combination of composite variables weights
(also called factor load). These linear combinations of variables, called factors, contribute maximum
total variance account in the data as a whole. The first principal component or first factor is comprised
of the best linear function of the original variables as to maximize the amount of the total variance that
can be explained. The second principal component is defined as the best linear combination of
variables for explaining the variance not accounted for by the first factor. In turn, there may be a
third, fourth, and fifth component, each being the best linear combination of variables not
accounted for by the previous factors.
The process continues until all the variance is accounted for, but as a practical matter, it is
usually stopped after a small number of factors have been extracted. It is important to note that
principal components, or factors, will always be produced in a factor analysis. However, the
quality and usefulness of the derived factors are dependent upon the types, number, and
conceptual basis of variables selected for inclusion during the initial research design. The
output of a principal components analysis might look like the hypothetical data .

Eigenvalues are the sum of the variances of the factor values .When divided by the number of
variables, an eigen value an estimate of the amount of total variance explained by the factor.

Assumptions underlying factor analyses are as follows:


1. Postulate of factorial causation: This imposes a particular causal order on the data- that
observed variables are linear combinations of some underlying causal variables.
2. Postulate. of parsimony: The principle of parsimony leads to a unique conclusion where
there are infinite number of factor models possible, but there is only one particular configuration
of factor loadings that is consistent with the one common factor model.

CLUSTER ANALYSIS
Cluster analysis is a set of interdependence techniques for grouping similar objects or people.
Cluster analysis also known as classification analysis or numerical classification or arrangement –
places the variables or objects into relatively homogeneous groups as clusters.

Cluster analysis consists of methods of classifying variables into clusters. Technically, a cluster
consists of variables that correlate highly with one another and have comparatively low
correlations variables in other clusters.

The basic objective of cluster analysis is to determine how many mutually and exhaustive groups
or clusters, based on the similarities of profiles among entities, really exist in the population and
then to state the composition of such groups. Various groups to be determined in cluster analysis
are not predefined as happens to be the case in discriminant analysis.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 26 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
It can also be said that cluster analysis is an exploratory data analysis tool which aims at sorting
different objects into groups in a way that the degree of association between two objects is
maximal if they belong to the same group and minimal otherwise. Cluster analysis automatically
forms the cluster without the need of researcher to classify cluster based on his opinion.
According to Gupta (2003), the following factors are considered for cluster analysis:
1. They form sub groupings
2. They take as input a matrix of association between variables or objects.
3. They assume that natural clusters exist within the data.
Cluster analysis reduces the number of observations or cases by grouping them into a smaller set
of clusters.

Its visibility in those fields and the availability of high-speed computers to carry out the extensive
calculations have speed its adoption in business. Understanding one's market very often involves
segmenting customers into homogeneous groups that have common buying characteristics or
behave in similar ways. Such segments frequently share similar psychological, demographic,
lifestyle, age, financial, or other characteristics.

Cluster analysis offers a means for segmentation research and other business problems, such as
understanding buyer behaviors, where the goal is to identify similar groups.

It differs from discriminant analysis in that discriminant analysis begins with a well- defined
group composed of two or more distinct sets of characteristics in search of a set of variables to
separate them. In contrast, cluster analysis starts with an undifferentiated group of people, events,
or objects and attempts to reorganize them into homogeneous subgroups.

Objectives of cluster analysis


1. Reduction of data: Reducing the number of cases by enabling consideration of
several types instead of numerous records.
2. Assign observation to groups: The objective of cluster analysis is to assign
observation or groups (“clusters”) so that observation within each group are similar to
one another with respect to variables or attributes of interest and the groups themselves
stand apart from one another.
3. Discover Composition of Groups: Cluster analysis seeks to discover the number and
composition of the groups.

Steps for conducting cluster analysis


Five steps are basic to the application of most cluster studies:
1. Selection of the sample to be clustered (e.g., buyers, medical patients, inventory,
products, employees).
2. Definition of the variables on which to measure the objects, events, or people (e.g.,
market segment characteristics, product competition definitions, financial status,

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 27 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
political affiliation, symptom classes, productivity attributes).
3. Computation of similarities among the entities through correlation, Euclidean
distances, and other techniques.
4. Selection of mutually exclusive clusters (maximization of within-cluster similarity and
between- cluster differences) or hierarchically arranged clusters.
5. Cluster comparison and validation.

Example of cluster analysis

The following diagram shows a two – dimensional perceptual stature based on


data relating to 18 individuals from the East India, on the basis of

1. Number of working days and


2. Expenditure on working during a given year.

 The first cluster consisting of six individuals explains that although these individuals
work for more days, they do not spend much on their working days.

 The second cluster comprising seven individuals illustrates that they work fairly hard
and also spend reasonably.
 The third cluster comprising five individuals shows that they have relatively few
working days but spend significantly more on their working days.

Applications of Cluster Analysis in Business Research


Cluster analysis has widespread applications in business research, some of which are as follows.
 Market Segmentation: Cluster analysis can be used to cluster markets based on
various parameters. For example, Yankelovich (1964) segmented customers based on
'what they look for in a watch' and classified individuals into those who are price
driven, durability and quality driven, and those are driven by occasion – based
symbolism.
 Industrial Segmentation: One can easily segment products or sectors into clusters
and evaluate them further.
 Selecting test markets: By grouping states or regions into homogenous clusters, it can
easily be decided which market to test or whether to test at a particular point of time.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 28 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS

 Reducing data: One can reduce data and conduct further analysis based on the
simplified clusters.
 Identifying new product opportunities: Competitive sets in the markets can be
analyzed through cluster analysis.
 Educational data mining: Cluster analysis is for example used to identify group of
schools or students with similar properties.
Other Applications
1. Social Network Analysis
2. Image Segmentation
3. Data Mining
4. Grouping of shopping items
5. Crime analysis

MULTIDIMENSIONAL SCALING
Multidimensional scaling (MDS) creates a special description of a respondent's perception
about a product, service, or other object of interest on a perceptual map. This often helps the
researcher to understand difficult-to-measure constructs such as product quality or desirability.

In contrast to variables that can be measured directly, many constructs are perceived and
cognitively mapped in different ways by individuals. With MDS, items that are perceived to
be similar will fall close together on the perceptual map, and items that are perceived to be
dissimilar will be further apart.

It consists of a group of analytical techniques which are used to study consumer attitudes
related to perception and preferences. The respondents are asked to place the various brands
into different groups like similar, not similar and so on. A goodness of fit is traded off on a
large number of attributes.

Then a lack of fit index is calculated by computer program. It is a computer based technique.
The purpose is to find a reasonably small number of dimensions which will eliminate most of
the stress.

After the configuration for the consumer’s preference has been developed, the next step is to
determine the preference with regards to the product under study. These techniques attempt to
identify the product attributes that are important to consumers and to measure their relative
importance.

Concepts of Multi – Dimensional Scaling (MDS)


Multi – Dimensional Scaling (MDS) is a class procedure for representing perceptions and
preferences of respondents spatially by means of a visual display. Perceived or psychological

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 29 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
relationships among stimuli are represented as geometric relationships among points in a
multi- dimensional space. These geometric representations are often called spatial maps.

The axes of the spatial map are assumed to denote the psychological bases or underlying
dimensions respondents use to for perceptions and preferences for stimuli. MDS has been used
in marketing to identify:
1. The number and nature of dimensions consumers use to perceive different brands in
the market place.
2. The positioning of current brands on these dimensions

The positioning of consumer’s ideal brand on these dimensions.This is a complicated scaling


device, with which researchers can scale objects, individuals or both with a minimum of
information. The goal of the analysis is to detect meaningful underlying dimensions that allow the
researcher to explain observed similarities or dissimilarities (distances) between the investigated
objects.
MDS can be characterized as a set of procedures for portraying perceptual or effective
dimensions of substantive interest. It is used when all the variables (metric and non metric) in
a study are to be analysed simultaneously and all such variable happen to be independent.

MDS is not so much an exact procedure as it is a way to „rearrange‟ objects in an efficient


manner, so as to arrive at a configuration that best approximates the observed distances. It
actually moves objects around in the space defined by the requested number of dimensions,
and checks how well the distances between objects can be reproduced by the new
configuration.

In more technical terms, MDS uses a function minimization algorithm that evaluates different
configurations with the goal of maximizing the goodness – of – fit (or minimizing „lack of
fit‟).

Multidimensional Scaling transforms consumer judgments / perceptions of similarity or


preferences in a multidimensional space (usually 2 or 3 dimensions). It is useful for designing
products and services. MDS is a set of procedure for drawing pictures of data so that the
researcher can:

 Visualize relationships described by the data more clearly.


 Offer clearer explanations of those relationships.
Thus, MDS reveals relationships that appear to be obscure when one examines only the
numbers resulting from a study. It attempts to find the structure in a set of distance measures
between objects.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 30 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Assumptions
 People perceive a set of objects as being more or less similar to one another on a
number of dimensions instead of one.
These techniques attempt to locate the points, given the information about a set of interpoint distances,
in space of one or more dimensions such as to best summarize the information contained in the
interpoint distances.
Objectives of MDS
1. Develop techniques of full text analysis.
2. Provides visual presentation of similarities.
3. Help explain observed similarities.
4. Applied to any kind of distances
Advantages of MDS
1. Market Segmentation: This technique can be very useful in segmenting the markets
based on varied criteria. For instance, in above example market can be segmented
based on consumer perception.
2. Product Life cycle: MDS can be used to predict the life cycle of a product. For
instance, the product life cycle stage for a floppy disk will be altogether different from
CD or DVD.
3. Vendor evaluations: This technique can assist an organization in judging the vendor
responses and proceed in the market further based on those responses.
4. Advertising media selection: This technique can segment various available media
based on their relevance and effectiveness. As such, one can easily sort out which
media to choose for the advertisement and the budget to devote to that particular
media.
Process of MDS
The types of data required are dependent on the area of research. An ideal
multidimensional scaling experiment involves gathering four types of data:
a. Similarity judgments among all pairs of stimuli,
b. Ratings of stimuli on descriptors such as adjectives,
c. Objective measures (such as physicochemical parameters) relating to the sensory
properties of the stimuli, and
d. Information about the subjects.
Multidimensional scaling (MDS) represents the objects as points in a multidimensional space. The
number of dimensions used may be I to (n-I), but typically I to 5 dimensions are used. Distance
between objects created in multidimensional space is based on rank order similarities and is known as
stress. The objective of the process is to decide on the fewest dimensions with which the low stress
value will be created. Like in factor analysis, the dimension must be named by the researchers.

MDS can be thought of as a process having a variety of procedures to plot stimulus objects as points in a
multidimensional geometric perceptual space where dimensions are interpreted as attributes through which
stimulus objects are differentiated from each other by the respondents. Depending upon the nature of input and
output data, MDS can be classified as doubly metric (both data metric), doubly non-metric (both data non
metric), or non-metric (input data metric, output data non-metric). The MDS process consists of three steps,

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 31 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
1. Data collection
2. Data analysis, and
3. Data interpretation.

MDS is most often used to assess perceived similarities and differences among objects. Using MDS
allows the researcher to understand constructs that are not directly measurable. The process
provides spatial map that shows similarities in terms of relative distances. It is best understood
when limited two or three dimensions that can be graphically displayed.

Simple Linear Regression

In the previous section, we focused on relationships between variables. The product moment
correlation was found to represent an index of the magnitude of the relationship, the sign
governed the direction, and r2 explained the common variance. Relationships also serve as a
basis for estimation and prediction.

When we take the observed values of X to estimate or predict corresponding Y values, the
process called simple prediction." When more than one X variable is used, the outcome is a
function of multi predictors. Simple and multiple predictions are made with a technique called
regression analysis.

4.5 APPLICATION OF STATISTICAL SOFTWARE


Statistics is the science of making effective use of numerical data relating to groups of individuals
or experiments. It deals with all aspects of this, including not only the collection, analysis and
interpretation of such data, but also the planning of the collection of data, in terms of the design
of surveys and experiments. The proliferation of computer technology in businesses and
universities has greatly facilitated tabulation and statistical analysis.

Commercial statistical packages eliminate the need to write a new program every time tabulation or
analysis of data required with the computer. Traditional or manual method takes lot of time, whereas
statistical software makes the statistical, analysis more simpler and accurate result. Some of the more
common software packages used for data analysis are as follows:

SPSS

Software
Packages

STATA SAS

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 32 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS

SPSS
SPSS stands for Statistical Package for the Social Sciences. It was developed in 1968, by three
young men from disparate professional backgrounds.

SPSS or the Statistical Package for the Social Sciences is basically a computer programme
which enables us in survey authoring and deployment, data mining, text analytics, statistical
analysis, and collaboration and deployment. Since its' inception in 1968 it has been helping
users do various kinds of analysis and research in the social scenario. In SPSS research, an
integrated set of programs are used to analyze a set of management and statistical data from
questionnaire surveys and other sources.

SPSS is a multipurpose data storage, graphical, and statistical system.


The primary use of SPSS in educational settings is for data analysis about people, their
attitudes and behaviours. SPSS is mainly used by Political Science, Communication,
Sociology, Social Work and Psychology departments.

Overall, SPSS is a good first statistical package for people wanting to perform quantitative
research in social science be it is easy to use and because it can be a good starting point to
learn more advanced statistical packages.

Proficiency with statistical software packages is indispensable today for serious research in the- sciences.
SPSS (Statistic Package for the Social Sciences) is one of the most widely available and powerful
statistical software packages. It covers broad range of statistical procedures that allow you to summarise
data- (e.g., compute means and standard deviation determine whether there are significant differences
between groups (e.g., t-tests, analysis of variance), examine relationship among variables (e.g.,
correlation, multiple regression),. and graph results (e.g., bar charts, line graphs).

Starting SPSS
Assuming that SPSS is already installed on computer system, just choose it from the Windows
Start menu or double click its icon to begin.
1. Type in Data: Type in data is useful for analysing small data sets not available in
electronic form.
2. Open Existing Data Source: Open existing data source is used' for opening data files
created in SPSS.
3. Open Another Type of File: Open another type of file for importing data Stored in
files not created by SPSS.
After making your choice, click 'OK'. Clicking Cancel instead of OK is the same as choosing
Type in data. Use Exit from the File menu whenever you are ready to quit SPSS.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 33 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Significance of SPSS
SPSS is the statistical package most widely used by political scientists. There seem to be
several reasons why:

1. Of the major packages, it seems to be the easiest to use for the most widely used statistical
techniques;
2. One can use it with either a Windows point-and-click approach through syntax (i.e.,
writing out of SPSS command ).Each has its own advantages, and the user can switch
between the approaches;

3. Many of the widely used social science data sets come with an easy method to translate
them into SPSS; this significantly reduces the preliminary work needed to explore new
data.
SAS/STAT
This software is designed for both specialized and enterprise wide analytical needs. SAS/STAT
software provides a complete, comprehensive set of tools that can meet the data analysis needs of
the entire organisation. The features of SAS/STAT are as follows:
1. Analysis of Variance: Balanced and unbalanced designs; multivariate analysis of
variance and repeated measurement linear and non-linear mixed models.
2. Mixed Models
i) Linear mixed models.'
ii) Non-linear mixed models.
iii) Generalize linear mixed models.
3. Regression
i. Least squares regression with nine model selection techniques, including stepwise
regression.
ii. Diagnostic measures.
iii. Robust regression; loess regression.
iv. Non-linear regression and quadratic response surface models.
v. Partial least squares.
4. Categorical Data Analysis
i. Contingency tables and measures of association.
ii. Logistic regression and log linear models; generalized linear models.
iii. Bioassay analysis.
iv. Generalized estimating equations.
v. Weighted least squares regression.
vi. Exact methods.
5. Bayesian Analysis
Bayesian modelling and inference for generalised linear models, accelerated life failure
models, Cox regression models and piece-wise exponential models. General procedure fits
Bayesian models with arbitrary priors and likelihood functions.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 34 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS

6. Multivariate Analysis: Factor analysis; principal components; canonical correlation and


discriminate analysis; path analysis; structural equations.
4. Survival Analysis: Comparison of survival distributions; accelerated failure time models;
proportional hazards models.
5. Psychometric Analysis:
Multidimensional scaling; Conjoint analysis with variable transformations;
correspondence analysis.
6. Cluster Analysis:
Hierarchical clustering of multivariate data or distance data; disjoint clustering of large
data sets; non- parametric clustering with hypothesis tests for the number of clusters.
7. Non-Parametric Analysis
i) Non-parametric analysis of variance. Exact probabilities computed for many non-
parametric statistics.

ii) Kruskal-Wallis, Wilcoxon-Mann-Whitney and Friedman tests.

iii) Other rank tests for balanced or unbalanced one-way or two-way designs.
8. Survey Data Analysis: Sample selection; descriptive statistics and t-tests; linear and
logistic regression; frequency table analysis.
9. Study Planning:
Power and sample size application provides interface for computation of sample sizes
and characterization of power for t-tests, confidence intervals, linear models, tests of
proportions and rank tests for survival analysis.

STATA
It is a complete, integrated statistical package that provides everything one need for data
analysis, data management, and graphics. STATA is not sold in pieces, which means one gets
everything one need in one package without annual license fees.

Fast, Accurate, and Easy to Use: With a point-and-click interface, an intuitive command
syntax, and online help, STATA is easy to use, fast, and accurate. All analyses can be re-
produced and documented for publication and review.

Broad Suite of Statistical Capabilities: STAT A puts hundreds of statistical tools at your
fingertips, from advanced techniques, such as survival models with frailty, Dynamic Panel Data
(DPD) regressions, Generalized Estimating Equations (GEE), multilevel mixed models, models
with sample selection, multiple imputation, ARCH, and estimation with complex survey
samples; to standard methods, such as linear. and Generalized
Linear Models (GLM), regressions with count or binary outcomes, ANOVA/MANOV A,
ARIMA, cluster analysis, standardization of rates, case-control analysis, and basic tabulations
and summary statistics.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 35 of 36
KV INSTITUTE OF MANAGEMENT AND INFORMATION STUDIES
BA4205-BUSINESS RESEARCH METHODS
Complete Data-Management Facilities: STATA'S data-management commands give one
complete control of all types of data - one can combine and re-shape datasets, manage variables,
and collect statistics across groups or replicates. One can work with byte, integer, long, floats,
double, and string variables. STATA also has advanced tools for managing specialized data such
as survival/duration data, time-series data, panel/longitudinal data, categorical data, multiple-
imputation data, and survey data.

Publication-Quality Graphics: STATA makes it easy to generate publication-quality, distinctly


styled graphs, including regression fit graphs, distributional plots, time-series graphs, and
survival plots. With the integrated Graph Editor one click to change anything about the graph or
to add titles, notes, lines, arrows, and text.

Responsive and Extensible: ST ATA is so programmable that developers and users add new
features every day to respond to the growing demands of today's researchers. With STATA's
internet capabilities, new features and official updates can be installed over the internet with a
single click. Many new features and informative articles are published quarterly in the refereed
STATA Journal. Another great resource is STATA list, an independent list server where more
than 3,200 STATA users exchange over 1,000 postings and 50 programs each month.

Matrix Programming - MAT A: Though one do not need to program to use STAT A, it is
comforting to know that fast and complete matrix programming language is an integral part of
STAT A. MAT A is both an interactive environment for manipulating matrices and a full
development environment that can produce compiled and optimized code. It includes special
features for processing panel data, performs operations on real or complex matrices, provides
complete support for object-oriented programming, and is fully integrated with every aspect of
STAT A.

Cross-Platform Compatible: STATA is available for Windows, Mac, and Unix computers
(including Linux). STAT datasets, programs, and other data can be shared across platforms
without translation.

This material is proprietary to KV Institute of Management, a Nationally Ranked B-School in Coimbatore and cannot be copied or duplicated for use
outside of KV. Violators will face infringement proceedings of copyright laws. Page: 36 of 36

You might also like