DBA1657-Research Methods in Business

DBA 1657
RESEARCH METHODS IN BUSINESS
UNIT -1
NOTES
Introduction to Research
Unit structure:
1.1 Introduction
1.2 Learning objectives
1.3 Definition of research
1.4 Importance of research
1.5 Hallmarks of scientific research
1.6 The Building Blocks of Science in research
1.7 Research process: An Overview
1.7.1. Defining the research problem
1.7.2. Establishing research objectives
1.7.3 Developing the research design
1.7.4 Preparing a research proposal
1.7.5 Data Collection
1.7.6 Data Analysis and Interpretation
1.7.7 Research report
1.8 Theoretical Framework
1.8.1 Types of Variables
1.8.2 Theoretical framework : The need and features
1.9 Hypotheses: Types and testing
1.10 Research Design
1.10.1 The need for research design
1.10.2 Classification of research designs
-------------------------------------------------------------------------------------
1.1 INTRODUCTION
Managers are mostly involved in studying and analyzing issues that lead to
decision making. They are involved in some form of research for making an appropriate
decision. Decision making today is complicated and complex. There is a myriad flow of
information enabled by data mining and warehousing which provides a vital input for
decision making. The success or failure of a business decision depends on the data
associated with the decision. The decisions can be made in an objective or subjective
manner. Objective decision making is rationale and scientific. To arrive at objective
1
Anna University Chennai
DBA 1657
NOTES
decision making the business managers often involve themselves in some form of
research.
Research is simply the process of finding solutions to a problem after a thorough
study and analysis of the situational and other related factors. Business research is a
systematic and organized effort to investigate a specific problem encountered in the
work setting that needs a solution. It comprises of a series of steps designed and executed
with the goal of finding answers to the issues that are of concern to the manager.
This unit provides a basic understanding of research, the process involved and
the steps involved in the development and testing of hypothesis. Further, the need and
the major types of research design is dealt in detail.
1.2 LEARNING OBJECTIVES:

After reading this unit you should be able to:
Define research and understand the advantages of the knowledge of research
Highlight the distinctive characteristic features of research
Describe the building blocks of Science in research
Understand the steps in the research process
Develop a research design
Understand the need and basic features of the theoretical framework
Describe the steps in hypotheses development and testing
Understand the need and the major types of research design
1.3 DEFINITION OF RESEARCH

Research refers to search for knowledge. It is an art of scientific investigation.
Redman and Mory define research as Systematized effort to gain new knowledge.
Research is an original contribution to the existing stock of knowledge. D.S. Lesinger
and M.Stephenson in the Encyclopedia of Social Sciences, define research as the
manipulation of things, concepts or symbols for the purpose of generalizing to extend,
correct or verify knowledge, whether that knowledge aids in construction of theory or
in the practice of an art.
According to Clifford Woody research comprises defining and redefining
problems, formulating hypothesis or suggested solutions, collecting, organizing and
evaluating data; making deductions and reaching conclusions, and at last carefully testing
the conclusions to determine whether they fit the formulated hypothesis. Business
research is an organized, systematic, data-based, critical, objective, scientific inquiry or
DBA 1657
investigation into a specific problem undertaken with the purpose of finding solutions to
it. Research provides the needed information that guides managers to make informed
decisions to successfully deal with problems.
NOTES
1.4 IMPORTANCE OF KNOWLEDGE OF RESEARCH IN BUSINESS

SETTINGS:
The knowledge of research is important on account of the following reasons:
The business world today is more complicated and complex. In this context
the research enables the manager to face the competitive global market with
greater confidence. Research enables to consider the available information in a
sophisticated and creative way.
Research enables the managers to identify critical issues, gather relevant

information, analyse the data and implement the right course of action.
Managers need to understand, predict and control events that are dysfunctional
to the organization. Research enables to understand, predict and control the
environment.
Research enables to sense, spot and deal with problems before they go out of
hand.
The organizations may not be able to solve all the problems encountered inhouse. Consultants may be engaged for expert advice. The manager needs to
have knowledge of research to interact with the research consultants effectively
and to get the maximum benefit out of them
All the research findings published cannot be accepted as such. The soundness
of findings should be evaluated before making decisions on the basis of research
findings. The managers needs to know about the research so as to evaluate and
discriminate the research findings based on the soundness of methodology etc.,
The knowledge of research and research methods sensitize the mangers to the
various variables operating in a situation and remind them of the multicausality
and multifinality of the situations and thereby avoiding inappropriate, simplistic
notions of one variable causing another.
It enables the managers to understand the research reports prepared by

professionals so as to take intelligent, calculated risks with known probabilities
attached to the success or failure of their decision.
Knowledge about the scientific investigation will enable the managers to eliminate
or avoid making decisions on subjective or biased manner.
DBA 1657
NOTES
Knowledge about research helps the manager to understand the need for and
share the pertinent information with the research consultants.
1.5 HALLMARKS OF SCIENTIFIC RESEARCH

Successful managerial decisions are seldom made on hunches or on trail and
error method. The sound and effective decisions are always made on the basis of
scientific research. Scientific research focuses on solving problems in a step-by-step
logical, organized and rigorous manner in each step of research viz., identifying problem,
gathering data, analyzing it and in arriving at a valid conclusion. Organizations may not
always be involved in the scientific research due to various reasons like- simple problems
which can be solved with previous experience, time contingency, lack of knowledge,
resource constraints etc., However, the scientific research performed in a rigorous and
systematic way leads to repeatable and comparable research findings. It also enables
the researchers to arrive at accurate, dependable and subjective findings. The hallmarks
or distinguishing characteristic features of scientific research are as follows:
1.5.1 Purposiveness
The research is conducted with a purpose. It has a focus. The purpose of the
research should be clearly mentioned in an understandable and unambiguous manner.
The statement of the decision problem should include its scope, its limitations and the
precise meaning of all words and terms significant to the research. Failure to mention
the purpose clearly will raise doubts in the minds of stakeholders of the research as to
whether the researcher has sufficient understanding of the problem.
1.5.2. Rigor
Rigor means carefulness, scrupulousness and the degree of exactness in research
investigation. In order to make a meaningful and worthwhile contribution to the field of
knowledge, research must be carried out rigorously. Conducting a rigorous research
requires a good theoretical knowledge and a clearly laid out methodology. This will
eliminate the bias, facilitate proper data collection and analysis, which in turn would
lead to sound and reliable research findings.
1.5.3. Testability
Research should be based on testable assumptions/hypotheses developed after
a careful study of the problems involved. The scientific research should enable the
testing of logically developed hypotheses to see whether or not the data collected support
the hypotheses developed.
DBA 1657
NOTES
1.5.4. Replicability
Research findings would command more faith and credence if the same results
are evolved on different set of data. The results of the test hypothesis should be supported
again and again when the same type of research is repeated in other similar circumstances.
This will ensure the scientific nature of the research conducted and more confidence
could be placed in the research findings. It also eliminates the doubt that the hypotheses
are supported by chance and ensures that the findings reflect the true state of affairs.
1.5.5. Precision and Confidence

In management research the findings are seldom definitive due to the fact that
the universe of items, events or population are not taken as such but based on samples
drawn from universe. There is a probability that the sample may not reflect the universe.
Measurement errors and other problems are bound to introduce an element of error in
the findings. However, the research design should ensure that the findings are as close
to the reality as possible so that one can have confidence in the findings .
Precision refers to the closeness of the finding to reality based on sample. It
reflects the degree of accuracy or exactitude of the results on the basis of the sample to
what exactly is in the universe. The confidence interval in statistics is referred here as
precision.
Confidence refers to the probability that the estimation made in the research
findings are correct. It is not enough if the results are precise but it is also important to
claim that 95% of the time the results would be true and there is only a 5% chance of
the results being wrong. This is known as confidence level.
If the precision and confidence level of the research findings are higher than the
findings of the research study would be more scientific and useful. Precision and
confidence can be attained through appropriate scientific sampling design.
1.5.6. Objectivity
Research findings should be factual, data-based and free from bias. The
conclusion drawn should be based on the facts of the findings derived form the actual
data and not on the basis of subjective or emotional values. Business organizations will
suffer a greater extent of damage if non-data-based or misleading conclusions drawn
from the research is implemented. Scientific approach ensures objectivity of research.
1.5.7. Generalizability
It refers to the scope of applying the research findings of one organizational
setting to other settings of almost similar nature. The research will be more useful if the
solutions are applicable to a wider range. The more generlizable the research, the greater
5
DBA 1657
NOTES
will be its usefulness and value. However it is not always possible to generalize the
research findings to all other settings, situations or organizations. For achieving
genaralizability the sampling design has to be logically developed and data collection
method needs to be very sound. This may increase the cost of conducting the research.
In most of the cases though the research findings would be based on scientific methods
it is applicable only to a particular organization, settings or situations.
1.5.8. Parsimony
Research needs to be conducted in a parsimonious ie simple and economical
manner. Simplicity in explaining the problems and generalizing solutions for the problems
is preferred to a complex research framework. Economy in research models can be
achieved by way of considering less number of variables leading to greater variance
rather than considering more number of variables leading to less variance. Clear
understanding regarding the problem and the factors influencing the same will lead to
parsimony in research activities. Sound understanding can be achieved through structured
and unstructured interview with the concerned people and by undertaking a study of
related literature in the problem area.
The scientific research in management area cannot fulfill all the above discussed
hallmarks to the fullest extent. In management research it is not always possible to
conduct investigations that are 100% scientific like in physical science as it is difficult to
collect and measure the data regarding the feelings, emotions, attitudes and perception.
It is also difficult to obtain representative sample. These aspects restrict the generlizability
of the findings. Though it is not possible to meet all the above said characteristics of the
scientific research, to the possible extent the research activities should be pursued in
the scientific manner.
1.6 THE BUILDING BLOCKS OF SCIENCE IN RESEARCH

The essential tenets of scientific research are: direct observation of phenomena,
clearly defined variables, methods and procedures, empirically testable hypotheses,
ability to rule out rival hypotheses, statistical justification of conclusions and self correcting
process. One of the primary methods of scientific investigation is the hypotheticodeductive methods. The method of starting with a theoretical framework, formulating
hypotheses and logically deducing from the results of the study is known as hypotheticodeductive method. The deduction and induction are two important aspects of the scientific
research through which the answers to a research question can be arrived at. Further
details on deduction and induction are dealt below:
Deduction
Deduction is a process by which the researcher arrive at a reasoned conclusion
by logical generalization of a known fact. Deduction leads to conclusions which should
DBA 1657
be necessarily based on reasons. The reasons are said to imply the conclusions and
represent a proof. The bond between the reasons and conclusions is much stronger
than in the case of induction. To be correct, a deduction should be both valid and true.
True in the sense that the reasons given for the conclusions must agree with the real
world. Valid means the conclusion must necessarily be arrived from the reasons.
NOTES
Researchers often use deduction to reason out the implication of various acts
and conditions. For example, in a survey a researcher may reason as follows:
Surveying households in urban area is difficult and expensive (Reason 1)
The study involves interview with households in urban area
(Reason 2)
The interview in this survey will be difficult and expensive
(Conclusion)
Induction
Induction is a process where certain phenomenon is observed on the basis of
which conclusions are arrived at. The conclusions are drawn from one or more facts or
pieces of evidence. The conclusions in induction results in hypotheses. Induction leads
to establish a general proposition based on observed facts. For example, the researcher
understands that production processes is the prime feature of factories. It is therefore
concluded that factories exist for production purposes.
Research is based on both deduction and induction. It helps us to understand,
explain and predict business phenomena.
The building blocks of scientific inquiry include the following sequences:
1. Observing a phenomena
2. Identifying a problem
3. Constructing a theory
4. Developing hypotheses
5. Developing research design
6. Collecting data
7. Analyzing data and
8. Interpreting results
Observation a phenomena may be casual or purposeful. A casual scanning of
the environment may lead us to the knowledge of interesting facts. This observation
may lead to identifying the problem in the concerned area. The problem identification
needs gathering of primary data from the customers or from the employees or
management concerned with the particular problem. Further insights may be obtained
7
DBA 1657
NOTES
to refine the problem in a more specific manner. The next step is to build a conceptual
model or theoretical framework taking into consideration all the factors contributing to
the problem. The framework enables to integrate all the information collected in a
meaningful manner. From this theoretical framework several hypotheses can be generated
and tested to support the concept. A research design provides the blue print of the
mechanism or insight regarding the methods of collecting data, analyzing the same and
interpreting them in order to solve the problem.
The building blocks of science discussed above provide the genesis for the
hypothetico-deductive method of scientific methods. The steps are discussed below:
1. Observation
Observation is the first stage in scientific investigation. In this process, the
researcher takes into account the changes that are occurring in the environment. To
proceed further the changes observed in the environment should have important
consequences. The changes may be in the form of sudden drop in the sales, increase in
the employee turnover, decrease in the number of customer and the like.
2. Preliminary information gathering
This involves seeking in depth information regarding the facts being observed.
The information may be gathered through formal questionnaires, interview schedules or
through informal or causal talk with the concerned people. Desk research may also be
conducted to enrich the information gathered. The next step is to make sense out of the
factors identified in the information gathering stage by assembling them together in a
meaningful manner.
3. Formulation of theory
Theory formulation enables to integrate all the information in a logical manner
so as to conceptualize and test the factors responsible for the problem. The critical
variables contributing to the problem are examined. The association or relationship
among the variables contributing to the problem is studied in order to formulate the
theory.
4. Developing Hypotheses
The next logical step leads to framing of testable hypotheses. Hypotheses testing
are called deductive research. Sometimes it may so happen that the hypotheses which
are not originally formulated get generated through the process of induction. After the
collection of data an insight may occur based on which new hypotheses can be formulated.
Thus hypotheses testing through deductive research and hypotheses generation through
induction are both common.
DBA 1657
5. Scientific Data collection
NOTES
After the hypothesis is developed, the data with respect to each variable in the
hypotheses needs to be obtained in a scientific manner so as to test the hypotheses. The
primary and secondary sources can both be explored in order to collect the data. Data
on every variable in the theoretical framework from which the hypothesis is generated
should be collected.
6. Data Analysis
The data gathered are to be statistically analyzed to validate the hypothesis
postulated. Both qualitative and quantitative data needs to be analyzed. Qualitative
data refer to information gathered through interviews and observations. Through scaling
techniques the qualitative data can be converted into quantifiable form and subjected to
analysis. Appropriate statistical tools should be used to analyze the data.
7. Deduction
Deduction is the process of arriving at conclusions by interpreting the meaning
of results of the data analysis. Based on the deduction, recommendations can be made
to solve the problem encountered.
1.7. RESEARCH PROCESS : AN OVERVIEW

Research process involves execution of a series of phases towards
accomplishment of the objectives of research. Each phase in the research process need
not be carried out in a sequential process. Some the phases can be carried out
simultaneously. However, the idea of sequence will be useful for developing and carrying
out a research study in a systematic manner. The research process consists of the
following distinctive interrelated phases: (1) Defining the research problem (2) Establishing
Research Objectives (3) Developing the research design (4) Preparing a research
proposal (5) Data Collection (6) Data Analysis and Interpretation and (7) Research
reporting.
1.7.1. Defining the Research Problem:

A problem need not necessarily mean that something is wrong in the current
situation which needs to be rectified immediately. It simply indicates an issue for which
finding a solution could help to improve an existing situation. Problem can be defined as
any situation where a gap exists between the actual and the desired state. Problem
statement or problem definition refers to a clear, precise and succinct statement of
question or issue that is to be investigated with the goal of finding an answer or solution.
Components of research problem
The components of research problem are as suggested by R.L.Ackoff in the
Design of Social Research is elaborated below:
9
DBA 1657
NOTES
There must be an individual or a group which has some difficulty with problem
There must be some objective(s) to be attained at.
There must be alternative means or course of action for obtaining the objectives
There must be some doubt in the minds of a researcher with regard to the
selection of alternatives.
There must be some environment to which the difficulty pertains.
Criteria for selecting the research problem

The following criteria can be kept in the minds of researcher in selecting the
research problem.
Subjects on which the research is carried on amply should not be normally

chosen as there will not be new dimension to reveal
Too narrow or too vague problems should be avoided
The researcher should be familiar with the subject chosen for research. The
researcher should have enough knowledge, qualification and training in the
selected problem area. The resources needed to solve the problem in terms of
time, money, efforts, manpower requirement should be taken into account before
embarking on a problem.
The subject of research should be familiar and feasible so that related research
material or sources of research can be obtained easily.
The selection of a problem must be preceded by a preliminary study.
Research problems trigger the research process. Defining the research problem
is a critical activity. A thorough understanding of research problem is a must for achieving
success in the research endeavor. Defining the research problem begins with identifying
the basic dilemma that prompts the research. It can be further developed by progressively
breaking down the original dilemma into more specific and focus oriented objectives.
Five steps could be envisaged (1) Identifying the broad problem area(2) Literature
review (3) Identifying the research question (4) Refining the research question (5)
Developing investigative questions. They are discussed below:
1.7.1.1 Identifying the broad problem area
The process begins with specifying the problem at the most general level eg.,
declining sales, increased cost, increased employee turnover etc. From this general
specification of problem the next step is to move towards the question. The question
reinstates the general problem. For eg., What is the reason for declining sales?. The
questions that can be raised can be grouped into three categories;(1) Choice of purposes
10
DBA 1657
or objectives where the question focuses on what objectives the researcher wishes to
achieve by conducting the research (2) Generation and evaluation of solutions where
the question focuses on the alternatives available to solve a problem in hand (3) Trouble
shooting or control situation where the query focuses on monitoring and diagnosing,
why an organization is not achieving the established goals.
NOTES
The researcher can identify the problem through the following sources:
Own experience as well as observation of others experience and situations

may give raise to researchable problem
Detailed discussion with various authorities concerned with the problem
Focus group interviews
Scrutinizing published data
Review of literature enables to identify problems which are researched and

questioned in other studies. The same can be simulated by the researcher.
The above techniques would enable the researcher to understand the problem
in a better manner and also to outline the possible variables that might exert an influence.
The nature of information needed by the researcher could be broadly classified under
three headings:
1. Background information of the organization for which research is
conducted viz, the origin and history of the company, its assets, number
of employees, location etc., The information can be obtained from
company records, published data, Census of Business and Industry
and the web.
2. Information regarding managerial philosophy, company policies and
other structural aspects can be collected by asking direct question from
the management
3. Information regarding the perception, attitudes and behavioural aspect
of employees could be obtained by way of observations, interviews
and questionnaires.
1.7.1.2 Literature survey
Literature survey is the review of published and unpublished work from
secondary source in the area of interest to the researcher. The purpose of conducting
literature survey at this stage is:
To document the studies relevant to the problem identified for research

11
DBA 1657
NOTES
To ensure that no variable that has been taken up in the past related studies is
ignored.
To avoid conducting similar type of study and thereby stop the researcher from
investing his resources in terms of time and effort in an research venture which
is already solved.
To provide a good frame work and a solid foundation to proceed further in the
investigation.
To have a comprehensive theoretical framework from which hypothesis can be

developed for testing.
To enable to develop the problem statement in a precise and clear manner
To enhance the testability and replicability of the findings of the current research.
To understand the research gap
To stimulate researcher to carry out the work
To confirm the appropriateness of procedure by referring to similar studies

conducted in the past
To trace inconsistencies, contradictions and consistencies
To clear conceptualization
To familiarize with methodology, research tools and statistical analysis
The literature review needs to be performed on the variables identified through

the interview process. It comprises of three steps viz., (i) Identifying the sources (ii)
Gathering relevant information (iii) Writing up the Literature review.
i. Identifying the sources
The data can be obtained from library by going through books, journals,
newspapers, magazines, conference proceedings, doctoral dissertation, thesis,
government publication and other reports. The development of information technology
has led to many online databases like Prowess, EBSCO etc and the interlinking of
libraries has led to a myriad of information in the hands of the researcher with the click
of the mouse. Computerized databases include bibliographies, abstract and full text of
articles. Bibliographic databases display only the bibliographic citations i.e., the name
of the author, the title of the article / journal, source of publication, year, volume and
page numbers. The abstract databases in addition to the above said information provides
an abstract or summary of the article. The full-text databases as the name suggest
enables to download the full text of the article.
12
DBA 1657
NOTES
ii. Gathering relevant information.

The articles gathered either from books, journal or on line sources could as
such act as a reservoir of information. These sources could lead to further information
through the citation and references used. The list of journal and references referred in
the articles could lead us further to the source of information. Also, during the course of
reading the articles, the researcher can get insight into new variables or new avenues
hitherto unexplored.
iii. Presenting the Literature review
. The literature should be presented in a clear and logical manner citing the
author, year of study, objectives of the research, major findings and implications. The
researcher should present the literature in a chronological order and in a coherent
manner. There are several methods of citing references in the literature. The publication
Manual of American Psychological Association (2001) offers detailed information
regarding citations, quotations, references and so on. The Chicago Manual of Style
also prescribes the format.
1.7.1.3 Identifying the research problem/question
The next step is converting the broad problem into a research question. The
research question is fact-oriented and requires gathering information. A research question
states the objective of the research study. It is a more specific question that must be
answered. It can be more than one question or just only one.
1.7.1.4 Refining the research question
The refined research question will have better focus and will enable to conduct
research with more clarity than the initially formulated questions. In addition to finetuning the original question, other research question related activities should be addressed
in this phase to enhance the quality of research work viz.,
1. Examine the concepts and constructs used in the study.
2. Review the research question and break them down to second and third level
questions.
3. Whether hypotheses are postulated in a proper and standard manner?
4. What is not included in the scope of the research questions?
If the research questions are well defined, the subquestions can be easily arrived at.
However, if the research question is poorly defined the researcher will need further
13
DBA 1657
NOTES
exploration and question revision to refine the original question and generate the material
for constructing the investigative questions.
1.7.1.5 Developing investigative questions
Investigative questions are questions that the researcher must answer
satisfactorily to arrive at a conclusion about the research question. To formulate them,
the researcher should break down the research question into more specific questions
for which the data is to be gathered. This fractioning process can be continued down to
several levels with increasing specificity. The investigative questions guides to develop a
suitable research design. They are the foundations for creating the research data collection
instrument. In developing the investigative questions performance considerations,
attitudinal issues and behavioral issues can be included, depending on the research
problem.
The problems in defining research questions
There might be some problems in defining the research questions which are
discussed below:
The researcher may recast the management question so that it is amenable to

favorite methodology.
The existence of a pool of information or a database may distract the researcher

to reduce the need for other research.
All management questions are not researchable. To be researchable, a question

must be one for which observation or other data collection can provide the
answer.
Some problems are complex, value-laden and bound by constraints. These illdefined questions have characteristics that are virtually the opposite of those of
well-defined problems. These problems require a thorough exploratory study
before proceeding.
1.7.2 Establishing Research Objectives

The research objectives should be set once the research problem is finalized.
Research objectives provide the guidelines for determining the other steps to be
undertaken in the research process. If the objectives are achieved the decision maker
will have the information needed to solve the problem. Research objectives justify the
need for undertaking the research work. It provides a purpose and direction for the
research.
14
DBA 1657
NOTES
1.7.3 Developing the research design

A research design is the specification of methods and procedures for acquiring
the information needed to structure or to solve problems. It is an overall operational
pattern or framework of the project that stipulates the information to be collected, the
sources from which information can be collected and the procedures for collection of
information. In other words the researcher should consider (1)the design technique,
(2)the type of data, (3)the sampling methodology and procedures, (4)the schedule and
the budget. A good research design ensures that the information obtained is relevant to
the research problem in an objective and economical manner. The research design can
be described as a master plan or model or blueprint for the conduct of investigation.
1.7.3.1The type of research design
Most of the research objectives could be met by using any one of the three
types of research designs; exploratory, descriptive and casual research designs.
Exploratory research focuses on collecting data using an unstructured format or informal
procedure to capture data and to interpret them. It is often used to classify the problems
or opportunities and it is not intended to provide conclusive information from which a
particular course of action can be determined.
Descriptive research uses a set of scientific methods and procedures to collect
raw data and create data structures that describe the existing characteristics of a defined
target population. For eg., the profile of the consumers, pattern of purchase behaviour
etc. In descriptive research design the researcher looks for answers to the how, who,
what, when and where questions concerning the different components of a market
structure. The data and information generated through the descriptive designs can provide
the decision makers with evidence that can lead to a course of action.
Casual research design deals with collecting raw data, creating data structures
and information that will allow the decision maker or researcher to model cause-effect
relationships between two or more market variables. The casual research design enables
to identify, determine and explain the critical factors that affect the decision making.
However, the research process is more complex, expensive and time-consuming.
1.7.3.2 The type of data
The data can be grouped into two broad categories viz., primary and secondary.
Primary data represents the first hand raw data that have been specifically collected for
the current research problem. Primary data are raw, unprocessed and yet to receive
any type of meaningful interpretation. Sources of primary data tend to be the output of
conducting some type of exploratory, descriptive or casual research.
15
DBA 1657
NOTES
The secondary data is the historical data previously collected and assembled
for some other research problem. Secondary data can be usually gathered at faster and
economical manner than the primary data. However, the data may not fit in the
researchers information need. The secondary data can be obtained from the libraries,
website, published as well as unpublished documents etc.,
1.7.3.3 Sampling methodology and procedure
Sampling refers to randomly selected subgroup of people or objects from the
overall membership pool of defined target population. Sampling plans can be broadly
classified into probability and non probability sampling. In a probability sampling plan,
each member of the defined target population is a known and has an equal chance of
being drawn into the sample group. Probability sampling gives the researcher the
opportunity to assess the sampling error. In the case of non probability sampling the
research finding cannot be generalized and the sampling error cannot be assessed. The
findings are limited to the sample which provided the original raw data. However non
probability sampling may be the only choice in case where the population cannot be
ascertained. (A more detailed discussion on sampling is dealt in Unit 3 )
1.7.3.4The time schedule and the budget
The time schedule for completing the research along with the break up of time
required for each task has to be ascertained. Scheduling will enable the completion of
the project in time. A budget displays the sources and application of funds for the
research. The budget may require less attention in case of a in-house project or one which
is sourced by the researcher . However, a budget which is prepared for financial grants
needs to be prepared very systematically supported with proper documentation. The
budget may be prepared on various basis for eg., the rule-of- thumb budgeting where
a fixed percentage is arrived on some criterion like a percentage of sales or previous
years research budget.
Task budgeting selects specific research projects to support on an adhoc basis.
1.7.4. Preparing a research proposal

The research proposal is an oral or written activity that incorporates decisions
made regarding the research work. It includes the choices the researcher made in the
preliminary steps. A written proposal is often made when the study is suggested. It
ensures the project purpose, methodology, time and budget. The length and complexity
of the proposal varies according to the needs and desires of the researcher. Irrespective
of the length of the proposal it should have two basic sections; statement of the research
problem and the research methodology.
16
DBA 1657
1.7.5 Data collection
NOTES
The data gathering phase begins with the pilot testing. It is done to detect the
weakness in the research design, questionnaire/interview schedule and provides proxy
data for selection of probability sample. The pilot testing should stimulate the procedure
and protocols designed for data collection. If the study is to be conducted by email then
the pilot questionnaire should be emailed. The size of the pilot group may range normally
from 25 to 100 respondents who need not be statistically selected. There are a number
of variations of pilot testing. Some of them may be restricted to data collection only.
One form is pretesting where the responses are collected from colleagues, respondents
surrogates or actual respondents for the main purpose of refining the questionnaire.
Based on the pilot testing the questionnaire may be redesigned, rephrased and improved.
Pretesting may be repeated many times to refine questions or procedures.
Data is the facts presented to the researcher from the study environment. Data
can be gathered from a singe location or from all over the world based on the research
objectives and the resource allocation. The data collection method ranges from
observation, questionnaires, laboratory notes and other modern instruments and devices.
Data can be characterized by their abstractness, verifiability, elusiveness and closeness
to the phenomenon. As abstractions, data are more metaphorical than real. When sensory
experiences consistently produce the same result then the data is said to be trustworthy
as they are verified. Data capturing is elusive, complicated by the speed at which events
occur and the time-bound nature of observation. Data reflect their truthfulness measured
by the degree of closeness to the phenomena. Secondary data has at least one level of
interpretation inserted between the event and its recording. Primary data are close to
the truth.
Data collected need to be edited for ensuring consistency and to locate
omissions. In case of survey method editing reduces errors in the recording, improves
legibility and clarifies unclear and inappropriate responses. Edited data are then converted
into analyzable form. Computers can be used to find missing data, validate data, edit
and code so that further analysis can be carried out in a valid manner.
1.7.6 Data Analysis and Interpretation

Research is conducted for the purpose of acquiring information. Raw data as
such does not provide information. Further analyzes needs to be done to crunch
information out of data. Data analysis involves application of statistical techniques for
reducing accumulated data to a manageable size leading to summaries. Responses
acquired by way of administering questionnaires should be subjected to analysis so as
to ascertain the behaviour of variable, the relationship between variables etc. Analysis
should be focused to find answers to the research questions / hypothesis.
17
DBA 1657
NOTES
Various statistical softwares are available to make the job of data analysis easier
and scientifical. However, the interpretation needs to be made with expertise as the
recommendations are made on the basis on them.
1.7.7 Research Report

It is only through reports the researcher communicates about the research work,
findings and recommendations to the outside world. The report has to be prepared in
the style that will be understood by the target audience. The reports may be
communicated by way of written documents or in an oral manner, through letters or
through telephone calls or a combination of all. The type of report varies depending on
the type of research, length of report and the purpose.
The researcher should take care to see that the report addresses all the objectives
of research in a lucid manner. The report should be adapted to the needs of the target
audience and care must be taken to use appropriate words in projecting the interpretation,
recommendations and conclusion.
A report should contain an executive summary consisting of synopsis of problem,
findings and recommendations. It should speak about the background of the study, the
statement of problem, literature summary, methods and procedures, findings,
recommendations and conclusion. A detailed discussion on report writing follows in
Unit 5.
1.8 THEORETICAL FRAMEWORK

A theoretical framework is a conceptual model of how one theorizes or makes
logical sense of the relationship among the several factors that have been identified as
important to the research problem concerned. To put it simply a theoretical framework
involves identifying the network of relationship among the variables considered important
to the study. It provides the conceptual foundation to proceed further with the research.
The theory is developed based on the documentation of previous research studies
undertaken in the relevant study area or similar problems.
Understanding the conceptual framework enables to postulate hypotheses and
test the relationships. A testable hypothesis can be developed to examine whether the
theory formulated is valid or not. The hypothesized relationship can be tested by means
of suitable statistical techniques. In case of applied research, testable hypothesis need
not be evolved from the theoretical framework but still it is important as it provides a
background for understanding the problem researched. Thus, the entire research process
rests on the soundness of the theoretical framework undertaken.
Having a background knowledge of variables is absolutely necessary to
understand the relationship so as to formulate testable hypotheses. A variable, as the
18
DBA 1657
name suggest takes varied values. The values may be different at various time for the
same object or person or at the same time for different objects or persons. For eg.,
Age is a variable, as it can be different for different consumers and also for a single
consumer it varies as time evolves.
NOTES
1.8.1 Types of variables

There are many types of variables like the dependent, independent, moderating,
intervening, discrete, continuous, extraneous etc.,
1.8.1.1 Dependent variable
As the name suggests the value of a dependent variable is influenced by other
variables. It is the main variable of interest to the researcher. Understanding the variables
that influence the dependent variables will lead to finding solutions to the problem. For
this purpose the researcher will be interested in quantifying and measuring the dependent
variable as well as the other variables that influences the dependent variables. For eg.
sales of an organization is a dependent variable. The sales value depends on the demand,
price fixed, environmental factors etc. The sales also vary from time to time. Hence it
can be called as dependent variable. There can be more than one dependent variable
in a study. In this case the researcher may be interested to know factors that influence
all the dependent variables and the difference in the degree of variance among the
Choice of different
product dependent variables.
1.8.1.2 Independent variables
An independent variable influences the value of dependent variable either in a
positive or in a negative way. The variance in the dependent variable is accounted for
by the independent variable. To manipulate the dependent variable the independent
variable can be used. With each unit of increase in the dependent variable the independent
variable may increase or decrease. The variance in the dependent variable is caused
by the independent variables. To establish the casual relationship the independent variable
is manipulated. For example age of a customer may influence the choice of a product.
Here age is the independent variable and the choice of the product is a dependent
variable.
Age
Independent variable
Dependent variable
1.8.1.3 Moderating variable

The variable that moderates the relationship between dependent and
independent variables is called as a moderating variable. The moderating variable has a
19
DBA 1657
NOTES
strong contingent effect on the relationship between the independent and the dependent
variable. The presence of a third variable modifies the original relationship between the
dependent and independent variables. In the example discussed above the price of the
product is a moderating variable. Though the age influences the price may moderate the
choice of the product.
Choice of product
Age
Price
1.8.1.4 Intervening variables

An intervening variable is one that surfaces between the time, the independent
variable start operating to influence the dependent variable and the time the impact is
felt on it. The intervening variable surfaces as a function of the independent variables
operating in any situation and helps to conceptualize and explain the influences of the
independent variables on the dependent variables.
Age
Independent
Attitude
Intervening
Choice of product
Dependent variable
1.8.2 Theoretical Framework: The need and features

The theoretical framework is the foundation on which the entire research is
carried out. It is logically developed, described and an elaborated network of associations
among the variables, deemed to be relevant to the problem situation and an identified
through such processes as interviews, observations and literature survey. Experience
and intuition can also be taken up in developing the theoretical framework.
To arrive at good solutions to the problem, correct identification of the problem
and the variables contributing to the same is a must. After identifying the variables, the
next step is to elaborate the network of associations among the variables. This will
enable formulation of hypotheses which can be subsequently tested.
The literature survey provides a solid foundation to develop the theoretical
framework. Through literature survey the variables that are important are identified
through previous research findings. This forms the basis for a theoretical model. The
theoretical framework elaborates the relationship among variables, explains the theory
20
DBA 1657
underlying the relations and describes the nature and directions of the relationship. The
theoretical foundations provide the basis for developing testable hypotheses.
NOTES
The following are the basic features of a theoretical framework:
The variables influencing the research problem should be clearly identified, defined
and discussed.
The discussion should also highlight the relationship between the variables so
identified.
The type of relationship for eg., Positive or negative should be highlighted.
The reason for assuming the type of relationship should be mentioned drawing
on the previous research studies identified through the literature review.
A model showing the relationship among the variables can be given so that the
concepts can be visualized and understood clearly by the reader.
1.9 HYPOTHESIS : TYPES AND TESTING PROCEDURE

A hypothesis can be defined as a logically conjectured relationship between
two or more variables expressed in the form of a testable statement. Relationships are
assumed on the basis of the network of associations established in the theoretical
framework. Formulating such testable statement is called hypothesis development. The
hypothesis can be grouped on the following basis:
1. Statement of hypotheses
Hypothesis can be expressed either as propositions or in the form of if-then
statements.
Example:
Aged customers will be inclined to take insurance policy
If customers are aged, then they will be inclined to take insurance policy
2. Directional and Non directional Hypotheses
The hypothesis which indicates the type or direction of relationship between
variables is called as directional hypothesis. In specifying the relationship between
variables the terms such as positive, negative, more than, less than and the like are used
in these hypotheses.
Eg., High income consumers spend more on consumer durables.
Non directional hypotheses postulate relationship but does not offer indication
of the direction of the relationship.
21
DBA 1657
NOTES
Eg., Education of the respondent does not have an influence on the importance
given to the information source.
Non directional hypotheses are formulated in the case where previous studies
have not explored the direction of relationship or there is no evidence to assume the
direction of the relationship among the variables. The previous research studies may
give rise to conflicting findings which will also be the reason for nondirectional hypothesis.
3 Null and alternative hypotheses
Null hypotheses states that there is no significant relationship between the
variables. Null hypotheses also state that there is no difference between what we might
find in the population characteristics and the sample that is being studied. It is implied
through null hypotheses that the difference, if any between the two samples groups or
any relationship between two variables based on our sample is simply due to random
sampling fluctuations and not due to any true differences between the two population
groups. The null hypotheses so formulated are tested for possible rejection. It may
state that the population correlation between two variables is equal to zero or that the
difference in the means of two groups in the population is equal to zero.
The hypotheses generation and testing can be done through both induction and
deduction. In deduction, the theoretical model is first developed, testable hypotheses
are formulated on the basis of the theoretical framework, data collected and then the
hypotheses are tested. In the inductive process, new hypotheses are formulated based
on the known facts collected already which are subjected to test. The findings would
add to the knowledge and help to build a theoretical framework.
1.9.1 Hypothesis testing: Meaning and Approaches

The purpose of hypothesis testing is to determine the accuracy of the hypotheses
framed due to the fact that the data is collected from sample and not from the entire
population. The accuracy of hypotheses is evaluated by determining the statistical
likelihood that the data reveal true differences and not the random sampling error.
There are two approaches to hypothesis testing; classical or sampling theory
and the Bayesian approach. Classical approach is mostly used in research application.
This approach represents an objective view of probability and the decision making is
made totally on an analysis of available sampling data. A hypothesis is accepted or
rejected based on the sample data collected. The sample drawn may vary at least to a
smaller extent from the population and hence it is a must to know whether the differences
are statistically significant or insignificant. A difference is statistically significant if there is
a good reason to believe that the difference does not represent the random sampling
fluctuations only.
22
DBA 1657
Bayesian statistics also use sampling data for decision but go beyond them and
considers all other available information. The additional information consists of subjective
probability estimates stated in terms of degrees of belief. The subjective estimates are
based on general experience rather than on specific data collected . They are expressed
as a prior distribution that can be revised after sample information is gathered. The
revised estimate known as posterior distribution can be further revised by additional
information and so on. Various decision rules are established, cost and other estimates
can be introduced and the expected outcome of the combination of these elements are
used to judge the decision alternatives
NOTES
1.9.2 Statistical testing procedure

The sequence for testing a hypothesis is discussed below:
State the null hypothesis
The null hypothesis is mostly used for statistical purposes. It is used in the case
where the researcher is interested in testing a hypothesis of change or difference.
Choose the statistical test
Appropriate statistical test must be chosen to test the hypothesis. Criteria is to
be employed in selecting the appropriate statistical test. The criteria may include the
type of sample used, the nature of population, type of measurement scale used etc.
Select the desired level of significance
The level of significance must be decided before the data collection. Commonly
used level is .05 however .01 is also widely used. The other significance levels are.10,.025
and .001. The level of significance is determined on the basis of extent of risk the
researcher is willing to accept and the effect of the choice on the risk. The larger the
level of significance, the lower will be the risk.
Compare the calculated difference value
Once the data is collected then a selected statistical formula is used to obtain
the calculated value Obtain the critical test value
After obtaining the calculated value, the critical value is to be obtained from the
appropriate statistical table. The critical value is the criteria that defines the region of
rejection from the region of acceptance of the null hypothesis
Interpret the test
If the calculated value is larger than that of the critical value the null hypothesis
is rejected and it is concluded that the alternative hypothesis is accepted. If the critical
value is larger the null hypothesis is accepted.
23
DBA 1657
NOTES
1.9.3 Parametric and Non parametric test

There are two general classes of significance tests: parametric and non
parametric.
In the case of Parametric test the data are derived from interval and ratio measurements.
Non parametric tests are used to test the hypotheses with nominal and ordinal data.
The following assumptions are made in case of parametric tests:
The observation must be independent i.e., the selection of one case should not
affect the chances of any other cases to be included in the sample
The observations should be drawn from normally distributed population
The population should have equal variances
The measurement scales should be at least intervals so that arithmetic operations

can be used with them
Non parametric tests have few assumptions. They are easy to understand and
simple to use. They do not specify normally distributed population or homogeneity of
variance. Some tests require independence of cases and others are designed for related
cases. Non parametric tests are the only ones usable with nominal data; they are the
only technically correct test to be used with ordinal data. Non parametric test can also
be used in the case of interval and ratio data. However, it will result in waste of some
information available. The non parametric tests are highly efficient as compared to
parametric tests. Non parametric test with the sample of 100 will provide the same
statistical testing power as a parametric test with a sample of 95.
1.9.4 Types of test

Five different types of tests can be applied to test the hypotheses viz., Onesample test, two independent sample test, two related sample test, K Independent
sample test and K related sample test. In order to test the hypothesis a particular test
has to be selected based on the following criterion:
y The samples involved viz., one sample, two samples or k samples.
y In case of two samples it has to be identified whether the individual cases are
independent or related.
y The type of scale used i.e., nominal, ordinal, interval or ratio
The following section explores the type of test to be used based on the three
criteria discussed above:
24
DBA 1657
NOTES
I. One-sample Tests
One-sample tests are used when a single sample is taken and test is undertaken
to know whether the sample comes from a specified population.
Parametric tests
The parametric tests Z or t-test can be used to determine the statistical
significance between a sample distribution mean and a parameter. When sample sizes
are beyond 120 then the t and z distributions are virtually identical.
Non-parametric tests
Different types of non parametric tests may be used in the case of one-sample
test depending on the measurement scale used and other conditions. If measurement
scale is nominal, binomial or chi-square test can be used. The binomial test is appropriate
when the population is viewed as only two classes such as male and female, buyers and
non-buyers and all observations fall into one of these categories. The binomial test is
useful when the size of sample is very small and the chi-square test cannot be used
x2
Chi-square test is the most widely used non parametric test of significance. It is
particularly useful in those tests involving nominal data but can also be used for higher
scales. Using this technique, the significant differences between the observed distribution
of data among categories and the expected distribution are tested on the null hypothesis.
This test can be used in one sample, two independent samples or k independent samples.
It must be calculated with actual counts rather than percentages. The formula for the
chi-square( ) test is
k
(Oi E1 )2
2
x =
Ei
t =1
Oi = observed number of cases categorized in the ith category
Ei = Expected number of cases in the ith category under Ho
K = the number of categories
II. Two-independent samples tests
The need for two independent samples tests is often encountered in research.
One can compare the purchasing predisposition of a sample of subscribers from two
magazines to discover if they are from the same population.
Parametric tests
The z and t-tests are frequently used parametric tests for independent samples,
however F test can also be used
25
DBA 1657
NOTES
The z test is used with sample sizes exceeding 30 for both independent samples
or with smaller samples when the data are normally distributed and population variances
are known. The formula for the z test is
z=
(x
x 2 ( 1 2 ) X 0
S 12 S 22
+
n1
n2
In the case of small sample sizes, normally distributed populations and assuming equal
population variances, the t-test is appropriate:
t=
(x
x 2 (1 2 ) X 0
1 1
S p2 +
n1 n 2
(1 2 )
is the difference between the two population means is associated with the
pooled variable estimate:
S =
2
p
(n1 1) S12 + (n2 1) S 22

n1 + n2 2
Non-parametric tests
The chi-square test is appropriate for situations in which a test for differences
between samples is required. It is especially valuable for nominal data, however it can
be used with ordinal measurements also. The formula slightly differs from earlier one
and it is as below:
x = i j
2
(O
ij
Eij )
Eij
Oij = Observed number of cases categorized in the ijth cell

Eij = Expected number of cases under Ho to be categorized in the ijth cell.
III Two Related Samples Test
The two related samples tests are used in situations in which persons, objects
or events are closely matched or the phenomena is measured twice. The efficiency of
workers before and after training can be measured.
26
DBA 1657
Parametric Tests
NOTES
The t-test for independent samples is inappropriate because one of the

assumptions is that observations are independent. This problem is solved by a formula
where the difference is found between each matched pair of observations, thereby
reducing the two samples to the equivalent of one-sample case.
In the following formula , the average difference, D corresponds to the normal
distribution when the (alpha) difference is known and the sample size is sufficient. The
statistic t with (n 1) degrees of freedom is defined as
t=
where
SD n
D=
D
SD =
n
D2
( D)2
n
(n 1)
Nonparametric Tests
The McNemar test may be used with either nominal or ordinal data and is
especially useful with before-after measurement of the same subjects.
IV. K Independent Samples Tests
K independent samples tests are normally used in management and economic
research when three or more samples are involved. The test is concerned with whether
the samples might come from the same or identical population. When the data are
measured on an interval-ratio scale and the necessary assumptions are met then the
Analysis of Variance and F test are used. If the assumptions cannot be met or if the
data are measured on ordinal and nominal scale then the non parametric test can be
selected. The samples are assumed to be independent.
Parametric Tests
Analysis of Variance(ANOVA) is a statistical method of testing the null
hypothesis that the means of several populations are equal. To use ANOVA certain
conditions must be met.
1
The samples must be randomly selected from normal population
Populations should have equal variances
The distance from one value to its groups mean should be independent of the
distance of other values to that mean
ANOVA breaks down or partitions total variabaility into component parts. It

uses squared deviations of the variance so computation of distances of the individual
data points from their own mean or from the grand mean can be summed up. In ANOVA
27
DBA 1657
NOTES
model each group has its own mean and values that deviate from that mean. Similarly,
all the data points form all of the groups produce an overall grand mean. The total
deviation is the sum of the squared differences between each data point and the overall
grand mean.
The total deviation of any particular data point may be portioned into between
groups variance and within-group variance. The between-groups variance represents
the effect of the treatment or factor. The differences between-groups means imply that
each group was treated differently and the treatment will appear as deviations of the
sample means from the grand mean. The within-groups variance describes the deviations
of the data points within each group from the sample mean. This results from variability
among subjects and from random variation. This is often called error. When the variability
attributable to the treatment exceeds the variability arising from error and random
fluctuations, the viability of the null hypothesis begins to diminish.
The test statistic for ANOVA is the F ratio. It compares the variance between
two sources:
F = Between-groups variance
Mean square between
Within-groups variance
Mean square within
Mean square between = Sum of squares between

Degrees of freedom between
Mean square within =
Sum of squares within

Degrees of freedom within
IV. K Related Sample Case

Parametric test
A k related sample test is required in the following situations:
The grouping factor has more than two levels
The observations or subjects are matched or the same subject is measured

more than once
The data are atleast interval
28
DBA 1657
In this method it is often necessary to measure subjects several times. These repeated
measurements are called trials. The repeated-measures ANOVA is a special type of nway analysis of variance.
NOTES
Non parametric test

In case of k related samples to be measured on a nominal scale the Cochran Q
test is appropriate. The test extends McNemar test. It tests the hypothesis that the
proportion of cases in a category is equal for several related categories. When the data
is ordinal, the Friedman two-way analysis of variance is appropriate. It tests matched
samples, ranking each case and calculating the mean rank for each variable across all
cases. It uses ranks to compute a test statistic.
The basis aspects of a research design
The need and the major types of research design
1.10 Research Design

The research design is a blue print of action. It involves a series of rational
decision making choices regarding the purpose of the study, its scope, its location, the
type of investigation, the extent to which its is controlled and manipulated by the
researcher, the time aspects, the collection, measurement and analysis of data. It is a
plan and structure to obtain answer to the research questions. It aids the researcher in
the allocation of the resources in a well-defined manner. The more sophisticated and
rigorous the research design is, the greater outcome of the research.
The essentials of research design:
It is an activity and time based plan
The design is based on the research questions
The design guides the selection of sources and types of information.
It is a framework for specifying the relationship among the studys variable
It outlines the procedures for every research activity
The overall research design can be split into the following parts:
The sampling design which deals with the method of selecting the samples for
the purpose of conducting the study
The observational design which deals with the conditions under which the
observation is made
The statistical design which is concerned with the number of samples to be

observed and the how the data gathered is to be analyzed
29
DBA 1657
NOTES
The operational design which relates to the techniques by which the procedures
specified in the sampling, statistical and observational designs can be carried
out.
In a nut shell, the research design should contain the following:
A clear statement of the research problem
Procedure and techniques proposed for gathering information
The population involved in the study
The methods to be used in processing and analyzing the data
1.10.1 Need for research design

The research design has to be prepared on account of the following reasons:
Research design is the blueprint of the proposed research to be conducted. It

enables to plan the various activities and provides an insight into the type of
difficulties that may arise, so that the researcher may be prepared to tackle the
same.
Since the research design is the plan regarding the sampling procedure, data
collection method and various other activities to be performed in the proposed
research , the same can be discussed with others and based on the critical
comments, the flaws and inadequacies can be tackled leading to an effective
research design.
It gives an idea regarding the type of resources required in terms of money,

manpower, time and efforts
It enables the smooth and efficient conduct of various research operations
The research design affects the reliability of the research findings and as such it
constitutes the foundation of the entire research work
1.10.2 Classification of research designs

Every researcher is faced with the task of identifying a suitable research design
to carry out the study. The following section explores the types of research design,
based on select criteria:
30
DBA 1657
Criteria
NOTES
Types
Method of data collection
Monitoring
Interrogation/communication
Researchers control on variables
Experimental
Ex post facto
Purpose of study
Casual
Descriptive
Time dimension
Cross-sectional
Longitudinal
Scope
Case
Statistical study
Research environment
Field setting
Laboratory research
Simulation
Participants perception
Actual routine
Modified routine
Types of investigation
Casual
Correlational
Unit of analysis
Single
Dyad
Group
Organization/Nation
Extent of crystallization
Formal study
Exploratory study
1. Method of data collection

The research study may assume the characteristics of monitoring and
interrogation. Monitoring includes the studies where the researcher inspects the activities
of a subject without attempting to elicit response from anyone. For e.g., observing the
behaviour of consumers in a departmental store.
In the interrogation/communication study the researcher questions the subjects
and collects their responses. The data can be collected through questionnaires, interviews
or experimental methods.
31
DBA 1657
NOTES
2. Researchers control of variables

On the basis of researchers ability to control or manipulate the variables two
types of research design could be arrived at viz., experimental and ex post facto designs.
In an experimental design, the researcher attempts to control or manipulate the variables
in the study. The variables may be kept constant or may be changed to know the
effects. Experimental design is appropriate when one wishes to understand and explore
the effects of certain variables on the other.
In an ex post facto design investigators have no control over the variables. The
variables cannot be manipulated. The researcher can only report what has happened or
what is happening.
3. Purpose of the study
On the basis of the purpose of the study two categories of research design can
be arrived at viz., descriptive and casual studies.
Descriptive studies
The research concerned with finding out who, what, where, when or how much
is a descriptive study. The descriptive studies are more formalized and has a structure
with clearly stated hypotheses or investigative questions. They can serve a variety of
objectives viz.,
y Description of phenomena or characteristics
y Estimates of the proportion of a population that have a certain set of
characteristics
y Discovery of association among different variables. This is commonly called as
correlational study
The Descriptive studies present data in a meaningful form and thus helps to
understand the characteristics of a group in a given situation. It enables to think
systematically about the aspects in a given situation. It offers ideas for further research
and help to make simple decisions. A descriptive study may be simple or complex and
can be done in many settings. A simplest descriptive research can study about the size,
form, distribution etc.
Casual studies
If the researcher is concerned with analyzing how one variable produces changes
in another, it is called a casual study. The casual study attempts to explain the relationship
among variables. The concern in casual analysis is to analyse how one variable affects
32
DBA 1657
or is responsible for changes in the other variables. There are three possibilities of
relationship that can occur between variables:
NOTES
1. Symmetrical
2. Reciprocal
3. Asymmetrical
In the case of Symmetrical relationship two variables fluctuate together but it is
assumed that changes in neither variable are due to changes in the other. Symmetrical
conditions are usually found when two variables are alternate indicators of another
cause or independent variable.
Reciprocal relationship exists in the case where two variables mutually influence
or reinforce each other. Asymmetrical relationship exists where the changes in one
variable viz., the independent variable is responsible for changes in another variable
viz., the dependent variable. The dependent and independent variables are identified
on the basis of :
The degree to which each variable may be altered. The variable which are
relatively unalterable is called independent variable
The time order between the variables. The independent variables precedes the
dependent variable.
Four types of asymmetrical relationship can exists:

1. Stimulus response; where an event or change results in a response from some
object. A stimulus is an even or force. A response is a decision or reaction. e.g.,
decrease in price might result in increase in the number of units sold
2. Property disposition; where an existing property causes a disposition. A
property is an enduring characteristic of a subject that does not depend on
circumstances for its activation. Disposition is a tendency to respond in a certain
way under certain circumstances. eg. gender and attitude towards genocide
3. Disposition behaviour; where a disposition causes a specific behaviour. A
behaviour is an action e.g., opinion about the stores image and purchase
4. Property- behaviour; where an existing property causes a specific behaviour
e.g., family size and purchase of car.
Testing casual hypotheses
To test casual hypotheses, three types of evidence can be opted:
33
DBA 1657
NOTES
1. Covariation between the variables

2. Time order of events moving in the hypothesized direction
3. No other possible causes for change in the dependent variable.
Causation and experimental Design
In case of experimental design apart from the above three conditions two other
requirements must be met:
All factors except the independent variable must be held constant and not
confounded with another variable that is not part of the study. This is called a
control group.
Each person in the study must have an equal chance for exposure to each level
of the independent variable. This is called random assignment of subjects to
groups
Causation and Ex Post Facto Design:

In case where research studies cannot be carried out experimentally by
manipulation of variable, the subjects which has been exposed to an experimental variable
and those which are not exposed are studied
4. The time dimension
On the basis of time involved in conducting the study two classifications is
possible; Cross sectional and longitudinal study. The cross - sectional studies are carried
out once and represent a snapshot of the happening in a study at a point of time.
Longitudinal studies are repeated over an extended period. In longitudinal studies the
changes over a period of time can be tracked.
In the case of longitudinal studies where a panel is used, the same members of
the panel can be used for the entire period of study. The longitudinal studies involve
more time and budget. Hence, attempt can be made to use adroit questions involving
past, present and future expectations in the cross sectional study itself. However, care
must be taken in interpreting the findings.
5. Scope
The topic of the research may involve a particular case study or it may be a
general study. The general study attempts to capture a populations characteristics based
on the inference drawn from sample characteristics. Hypotheses are formulated and
tested based on quantitative data. Generalizations of the findings are made based on
the findings of the sample study. These studies have breadth rather than the depth.
34
DBA 1657
Case studies place more importance on a holistic analysis of a fewer events or

conditions and their interrelations. The study relies to a greater extent on the qualitative
data. It provides more input and valuable insight to problem solving, evaluation and
strategy. The details regarding the problem in hand are collected from multiple sources
of information. Generalizations cannot be made from the case study as the findings are
specific to the particular problem in hand. However, a single, well designed case study
can provide a major challenge to a theory and also provide the source for framing new
hypotheses.
NOTES
6. The research environment

Research may be conducted in actual, manipulated or simulated conditions.
Research conducted in actual environmental conditions is called as field study. The
studies conducted under staged or manipulated conditions are termed as laboratory
studies. Simulated studies involve replication of the essence of a system or process.
Simulations are used to a wider extent in operations research where the major
characteristics of various conditions and relationships in actual situations are often
represented in mathematical models. Simulations can also take the form of role playing
and other behavioral activities.
7. Participants perceptions
The usefulness of a design may be reduced when people in a disguised study
perceive that the research is being conducted. Participants perceptions influence the
outcomes of the research. When participant perceive that something out of the ordinary
is happening, they may behave less naturally. In this context, three situations are likely
to arise:
Participants perceive no deviations from the routine.
Participants perceive deviations, but as unrelated to the researcher.
Participants perceive deviations as researcher-induced.
8. Type of investigation
Research study can take the form of casual or correlational investigation. A
casual study is conducted to establish a definitive cause and effect relationship. In this
case the objective of the research is to delineate one or more factors that are causing
the problem. The intention of the researcher conducting a casual study is to be able to
state that variable X causes variable Y. Thus the study in which the researcher wants to
delineate the cause of one or more problems is called a casual study.
35
DBA 1657
NOTES
If the researcher wants to identify or delineate the important factors associated

with the problem, then a correlational study is suitable. The type of questions asked and
the way in which problem is defined determines whether a study is casual or correlational.
9. Units of Analysis
The research question decides the unit of analysis. The research study may be
involved in collecting information from the individual units involved in the study in which
case it is an individual study. If the research involves studying the interaction between
two or more individuals then several two person groups also known as dyads will
become the unit of analysis. The group study involves studying a group for e.g.,
Comparison of the motivation level among the workers in the different departments is a
group study. Likewise the unit of study may involve the organization or nation.
10. Extent of crystallization of Research question
The research study can be classified as formal or exploratory. The classification
is made on the base of the degree of structure and the objective of the study.
(a)Formal study
Formal study begins where exploratory study ends. It begins with a hypothesis
or research question and involves precise procedures and data source specification.
The goal of formal research design is to test the hypothesis or answer the research
questions.
(b)Exploratory study
Exploratory study has loose structures with the objective of discovering further
research tasks. An exploratory study is undertaken when the existing knowledge base
on the problem selected is very limited or not available. In such cases preliminary research
work needs to be done to gain familiarity with the situation. It is undertaken with the
idea of comprehending the nature of the problem since very few studies have been
conducted in that area. The data are mostly collected through interviews or observation.
When data reveal some pattern regarding the problem at hand, theories are developed
and hypotheses are postulated for subsequent testing. The immediate purpose is to
develop hypotheses or questions for further research. Exploratory studies are also
conduced when some facts are known and more information is needed for developing
a viable theoretical framework.
The exploratory study is finished when the researcher has achieved the following
purpose:
36
DBA 1657
Establish the major dimensions of the research task
Defined a set of subsidiary investigative questions that can be used as guideline
NOTES
to a detailed research design
Developed hypotheses about possible causes of the problem
Learned the boundaries and scope of the proposed research study
Decided that additional efforts or further research is not feasible.
The objectives of exploration may be accomplished with different techniques.

Exploratory study to a greater extent depends on the qualitative techniques. However
quantitative techniques may also be used. Several techniques are available for conducting
exploratory investigations;
In-depth interviewing
Participant observation
Films, photographs and videotapes
Projective techniques and psychological testing
Case studies
Street ethnography
Document analysis
Proxemics and Kinesics
Combining the approaches listed above four techniques could be derived
Secondary data analysis
Experience survey
Focus groups
Two-stage design
i. Secondary data analysis

Data collected by others for the purpose of conducting their research is called
secondary data. If the present study, the researcher is conducting, can use the same
data then, time and money can be saved by means of not conducting the study again.
37
DBA 1657
NOTES
The researcher can explore the organizations archives for the data. Report of
prior research studies would reveal the successful and unsuccessful methods adopted
in the previous research studies. Browsing through the earlier research studies will also
reveal the less attempted problem areas which can be addressed in the present research.
The researcher can look into the published documents, in the form of books/
journals by outside organizations. They can be a rich source of hypotheses. The esources and the library will provide the needed information.
The search of secondary sources will provide the background information about
the research to be conducted and also will also provide a fair idea about the areas to be
pondered.
ii. Experience survey
Experience survey involves collecting information from the people experienced
or knowledgeable in the particular area of study. The data would be collected from
their memories and experiences .The ideas on important issues and the subject matter
can be explored. The investigative format is more flexible. The outcome of the interview
would be a new hypothesis, discarding an old one or information, for doing the study in
a better manner.
iii. Focus groups
A focus group is a panel of people who meet for about 90 minutes to 2 hours
and discuss about the subject matter led by a trained moderator. The facilitator uses
group dynamics to focus or guide the group in the exchange of ideas, feelings and
experiences on a specific topic. The focus group is made up of 6 to 10 respondents.
Too small or too large a group may not be effective in meeting the objective. The
outcome of the focus group will be a list of ideas and behaviourial observations with the
observations and recommendations made by the moderator. The qualitative data
produced from the focus group can be used for enriching the knowledge.
Depending on the topic, separate focus groups could be run for different subset
of the population. Homogeneity in the focus group will be more effective and produce
maximum results. The focus groups can be conducted in a face to face manner, through
telephones, internet (e-groups) and through video conferencing.
iv.Two stage design
In the exploratory stage, the researcher does not know much about the problem
in hand but needs to know more before proceeding further in terms of time and resources.
38
DBA 1657
A two stage design would be useful in this situation. With this approach exploration
NOTES
becomes a separate first stage with limited objectives: (1) clearly defining the research
question and (2) developing research design. A limited exploration at a lesser cost
carries little risk for the researcher and enables to uncover information that reduces the
total research cost.
SUMMARY:
This unit has examined some of the basic aspects of research. The importance
of knowledge of research in business setting was emphasized. The hallmarks of scientific
research viz., purposiveness, rigor, testability, replicability, precision and confidence,
objectivity, generalization and parsimony were described. The steps involved in
hypothetico-deductive research were discussed. The various steps involved in
undertaking the research were dealt in detail. The issues involved in development of
hypothesis and the parametric and non parametric tests were examined.
With the impetus to the background of research, the next unit deals with the
issues concerning the research design and its types.
Have you understood?
What is research? Explain the need for the same
Discuss the hallmarks of scientific research.
Explain the process of deduction and induction with an example
Discuss the building blocks of scientific research.
What are the steps in Hypothetico-dedcutive research?. Explain them using an

example
Describe the research process in detail.
Discuss the steps involved in problem identification
Why should literature survey be conducted?
Discuss the need for theoretical framework and highlight the features of the
same
What is hypothesis? Discuss the types.

39
DBA 1657
NOTES
Explain the steps involved in formulation and testing the hypothesis
Discuss the various methods of testing the hypothesis
Explain the meaning and significance of research design
What are the basic research design issues? Discuss them in detail.
Is single research design suitable for all research studies? If not why?
Discuss the exploratory research design in detail.
Discuss the different types of research design. Site a situation to which each
design is applicable to.
40
DBA 1657
NOTES
Unit -2
Experimental Design
Unit structure:
2.1 Introduction
2.2 Learning Objectives
2.3 The benefits and drawbacks
2.4 Activities involved in conducting an experiment
2.5 Validity in Experimentation
2.5.1 Factors affecting internal validity
2.5.2 Factors affecting external Validity
2.6 Experimental research designs
2.6.1. Pre-experimental designs
2.6.2 True experiments/ Lab experiments
2.6.3. Field experiments: Quasi or semi - experiments
2.7 Measurement
2.8 Measurement Scales
2.8.1 Selection of measurement scale
2.8.2 Methods of scaling
2.8.3 Construction of measurement scales
-------------------------------------------------------------------------------------
2.1 INTRODUCTION
Experimental design enables a researcher to alter systematically the variables
involved in the study. The experimental design involves intervention by the researcher.
The researcher intervenes by way of manipulating the variables in a setting and observes
the effect on the subjects studied. Under experimental design the independent variables
are manipulated and the effects of the same on the dependent variables are observed.
This units deals with the discussion on activities involved in conducting an
experiment, the factors affecting the validity in experimentation and the various types of
experimental designs. Measurement of variables is necessary for testing the hypotheses.
41
DBA 1657
NOTES
The nominal, ordinal, interval and ratio scales are dealt in detail. The process involved
in selection and construction of measurement scales are discussed in detail.

After reading this unit you should be able to:
Define experimental design and elucidate the benefits and drawbacks
Have insight on the internal and external validity of experimental designs
Distinguish the different types of experimental designs
Understand the concept of measurement, scales and the major sources of

measurement error
Understand the meaning of scaling and the six critical decisions involved in
selecting an appropriate measurement scale
Know the different forms of rating and ranking scales
Be conversant with the five ways of constructing the measurement scales
2.3 THE BENEFITS AND DRAWBACKS

The benefits of an experimental design are listed below:
The researcher can manipulate the independent variable and thereby understand
the effect on the dependent variable. This will lead to understand the existence
and potency of the manipulation.
The effect of the extraneous variables can be eliminated by means of effective

control which will lead to authentic findings.
The experimental design offers convenience and is economical compared to

other methods as the scheduling of data collection and control of variables are
decided by the researcher himself.
The experiment can be replicated with different subject groups and conditions
and thereby enables to understand the effect of independent variables across
people, situations, conditions and time.
The researcher in some situations can use field experiments to reduce the
subjects perception of the researcher as intervention or deviations in their
everyday lives.
The experimental design suffers from the following limitations:
The research is undertaken in artificial settings and hence the subjects may not
behave as they do under normal circumstances.
42
DBA 1657
Genaralization from non - probability sampling can lead to difficulty in extending

the findings.
Many times, cost overruns arise in the experimental designs
Experimental studies are conducted to solve present or future problems.

However, the past or occurred events cannot be dealt with, in the experimental
designs.
Management research is mostly concerned with people and hence manipulation

and control of human elements are subject to ethical considerations
NOTES
2.4 ACTIVITIES INVOLVED IN CONDUCTING AN EXPERIMENT

The following activities are involved in the experimental research:
2.4. 1. Selecting relevant variables

The researcher in the course of the conduct of the study develops hypotheses
to meet the objectives of the research. The hypothesis describes the relationship between
two or more variables. The researcher should select the variables that best represent
the concepts to be tested, determine the number of variables to be tested and select or
design appropriate measures for them. The number of variables selected in a research
study is subject to the budget allocated, the time frame available, number of subjects
being tested and the like.
2.4. 2. Specifying the levels of treatment

The treatment levels of the independent variable are the distinctions, the
researcher makes between different aspects of the treatment conditions. For eg., if
attitude is hypothesized to have influence on the purchase behaviour, the attitude may
be grouped into three levels viz., positive, negative and neutral.
2.4. 3. Controlling the experimental environment

The control may be exercised on the experimenter/researcher, subject or the
environment. The environmental control is concerned with holding the physical
environment in which the experiment is conducted as constant. The subjects may not
know that the experiment is being conducted. This situation is mentioned as blinding the
subject. When the experimenter/researcher is also unaware of the experiment, then it is
called as double - blinded. The control refers to avoiding the effect of extraneous variables
on the research study conducted.
2.4. 4. Choosing the experimental design

Choosing an apt experimental design improves the probability that the observed
change in the dependent variables are caused by the manipulation of the independent
variable only and not by another variable. It strengthens the generalization of the result.
43
DBA 1657
NOTES
2.4.5. Selecting and assigning the subjects

The question of selecting and assigning the subjects do not arise in case where
the entire population is considered for the study. However, mostly the researcher will
depend on sample to conduct the study. In order to validate and generalize the findings
of the research study, the samples selected should be representative of the population.
The sample may be selected on the basis of random selection or systematic sampling.
Random assignment of subjects to groups should be followed. When it is not possible
to randomly assign the groups then matching may be used. Matching is based on nonprobability quota sampling approach. The object of matching is to ensure that each
experimental and control subject matches on every characteristics used in research.
2.4.6. Pilot-testing, revising and testing

Pilot - testing reveals error in the design and improper control of extraneous or
environmental conditions. Pretesting the instruments enables refinement of the same
before the final test. It enables to revise scripts, take stock of control problems with
laboratory conditions and scan the environment for factors that might confound the
results.
2.4.7. Analyzing the Data

Pretesting and proper planning enable to have an order and structure in the
experimental data collected. The data are more conveniently arranged as a result of the
levels of treatment condition, pretest and post-test and the group structures. This enables
to apply statistical technique in a simplified manner.
2.5. VALIDITY IN EXPERIMENTATION

The findings of Experimental design are judged by measuring the internal and
external validity. The Validity is the extent to which a measure accomplishes its claims.
Two groups of validity exists; internal and external. Internal validity is concerned with
identifying whether the conclusions drawn from a demonstrated experimental relationship
truly imply cause. External validity is concerned with the generalization of observed
casual relationship across persons, settings and times.
2.5.1 Factors affecting internal validity:

Internal validity refers to the confidence that one can place in the cause-effect
relationship. In other words, it addresses the question, To what extent does the research
design permit us to say that the independent variable A causes the change in the
dependent variable B ?. Factors affecting internal validity causes confusion as to whether
the observational differences are due to experimental treatment or extraneous factors.
An experiment has high internal validity if the researcher has the confidence that the
44
DBA 1657
experimental treatment has been the source of change in the dependent variable. The
factors listed below affects the internal validity:
History
Maturation
Testing
Instrumentation
Selection
Statistical regression
Experimental mortality
Diffusion or imitation of treatment
Compensatory equalization
Compensatory rivalry
Resentful demoralization of the disadvantaged
Local history
NOTES
1. History
In the experimental designs a control measurement (O1) of dependent variable
is taken before introducing the manipulation (X). After the manipulation an after
measurement (O2) of the dependent variable is taken. Then the difference between O1
and O2 is attributed to the manipulation. However, some events may occur during the
course of the experimental study which will affect the relationship between the variables
under the study.
2. Maturation
The subjects considered for experimentation might change with the passage of
time and may not be due to the occurrence of any specific event. This happens particularly
when the study covers a long period of time.
3. Testing
The process of taking a test can affect the scores of further tests. The first test
would have created some awareness and learning experience which influences the results
of the subsequent tests.
45
DBA 1657
NOTES
4. Instrumentation
The threat to validity may arise due to the observer or the instrumentation.
Using different observers or interviewers affects the validity of the study. If the same
observer is used for a longer period of time, it may affect the validity due to observers
experience, boredom, fatigue and anticipation of results. Difference in the questions for
each measurement affects the validity
5. Selection
Differential selection of subjects for experimental and control groups affects
the validity. Validity considerations require the groups to be equivalent in every aspect.
The problem can be overcome by randomly assigning the subjects to experimental and
control groups. In addition, matching can be done. Matching is a control procedure to
ensure that experimental and control groups are equated on one or more variables
before the experiment. Matching the members of the groups on key factors also enhances
the equivalence of the groups.
6. Statistical Regression
This factor operates especially when members chosen for the experimental
group have extreme scores on the dependent variable. For eg., If a manager wants to
test if he can increase the salesmanship qualities of the sales personnel through training
program, he should not choose those with extremely low or extremely high abilities for
the experiment. This is because, those with very low score i.e., those with low current
sale abilities have a greater probability of showing improvement and scoring closer to
the mean test after being exposed to the treatment. This phenomenon of low scorers
tending to score closer to the mean is known as regressing towards the mean. Likewise,
those with very high abilities would also have a greater tendency to regress towards the
mean they will score lower on the post-test than on the pretest. Thus, those who are
at either end of the continuum with respect to the variable would not truly reflect the
cause-and-effect relationship. This phenomenon of statistical regression is a threat to
internal validity.
7. Experimental mortality
This factor arises due to the changes in the composition of study groups during
the test. There may be drop outs in the study group leading to the changes in the
membership of the group. This problem does not arise for the control group as they are
not affected by the testing situation and they are less likely to withdraw.
All the above threat factors can be controlled to a certain extent by random
assignment. However, the following factors affecting internal validity cannot be controlled
46
DBA 1657
by randomization. Both the control group and the experimental group are affected by
the first three factors.
NOTES
8. Diffusion or imitation of treatment

The interaction between the experimental and the control group may lead the
control group to learn about the experiments eliminating the difference between the
groups.
9. Compensatory equalization
If the experimentation treatment leads to a desirable and beneficial outcome,
then it may lead to an administrative reluctance to deprive the control members.
Compensatory action directed at the control groups may confound the experiment.
10. Compensatory rivalry
Compensatory rivalry arises when the member of the control group know that
they are in the control group. This will generate competitive pressures causing them to
try harder which will affect their normal behaviour
11. Resentful demoralization of the disadvantaged
When the treatment is desirable and the experiment is obtrusive, control group
members may become resentful of their deprivation and lower their cooperation and
output.
12. Local history
When all experimental persons are assigned to one group or session and all
control people to another, there is a chance for some idiosyncratic event to confound
results. This problem can be handled by administering treatments to individuals or small
groups that are randomly assigned to the experimental or control sessions.
2.5.2 Factors affecting External Validity

External validity is concerned with the interaction of the experimental treatment
with other factors and the resulting impact on the ability to generalize the findings across
times, settings or persons. External validity is high when the results of an experiment are
applicable to a larger population. The following are the threats to external validity:
1. The reactivity of testing on the experimental treatment
In the case of conducting pretest in an experimental design the subjects are
sensitized and they react to the experimental stimulus in a different manner. The beforemeasurement effect can be particularly significant in case of experiments where the
independent variable involved in the study is concerned with the change in the attitude.
47
DBA 1657
NOTES
2. Interaction of selection and the experimental treatment

This threat is concerned with the process by which the test subjects are selected
for the experiment. The population from which the subjects are selected may not be the
same as the population to the result are extended to. It limits the generalizations of the
findings.
3. Other reactive factors
The experimental settings may have a biasing effect on the subjects response
to the experimental treatment. An artificial setting will produce results that are not
representative of the larger population. If subjects know that they are participating in an
experiment, they may not behave in a normal way; this affects the validity of the
experimental treatment. In addition, yet another reactive effect is the possible interaction
between X and the subject characteristics.
2.6 EXPERIMENTAL RESEARCH DESIGNS

Many experimental designs are available and they widely vary on the basis of
their power to control contamination of the relationship between independent and
dependent variables. On the basis of the characteristics of control the experimental
design can be grouped under the following three heads:
1. Pre-experiments
2. True experiments/Lab Experiment
3. Field experiments
A set of commonly used symbols are:
X = The exposure of an independent variable to a group of test subjects for which
the effects are to be determined
O = The process of observation or measurement of the dependent variable (effect
outcome ) on the test subjects
R = The random assignment of test subjects to separate treatment groups
2.6.1. Pre-experimental designs

Pre- experimental design represents the crudest form of experimentation and is
undertaken only when nothing stronger is possible. The designs are characterized by
the absence of randomization of test subjects. The pre-experimental designs are weak
in their scientific measurement power because they fail to control adequately the various
threats to the internal validity. As a result they fail to meet the internal validity criteria.
Three pre-experimental designs are detailed below:
48
DBA 1657
NOTES
1. One-Shot Case Study

2. One-Group Pretest-Post-Test Design
3. Static Group Comparison
1. One-Shot Case Study:
A single group of test subjects is exposed to the independent variable treatment
X, and then a single measurement on the dependent variable is taken (01).One-Shot
case study does not use pretest and control group. As a result this design is inadequate
for establishing causality. For e.g., a study on the employee education campaign about
the automation of the office activities without a prior measurement of employee
knowledge. Result would reveal only how much the employees know after the education
campaign, but there is no way to judge the effectiveness of the campaign. This may be
represented as shown below:
X
Treatment or manipulation of
independent variable
O
Observation or measurement of
dependent variable
2. One-Group Pre-Test Post test design

First a pretreatment measure of the dependent variable is taken ( 01), then the
test subjects are exposed to the independent treatment X, and there after a post treatment
measure of the dependent variable is taken(( 02). This design meets the threats to internal
validity better than the One-Shot Case study. Continuing the example given above it
measures the awareness of the employees before the campaign and after the campaign.
This can be represented in the following manner:
O
Pre- test
X
Manipulation
O
Post-test
However, other aspects of internal validity like the history, maturation, testing
effect etc are not taken into account. Hence, it is still a weak design only.
3. Static Group Comparison
This design uses two groups; one receives the experimental stimulus and the
other serves as a control group and is not given the treatment. The dependent variable
is measured in both groups after the treatment. For e.g., in a field setting an experiment
is designed to study the effect of a natural disaster (experimental treatment) on the
psychological trauma ( measured outcome). A pretest before the natural disaster say
49
DBA 1657
NOTES
tsunami is possible but not on a large scale. Moreover, the timing of the pretest would
be problematic. The control group, receiving the post-test would consist of subject
whose property is safe. The effect can be represented in the following manner:
X
O1
-
O2
The addition of a comparison group increases the validity over the previous two designs.
However, there is no way to be certain that the two groups are equivalent.
2.6.2 True experiments/ Lab experiments

The lab experiments are conducted in order to ascertain the cause and effect
relationship between independent and dependent variables. In order to understand the
cause-effect relationship all other variables which might contaminate the relationship
should be controlled so that the actual cause effects of the investigated independent
variable on the dependent variable can be determined. Thus, the experiments performed
in an artificial or contrived environment is known as lab experiments. In the lab
experiments the researcher has complete control over all aspects. The researcher has a
control over the experiment, who, what, when, where and how. The researcher can
assign subjects to conditions randomly. Random assignment is an unbiased assignment
process that gives each subject an equal and independent chance of being placed in
every condition. Random assignment is preferable because it allows one to conclude
that any other variable could be confounded with the independent variable only by
chance. Control over what, where, when and how of the experiment means that the
experimenter has complete control over the way the experiment is to be conducted.
Terms used:
The following are the meaning of the terms normally used in experimental design:
Factors: The independent variables of an experiment are often called the factors
of the experiment. Active factors are those the experimenter can manipulate by
causing a subject to receive one level or another. Blocking factor is one where
the experimenter can only identify and classify the subject on an existing level.
Level: A level is a particular value of an independent variable.
Condition: The term condition is used to discuss the independent variables. It

refers to a particular way in which subjects are treated.
Treatment: this is another word used for condition. It also refers to the statistical
test of the effect of various conditions of the experiment.
50
DBA 1657
Test unit: the experimental subjects are referred as test unit. The test unit may
be people, organizations, machine type, materials and other entities.
NOTES
The basic principles of experimental research design are:

1. The existence of a control group or a control condition and
2. The random allocation of the subjects to groups
The internal validity is higher in case of lab experiments. Internal validity as already
explained increases the degree of confidence in the casual effects. In lab experiments
the cause-and effect relationship are substantiated and hence the internal validity is
higher. However the external validity i.e., the extent of generalizations of study is lesser
as the research is executed in a contrived environment and the real world situation may
not be same.
Some of the true experimental designs are discussed below:
1. Pre-test and post-test Control Group Design
Test subjects are assigned to the groups by a random procedure ( R ) to either
the experimental or control group. Each group receives a pretreatment measure of the
dependent variable. Then the independent treatment is exposed to the experimental
group, after which both groups receive a post treatment measure of the dependent
variable. The concept is represented in the following manner:
R 0 1 X 02
R 03
04
The effect can be measured by

E = (02 - 01 ) (04 - 03 )
The internal validity problems discussed earlier are addressed to a greater extent
in this design. Local history may occur in one group and not in the other. Maturation,
testing and regression are handled well as the same would be equally felt in experimental
and control groups. Selection is equally dealt by random assignment. Mortality can be
a problem if there are different drop out rates in the study groups. The external validity
of the design is however questionable.
2. Post- Test Only Control Group Design
Test subjects are randomly assigned to either the experimental or control group.
The experimental group is then exposed to the independent treatment, after which both
groups receive a post treatment measure of the dependent variable. In this design, the
pretest measurements are omitted. The design is
R X O1
R
O2
51
DBA 1657
NOTES
The experimental effect is measured by the difference between O1 and O2. The
design is more simple and attractive. Internal validity threats from the history, maturation,
selection and statistical regression are adequately controlled by random assignment.
Since the subjects are measured only once, the threats of testing and implementation
are handled. The different mortality rates between experimental and control groups
continue to be a problem. The design reduces the external validity problem of testing
interaction effect.
3. Extensions of True Experimental Design
The researcher normally uses an operational extension to the basic design.
These extensions differ from classical design in terms of
The number of different experimental stimuli that are considered simultaneously

by the experimenter and
The extent to which assignment procedures are used to increase precision.
a. Completely Randomized Design

It involves two principles viz., the principle of replication and the principle of
randomization. The essential characteristic feature of the design is that the subjects are
randomly assigned to experimental treatments. This design is generally used to when
experimental areas happen to be homogeneous. In case of this design all the variations
due to uncontrolled extraneous factors are included under the heading of chance variation.
Two forms of such design are:
1. Two-group simple randomized design: In this design first all the population
is defined and then from the population a sample is selected randomly. The
selected sample is then randomly assigned to experimental and control groups.
Thus, the design yields two groups as representatives of the population viz the
experimental and control group. The two groups are given different treatments
of the independent variable. This design of experiment is quite common in research
studies concerning behavioural sciences.
2. Random replication design: The effect of the extraneous variables is not
taken into consideration in the case of two-group simple randomized design. In
the case of random replication design, the effect of the extraneous variables is
minimized by providing a number of repetitions for each treatment. The repetition
is called replication. Random replication design provides control for the
differential effects of the extraneous independent variables and also it randomizes
any individual differences among those conducting the treatment
52
DBA 1657
NOTES
b. Randomized block design

In the randomized block design the subjects are first divided into groups known
as blocks such that within each group the subjects are relatively homogenous in respect
to some selected variable. The number of subjects in a given block would be equal to
the number of treatments and one subject in each block would be randomly assigned to
each treatment. This design is used when there is a single major extraneous variable.
Random assignment is the basic way to produce equivalence among treatment group.
Blocking is done to learn whether treatments bring different results among various groups
of subjects. In this design two effects could be studied
1. The main effect is the average direct influence that a particular treatment has
independent of other factors.
2. The interaction effect is the influence of one factor on the effect of another.
The precision of this experimental design depends on how successfully the
design minimizes the variance within blocks and maximizes the variance between blocks.
If the response patterns are about the same in each block, there is little value to the
more complex design and blocking in case may be counter productive. The randomized
block design is analysed by the two-way ANOVA technique.
c. Latin Square Design
The randomized block design is used to minimize the effects of one extraneous
variable whereas the Latin square design is used when two blocking factors are to be
controlled. Each treatment occur an equal number of times in any one ordinal position
in each row. Treatments are randomly assigned so that each treatment appears only
once for each factor. Due to this aspect, the Latin square should have the same number
of rows, columns and treatments. For example, in order to find the effect of offering
discount at 5%, 10%, 15%, two blocking factors can be identified viz., customer income
and size of the store. Three levels can be identified on the basis of the blocking factor
Customer income viz., high, medium and low. On the basis of store size, three levels
can be identified viz., large, medium and small. On the basis of both the blocking factors
customer income and size of the store, nine groups can be identified. For each of the
group one treatment will be given and in the row or column it will not be repeated. This
is illustrated below:
Customer Income
Store size
Large
Medium
Small
High
Medium
X1
X2
X2
X3
X2
X1
53
Low
X2
X1
X3
DBA 1657
NOTES
Treatments are assigned based on random number tables. From the above, the effects
of price reduction can be ascertained. The major limitation of Latin square is that it is
assumed that there are no interaction between treatments and the blocking factors.
d. Factorial design
In case of factorial design a researcher can deal with more than one factor
simultaneously. This design is especially important in several economic and social
phenomena where usually, large number of factors affect a particular problem. Factorial
designs can be of two types:
(1) Simple factorial designs
(2) Complex factorial designs
(1). Simple Factorial designs: When the effect of varying two factors on the dependent
variables is dealt, the design is called simple factorial design. This design is also known
as two-factor-factorial design. A simple factorial design may be either a 2 X 2 , 3 X 4
or 5 X 3 or the like type of design. A 2 X 2 simple factorial design can be depicted as
below:
Experimental variable
Control variables
Treatment A
Treatment B
Level I
Cell 1
Cell 3
Level II
Cell 2
Cell 4
In this design extraneous variable to be controlled by homogeneity is called as

the control variable and the independent variable which is manipulated is called
experimental variable.
(2). Complex factorial design: A design which considers three or more independent
variables simultaneously is called a complex factorial design. This is also known as
multi-factor factorial design.
2.6.3. Field Experiments: Quasi or Semi Experiments

Field experiment is done in natural environment, but treatments are given to
one or more groups. It is not possible to control all extraneous variables in field conditions
as the stimulus condition occurs in a natural environment. This situation warranties a
field experiment. In the quasi experiment it is not possible to know when or to whom to
expose the experimental treatment, however, when and whom to measure can be
determined. A quasi experiment is inferior to a true experimental design but is usually
superior to pre-experimental designs.
54
DBA 1657
The results are more generalizable and so the external validity is more in case of
field experiments. However, the internal validity is lesser as the extent, to which variable
X alone causes variable Y, cannot be ascertained
NOTES
A few quasi experiments are listed below:

1. Nonequivalent control group design
This is a strong and widely used quasi experimental design. The test and control
groups here are not assigned randomly. The design is diagrammed as below:
O1
x
O3
O2
O4
Two designs are possible viz., intact equivalent design and the self-selected
experimental group design
In intact equivalent design members of the experimental and control groups are
naturally assembled. This design is useful when any type of individual selection process
would be reactive.
In the self-selected experimental group design the volunteers are recruited to
form the experimental group, while the non volunteer subjects are used for control
Comparison of the pretest result (O1 O2) is one indicator of the degree of
equivalence between test and control groups. If the pretest results are significantly
different, there is a real question about comparability. On the other hand, if pretest
observations are similar between groups, there is more reason to believe internal validity
of the experiment is good.
2. Sepearate Pretest and post - test design
This design is most applicable in the situations when the researcher cannot
know when and to whom to introduce the treatment but can decide when and whom to
measure. The basic design is
R O1
(X)
X O2
The bracketed treatment ( X ) suggests that the researcher cannot control the treatment.
This is not a strong design because of several threats to internal validity are not handled
adequately. History can confound the results but it can be overcome by repeated
experiments. This design is considered to be superior to true experiments in external
55
DBA 1657
NOTES
validity. The strength is that samples are drawn from general population which extends
the generalization of the study
3. Group Time Series design
This design introduces repeated observations before and after the treatment
and allows subjects to act on their own control. The single treatment group design has
before and after measurements as the only controls. There is also a multiple design with
two or more comparison groups as well as the repeated measurements in each treatment
group. This format is useful where regularly kept records are a natural part of the
environment and are unlikely to be reactive. It is also a good way to study unplanned
events in an ex post facto manner.
2.7 MEASUREMENT
In normal parlance, measurement refers to an attempt to fix quantitatively the
form or other features of a physical object. In research, measurement refers to assigning
numbers to empirical events in compliance with a set of rules. This definition brings out
the three steps involved in the process of measurement:
1. Selecting the observable empirical events.
2. Developing a set of mapping rules i.e. a scheme for assigning numbers or symbols
to represent aspects of the event being measured.
3. Applying the mapping rules to each observation of that event.
The goal of measurement is to provide the highest quality, lowest error data for
the purpose of testing the hypotheses identified and other related analysis and
interpretations. Variables dealt in research studies can be classified as objects and
properties. Objects include the things of ordinary experience, such as the laptop, chair
and car. It also includes things which are not concrete such as attitude, peer group
pressures, perception etc. Properties are the characteristics of the objects. It includes
level of motivation, leadership skills etc. Strictly speaking researchers are not involved
in measuring objects or properties but rather they measure the indicants of the properties
or indicants of the properties of the objects.
2.7.1 Mapping rules

Measurement involves developing mapping rules and applying the same to
record the happenings. The assumptions regarding the mapping rules also affects the
choice of data. The mapping rules have the following four characteristics:
1. Classification: numbers are used to group or sort responses. Order does not
exist
56
DBA 1657
2. Order: Numbers are arranged in some order in such a way that one number is
greater than /smaller than / equal to another number.
NOTES
3. Distance: differences between numbers are ordered. The difference between

any pair of numbers is greater than, less than or equal to the difference between
any other pair of numbers
4. Origin: the number series has an unique origin indicated by the number zero
Combination of these four characteristics gives raise to data types.
2.7.2 The scales

Based on the characteristics of the mapping rules i.e., classification, order,
distance and origin, four classifications of measurement scales could be arrived at:
nominal, ordinal, interval and ratio scale. A detailed discussion follows:
1. Nominal scale
A nominal scale allows the researcher to assign subjects to certain categories
or groups. For e.g., the respondents can be grouped as male and female. The two
groups can be assigned numbers for the purpose of coding and further analysis as 1 and
2. These numbers are simple and convenient labels and have no intrinsic values. It only
assigns subjects into either of the two mutually exclusive categories. In other words,
nominal scale allows the researcher to collect information on a variable that naturally or
by design can be grouped into two or more categories that are mutually exclusive and
are collectively exhaustive.
The nominal scale provides only the basic, categorical, gross information.
Counting of members in each group and calculation of frequency or percentage is
possible when nominal scale is employed. The researcher is restricted to the use of
mode as the measure of central tendency. One can conclude which category has more
members. Chi-square test can be used to measure the statistical significance and for
measures of association, phi, lambda or other measures may also be appropriate.
Nominal scales are weak but they are still useful to classify the data. It is valuable
in exploratory work where the objective is to uncover relationships rather that to secure
precise measurements. Nominal data type is also widely used in survey and ex post
facto research when data is classified by major subgroups of the population.
2. Ordinal scale
Ordinal scale indicates the order. It includes the characteristics of nominal scale
also. Thus an ordinal scale not only categorizes the variables but also rank-orders
categories in some meaningful way. The use of ordinal scale implies a statement of
57
DBA 1657
NOTES
greater than or less than or equal to without stating how much greater or less.
Other descriptors may also be used viz., superior to, happier than, poorer than,
above. It is also possible to rank more than one property at a time. For e.g., researcher
can ask the respondent to rank various air lines on the basis of certain properties.
In ordinal scaling the differences in the ranking of objects, persons or events
investigated are clearly known. However, the ordinal data does not give any indication
of the magnitude of the differences among the ranks
3. Interval scale
Interval data has the power of the nominal and ordinal data and in addition it
incorporates the concept of equality of interval. The interval scale allows to measure
the distance between any two points on the scale. It not only enables to group the
individuals according to certain categories and taps the order of the groups; it also
measures the magnitude of differences in the preferences among the individuals. The
interval scale is more powerful than the nominal and the ordinal scales. The measure of
central tendency the arithmetic mean, is applicable. Its measures of dispersion are the
range, the standard deviation and the variance.
4. Ratio scale
Ratio data has the power of the nominal, ordinal and interval scale in addition it
also has the provision for absolute zero or origin. It covers the disadvantage of the
arbitrary origin point of the interval scale, i.e., it has an absoultue zero point. The ratio
scale not only measures the magnitude of the differences between points on scale but
also the proportion in the differences. Multiplication or division would preserve the
ratios. It is the most powerful of the four scales because it has a unique zero origin and
subsumes all the properties of the other three scales.
The measure of central tendency of the ratio scale could be either the arithmetic
or the geometric mean and the measure of dispersion could be either the standard
deviation or variance or the coefficient of variation. Some examples of ratio scales are
those pertaining to actual age, income and work experience in organizations.
2.7.3 Sources of measurement differences

Normally, any variation of scores among the respondents would reflect true
differences in their opinions about the object/issue. However, four major sources may
contaminate the results: the respondent, the situation, the measurer and the data collection
instrument
58
DBA 1657
NOTES
.1. The Respondent

Opinion differences that affect measurement come from relatively stable
characteristics of the respondent. The skilled researcher will anticipate many of these
dimensions, adjusting the design to eliminate, neutralize or otherwise deal with them.
However, the respondents may still suffer from temporary factors like fatigue, boredom,
anxiety or other distractions which may limit their ability to respond accurately and fully.
Likewise variations in mood due to hunger, impatience etc may also have an impact.
2. The Situational factors
Situational factors include any condition that places a strain on the interview or
measurement session which may have a serious effect on the interviewer-respondent
rapport. If another person is present during the interview or if the respondents believe
that anonymity is not ensured then they may be reluctant to express their true feelings.
3. The Measurer
The interviewer can distort responses by rewording, paraphrasing or reordering
questions. The tone used, the body language, smiles, nods and so forth may encourage
or discourage replies. Likewise in data analysis stage, incorrect coding, careless tabulation
and faulty statistical calculation may introduce errors.
4. The Instrument
A defective instrument can cause distortion in two major ways. First, the use of
complex words and syntax beyond respondents comprehension may cause confusion
and ambiguity. Leading questions, ambiguous meanings, mechanical defects and multiple
questions suggest the range of problems.
2.7.4 The characteristics of sound measurement

The instrument developed to measure the concept should be an accurate
indicator of the aspects that are being measured. The scale developed cannot be imperfect
and prone to errors. The use of better instruments will ensure accuracy in results and
will enhance the scientific quality of research. It should also be easy and efficient to
use. The goodness of the measures developed should be assessed on the basis of
three major criteria viz., validity, reliability and practicality.
I. Validity
Validity refers to the extent to which a test measures what we actually wish to
measure. It refers to the extent to which differences found with a measuring tool reflect
true differences among respondents being tested. Validity can be classified into three
major forms viz., content validity, criterion-related validity and construct validity.
59
DBA 1657
NOTES
1.Content Validity
The content validity refers to the extent to which a measuring instrument provides
adequate coverage of the investigative questions guiding the study. Content validity is
good if the instrument contains a representative sample of the universe of subject matter
of interest. Determination of content validity is judgmental and can be approached in
several ways. Generally the content validity is treated to be higher , if the scale items
used represents to a greater extent the domain or universe of the concept being measured.
The researcher may determine the content validity through a careful definition of the
topic of concern, the item to be scaled and the scales to be used. Another way is to use
a panel of persons to judge whether the instrument meets the standards.
Face validity is considered as a basic and very minimum index of content validity.
It indicates that on the face of it the, items look as if they measure the intended concept.
2.Criterion-Related Validity
Criterion related validity reflects the success of measures used for prediction or
estimation. Predictive validity refers to the extent to which an outcome could be predicted
and concurrent validity refers to the extent to which estimate of current behaviour or
condition could be made. The researcher must ensure that the validity criterion used is
itself valid. This can be judged in terms of four qualities viz., relevance, freedom from
bias, reliability and availability.
3.Construct validity
This is the most complex and abstract feature. Construct validity testifies that
the results obtained from the use of measure fits the theories around which the test is
designed. In other words a measure has construct validity to the degree that it conforms
to predicted correlations of other theoretical propositions. The researcher may wish to
measure or infer the presence of abstract characteristics for which no empirical validation
seems possible. Attitude, aptitude and personality scales generally fall in this category.
Although it is difficult, assurance is still needed that the measurement has an acceptable
degree of validity. This is assessed through convergent and discriminant validity.
Convergent validity is established when the score obtained with two different instruments
measuring the same concept are highly correlated. Discriminant validity is established
when, based on theory two variables are predicted to be uncorrelated and it is also
empirically proved. The validity can be proved through the use of correlational analysis,
factor analysis etc.
II. Reliability
Reliability refers to consistency i.e. a measure is reliable to the degree that it
supplies consistent results. Reliability is concerned with estimates of the degree to which
60
DBA 1657
measurement is free from random or unstable error. Reliable instruments can be used
with confidence that transient and situational factors are not interfering. Reliable
instruments are robust and they work well at different times under different conditions.
The reliability of an instrument is measured on the basis of the stability, equivalence and
internal consistency.
NOTES
1.Stability
Stability is securing consistent results with repeated measurements of the same
person with the same instrument. An observation is said to be stable if it gives the same
reading on a particular person when repeated one or more times. Stability measurement
in survey situations is more difficult then in observational studies. Observation can be
done repeatedly but the resurvey can be conducted only once. Two tests of stability are
test-retest reliability and parallel-form reliability.
(a)Test-Retest Reliability
The conduct of resurvey is called test-retest arrangement which involves comparisons
between the two tests to learn about the reliability. The reliability coefficient obtained
with a repetition of the same measure on a second occasion is called test-retest reliability.
When a questionnaire containing some items that are supposed to measure a concept is
administered to a set of respondents and if the same questionnaire is administered after
some time again, then the correlation between the scores obtained at two different
times from the same set of respondents is called test-retest co-efficient. Higher the
coefficient, better the reliability and stability.
The following difficulties can occur in test-retest methodology;
Time delay in measurement may give rise to changes in situational factors
Insufficient time between measurements will enable the respondent to remember

previous answer and repeat them resulting in bias.
Respondents discernment of a disguised purpose may introduce bias if the

respondents hold opinion related to the purpose but not assessed with current
measurement questions.
Topic sensitivity occurs when the respondents seek to learn more about the
topic or form new and different opinions before the retest.
Introduction of extraneous moderating variables between measurements may

result in a change in the respondents opinion from factors unrelated to research
(b)Parallel-Form reliability
Parallel form reliability occurs when two comparable sets of measures tapping
the same construct are highly correlated. The forms have similar items and the same
61
DBA 1657
NOTES
response format. The wording and the order or sequence of the question is only changed.
It is done in order to establish the error variability arising due to change in the wordings
or ordering of questions. High correlation between two forms ensures that the measures
are reasonably reliable, with minimal error variance caused by wording, ordering or
other factors.
2.Equivalence
Equivalence is concerned with how much error may be introduced by different
investigators or different sample of items being studied. Equivalence is concerned with
variations at one point of time among observers and samples of items. One way to test
for the equivalence of measurements by different observers is to compare their scoring
of the same event. One test for item sample equivalence is by using alternative or parallel
forms of the same test administered to the same person simultaneously. The results of
the two tests are then correlated.
The major interest with equivalence is typically not how respondents differ
from item to item but how well a given set of items will categorize individuals. There
may be differences in responses between two samples of items, but if a person is
classified the same way by each test, then the test has good equivalence.
3. Internal consistency
The internal consistency indicates the homogeneity of the items in the course of
measuring a construct. The items should hang together as a set and should also be
capable of independently measuring the same concept so that the respondents attach
the same overall meaning to each of the items. This can be ensured by examining if the
items and the subsets of items in the measuring instrument are correlated highly. The
internal consistency among the items can be measured by using the split-half reliability
and the inter-item consistency reliability test.
(a)Split-Half Reliability
Split-half reliability reflects the correlations between two halves of an instrument.
This technique can be used when the measuring tool has many similar questions or
statements. The instrument is administered and the results are separated by items into
even and odd numbers or randomly selected halves. When correlation is performed, if
the results of correlation are high the instrument is said to have high reliability as regardsthe internal consistency.
(b)Inter-item consistency reliability
It is a test of the consistency of respondents answers to all the items in a
measure. If the items are independent measures of the same concept, they will be
correlated with one another. The most popular test of inter-item consistency reliability
62
DBA 1657
is the Cronbachs coefficient alpha. This test is used for multipoint-scaled items. For
dichotomous items the Kuder-Richardson formula is are used. Higher coefficients will
result in a better measuring instrument.
NOTES
III Practicality
The operational requirements of the project require it to be practical. Practicality
is defined as economy, convenience and interpretability. Economy is concerned with
minimizing the cost concerned with conducting the research project. The method of
data collection, length of the instrument etc will have an implication on the research
budget. Convenience refers to ease in administering the questionnaire. This can be
achieved by giving clear and complete instructions and by paying proper attention to
design and layout. The interpretability issue arises in case when the persons other than
the test designers must interpret the results. To enable interpretation, the designer of the
data collection instrument should provide enough information regarding the scoring keys,
norms, guidelines for test use etc.
2.8 MEASUREMENT SCALES

Scaling is a procedure for the assignment of numbers (or other symbols) to a
property of objects in order to impart some of the characteristics of numbers to the
properties in question. The numbers are assigned to indicants of the properties of objects.
In case of measuring the attitude of respondents towards a new product introduced in
the market numbers may be assigned. 1 may be assigned to positive attitude, 2 to
neutral and 3 to negative attitude.
Measurement can be performed using standardized scales or through custom
designed scales. Standardized scales may be opted in case of measuring concrete objects.
Developing customized scale is needed in the case where a researcher wants to measure
more abstract and complex constructs like the customer attitudes towards a new product
introduced in the market. In this case standardized scales may not exist. This situation
warrants the development of customized scales.
2.8.1 Selection of measurement scale

Selection or construction of a measurement scale requires decision in the following
six key areas:
1. Study objective: Researchers may have two general study objective viz., to
measure the characteristics of the respondents and to use respondents as judges
of the objects or indicants presented to them.
2. Response form: Three types of measuring scales viz., rating, ranking and
categorization can be used. Rating scale is used when respondents score an
object or indicant without making a direct comparison with another object or
63
DBA 1657
NOTES
attitude. Ranking scales enable to make comparison among two or more indicants
or objects. Categorization enables to put the subjects involved in groups or
categories
3. Degree of preference: Measurement scales may involve preference measurement
or non preference evaluation. In case of preference measurement respondents
are asked to choose the object preferred. In case of non preference evaluation
the respondents are asked to make judgments without any personal preference
towards objects or solutions.
4. Data properties: The data properties should also be viewed in case of decision
regarding measurement scales. The data can be classified as nominal, ordinal,
interval and ratio. The statistical application depends on the assumptions
underlying each data type.
5. Number of Dimensions: Measurement scales can be unidemensional or
multidimensional. In case of unidimensional scale only one attribute of the
respondent is measured. Multidimensional scaling recognizes objects as
consisting of n dimensions.
6. Scale construction: Five construction approaches are available viz., arbitrary,
consensus, item analysis, cumulative and factoring. The researcher should take
into consideration of both the type of measurement and the scales construction
when selecting an appropriate scale.
2.8.2 Methods of scaling

The methods of assigning numbers or symbols to the attitudinal responses of
the respondents towards objects, events or persons is an important aspect of the research.
There are two main categories of attitudinal scales - the rating scales and the ranking
scale. Rating scales have several response categories and are used to elicit responses
with regard to the object, event or person studies. Ranking scales are used to make
comparison between or among objects, events, persons and elicit the preferred choices
and ranking among them.
1. Rating scales
Rating scales are used to judge properties of objects without reference to other
similar objects. In rating scales, an object is judged in absolute terms against certain
specified criteria. The scale can be used to elicit responses with regard to an object,
event or person studied.
The number of scale points may range from three to five or ten. Researchers
believe that more points on a rating scale provide an opportunity for an accurate
measurement of variance.
64
DBA 1657
Some of the rating scales used often by researchers are explained below;
The dichotomous scale offers two mutually exclusive response choices. It

may be used to elicit a Yes or No answer, agree and disagree etc., This
is useful to elicit responses for demographic questions or where dichotomous
response is adequate. e.g., Do you have a credit card? Yes
No
The category scale uses multiple items to elicit a single response. The multiple
choice, single-response scale is appropriate when there are multiple options
but only one answer is sought.
o Eg., Age
NOTES
- less than 20 years
- 21 to 40 years
- 41 to 50 years
- Above 50 years
The check list or a multiple choice, multiple- response scale allows the
respondent to select one or several alternatives. E.g., in eliciting the response
regarding the source through which the information about a new product is
obtained, a respondent may select all or more than one of the choices given
below:
Source of information - Advertisement
- Sales person
- Sales materials
- Showrooms
- Friends/ relatives/ Neighbours
- Other sources
The Likert scale is designed to examine how strongly the respondents agree
or disagree with statements relating to the attitude or object on a 5-point scale.
The scores on the individual items are summed to produce a total score for the
respondent and hence it is also called summated scales. A Likert scale usually
contains two parts, the item part and the evaluative part. The item part usually
contains statement about a product, event or attitude. The evaluative part is a
list of response categories ranging from strongly agree to strongly disagree.
The item and evaluative part are shown below:
65
DBA 1657

Strongly
Disagree
1
NOTES
Disagree
2
Neutral
Agree
Strongly
Agree
5
I am satisfied with
the working environment
I am happy with the
work assigned
The responses over a number of items or statements tapping a particular concept
or variable are summated for every respondent. It is assumed that all the statements
measure some aspect of a single common factor. This is an interval scale and the
differences in the responses between any two points on the scale remain the same.
The Semantic Differential Scales are used widely to describe the set of
beliefs a person holds. Several bipolar attributes are identified at the extremes
of the scale and respondents are asked to indicate their attitudes on semantic
space toward a particular individual, object or event on each of the attributes.
The semantic space may consist of five or seven-point rating scales bounded at
each end by polar adjectives or phrases. There may be as many as 15 to 25
semantic differential scales for each attitude or object. The procedure is also
insightful for comparing the images of competing brands, stores or services.
The semantic differential also may be analyzed as a summated rating scale.
Each of the scale is assigned a value from -3 to 3 or 1 to 7 and the scores
across all adjective pairs are summed for each respondent. Individuals can be
compared on the basis of the total scores. An example of semantic differential
scale is given below;
Responsive
_ Unresponsive
Beautiful
_ Ugly
Courageous
_ Timid
The semantic differential has several advantages. It produces interval

data. It is an efficient and easy way to elicit responses from a large sample. The
attitudes can be measured both in terms of direction and intensity. The total set
of responses provides a comprehensive picture of the meaning of an object. It
is a standardized technique which can be easily repeated and at the same time
escapes many problems of response distortion.
The Numerical scale is similar to the semantic differential scale with the
difference that numbers on a 5 point or 7 point scale are provided with bipolar
adjectives at both ends. This is also an interval scale. The scale provides both
an absolute measure of importance and a relative measure of the various items
66
DBA 1657
rated. The scales linearity, simplicity and production of ordinal or interval data
makes it very popular. An example :
NOTES
Extremely Pleased 7 6 5 4 3 2 1 Extremely displeased
The itemized rating scale is a 5 point or 7 point scale with anchors provided
for each item and the respondent states the appropriate number on the side of
each item or circles the relevant number against each item. The responses to
the items are then summated. This uses an interval scale . Example is shown
below;
Indicate your response number on the line for each item.
1
Very Unlikely
Unlikely
Neither Unlikely
nor likely
Likely
Very Likely
1. I like to take more responsibility
2. If additional responsibility is not provided I will be dissatisfied
3. I am interested in a job which provides me more salary
The itemized rating scale provides the flexibility to use as many points in the scale as
considered necessary ranging from 4,5,7,9, etc., It is also possible to use different
anchors. When a neutral point is provided, it is a balanced rating scale. When a neutral
point is missing it is an unbalanced rating scale. The itemized rating scale is frequently
used in business research as it adapts to the number of points desired to be used , as
well as the nomenclature of the anchors can be accommodated to suit the needs of the
researcher.
In Fixed or constant sum scale the respondents are asked to distribute a given
number of points across various items. It enables the researcher to discover the
proportions and is more in the nature of ordinal scale. A minimum of two categories
and a maximum of ten can be presented to the respondents. Presenting too many
stimuli will be a hindrance to the precision and the patience of the respondents. A
respondents ability to add is also taxed if too much of stimulus is provided. For
example in selecting a particular brand of computer a respondent may be asked to
rate the following aspects;
67
DBA 1657
NOTES
Hardware configuration
-----
Freebies given
-----
Brand image
---------------100
------------
Total points
Staple Scales are simplified version of semantic differential scales. It is used

when it becomes difficult to find bipolar adjectives that match the investigative
questions. It uses only one pole rather than two. Respondents are asked to indicate
the object by selecting a numerical response category. The higher the positive
score the better the adjective describes the object. Similarly , the less accurate
the description, the larger the negative number chosen. Ratings may range from +
3 to -3, or + 5 to -5 , very accurate to very inaccurate. It produces interval data.
It is easy to administer and construct as there is no need to provide adjectives to
assure bipolarity. For eg, the respondents may be asked to rate their job using
staple scales as follows:
+3
+3
+3
+2
+2
+2
+1
+1
+1
Challenging
suits my skill
satisfactory
-3
-3
-3
-2
-2
-2
-1
-1
-1
The graphic rating scale is simple and commonly used in practice. In this
scale various points are marked along the line to form a continuum. The respondent
indicates his rating by simply making a mark at the appropriate point on a line that runs
from one extreme to the other. A brief description on the scale points are given to act as
a guide in locating the rating. The faces scales depicting faces ranging from smiling to
sad can be used on a rating scale to obtain responses regarding peoples feelings with
respect to some aspect. A major limitation of this scale is that the respondent may select
almost any position on the line which will pose difficulty in analysis.
The Consensus scale as the name suggests is developed by consensus by a

panel of judges. The judges select certain items which enable to measure a concept.
The items are selected based on pertinence or relevance to the concept. The items are
68
DBA 1657
also tested for validity and reliability. For eg. Thurstone Equal Appearing Interval Scale
is a consensus scale. A panel of judges selects the statements which describe the concept
under study. The scale is developed based on the consensus. Developing this scale
involves time and as such is rarely used in the organizational concept.
NOTES
Errors in Rating scales

The respondents rating should be evaluated taking into consideration the following
three types of errors; leniency, central tendency and halo effect.
(1) The error of leniency occurs when the respondent is either a easy rater or
hard rater. Respondents are inclined to give higher score to people they know
well. The opposite is also possible where a lower score may be given.
(2) The central tendency refers to the respondents reluctance to give extreme
judgments which will lead to the error of central tendency. This happens because
the respondent may not know the object or property being rated.
(3) The halo effect happens because of carrying over a generalized impression of
the subject from one rating to another. Halo is a pervasive error. It is difficult to
avoid when the property being studied is not clearly defined, not easily observed,
not frequently discussed, involves reactions with others or is a trait of high
moral importance.
2. Ranking scales
The ranking scales are used to tap the preferences of respondents among two
or more objects and make choices among them. The respondents are usually asked to
select the best or most preferred. This approach is satisfactory when there are only
two choices involved. When more than two choices are present, it results in ties. For
e.g., in a response if 40 % choose product A , 35% choose Product B and 25%
choose product C, then it cannot be concluded that A is the most preferred product
because 60% did not prefer A. Three techniques can be used to avoid this ambiguity
viz., Paired Comparisons, forced choice and the comparative scale. A brief discussion
follows;
Paired comparison scale
The paired comparison scale is used when the respondents are expected to
express attitudes or choice between two objects at a time. It helps to assess the
preferences. In the previous example, the preference for Product A over B and C will
enable to make better decision. However, as the number of objects to be compared
increases the number of paired comparisons also increases. The paired choices for n
objects will be [(n)(n-1)/2]. When three products are presented to the respondents the
69
DBA 1657
NOTES
number of paired comparison would be three [(3)(3-1)/2]. If the number of products is

four then the number of paired comparisons would be six. More the number of objects
more will be the number of paired comparisons presented to the respondents. Paired
comparison is a good method, if the number of objects to be compared is small. If too
much of comparisons are to be made then the respondents may become tired and
provide wrong answers or refuse to continue. It is suggested that 5 or 6 stimuli are not
unreasonable if other questions are to be accompanied with the comparisons. If paired
comparisons are only to be dealt with, then upto 10 stimuli can be accommodated.
Forced Ranking scale
Forced ranking scale is easier and faster compared to the paired comparison
method. It requires the respondents to rank a list of attributes. It enables respondents
to rank objects relative to one another among the alternatives provided. It is more
suitable in case where the number of alternatives to be ranked is limited in number.
For eg. rank the following newspapers in the order of preference
The Hindu
Business Line Indian Express If the number of stimuli to be ranked is 5 or less, then it is comparatively an
easy task. The respondents may be normally careless if the items exceed 10.
Comparative scale
It involves a standard against which comparison is done. The comparison scale
provides a point of reference against which the current object under study is compared.
It enables benchmarking. However this can be used only when the respondents have
the knowledge regarding the standard against which comparison is made. The researchers
can treat the data produced by comparative scales as interval data since the scores
reveal the interval between the standard and the actual. It can also be treated as ordinal
data as the rank or position of the items are dealt with.
2.8.3 Construction of measurement scales

Five techniques are available to construct the measurement scales viz., Arbitrary
approach, Consensus Item analysis, Cumulative methods and Factoring. They are
explained below;
1. Arbitrary scaling
Arbitrary scales are developed on ad hoc basis. It is largely based on researchers
own subjective selection of items. Several items which are appropriate and unambiguous
70
DBA 1657
to the theme of study may be selected. Each item is scored from 1 to 5 depending on
the responses obtained. The results are then totaled. Arbitrary scales are easy to develop,
inexpensive and highly specific to the theme of the study. However the major limitation
is that the design approach is subjective. There is no assurance other than researchers
insight that the items chosen are representatives of the universe of content.
NOTES
2. Consensus Scaling
In consensus scale the items are selected by a panel of judges after evaluation
on the basis of some criteria like - relevance to the topic area, the risk of ambiguity and
the level of attitude represented by the items. This approach is widely known as
Thurstone equal appearing Interval Scale. The procedure followed in construction of
the scale is described below
(i)
A large number of items/statements expressing different degrees of

favourableness towards an object relating to the subject of the study, usually
more than twenty are collected by the researcher.
(ii)
A panel of judges evaluates the statements. The statements are written in the
card. One statement is written in each card. The judges sort each card into one of
the 11 piles representing the degree of favourableness the statement expresses.
(iii)
The sorting yields a composite position for each of the items. In case of disagreement
between the judges, the item is discarded.
(iv)
For the items that are retained, median scale value between one and eleven is assigned.
(v)
A final selection of statements are made on the basis of the median score. Of
the 11 piles 3 are identified by the judges as favourable, unfavourableand neutral.
The eight intermediate piles are unlabelled.
The Thurstone method is widely used for developing differential scales to measure
attitudes. The scale is more reliable for measuring a single attitude. This method of
construction involves cost, time and people and hence it is impractical. The values are
assigned to the items by judges which is subjective.
3. Item Analysis scaling
In Item analysis scaling, an item is evaluated on the basis of how well it
discriminates between those persons whose score is high and those whose total score
is low. It involves calculating the mean score for each scale item among the low scorers
and high scorers. The item means between the high-score group and the low-score
group are then tested for significance by calculating t values. Finally the items that have
the greatest t values are selected for inclusion in the final scales.
71
DBA 1657
NOTES
Summated scales or Likert scales are developed by the item analysis

approach. Summated scales consist of a number of statements which express either
favourable or unfavourable attitude towards an object to which the respondents is
required to react. The respondents indicate the agreement or disagreement with each
of the statement. Each response is given a numerical score and the total is obtained to
measure the respondents attitude. The procedure for developing a Likert type scale
is described below;
(i)
A large number of statements relevant to the object being studied is collected.

The statements express definite favourableness or unfavourableness towards
the subject
(ii)
A trial test can be conducted with a small group of respondents who form
part of the final study. The agreement or diasagreement towards each
statement is obtained on a five point scale.
(iii)
The response is scored in such a way that the response indicating the most
favrourable attitude is given the highest score of 5 and the most unfavourable
attitude is given the lowest score 1.
(iv)
The total score of each respondent is obtained by adding the score for
each individual statements
(v)
The next step is to array the total scores and find out those statements
which have a high discriminatory power. For this purpose the researcher
may select some part of the highest and the lowest total scores, for eg, top
25 percent and bottom 25 percent. These two extreme groups are
interpreted to represent the most favourable and the least favourable attitudes
and are used as criterion groups by which to evaluate individual statements.
Thus the statements which consistently correlate with low favourability and
with high favourability are identified.
(vi)
The statements which correlate with the total test are retained in the final
instrument and all others are discarded.
The advantages of Likert scale is that it is relatively easy to construct, considered
to be more reliable and less time consuming. One of the major limitations is that
the scale simply examines whether respondents are more or less favourable
towards the subject under study, but it cannot reveal how much more or less
they are. There is no basis for belief that the five positions indicated on the scale
are equally spaced.
72
DBA 1657
NOTES
4. Cumulative scales
Cumulative scales consist of series of statements to which a respondent expresses
his agreement or disagreement. The special feature in this scale is that it forms a cumulative
series. The statements are related to one another in such a way that an individual who
replies favourably to Item No.3 also replies favourably to Item no. 2 and 1. An individual
whose attitude is at a certain point in a cumulative scale will answer favourably all the
items on one side of this point and answer unfavourably all the items on the other side of
this point. The individuals score is arrived at by counting the number of points concerning
the number of statements answered favourably. If the total score is known, it is easy to
estimate the respondents answer to individual statements constituting the cumulative
scales. A major scale of this type is the Guttmans scalogram.
Scalogram analysis refers to the procedure for determining whether a set of
items forms a unidimensional scale. A scale is unidimensional if the responses fall into a
pattern in which endorsement of the item reflecting the extreme position results also in
endorsing all items which are less extreme. Under this technique, the respondents are
asked to indicate in respect of each item whether they agree or disagree with it. If the
items form a unidimensional scale , the response patter will be in the following manner
Item Number
Respondent score
A score of 3 means that the respondent agrees with all the statements which is
positive or expresses favourable attitude. A score of 2 reveal that the respondent
does not agree with the third statements but agrees with all other statements. In this
way, the scores can be interpreted.
The procedure for developing a scalogram is described below;
(i)
The issue or concept or subject under study must be clearly defined
(ii)
A number of items relating to the subject under study is to be developed.

Care should be devoted to eliminate the items that are ambiguous or
irrelevant.
(iii)
The next step is to pretest the items to determine the scalability. The pretest
should include a minimum of 12 items and should be administered on atleast
73
DBA 1657
NOTES
20 respondents. In the pretest, the respondents opinions are obtained on

5 point Likert scale. The most favourable response is scored as 5 and the
least favourable response is scored as 1. If there are 10 items then the total
score can range between 5 and 50. For the purpose of analysis and
evaluation, the respondents opinions are arrayed according to the total
score. If the responses of an item form a cumulative scale, the response
category scores should decrease in an orderly fashion as indicated below.
Failure to follow the decreasing pattern revels an overlapping and shows
that the item is not a good cumulative scale item.
Scale type
9
Item
6 2 5
4(perfect)
X X X X
3(perfect)
X X X
nonscale
nonscale
- X X -
3(perfect)
X X X
2(perfect)
- X
1(perfect)
- -
nonscale
X -
0(perfect)
- -
(iv)
- X
Errors
Per case
Number of
errors
The total scores of the various opinions are obtained. The order is then
shifted such that it results in a reduced number of items. The above example
shows that five items ( 9,6,2,5 and 7) are selected for final scale. Perfect
scales are those in which the respondents answers fit the pattern that would
be reproduced by using the persons total score. Non-scale types are those
in which the category pattern differs from that expected from the
respondents total score ie non scale items have deviations from
unidimensionality or errors. The selection of an item in the final unidimensional
scale is made on the basis of the coefficient of reproducibility. Guttman has
set 0.9 as the level of minimum reproducibility in order to select a scale.
The following formula is used for measuring the reproducibility:
74
DBA 1657
Guttmans Coefficient of Reproducibility = 1 e/n(N)
NOTES
e = number of errors
n = number of items
N = Number of cases
5. Factor scales
Factor scales include a variety of techniques that has been developed to address
two issues viz, the problem of dealing with the universe of content that is multi dimensional
and the problem of uncovering the underlying dimensions that has not been identified by
the exploratory research. Factor scales are developed through factor analysis or on the
basis of intercorrelations of items which indicate the common factor responsible for the
relationships between items. The techniques are designed to intercorrelate items so that
the degree of interdependence may be detected. An important factor scale based on
factor analysis is semantic differential scale and multi dimensional scales. They are
discussed below:
(a) Semantic Differential Scale
The semantic differential scale (S.D) was developed by Osgood and his
associates to measure the psychological meaning of an object to an individual. The
scale is made on the presumption that the object under study can have different dimensions
of connotative meaning which can be located in multidimensional property space or in
the semantic space in the context of S.D scale. The scaling consists of a set of bipolar
rating scales, usually of 7 points on the basis of which the respondents rate each concept
on the scale item. An example of the scale being used by a panel of corporate leaders
to rate the candidates for a leadership position are shown below. Three factors contribute
viz., evaluation (E), potency (P) and activity (A) are considered.
(E) Sociable
(7) ____ ____
____ ____
____ (1) Unsociable
(P) Strong
(7) ____ ____ ____ ____
____ (1) Weak
(A) Active
(7) ____ ____ ____ ____
____ (1) Passive
(E) Progressive (7) ____ ____
____ ____
____ (1) Regressive
(P) Tenacious
(7) ____ ____
____ ____
____ (1) Yielding
(A) Fast
(7) ____
____ ____ ____
____ (1) Slow
The nature of the problem determines the selection of dimensions and bipolar
pairs. The SD scale is adapted to each research problem. The construction of SD scale
involves the following steps:
75
DBA 1657
NOTES
(1) The concept to be studied, is selected based on personal judgment. It should

reflect the nature of the problem.
(2) The next step is to select the scales. In selecting the scale, the scales relevance
to the concepts being judged and factor composition should be kept in mind.
Atleast three bipolar pairs for each factor should be taken into consideration to
measure the evaluation, potency and activity. The scale should also be stable
across subjects and concepts.
(3) A panel of judges are used to rate the various stimuli on the various selected
scales and the responses of all the judges are combined to determine the
composite scaling.
The Semantic differential scale is efficient and easy way to secure attitude from
a large sample. The scale measures both the direction and intensity. The test
can be easily repeated as it is a standardized technique.
(b) Multidimensional scaling(MDS)
Multidimensional scaling is relatively a more complicated scaling device which
can be used to scale objects, individuals or both with a minimum of information. It
enables to provide visual impression of the relationship between variables. MDS is
used when all the variables both metric and non metric are to be analyzed in a study
simultaneously and all such variables happen to be independent. The data handling
characteristic of MDS provides several options: ordinal input, fully metric and non
metric modes. The various techniques use proximities as input data. Proximity is an
index of perceived similarity or dissimilarity between objects. The respondents are
asked to judge in pairs the possible combinations regarding their similarity. Through
computer program, the ranked or rated relationship is represented as points on a map
in multi dimensional space. For example, if respondents are asked to identify similar
products among a group of products and if product X and Y are similar, MDS technique
will position X and Y in such a way that the distance between them in multidimensional
space is shorter than that between any two other objects.
Two approaches viz., the metric approach and the non-metric approach are
available in the context of MDS. The metric approach to MDS treats the input data as
interval scale data. The non-metric approach first gathers the non-metric similarities by
asking respondents to rank order all possible pairs that can be obtained form a set of
objects. Such non-metric data is then transformed into some arbitrary metric space and
then the solution is obtained by reducing the dimensionality.
The MDS enables the researcher to study the perceptual structure of a set of
stimuli and the cognitive process underlying the development of this structure. It enables
76
DBA 1657
perceptual mapping in a multidimensional space. However MDS is not widely used

because of the computational complications involved.
NOTES
SUMMARY:
This unit examined the meaning and the various types of experimental design.
The various activities involved in conducting the experiment were discussed. The various
factors affecting the internal and the external validity were dealt. The pre-experiments,
true/lab experiment and the field experiments were covered. The various rating and
ranking scales were discussed. Construction of arbitrary, cumulative, consensus, item
analysis and factor scales were examined.
Equipped with the knowledge on the research design and measurement scales,
the next unit presents the various data collection methods, sampling techniques and the
parametric and non-parametric tests available to test the hypothesis.
What is experimental design? Elucidate the benefits and drawbacks
What essential characteristics distinguish a true experiment from other research

design?
Distinguish between the following:

o Internal validity and external validity
o Pre-experimental and quasi-experimental design
o History and maturation
o Random sampling, randomization and matching
o Active factors and blocking factors
What is internal validity and what factors affect the same?
What factors contribute to the external validity?
Discuss the different experimental designs. Illustrate the same.
A retail grocery chain wants to study the effects of the various levels of advertising
effort and price reduction on the sale of specific branded grocery products.
What type of experimental design would you recommend? Suggest in detail the
design for the study.
What are the essential differences among nominal, ordinal, interval and ratio
scales? How do these differences affect the application of statistical techniques?
What are the four sources of measurement error? Illustrate by example how
each of these might affect the measurement results in a face-to-face interview.
77
DBA 1657
NOTES
How is the interval scale more sophisticated than the nominal and ordinal scales?
Why is ratio scale considered to be the most powerful of the four scales?
How do you ensure the soundness of a measure?
Describe the difference between the rating scales and ranking scales and indicate
the application areas where they can be used
Discuss the relative merits and problems with:

o Rating and ranking scales
o Likert and differential scales
o Unidimensional and multidimensional scales
Discuss the techniques of constructing the measurement scales with examples.
78
DBA 1657
Unit - 3
NOTES
Data collection Methods

Unit structure:
3.1. Introduction
3.3. Sources of data
3.3.1 Primary data sources
3.3.2 Secondary sources
3.3.3 Tertiary sources
3.4. Methods of data collection
3.4.1 Interviews
3.4.2 Observation
3.4.3 Questionnaires/interview schedules
3.4.4 Other methods
3.5. The basics of sampling
3.5.1 Why sampling?
3.5.2 Steps in sampling
3.5.3 Characteristic features of a good sample
3.6 Types of sampling
3.6.1 Probability sampling
3.6.2 Non probability sampling
3.7 Determination of Appropriate Sampling Design
3.8 Sampling decisions: Some Issues
3.8.1 Precision and Confidence in sample size estimation
3.8.2 Sample data and hypothesis testing
3.8.3 Determining the sample size
79
DBA 1657
NOTES
3.1 INTRODUCTION
Once the problem is defined and research design is finalized, the various sources
of data and the ways in which it can be collected for the purpose of analysis, testing of
hypotheses and answering research questions should be explored. Data collection could
be made through the primary, secondary and tertiary sources which are dealt in detail.
This unit also highlights the sources of data and the methods of collecting the same. The
data cannot be collected always form the entire population due to various reasons like
difficulty in estimating the population, cost constraints, time etc., Sampling technique
has to be adopted by a researcher for collection of data. This unit provides a detailed
account of the probability and the non probability sampling techniques. The issues
regarding determination of sample sizes are also presented.

After completing this unit you will be able to:
Know the difference between primary and secondary data and their sources
Understand the various data collection methods and the advantages and
disadvantages of each method
Know the criteria to be considered in making decisions regarding the method

of data collection
Understand the need for sample
Understand the different types of sampling designs
Identify the appropriate sample design for different research purposes
Describe the issues to be considered in determining the sample size
3.3 SOURCES OF DATA

Data sources can be broadly categorized into three types viz., primary,
secondary and tertiary.
3.3.1 Primary data sources

Primary data refers to information gathered firsthand by the researcher for the
specific purpose of the study. It is raw data without interpretation and represents the
personal or official opinion or position. Primary sources are most authoritative since the
information is not filtered or tampered. Some examples of the sources of primary data
are individuals, focus groups, panel of respondents, internet etc. Data collection from
individuals can be made through interviews, observation etc.
80
DBA 1657
NOTES
Focus groups
Focus group involves a formalized process of bringing small group of people
together for an interactive and spontaneous discussion on any one particular topic.
Focus group generally consists of 6 to 12 participants with a moderator leading the
unstructured discussions which can last between 90 minutes to two hours in general. By
facilitating the discussions the moderator elicits as many ideas, attitudes, feeling and
experiences as possible regarding the concerned issue. Participants are generally chosen
on the basis of their expertise in the topic on which the information is sought.
The goal of conducting focus group is to give researchers the access to as
much information as possible regarding the product, service concept or organization.
The focus group does not restrict to only asking and answering questions. The success
of focus group relies to a greater extent on the group dynamics, the willingness of the
participants to engage in interactive dialogue and the ability of the moderator to keep
the discussion on the track. The fundamental idea behind the focus group is that one
participants remark or response may initiate comments and discussions from the other
participant thus generating spontaneous and free interplay among all the participants.
Focus groups are relatively inexpensive and can provide access to dependable data
within a short period of time.
Focus groups objectives
The objectives of forming focus groups are listed below:
The focus groups provide data for defining and refining the problems. In situations
where it is difficult for the researcher to pinpoint the specific problems, the
focus group aids to differentiate between symptoms and root cause problems.
In certain situations, researchers may not be sure about the specific types of
data or information that should be investigated. In these situations, focus groups
reveal unexpected components of the problem and thus can help researchers
to determine the specific data that are to be collected.
There are situations when the quantitative research investigations leads to results
which are not understandable or explainable. In such situations the focus group
enables to provide data for a better understanding of results derived from
quantitative studies.
Focus group interviews provide researchers excellent opportunities to gain

insight into the respondents hidden needs, wants, attitudes, feelings and
behaviours. It opens the door to realism.
81
DBA 1657
NOTES
The general interactions and discussions among the focus group members
enables to generate new ideas, products or services or innovative ways of
solving problems which are unexplored hitherto.
The focus group plays a critical role in the process of developing new constructs
and creating reliable and valid measurement scales. In the exploratory stage,
the focus group reveals additional insights into the underlying dimensions that
may or may not make up the construct. This insight can help researcher to
develop scales that can be later tested and refined through larger survey research
designs.
Conducting focus group interviews:

The process of conducting focus group interviews can be divided into three
logical phases: planning, conducting the discussion, analyzing and reporting the results.
I. Planning the Focus group study
Planning phase is critical for the conduct of a successful focus group interview.
The researcher should have a clear idea of the purpose of the study, problem definition,
and specific data requirement. In the planning phase the decision regarding the type of
participants in the focus group, the process of selecting and recruiting them, the size of
the focus group and the location of the focus group session should be made. These
aspects are discussed below:
Focus group participants
The decisions regarding the type of participants must be made in tune with the
purpose of the study. The focus group should be made as homogenous as possible, but
it should also provide room for variations so as to encourage contrasting opinions.
Important factors to be considered in the selection process are the potential group
dynamics and the willingness of members to engage in dialogue. The knowledge level
of the participants regarding the topic to be discussed should also be considered in the
selection process.
Selection and recruitment of participants
To select the participants for a focus group, the researcher should develop a
screening form. The screening form should access the characteristics, the respondent
must possess in order to qualify as a prospective participant. Researcher should also
choose a method of reaching the prospective participants. They can use the existing
participants list, on-location interviews, snowball sampling, random telephone screening,
placing advertisements in newspapers and on bulleting boards.
82
DBA 1657
The issue of sampling, needs to be addressed carefully while planning for the
focus group. Random sampling may eliminate bias and produce dependable conclusions .
However, it may not be possible or necessary in certain situations. For example, in
qualitative research, a more flexible research design can be followed.
NOTES
Recruitment involves the process of securing willingness from the participants

to be a part of the focus group. It is not an easy task, however, professional recruiters
who are good in interpersonal communication skills and social skills can be identified to
perform the task successfully. The recruiters should clearly communicate the importance
of the topic and specify the date, starting and ending time, location, incentives for
participating and method of contacting the recruiter. In order to reinforce the participants
commitment to participate in the focus group, the researcher should send out a formal
communication/invitation letter incorporating all the important information about the
meeting. The recruiter should also remind the participants of the focus group meeting
the day before the scheduled meeting.
Size of the focus group
The optimal number of participants in the focus group interview may vary from
6 to 12. In case of less number of participants, creating the right type of group dynamics
may be difficult and a few people may dominate the others. Too many participants will
also limit each persons opportunity to contribute to the discussions. However, the
researcher should provide room for the fact that some of the participants who have
agreed may not turn for the discussion. Incentives may be provided to motivate
participants.
Focus group locations
The focus group sessions can last between 90 minutes to two hours. The location
assumes importance because of the length of the discussions. The location should be
comfortable, spacious, uncrowded and conducive to elicit spontaneous, unrestricted
dialogue among all the group members. Depending on the resource constraints the
focus group may be conducted in locations like conference rooms, office or hotel rooms.
However, the ideal location would be specially designed rooms with facilities like large
table, comfortable chairs, relaxed atmosphere, build in audio equipment and a oneway mirror so that the researcher can view and hear the discussion without interfering
or being seen. Video equipment should also be available to capture the nonverbal
communications and behaviour during the course of discussion.
II. Conducting the focus group discussions
The focus group is conducted by a moderator. The moderators communication,
interpersonal, probing, interactive and observation skills plays a major role in the
83
DBA 1657
NOTES
successful conduct of the focus group discussions. The moderator should be able to
stimulate and control the focus group discussions over the predetermined topics in a
skillful manner. He should be able to draw the best and most innovative ideas from the
participants regarding the topic or problem under discussion. The moderator is
responsible for creating a positive group dynamics and a comfort zone between himself
and each group member as well as among the group members.
The moderator should have enough background knowledge regarding the topic
of discussion. Apart from skill set discussed above, moderating the session requires
objectivity, self-discipline, concentration and careful listening. The moderator should
be completely prepared with the questioning route yet should allow flexibility depending
on the situation.
The actual conduct of the focus group discussion can be arranged under three
phases viz, opening the session, main session and closing the session. They are dealt
below:
Opening the session
The moderator should warmly receive the participants and make them feel
comfortable. The participants should be instructed to write their names in the name
cards. A few minutes should be allowed for socializing before seating the participants
so that a warm, friendly and congenial environment can be set. The socializing session
can be used by the moderator to observe the participants and place them in groups.
The moderator should discuss the ground rules for the session; one person should only
talk at a time, each one should be given a chance, brief about the purpose of the session
and so on. The moderator can begin the discussion with an open question designed to
engage all participants in the discussion. This breaks the ice and enables to build a
positive group dynamics and comfort zone.
The main session
The topic area is introduced in the main session and as the discussion starts the
moderator gears the direction using probing techniques to get as many details as possible.
As there is no hard and fast rules regarding how long a discussion can be carried out the
moderator should use his judgment in deciding when to close one topic and move to the
next. The critical question should be provided with more time so that the ideas, feeling
and thoughts can be elicited to the maximum.
Closing the session
After covering all the topics for which the focus group is formed, the session
can be wound up. In this process the moderator can summarize the conclusion around
84
DBA 1657
at the discussion and also invite the closing comment from the participants regarding
further contributions or disagreements over certain ideas. If nothing else arises, the
moderator can close the discussion after thanking the participants and distributing the
promised incentives.
NOTES
III. Analyzing and reporting the results

For the purpose of analyzing the happenings in the focus groups session two
techniques are available viz., debriefing analysis and content analysis. Debriefing analysis
is an interactive procedure in which the researcher and moderator discuss the subjects
responses to the topics that outlined the focus group session. Insight and perceptions
can be expressed concerning the major ideas, suggestions, thoughts and feelings from
the session. Ideas for improving the session can be uncovered and applied to further
focus group sessions.
Content analysis is a systematic procedure of collecting individual responses
and grouping them into larger theme categories or patterns. It is the most widely used
formalized procedure by the researchers in an effort to create data structures from
focus group discussion. From the group discussion session responses are recorded
and translated. The researcher reviews the raw responses and creates data structures
according to the common patterns. The process requires several analysis and interpretive
factors.
For creating the report, the researcher should first understand the audience, the
purpose of the report and the expected format. The report should be clear,
understandable and should provide data to support findings. The researcher should
include quotations, examples wherever needed. The report should be presented in a
logical sequence. The fact that the report will be a historical record should also be given
due consideration.
Advantages and disadvantages of focus group interviews
The following are some of the advantages of conducting focus group interviews:
The spontaneous and unrestricted interaction among the participants give raise
to new ideas, thoughts and feelings which cannot be elicited in an one to one
interview. The respondents will provide creative and honest opinion. The
conducive environment enhances creativity.
The underlying reason to the attitude, feeling, emotions, behaviour etc can be
dealt as in case of the focus groups discussion.
The researcher will have a first hand information and the opportunity to be
involved in the overall process right from starting the focus group till closing it.
85
DBA 1657
NOTES
This gives an in depth insight into the various dimensions of the problem which
are hitherto unexplored.
Focus group interview can cover a number of topics. The discussion can be
directed at successfully over a number of issues.
It enables to bring together participants from diverge segments which will

otherwise be difficult.
The disadvantages of the focus group interview are dealt below:
Identifying the participants and gathering them together in a location is difficult
The data structures developed from focus group interview cannot be as such
applied to the target population. The generalization of the research findings are
questionable.
The researcher has only limited ways to substantiate the data reliability. Added
to this the data collected from the participants may not be structured and
amenable to further statistical inferences.
The data collected from the focus group can be subjectively interpreted by the
researcher according to the researchers preconceived views. The bias will
reduce the credibility and trustworthiness of the data and the information derived.
The cost per participant in terms of identifying, recruiting and compensating are
relatively quite high.
Online focus group

One of the major problems in collecting data through focus group interview is
gathering the participants in a particular location. This issue can be solved by administering
online focus group interviews. The developments in internet, telecommunication and
computer technological advancements have led to the online focus group. In online
focus groups the researcher or moderator and the participants meet and conduct the
interview across the Internet on a real time basis. The participants need not be physically
gathered in a common location for conducting the session. E-mail, websites and Internet
chat rooms facilitate online focus group discussions. However, the cost of collecting
data through online focus groups is high.
Panels
Panels refer to the sample of individuals, households or firms from whom
information may be collected in successive time periods. Like the focus group, panels
enable us to collect primary data. However, the focus groups meet for a one-time
group session, panels meet more than once. Panel studies are useful to track the effect
of changes over a period of time.
86
DBA 1657
The members of the panel are randomly chosen. They may be exposed to an
advertisement or their attitude towards a particular brand may be recorded. After a few
days or month the panel may be exposed to a different set of advertisement or their
attitude may be measured again to identify the changes in the behaviourial pattern.
Thus, the continuing set of members form the sample base or the platform for assessing
the effects of change. Such members are called panels and the research that uses them
is called as panel study.
NOTES
The panels can be static or dynamic. In case of a static panel the same members
form part of the panel over an extended period of time. In case of the dynamic panel
members change from time to time as the study progresses to successive phases. The
static panel offers a good and sensitive measure of changes. However, due to continuous
interviews the panel members are over exposed to the issues on hand and they may not
reflect the view of population. The members may also not continue to be a part of panel
for a longer period of time. There may be dropouts. The major drawback in dynamic
panel is that it deals with different people which may give raise to different opinions and
the changes cannot be tracked in a objective manner.
3.3.2 Secondary data sources
Secondary data refers to the information gathered from already existing sources.
Secondary data may be either published or unpublished data. The published data are
available in the following forms:
Publications of central, state and local governments
Publications of foreign governments, international bodies and their subsidiary

organizations
Technical and trade journals
Books, magazines and newspapers
Reports and publications of various business and industrial associations, stock

exchanges, banks and other financial institutions
Reports prepared by research scholars, universities, economists in different

fields
Public records and statistics, historical documents and other sources of published
information.
Online and real time databases etc.,
The unpublished sources include the company records or archives, diaries,

letters, biographies and autobiographies and other public/private organizations. Collection
87
DBA 1657
NOTES
of secondary data involves less time and cost. However, a researcher should not solely
depend on the secondary data due to the following reasons; the data can become
obsolete and may not provide current and updated information, data would have been
collected for some other purpose and hence it may not meet the specific requirements
of the researcher.
The researcher before using secondary data should ensure the following:
The reliability of data should be ensured by way of finding out the type of
people involved in data collection, the sources from which the data is collected,
the methods used to collect the data, the time of data collection and the level of
accuracy associated.
The secondary data used by the researcher would have been collected for a
different problem other than what the researcher is presently attempting to solve.
Hence the researcher should ensure that the data collected is suitable for the
purpose of the study for which it is attempted.
The secondary data should be adequate for the conduct of the study. It should
be related to the area and should neither be narrower nor wider than the problem
attempted by the researcher.
3.3.3 Tertiary sources

Tertiary sources are an interpretation of a secondary source. It is generally
represented by index, bibliographies, dictionaries, encyclopedias, handbooks, directories
and other finding aids like the internet search engines.
3.4 METHODS OF DATA COLLECTION

Data collection method is an integral part of the research design. There are
various methods of data collection, each method has its own advantages and
disadvantages. Selection of an appropriate method of data collection may enhance the
value of research and at the same time wrong choice may lead to questionable research
findings. Data collection methods include interviews, Self-administered questionnaires,
observations and other methods. The choice of a method depends on the following
factors:
Nature , scope and objectives of the research
Availability of resources
Degree of accuracy required
Expertise of the researcher
Time span of the study
Cost involved and the like
88
DBA 1657
NOTES
The data collection methods are discussed below in detail:
3.4.1 Interviews
In this method the respondents are interviewed for the purpose of obtaining
information on the issues pertaining to the reserch. The interview may be either
unstructured or structured and it can be a personal interview or conducted through
telephone, mail , internet or a combination of all these.
Unstructured interviews
In the unstructured interviews, the interviewer does not conduct the interview
with a planned sequence of questions. The aim of this interview is to highlight the
preliminary issues so that the researcher can determine the variables which needs further
in-depth investigation. The researcher resorts to the unstructured interviews when the
problem is not clearly formulated or when a clear understanding of the variables involved
is not present. The researcher in the attempt to obtain information may adopt different
styles and sequencing of questions to various respondents. Some may provide information
with open ended questions, whereas some may require more directions. Some
respondents may be more defensive and may not be willing to share information. Some
may even be reluctant to undergo the interview and may refuse to respond. The
researchers have to employ various questioning techniques so as bring the respondents
defenses down and make them more amenable to reveal information. The researcher
should also know when to retreat or terminate the interview when the respondents
cannot be convinced to participate or impart the information.
The unstructured interview will direct the researcher to understand the variables
which need greater focus based on which a structured interview can be planned.
Structured interviews
The structured interviews are conducted when the interviewer knows the type
of questions to be asked to the respondents or when the information needs are clearly
known. The questions may focus on the issues that have been highlighted during the
unstructured interviews and are considered relevant to the problem identified. The
interview may be conducted by the researcher himself or by a team of interviewers.
The researcher/ interviewer should be very clear about the purpose of each question
particularly when a team of interviewers conduct the survey. The same questions are
posed in the same sequence or manner to all the respondents and the responses are
noted down. Depending on the situations and the respondents willingness and knowledge
the researcher can also ask other relevant questions which may not be in the list so as
to gain more insight into the identified problem. The researcher may also include visual
89
DBA 1657
NOTES
aids, drawings , pictures and other materials in conducting the interviews. In situations
where the ideas cannot be clearly articulated only with words visual aids are more
useful.
1. Personal interviews
Personal interviews or face to face communication is a two-way conversation
initiated by the interviewer to obtain information from the participants. The interviewer
and the participants may be strangers. The interviewer controls the topic and pattern of
discussion. The participant or the respondents may not gain anything out of their
participation in the interview.
The success of the personal interview lies among other things on the respondents
ability to provide the information needed and the ability to understand the importance
of information provided by him. The researcher should take necessary steps to motivate
the respondents to cooperate so as to ensure successful conduct of the interview.
Increasing participation
The researcher can enhance the respondents participation by way of explaining
the kind of answer sought, the terms that should be expressed, the depth and clarity of
information needed etc. Coaching can be provided to the participants but care should
be taken to avoid the biasing factor. The interviewer can make the session an interesting
and enjoyable experience by means of administering adequate motivation techniques.
Some of the techniques for successful interviewing of the participants are listed below:
The interviewer should introduce himself by name and the organizations to which
they are affiliated to. The interviewer can identify himself with the introductory
letters or other information that confirms the legitimacy of the work. Enough
details regarding the work to be done should be given, wherever demanded
more information may be provided. The interviewer should be able to kindle
the interest of the respondent.
If the participant is busy, the interviewer should try to stimulate interest so as to

arrange for an interview at another time.
The successful conduct of interview requires a good rapport and understanding

between the interviewer and participant. The interviewer should earn the
confidence of the respondent so as to elicit response without censure, coercion
or pressure.
In the process of gathering data the interviewer should ensure that the objective
of each question is achieved and the needed response is obtained. The
interviewer can resort to probing, but steps should be taken to avoid the bias.
90
DBA 1657
The interviewer should record the answers of the participant in an efficient

manner. The interview should record responses as they occur, recording the
response later will lead to loss of information. Shorthand mechanism like recording
only the keywords can be done in the case of time constraint.
Interviewers should have good communication skills, should be able to adapt

to flexible schedules, be willing to work during intermittent work hours and
should be mobile. If the interview is conducted by the researcher himself, there
is no need for much training else proper training should be provided so that the
interviewer is able to understand the objective of the study, the purpose of each
question, the possible responses and an outline of the research work conducted,
its importance etc. Written instructions can be provided wherever needed.
Questioning techniques should be followed by the interviewer. Funneling

approach can be practiced i.e. in the beginning of the unstructured interview
open-ended questions can be asked to get a broad idea and create an impression
about the situation. Care should be taken to see that the questions are unbiased.
The interviewer should restate or rephrase important information so as to ensure
that the issues are recorded as how the respondent intends to represent the
same. The researcher can also help the respondent to verbalize the perceptions.
NOTES
Problems in conducting personal interview

The two problems in conducting personal interview are the increased cost and
the problem of biased results. Biased results arise out of three types of errors viz.,
sampling, response and non response error
i.Sampling error
One of the major criteria of a good sample design is the precision of estimate
made with the samples. The sample respondents selected for conducting the interview
may not fully represent the population in all aspects. The numerical descriptors that
describe the sample may differ from those that describe the population because of the
random fluctuations inherent in the sample process. This is called as sampling error. The
sampling error reflects the influence of chance in drawing the sample members. The
sampling error is that which is left after accounting for all known sources of systematic
variance.
ii.Non-response error
Non-response error occurs when the responses of participants differ in some
systematic way from the responses of non-participants. The error occurs due to the
inability to locate and access the selected sample respondent or the selected sample
respondent may not be willing to participate in the interview. This problem specifically
91
DBA 1657
NOTES
arises due to selection of samples through the probability sampling method. The problem
can be tackled by way of attempting to contact the respondent again. Another approach
is to treat all the remaining non-participants as a new sub-population after a few callbacks.
A random sample is drawn from the non participant group and attempt is made to
contact and complete this sample at hundred percent success rate. Finding from this
non-participant sample can be then weighed into the total population estimate. The
researcher can also try to substitute the missing participant but care should be taken to
see that the substitute participant possess the significant character of the replaced
participant. For e.g., the respondent should belong to the same occupation, educational
status, income level etc.,
iii.Response error
Response error occurs when the data reported differ from the actual data. The
error can be caused by the respondent or the interviewer or during the preparation of
data for analysis. Participant initiated error occurs when the participant fails to answer
accurately either by choice or due to lack of knowledge. Interviewer error arises due to
the inability to conduct the interview in a controlled manner. This may take many forms
like the failure to secure cooperation, lack of consistent interview procedures, inability
to establish appropriate interview environment, bias due to physical presence, failure to
record answers correctly. These errors affect the quality of the data collected.
iv. Cost
To conduct the personal interview, the respondents should be met individually.
They might be scattered geographically and the time and cost involved in administrative
and travel task is higher. Sometimes, the respondents may not be available and repeated
contacts have to be made which adds to the cost. In addition to this the researcher may
employ interviewers who have to be paid. To reduce the cost telephonic interviews and
self-administrated surveys can be attempted.
Advantages and drawbacks
The major advantage of personal interviewing is the ability to secure in-depth
information and detail. The ability to harness information is more in personal interviewing
as compared to telephone, mail survey and through internet. The researcher can adopt
the questioning technique in tune with the respondents ability to understand. Further
clarification can be immediately made by repeating or rephrasing the questions concerned.
The researcher can also get information from the nonverbal clues exhibited through the
body language of the respondent.
However, the personal interviewing involves cost in terms of both money and
time. Costs may escalate in case, where the study covers a wide geographic area or
92
DBA 1657
has a large sample to be covered. The chance of the outcome being affected by the
interviewers bias is more in the case of personal interviews. The respondents may feel
uneasy about the secrecy of their responses in case of the face to face interaction.
NOTES
2 Telephonic interviews
Interviewing through telephones enables to gain the following advantages:
Conducting interview through telephone enables to reduce the cost. The cost
reduction arises due to reduction in traveling and administrative expenses involved
in training and supervision. It is enough to train less number of interviewers
since the interview is conducted through telephone. Coverage per person through
telephone will be more than the face to face interviews.
Telephonic interview enables to screen and cover large population spread over
a wide geographical location. It enables to have a much more representative
sample.
Use of computer assisted telephonic interviewing enables to enter data collected

in interview directly in the computers by means of terminals or through voice
data entry. This helps in further cost and time reduction.
Computer administered telephone survey can also be conducted where the

computer can replace the interviewer. A computer calls the phone number,
conducts interview and place data into a file for later tabulation.
The interviewers bias caused by physical appearance, body language and

actions are reduced by using telephones. The respondent may feel more relaxed,
comfortable and unhesitant to reveal information as the face to face contact is
not present.
Unlike face to face interview where the respondent may avoid contact with the
researcher, the contact rate is higher in telephonic interviews as the respondent
has to pick up the ringing phone. However, the use of caller identification facility
may reduce the contact rate.
The following drawbacks arise out of telephonic interviews:
Though the penetration rate of telephones is increasing in India, still there is a

vast population without the telephone facility. Also the number of users with
only cell phone connection is increasing. Their numbers are not listed and reaching
them would be difficult.
The random sample identified through telephone directories may be sometimes

not available in the number given or may be malfunctioning.
93
DBA 1657
NOTES
The length or duration for which the telephonic interview can be conducted is
limited. Ten minutes interview is considered ideal. However sometimes the
interview may extent to more than an hour also.
It is difficult or impossible to use maps, illustration, visual aids , measurement

scale techniques in the telephonic interview. The researcher cannot depend
more on the visualization techniques.
The interview can be terminated by the respondent as easily as the contact

could be made. Also the level of interest and rapport in the telephonic interview
is much lesser when compared to the face to face interview.
The challenging and distracting physical environment either at home or office

may reflect on the quality of data collection and may also result in refusal to
participate in the interviews.
3.4.2 Observation
Observation is most commonly used data collection method in many of the
studies relating to behavioral sciences. Observation enables to collect data without
asking question from the respondents. The respondents can be observed in the natural
work environment or in lab setting and their activities and behaviors of interest can be
recorded.
In conducting research, casual examination without purpose cannot be called
as observation. Observation becomes a scientific tool for data collection, if it is conducted
specifically to answer a research question. It should be systematically planned and
executed using proper controls and should provide a reliable and valid account of what
has happened.
Types of observation
Observation can be grouped under the following categories:
1. Type of activity under observation
Observation includes monitoring both behavioral and non-behavioral activities
and conditions. Behavioral observation includes nonverbal analysis, linguistic analysis,
extra linguistic analysis and spatial analysis.
Nonverbal analysis includes body movement, motor expression and exchanged

glances. Body movement indicates interest, boredom, anger or pleasure. Motor
expression includes facial movements, blink of eye and exchanged glances.
94
DBA 1657
Linguistic behaviour includes the number of repeated words used by persons

in a conversation. It also includes the type of interaction process that occurs,
between two persons or in small groups.
There are four dimensions to extralinguistic behaviour viz., (i) vocal which
includes pitch, loudness and timbre (ii) temporal which includes the rate of
speaking, duration of utterance and rhythm (iii) interaction which includes the
tendencies to interrupt, dominate or inhibit and (iv) verbal stylistic including
vocabulary and pronunciation, peculiarities, dialect and characteristic
expressions.
Spatial relationship refers to how a person relates physically to others. For

eg., proxemics is a study which relates to how people organize territory about
them and how they maintain discrete distances between themselves and others.
NOTES
The non-behavioral analysis includes record analysis, physical condition

analysis, and physical process analysis.
The records include the historical or current record and public or private records.
It may be written or printed.
Physical condition analysis includes conducting store audits, studies relating

to plant safety compliance, analysis of financial statements etc.
Process refers to series of steps taken to complete an activity. It includes time/

motion studies relating to the manufacturing process, analyzing the traffic flow
in distribution system, financial flow in organization etc.,
2.Directness of the observation

Based on the directness of observation, it can be grouped as direct or indirect.
Direct observation happens when the observer is physically present and monitors while
the event is taking place. This is highly flexible as the observer can decide what to
observe, how much time to spend on observation of an aspect, when to shift focus etc.
The observer may feel bored or frustrated by constantly being on the watch and may
tend to loose focus. This might reduce the accuracy and completeness of the observation.
Another weakness is that the observer may be overloaded when the events takes place
quickly which cannot be kept track of or recorded.
Observation carried out using mechanical, photographic or electronic means
are grouped under indirect observation. For example the uses of video cameras,
pupilometric devices etc to capture the behaviour of consumers are grouped under
indirect observation. Indirect observation can be carried out in an unbiased manner.
Further, loss of information due to boredom, fatigue, overloading etc is avoided. However,
the indirect observation is less flexible as they may be programmed earlier.
95
DBA 1657
NOTES
3. Concealment
This categorization is based on whether the participant is aware of the observers
presence. The presence of observer may cause the participant to behave in a different
manner which might arrest the very purpose of observation. If the activity in which the
participants are involved is highly absorbing then there is a high chance that the participant
may remain unaffected by the presence of the observer. However, the potential bias
due to the presence of the observer cannot be totally ruled out.
In order to rule out the bias in behaviour the observers may conceal themselves
from the object being observed using some mechanical means. For e.g., one way mirror,
camera, microphone etc. However, this has to be carefully evaluated on the basis of
ethical grounds.
Partial concealment is where the presence of the observer is not concealed but
his objectives or interest is not revealed. In order to evaluate the performance of a sales
person, a sales manager may be present when the salesman is dealing with the customer.
However the purpose of the sales managers presence may be concealed and he may
pretend to be involved in some other task.
4.Participation
The presence of the observer and his involvement in the research setting is
called participant observation. He plays the role of observer as well as the participant.
The participants may or may not know about the same. The observer should be more
efficient as he has to play a dual role.
Non-participant observation occurs when the observer collects the data without
becoming an integral part of the research setting. The observer merely observes the
activities, records them and tabulates them in a systematic manner. This type of
observation requires the observer to be physically present in the research setting for a
extended period of time which makes it a time consuming task.
5. Definiteness of structure
The observation can be grouped as structured and unstructured observation.
Clear definition of various aspects of observation viz., the units to be observed, method
of recording, extent of accuracy needed, conditions of observation and selection of
pertinent data of observation etc are the characteristic of structured observation.
Structured observation is appropriate in case of descriptive studies.
If the observation is conducted without the above characteristics defined in
advance, it is termed as unstructured observation. This method of observation is usually
followed in exploratory studies.
96
DBA 1657
NOTES
6. Extent of control
The observation can be carried out in controlled or uncontrolled settings.
Uncontrolled observation is carried out in a natural setting. No attempt is made to use
precision instruments. The main aim of using this method is to get a spontaneous picture
of reality. It provides naturalness and completeness to observation. However, it may
lead to subjective interpretation and over confidence that the observer knows more
about the observed phenomena than the actual. It is usually used in exploratory research.
Controlled observation takes place according to a definite predetermined plan.
It involves experimental procedure and involves the use of precision instruments to
record the observation. The observation is usually carried out in a standardized and
accurate manner leading to certain assured degree of generalization. It is usually carried
out in the form of experiments in laboratory or under controlled conditions.
Decision involved in conducting the observational study
Observational studies involve the decision regarding the type of the study, content
to be observed, training requirement of the observer/researcher and the data collection.
1.Type of the study
Observation in various forms is practiced in different type of studies. In
exploratory studies data collection is done through simple observation which may not
be carried out in a structured manner. In case of studies other than the exploratory
nature, systematic observation employing standardized scientific procedure will be
followed.
2.Content specification
In observational studies the variables to be observed and other variables that
may affect them should be specified. From the specified variables, the variables that are
to be observed should be selected. The variables should be operationally defined so as
to avoid confusion in the minds of observers.
3. Training the observers
The validity and reliability of the findings from observation depends on the
observer. If the observer is not trained properly, the data collected may not lead to
valid results. The observer is prone to fatigue, halo effects and observers drift which
will affect the dependability of the data collected. Hence, in selection of observers
certain guidelines should be followed. The observer should have the ability to function
amidst lot of distractions, remember details of the activity observed, blend with the
settings being observed and should have the ability to extract the most from the
observational study. The observer should be given clear instructions regarding the outcome
sought and the precise content to be observed.
97
DBA 1657
NOTES
4.Data collection
Data collection plans deals with answers to question like who, what, when,
how and where. The qualification of a participant to be observed, the characteristics of
the observation, the time of observation, the method of recording data by the observers
and the place where the observation is to be conducted.
3.4.3. Questionnaires
Most of the research studies carried out for solving business problems require
the researcher to depend on primary data. The researcher should collect data through
questionnaires/ interview schedules and process the same so as to provide solution to
the identified problem. A questionnaire is a formalized framework consisting of a set of
questions and scales designed to generate primary raw data. It is a preformulated written
set of questions to which the respondents record their answers. The answers are mostly
chosen by a respondent from within the closely defined alternatives. The questionnaires
can be administered personally, mailed to the respondents or electronically distributed.
A.Personally administered questionnaire
If the study is confined to a local area, the questionnaires can be collected by
personally administering the same. The main advantage is that the researcher can collect
all the completed responses within a short period of time. The researcher has an
opportunity to introduce the research topic and motivate the respondents to offer frank
answers. Any doubts that the respondents have on any questions is clarified on the
spot.
Administering the questionnaire to a large number of respondents at a time
would save time and expenses and also ensure quick collection of data as against
personal interviewing. Hence, wherever possible group administration of questionnaire
should be opted for depending on the sample frame work. The major drawback will
be the reluctance of organizations to give time to conduct survey among group of
employees.
B. Mail questionnaire
Where the respondents are scattered over a wide geographical area, the
researcher has to resort to mail questionnaires. The questionnaires are mailed to the
respondents, who can complete them at their convenience, in their home at their own
pace. The main advantage is that the anonymity of respondents is maintained and this
will lead to a free and frank disclosure of information. The respondents spread over a
wide , geographical area can be reached and the respondents can take more time at
their convenience and fill the questionnaire. It can also be administered electronically.
98
DBA 1657
However, the return rates of mail questionnaires are typically low. The doubts
in the questionnaire cannot be cleared as easily as in the case of personally administered
questionnaire. The representativeness of the sample is questionable due to the low
return rates. The respondents can be motivated by sending follow-up letters, enclosing
small monetary amounts as incentives, providing respondents with self-addressed,
stamped return envelopes and keeping the questionnaire as brief as possible
NOTES
Development of questionnaire requires both creativity and scientific approach.

It involves creativity because the researcher should use creative words in communicating
to the respondents. Writing of questions alone does not make up a questionnaire. It
should be scientific as it integrates the established rules of logic, objectivity, discriminatory
powers and systematic procedures.
Guidelines for questionnaire design
A good questionnaire accomplishes the research objectives. The logical
sequences of the steps involved in the development of a good questionnaire are discussed
below:
I. Deciding the information to be collected
II. Formulate the questions needed to obtain the information
III. Decide on the wordings of the questions and layout of the
questionnaire
IV. Pretesting the questionnaire and correcting the problem
I. Deciding the information to be collected
The researcher should have a clear idea of exactly what information is to be
collected from each respondent. Lack of clarity will lead to collection of irrelevant and
incomplete information which does not contribute towards the research purpose. The
situation will diminish the value of the study. Clarity can be facilitated by,
1. Clear research objectives that will provide an insight into the kind of information
needed, the hypotheses and the scope of the research
2. Exploratory research will reveal the variables to be explored and will enable to
understand the point of view of the respondents
3. Experience with similar studies
4. Pretesting the preliminary version of the questionnaire
In deciding the content of the questionnaire the following guiding factors should
be considered:
99
DBA 1657
NOTES
The question may be asked to get information regarding objective or subjective

variables or both. In the case of objective variables like age, gender, income
etc a single direct question can be asked. However, if the question is regarding
subjective variable for eg., regarding attitude, feeling, satisfaction etc then the
questions should tap the dimensions and elements of the concept concerned.
The researcher should challenge each questions in terms of its contribution

towards providing an answer for the objectives. Questions which merely
contribute interesting information and not towards the fulfillment of the objectives
should be avoided. The researcher should learn the art of getting more
information with fewer questions.
The question should have a proper scope and should cover the issue. The
questions asked should reveal all that is needed to know. Questions are
considered to be ineffective if they do not provide the right information that is
needed.
The question should ask precisely what is needed. For e.g., if the researcher
needs to know the family income of the respondent but the question is asked
regarding income then it may mean to the respondent as the respondents
income and not family income. Unambiguous words can be used so that clarity
can be ensured.
The question asked by the researcher may be contributing towards the theme
and may be precise but it may not be possible for the respondent to answer the
same adequately. The respondent may require time to think and answer certain
questions. Sometimes the respondent may not be able to give an accurate
answer due to his inability to recall things from memory.
II. Formulating the questions

Before formulating the questions a decision has to be made by the researcher
regarding the degree of freedom to be given to the respondents in answering the questions.
The various types of the question that can be included in a questionnaire are discussed
below:
1. Open-ended versus closed questions:
Unstructured questions or open-ended questions allow respondents to reply to
the questions in own words. It enables the respondent to answer in any way he chooses.
Predetermined responses are not given to aid the respondent. For example a question
asking the respondent to list five factors which made him to choose a particular investment
proposal. This type of question requires more thinking and effort on the part of
respondents. In most cases an interviewer is required to prompt the response by asking
100
DBA 1657
probing questions. If correctly administered the open ended question can provide the
researcher with a rich array of information.
NOTES
Structured or close-ended question in contrast provides a set of predetermined

responses and the respondents is required to choose among the same. This question
reduces the amount of thinking and effort required by the respondent. Instead of asking
the respondent to list five factors, the questionnaire may provide a set of 10 to 15
factors and ask the respondent to rank the first five among the list, in the order of their
preference. All items in the questionnaire using nominal, ordinal or Likert or ratio scale
are considered closed. The close-ended questions enable the researcher to code the
responses easily for the purpose of carrying out subsequent analysis. Care should be
exercised in making the alternatives provided as mutually exclusive and collectively
exhaustive. Even a well delineated category in closed question may make the
respondent feel confined and he may be willing to provide additional comments. The
researcher can tackle this issue by substantiating the close-ended questionnaire with a
final open ended question.
2. Dichotomous questions
Two alternatives are suggested in dichotomous questions. The choices presented
should be mutually exclusive i.e. the respondent should choose either of the answer
only. At the same time the given choices should be collectively exhaustive.
3. Multiple choice questions
Multiple choices offer more than one alternative answer and from which the
respondent to makes a single choice. The list of answers provided should be collectively
exhaustive. The alternatives provided should represent different aspects of the same
conceptual dimension. The multiple choice question usually generates nominal data.
When the choices are numbers, the response structure will produce at least interval and
sometimes ratio data.
4. Checklist questions
Checklist questions are used when the researcher wants the respondent to give
multiple responses to a single question. For e.g., the factors leading to the choice of a
particular brand laptop. The same information can be obtained from the respondent
using a series of dichotomous selection questions, one for each factor. However, it
would be time and space consuming. Checklists are more efficient.
5. Ranking questions
Ranking question is used when the response regarding the relative order of the
alternatives are important. For e.g., the check list question regarding the factors leading
101
DBA 1657
NOTES
to the choice of laptop will only provide the factors considered but not the order of
importance. The ranking question will lead the respondent to rank the most important
factor as 1 the next important as 2 and so on.
6. Positively and negatively worded questions
The questionnaire should include both positively and negatively worded
questions. If all the questions are positively worded then the respondent will tend to
mechanically circle all the points toward one end of the scale. A respondent who is
interested in completing the questionnaire soon will tend to circle all the questions to
one end. The researcher can keep a respondent more alert by including both positive
and negative worded questions. The use of double negatives and excessive use of
words such as not , only etc., should be avoided in the negatively worded question
as they will tend to confuse the respondents.
7. Double-barreled questions
A question that leads to different possible responses to its sub-parts is called a
double-barreled question. Such questions should be avoided by way of breaking the
questions into two or more parts. For example the question do you like the flavour
and the taste of the soft drink?. The question may lead to ambiguous reply. It should be
broken into two questions addressing flavour and taste separately so as to obtain
unambiguous response.
The type of question dealt below should be carefully avoided or used with
caution by the researcher.
8. Ambiguous question
The question may not be double-barreled but still it may lead to ambiguity. For
e.g., if the researcher involved in the study of the job satisfaction asks the respondent
to rate the level of satisfaction, the respondent may be confused as to whether the
question is addressing satisfaction related to work environment, salary, team spirit or
overall satisfaction. The question should not give raise to ambiguous response and bias.
9. Memory related questions
If the questions require respondents to recall experiences from a distance past
that are very hazy in their memory, then the answers to such question might be biased.
10. Leading / Loaded questions
Questions should not be asked in such a way that the respondents are forced
or directed to respond in a manner that he would not have, under normal situations
where all possible alternatives are given. Questions should not prompt the respondents
to answer in the way the researcher wants it answered. For example, Dont you think
102
DBA 1657
that salary is the main reason for software employees to quit the job?. Questions
which are emotionally charging the respondents are called as loaded questions. Such
questions would lead to bias in response and should be avoided.
NOTES
11. Bad questions

Any question that prevent or disturbs the fundamental communication between
the researcher and the respondent is considered to be a bad question. Some examples
of the bad questions are incomprehensible questions, unanswerable question, leading
or loaded questions, double-barreled question etc.
III. Decide on the wordings of the questions and layout of the questionnaire
The basic component of a questionnaire is the words. The researcher should
be careful in considering the words to be used in creating the questions and scales for
collecting raw data from respondents. The words used can influence respondents,
reaction to the question. Even a small change in the words can affect the respondent
answers, but it is difficult to know in advance whether or not a change in wording will
have an effect.
The wording used in the questionnaire and the language used should be
appropriate and understandable by the respondents. Certain guidelines in deciding the
wordings of the questionnaire are given below:
The vocabulary should be simple, direct and familiar to all respondents. If the
wordings / jargons used or the language is not understood by the respondent,
then it may lead to wrong or biased answers. The wording and language should
be selected keeping in mind the educational level of the respondents, the terms
used in the culture and the frames of reference of the respondents.
The words used should not give raise to ambiguity or vagueness. This problem
arises because of not giving the respondent an adequate frame of reference , in
time and space for interpreting the question. Words such as often, usually
lack an appropriate time referent leading the respondents to choose their own
which will lead to answers not comparable. Similarly, appropriate space or
location is not often specified. For eg.,the question Mention your place of
origin Does it elicit response as the district, state or country?.
Double-barreled question should be avoided. The respondent may agree with

one part of the question but not the other. For e.g., Are you satisfied with the
salary and increments given? The question should be broken else it would lead
to confusion and incorrect answers
103
DBA 1657
NOTES
The instructions provided to answer the question should not be confusing the
respondent. The questions should be directed more towards measuring the
respondents knowledge or interest in the subject
The questions asked should be applicable to all the respondents. Otherwise it

will make a respondent to answer a question though they dont quality to do so
or may lack an opinion. For e.g., Which other airways have you traveled before?.
This situation can be avoided by asking a qualifying or filter question and limit
further questioning to those who qualify.
Simple short questions should be asked instead of long ones. Researcher should
see that a question or a statement in the questionnaire should be worded as
minimum as possible.
Questions should not be asked in such a manner that it will elicit socially desirable
response. For example, Do you think that physically challenged people should
be given more weightage in employment opportunities?. Irrespective of the
true feeling of the respondent a socially desirable answer would be provided.
Sequencing and layout decisions

The order in which the questions are to be presented can encourage or
discourage the commitment and promote or hinder the development of researcherrespondent rapport. The sequence of questions asked in the questionnaire should lead
the respondents from questions of general nature to specific nature. It should start with
relatively easy questions which does not involve much thinking and should progress to
difficult questions. This facilitates easy and smooth progress of the respondents through
the various items in the questionnaire. Care should be taken to see that the positively
and negatively worded questions addressing the same issue or concept are not placed
contiguously. For eg.,
I am satisfied with the working environment
I am not satisfied with the working environment
If the above questions appear in the same order it will appear meaningless to
the respondent. The two questionnaires should be placed in different places of the
questionnaire. The way in which questions are sequenced would introduce bias in the
response which is frequently referred to as the ordering effects. Randomly placing the
questions in the questionnaire would reduce bias in the response, however, it is not
attempted as it would lead to difficulty in categorizing, coding and analyzing the responses.
Layout of the questionnaire
The appearance of the questionnaire is as important as its content. A neat,
properly aligned and attractive questionnaire with a good introduction, instructions and
104
DBA 1657
well sequenced questions and response alternatives will make things easier for the
respondents to answer. These aspects are explained below:
In the introduction section, the researcher can disclose his identity and
communicate the purpose of the research. It is also used to motivate the
respondents to answer the questions by conveying the importance of the research
work and by specifying the importance of contribution from the respondent.
The researcher should also ensure the confidentiality of the information provided.
The introduction section should end with a courteous note, thanking the
respondent for the time devoted to respond to the survey.
The questions should be organized in a logical manner and numbered sequentially

under appropriate sections. Proper instructions should be provided to complete
the questions in an unambiguous manner. The questions should be neatly assigned
so as to enable the respondent to read and answer the same without difficulty.
The questionnaire should be designed in such a way that the respondent spends
only minimum time and effort in completing the same.
Questions relating to the personal profile of the respondents viz., name, gender,
age, education, income, marital status etc., can appear in the beginning or at the
end of the questionnaire. The questions should provide a range of response
options rather than seeking an exact figure. The personal profile related questions
asked at the end may have a greater chance of response because the respondent
would have gone through other questions which would have convinced him
about the legitimacy and genuineness of the questions framed. This would make
them more amenable to reveal the personal information. Some researchers feel
that asking personal data in the beginning would enable the respondent to
psychologically identify themselves with the questionnaire and enhance the
commitment to respond.
NOTES
Avoiding the question relating to the name of the respondent would be

better as it ensures anonymity and enhances the probability of response. The
identification of the questionnaire with a particular respondent can be made by
assigning number instead of asking name. A separate private document can be
maintained connecting the name and number given to identify the respondent.
The open ended questions should be put at the end so the respondent may find
it easy to comment on the various aspects.
The questionnaire should end with an expression of sincere thanks to the

respondent for spending their valuable time and effort. The researcher can also
include a courteous note, reminding the respondents check that all the items
have been completed properly.
105
DBA 1657
NOTES
iv. Pretesting the questionnaire

The purpose of a pretest is to ensure that the questionnaire meets the researchers
expectations in terms of the information to be obtained. The objective of the pretest is
to identify and correct the deficiencies in the questionnaire. It may lead to revising
questions many times. It involves the use of a small number of respondents to test the
appropriateness of the questions. 15 respondents are sufficient for a short and
straightforward questionnaire, whereas 25 may be needed in case of a long and complex
questionnaire with many branches and multiple options. Feedback is obtained from the
respondents involved in the pretest on the general reaction to the questionnaire and
regarding the effort involved in completing the questionnaire. Any difficulty or ambiguity
can be identified and rectified before administering the questionnaire to a large number
of respondents. This helps to rectify any mistakes in time and enables to reduce the
biases.
Various type of pretesting can be carried out ranging from informal reviews by
colleagues to creating conditions similar to the final study. Some types are discussed
below:
The researcher pretesting is conducted in the initial stages so as to build

more structure in to the test. Fellow researchers can be involved. Many
suggestions and discussion may take place leading to a refined questionnaire
Participant pretesting involves testing the questionnaire in the field by involving

the participants or participant surrogates. Surrogates are those individuals with
characteristics and backgrounds similar to the desired participants.
Collaborative pretest can be conducted by the researcher where the

researcher informs or alerts the participants of their involvement in the preliminary
test of questionnaire. This makes the participants as the collaborators in the
process of refinement of the questionnaire. A detailed probing of the parts of
the question, including the words and phrases is carried out.
Non-collaborative pretest is where the researcher does not inform the

participant that the activity is a pretest. However, the probing of the questionnaire
is done.
The pretest is conducted for the following reasons:
The most important purpose for pretesting is to know whether the meaning of
the questions is interrupted in the manner in which it is intended to. This problem
may arise because, the respondent may not be familiar with the word which
will result in distortion of the meaning of the question. The respondent is likely
to modify a difficult question in a way that makes it easier for him to respond.
106
DBA 1657
Flow of the questionnaire should be tested to know whether the transition from
one topic to another is natural, logical and ensures a coherent flow.
Many questionnaires have instructions on what question to skip, depending on

the answer to a previous question. The skip pattern must be clearly laid out. In
this context a questionnaire is like a road map with signs. Researchers who
have been involved with the questionnaire design may not spot any inconsistencies
or ambiguities as they are highly involved in the task. Pretesting will ensure the
correct layout of the questionnaire
The length of the questionnaire is pretested as a lengthy questionnaire will often

lead to fatigue among the respondent, interview break-off and refusal if the
respondents know in advance, the expected length.
Task difficulty should also be identified through pretest. The respondent may
be confused if the question requires that a respondent make connections or put
together information in an unfamiliar way. For e.g., questions related to annual
income. It involves calculation by the respondent. Instead the researcher can
get monthly income and calculate the annual income on his own.
Ability to capture and maintain the interest of the respondent throughout the
entire questionnaire is a major challenge. The extent to which this is successful
should be pretested
Testing the items for an acceptable level of variation in the target population is
one of the common goals of pretesting. The researcher should lookout for
items showing greater variability. Very skewed distributions from a pretest can
serve as a warning signal that the question is not tapping the intended construct.
NOTES
The flaws identified in the questionnaire should be corrected. Finally the pretest
analysis should return to the first step in the design process. Each question should be
reviewed again and again regarding its contribution to objectives of the study, leading to
other steps. The last step in the process may be another pretest, if major changes are
needed again.
Interview Schedules Vs Questionnaire

Use of interview schedules method to collect data is much the same like
questionnaire method except the fact that in the case of schedules, the same is filled up
by researcher himself or by enumerators who are appointed for this purpose. The
schedule is a proforma containing a set of questions and the space to record the answer
for the same. The enumerators along with the interview schedules meet the respondents,
put the questions to them from the proforma in the order the questions are listed and
record the replies in the space meant for the same. In certain situations, schedules may
107
DBA 1657
NOTES
be handed over to the respondents and researcher may help them in recording their
answers to various questions in the schedules. The researcher can explain the aims and
objectives of the investigation and also can clear doubts and difficulties which the
respondents feel in understanding the implications of a particular question.
The success of this method depends on the selection of enumerators for filling
up schedules or assisting respondents to fill up schedules. The enumerators should be
trained to perform their job well and nature and scope of the investigation should be
clearly explained to them. The purpose of each question and the type of response
expected should be informed to them. Enumerators should posses patience to tackle
the respondents and should also be intelligent to cross examine and find the truth. They
should be sincere, hardworking and should have perseverance.
Collection of data using interview schedules and enumerators lead to fairly
reliable results and extensive inquiry. However it is expensive and takes time.
Difference between questionnaire and interview schedule
Questionnaire and interview schedule are both used for data collection and
they resemble each other. However, the important points of difference are highlighted
below:
i.
The questionnaire can be sent thought mail with covering letter and the same
does not require further assistance. The schedule is filed out by the researcher
who interprets the question whenever needed.
ii. Collecting the questionnaire requires less expense as it is filled by the respondent
himself. In the case of schedules, enumerators should be appointed. This involves
additional expenses in terms of payments made to them and training provided.
iii. The rate of non-response is usually higher in case of mailed questionnaire. In
case of schedules the non-response rate is lesser as the enumerator himself fills
the schedules and is personally present. However, the danger of bias and cheating
prevails.
iv. The identity of the respondent is not clear in the case of the questionnaire, but
in case of the schedules the identity is known.
v. The questionnaire method of data collection involves time as it requires several
reminders inspite of which it may not be returned. In case of schedules direct
personal contact is established and responses are elicited soon.
vi. Questionnaire method can be used only in case of educated or literate
respondents but the interview schedules can be administered even in case of
illiterate persons
108
DBA 1657
vii. Wider and more representative population is possible in the questionnaire method
of data collection, but it remains as a difficulty in case of schedules particularly
when the respondents are distributed over a wide geographical area.
NOTES
viii. Risk of collecting incomplete and wrong information is more in case of

questionnaire method, but in case of schedules, the enumerators are present to
see that the questions are properly filled in. As a result the information collected
through the schedules are more accurate than those obtained through the
questionnaire.
ix. The success of the questionnaire method depends to a greater extent on the
quality of the questionnaire, but in case of the interview schedules it depends on
the honesty, sincerity and perseverance of the enumerators.
x. The physical appearance of the questionnaire is very important to attract and
retain the respondents attention, however, the level of importance is not the
same in case of the interview schedule.
xi. Additional data can be obtained by the enumerator apart from what is asked in
the schedules by personal observation. This is not possible in case of the mailed
questionnaire.
Electronic questionnaire design and surveys

Electronic questionnaires or online questionnaries combines questionnaire based
survey functionality with that of a webpage or website. It also includes mailing of data
disk to the respondents, who may use their own personal computers for responding to
the questions. However, web surveys are rapidly gaining popularity as they have major
speed, cost, and flexibility advantages. Factors such as layout or organization, formatting,
structure of the questions and technical requirements should be taken into consideration
while designing and implementing an electronic questionnaire. Decision regarding the
above listed aspects should be made keeping in consideration, the profile of the target
audience identified for the survey. The guidelines relating to the factors to be considered
are discussed below:
1. General organization
The overall structure and organization of electronic question can follow the
pattern depicted in the figure below. A welcome page should be used to motivate
respondent to participate in the survey. Where the entry is restricted a login facility can
be provided to allow the user to enter the password. A short introductory page should
provide general information about the survey, including specific direction if any to fill the
questionnaire. If a screening test is required for the survey it is delivered before
proceeding to the main section of the questionnaire. Every online questionnaire should
109
DBA 1657
NOTES
conclude with the acknowledgement to the respondent for the time and effort spent in
completing the questionnaire. These aspects are detailed below:
Welcome
Introduction
Registration/
Login
Additional
information
Questionnaire
questions
Screening test
Thank you
i. Welcome:
The site or domain name that brings the respondents to the survey page should
be easy to remember and should reflect the purpose of the questionnaire. Several domain
name could be used to attract the respondents. The welcome page should be designed
in such a way that it is loaded quickly. The page should provide information regarding
the organization on whose behalf the questionnaire is administered. It should motivate
the respondent to take part in the survey and emphasize on the ease of responding. The
procedure to start should also be made evident. For questionnaire with password
restriction, the fact should be mentioned clearly in the welcome screen so that the
respondent does not waste time over the same. Too much of animations and gimmicks
should be avoided as it may take more time to download and may also distract the
respondents attention.
ii. Registration/login
The registration or login screen is needed if the access to the questionnaire is
restricted to specific people. The passwords access should be provided to the
appropriate respondents so as to enable them to participate in the survey. While
processing the pin number and password it is better to accept dashes and hyphens as
part of string of numbers. In order to alleviate the respondents frustration, soon after
110
DBA 1657
the data is entered in the required fields, all correct data should be accepted and only
the fields that have been erroneously omitted or completely incorrect should be
highlighted for reentry with a proper explanation regarding the required data. Sufficient
time should be provided to read and complete the registration forms before automatic
time out.
NOTES
iii. Introduction
This section should provide a brief description of the survey, the purpose and
the importance of the response received. It should also outline all the security and
privacy practices associated with the survey so as to reassure the respondents.
Alternatively, these informations can also be included in the registration/login page.
iv. Screening test
If the screening test is very simple it can be located within the introduction
page. If it is more extensive, it should be dealt in a separate page but should be linked
to the preceding and the succeeding pages. If a respondent fails a screening test, still the
chance to participate in the survey should not be denied as it will offend the respondent.
However the contribution can be discarded in the study.
v. Questions
The questions should follow all the basic guidelines of the offline questionnaire.
In addition the following should also be considered in the designing of the electronic
questions:
1. The total number of questions should be as minimum as possible.
2. Initial questions should be routine, easy to answer questions so as to ease the
respondents mindset. The first question should be engaging and should attract
the attention and interest of the respondent.
3. Difficult, sensitive and most important questions should appear after the
respondent has completed atleast 1/3 of the questionnaire at a point when the
respondent would have settled down.
4. In order to ensure consistency of responses, repeated questions which are
worded in a different manner often forms part of the questionnaire. Such
questions should be placed apart.
5. Open-ended questions should appear before close-ended on the same topic
so as to prevent influencing respondents with the fixed option choices of the
close-ended questions.
111
DBA 1657
NOTES
vi. Additional information/links:

In order to ensure that the main questionnaire is simple not cluttered with
unnecessary information, additional information can be included in a separate page
which is linked to the main page. The respondent should be able to return to the main
questionnaire easily at any point of time.
vii Note of gratitude:
The questionnaire should end with a note of gratitude to the respondent for the
time and effort spent by him. The same should be expressed in a friendly and gentle
tone. This page should also include the facility for respondents to e-mail feedback or
comments to the questionnaire administrators.
2. Layout
Online questionnaires should create a distincitive, positive visual impression so
as to evoke a feeling of trust and assist the respondents in the process of completing the
questionnaire. Some guidelines are given below:
1. The question and answering process should be attractive. It should be presented
in an uncluttered manner evoking ease in completion. Too much information
should not be squeezed in one page. Information should be aligned horizontally
and vertically to enable easy reading.
2. A question should never be separated from its response set. The question and
all elements of response should appear in the same page/screen.
3. The questions related to a given topic should be presented together and clearly
sectioned from questions related to other topics. Section heading and
subheadings should be used to clearly differentiate sections. Too many sections
should also be avoided as it will lead to confusion and lack of focus.
4. It should not be made compulsory for a respondent to answer a question, so
as to move to the next question, except in case of screening question.
Respondents should be provided the freedom of moving back and forth in the
process of completing the questionnaire
5. A questionnaire can be designed within a single screen with scrolling page or
dispersed across several, linked, non-scrolling pages. It should be noted that
lengthy questionnaire with a scroll may frustrate a respondent and give the
impression that a questionnaire will take a long time to complete. Only a few
questions should appear in a page and clear and easy links to preceding and
succeeding pages should be provided.
112
DBA 1657
6. Frames can make pages difficult to read, print, increase load time and cause
problems. So the use of frame should be minimized or avoided.
NOTES
7. Forms and fields are commonly used for data entry. The field labels should be
placed close to the associate fields. The submit button should be located
adjacent to the last field. The tab order for key navigation around the fields in
questionnaire should be logical and reflect the visual appearance as far as
possible. Fields should be stacked in vertical column and any instruction
pertaining to a given field should appear before and not after the field.
3. Navigation
To enable easy navigation within the website presenting questionnaire, online
questionnaire usually includes buttons, links, site maps and scrolling. All mechanisms
for navigation should be clearly defined and should be placed in such a manner that it
can be easily identified and accessed by the respondents. The navigational aids are
typically located at the top right hand corner of web pages and it should appear
consistently in the same place on each page of a quesionnaires website. The guidelines
regarding the navigational elements are listed below:
1. Buttons enables a respondent to exit a questionnaire or return to the previous/
next section of a questionnaire. This should be placed consistently in the same
place in all the pages and should be designed in a easily identifiable manner.
Graphical presentation can be used to name the button.
2. Links are commonly used in web pages. It should be designed in a simple
manner and used sparingly. It should be placed in a clearly identifiable manner.
Bold, coloured, underlined text can be used. A link that has been visited by the
respondent should be indicated by a change in the colour. Text based link
should be used rather than image based links. Clear distinctions should be
made between links to locations within the same page and different page.
3. Site maps provide an overview of the entire webpage at a single glance. They
help users navigate through website and enables saving time and frustration.
The path way is usually in a linear manner and therefore the orientation should
not be overly complex. The site maps should be scaleable and should be
consistently placed. It should be downloadable in minimum time and used only
when there are more number of pages in a questionnaire.
4. Scroll bar should be avoided as some respondent find scrolling hard to use
and it can also be overlooked by them. The welcome page should fit into a
single screen and not require scrolling. If scrolling cannot be avoided then the
respondents should be informed of the need to scroll. Scrolling can be avoided
by using jump buttons which takes the respondents to the next screen full of
information or questions.
113
DBA 1657
NOTES
4. Formatting
Formatting of a questionnaire includes several aspects i.e. text, colour, graphics,
flash, frames and tables, feedback and other miscellaneous factors. Guidelines pertaining
to each of these aspects are discussed below:
A. Text
i.
The font used should be readable and the text should be presented
in standard sentence format. Capital letters should be used only for
emphasizing title, captions etc.
ii. Sentences should not have minimum words and should be presented
with minimum characters per line. Paragraph should be of minimum
size.
iii. Technical instructions should be written in such a way that nontechnical people can understand them.
iv. Questions should be easily distinguishable in terms of formatting
from instructions and answers.
v. The relative position of questions and answers should be consistent
throughout the questionnaire. Where different types of questions
are to be included in the same questionnaire, each question type
should have an unique visual appearance
vi. A minimum font size of 12 pt should be used. The font colour should
contrast significantly with the background colour.
vii. The text should be left justified and use of italics should be avoided.
B. Colour
Colour has a great impact on the respondents and their responses and
so it is important to use colours in a wise manner. Consistent colour coding
should be used throughout the questionnaire to reinforce meaning or information
in an unambiguous fashion. Neutral background colour excluding patterns should
be used to make text easy to read. When using two colours, the colours of high
contrast can be used to ensure maximum discernability. The use of following
combination of colours can be avoided since visual vibrations and after images
can occur; red and green, yellow and blue, blue and red, blue and green. While
using colours the standard cultural colour association should be kept in mind
114
DBA 1657
NOTES
C.Graphics
In order to minimize the download time, the graphics should be kept to
minimum. Some guidelines to be followed are listed below:
i.
Cluttering the questionnaires with graphs should be avoided as it repels

the participants
ii. Small graphics that can be downloaded quickly should be used.

Individual images should not exceed 5KB in size and a single web
page should not have graphics exceeding a total of 20KB size
iii. It is essential to provide text also along with the graphics. Progressive
rendering method of allowing the text to download before the graphics
should be used so as to reduce the access time.
iv. The number of colours used in the graph should be kept minimum and
it should be ensured that the graphics do not resemble other items on a
typical website.
v. Overlapping of menus, blurring of picture, graying out areas etc., should
not be done. Crisp and clear images should be used.
vi. Multimedia and audio clips should not be associated with the graphics
if plug - ins should be downloaded to play the same. The users should
be allowed the freedom of skipping the multimedia content.
D. Flash
Online questionnaire should use minimum of flash including blinking
text since it requires certain browser versions or plug - ins. It makes it difficult
for users to download the animation or flash which takes much of the time to
load. If it is a must to use flash then the site should provide the user an option of
using a flash or non-flash format. It should provide for static navigation i.e. it
should not use disappear and reappear format and should always include a
way to navigate back to the location from which a user encountered a flash. A
close window option should be provided in all windows that open.
E. Tables and Frames
Tables and frames are used in website design for alignment and other
aesthetic purposes. If tables are used to convey structured information, they
should be kept short and simple. All information should be included within
tables in straight text and should be in a standard format.
115
DBA 1657
NOTES
F. Feedback
It is important to get feedback on the online questionnaire so as to
understand whether a respondent will abandon completion or will persevere
with it. With each new section/page respondent should be given real time
feedbacks to their degree of process through questionnaire. This may take the
form of 30 % completed in a progressive bar. Respondents answers to the
questions should be made immediately visible to them in a clear and concise
manner to reinforce the effect of their action.
G. Miscellaneous
The following guidelines relating to formatting does not fall in any of the
previously discussed categories. The total website content should remain below
60 KB of text and graphics. A version of questionnaire as well as all referenced
articles or documentation should be provided in an alternative format that can
be printed fully. All introductory pages in the survey website should include a
date-last-modified notification as well as a copyright notice if applicable.
5. Response Formats
Electronic equivalents to the various paper-based response styles have to be
selected to best meet the needs of the questionnaire and target audience. Some guidelines
in this respect are discussed below;
A. Matrix questions:
If a question involves many response options, matrix formats can be used to
condense and simplify questions. it should be used sparingly as they require a lot of
work to be done in a single screen. It is also hard to predict how such questions will
appear on the respondents web browsers and size and format of such questions
demands a significant amount of screen which cannot be guaranteed on smaller-scale
technology.
B. Drop-down boxes:
A drop-down text box appears in one line text format. When collapsed it
contains a list or response options from which a respondent can select one or more.
Drop-down boxes are fast to download and can be used when very long lists of response
options are required. It should be used sparingly as it requires very accurate mouse
click and should be avoided when it would be faster to simply type the response. It is
important that the first option in the drop-down list box is not visible by default as it can
lead respondents to select the same.
116
DBA 1657
NOTES
C. Radio Buttons:
Radio buttons are small circles that are placed next to response options of a
close ended question. By default, only one radio button within any given group of radio
buttons can be selected at a time. It can be used in case of mutually exclusive options.
They closely resemble the paper based questionnaire answer formats. It demands a
relatively high degree of mouse precision and users with limited computer exposure
may find it frustrating to click the options or to change the options.
D.Check boxes
Check boxes are typically small squares that contain a tick mark when checked
and allow multiple options rather than exclusive options. They also require high degree
of mouse precision. The advantage of using check boxes and radio buttons within the
same questionnaire is that their appearance is visibly different and so, respondents are
given visual clues as to how to answer any question using either of the two response
formats.
6. General technical guidelines
In addition to the above technicalities of a online-questionnaire design, the
following additional details should also be taken into consideration in the design of an
online questionnaire:
i. Privacy and Protection
It is important to ensure that the respondents privacy and perception of privacy
are protected. The survey data should be encrypted and the anonymity of the respondent
should be assured.
ii. Computer literacy
The questionnaire should be designed keeping in mind the less knowledgeable,
and low-end computer user. Specific instructions should be provided without offending
the inexperienced respondents. Prior knowledge or preconceptions in terms of
technological know-how should not be assumed. The need for double-click should be
eliminated since it can be difficult for an inexperienced user.
iii. Automation
Many aspects of the online questionnaire can be automated unlike the paper
based questionnaire. For e.g., the skip questions. Skip questions are primarily used to
determine the basis of the individual respondents answer, which of the following questions
a respondent should jump(or skip) to when question path is response directed. When
automation is used it should be carefully designed in order to avoid disorientation or
confusion to the respondents.
117
DBA 1657
NOTES
iv. Platforms & browsers

Before launching the questionnaire online it should be ensured that the
questionnaire operates effectively across all platforms and within all browsers. As a
general rule, only a portion of the capacity of the most advanced browsers should be
used in order to maximize the chance that all recipients of the questionnaire are having
a equal likelihood of responding to the same.
v. Devices
The technology is highly portable and information access is highly mobile. Hence
it is important to design all elements of the online questionnaire in a scaleable form. It is
unlikely that many respondents would choose to complete a lengthy questionnaire on a
handheld device, but this cannot be totally avoided and the range of the sizes to which
a respondent might resize their desktop browser can be anticipated. A well designed
and tested questionnaire should take this aspect into consideration.
Advantages
Web page surveys are extremely fast. A questionnaire posted on a popular

Web site can gather several thousand responses within a few hours. Many
people who will respond to an email invitation on the first day, and most will do
so within a few days.
There is practically no cost involved once the set up has been completed.
Large samples do not cost more than smaller ones
The researcher can use audio visuals in collection of the data. Some Web survey
software can also show video and play sound.
Web page questionnaires can use complex question skipping logic,

randomizations and other features not possible with paper questionnaires or
most email surveys. These features can assure better data.
Web page questionnaires can use colors, fonts and other formatting options
not possible in most email surveys.
A significant number of people will give more honest answers to questions

about sensitive topics, such as drug use when giving their answers to a computer,
instead of to a person or on paper.
On average, people give longer answers to open-ended questions on Web

page questionnaires than they do on other kinds of self-administered surveys.
Some Web survey software can combine the survey answers with preexisting information on the individuals taking a survey.
118
DBA 1657
It is possible to link the online questionnaire to data base and as such the
information received can be immediately updated without further need for manual
data entry as in the case of paper based questionnaire
NOTES
Disadvantages
The coverage error is prevalent in online survey i.e. all the members of the
population is not having an equal chance of being included in the survey. This is
particularly so in case of countries where the internet access is very low and
computer illiteracy is higher.
Non-responsive error is higher because respondent may not opt to fill the online
questionnaire. Also respondents can easily quit in the middle of a questionnaire.
They are not as likely to complete a long questionnaire on the Web as they
would be if talking with a good interviewer.
The survey on a web page cannot exercise a control over who replies - anyone
from anywhere who is surfing may answer. We cannot restrict the demographic
pattern of the respondent.
There is often no control over people responding multiple times to bias the
results.
In the present context online surveys should be mainly used only when the target
population consists entirely or almost entirely of Internet users. Business-to-business
research and employee attitude surveys can often meet this requirement. Surveys of the
general population usually will not. Web page surveys can be used when the researcher
uses audio or video or both sound and graphics. A Web page survey may be the only
practical way to have many people view and react to a video.
The researcher should make sure that the software used for conducting online surveys
prevents people from completing more than one questionnaire. The access can also be
restricted by requiring a password or by putting the survey on a page that can only be
accessed directly i.e., there are no links to it from other pages.
3.4.4 Other methods

In addition to the above discussed methods of data collection, the following
methods can be used:
1. Warranty cards
These are post sized cards which are used by dealers of consumer durables to
collect information regarding the product. The information sought is printed in the form
of questions on the warranty cards which is placed inside the package along with the
119
DBA 1657
NOTES
product with a request to the consumer to fill in the card and post the same to the
dealer.
2. Store audits
Store audits are performed by distributors as well as manufacturers through
their salesmen at regular intervals. The information is used to estimate market size,
market share, seasonal purchasing pattern etc. The data is obtained mostly by
observational method. Store audits are invariably panel operation, for the derivation of
sales estimates and compilation of sales trends are the base for the calculation. It provides
an efficient way to evaluate the effect of various in-store promotions on the sales.
3. Pantry audits
This is used to estimate consumption of the basket of goods at the consumer
level. The investigator collects an inventory of types, quantities and prices of the
commodities consumed. In pantry audits the data is recorded from the examination of
consumers pantry. The objective in a pantry audit is to identify the type of consumers
who buy certain products and certain brands. The basic assumption is that the contents
of the pantry accurately portray consumers preference. Pantry audits are usually
supplemented by direct questioning relating to reasons for preference of a product.
4. Consumer panels
The consumer panels consists a group of consumers who are interviewed on a
regular basis over a period of time. The consumer panels may be transitory or a continuing
panel. A transitory panel is set up to measure the effect of a particular phenomenon.
The panel is conducted on a before and after basis. Interview is conducted before the
phenomenon takes place and another interview after the phenomenon has occurred so
as to measure the changes in the attitude and behaviour of the consumers. A continuing
consumer panel is set up for an indefinite period with a view to collect data on a particular
aspect of consumer behaviour over a time period.
5. Mechanical devices
Use of mechanical devices enables to record data accurately. Eye camera,
Pupilometric camera, Psychogalvanometer, Motion picture camera and Audiometer
are some of the devices used for data collection. Eye cameras are designed to record
the focus of eyes of a respondent on a specific portion of a sketch or diagram or a
product package etc. Pupilometric cameras record dilation of the pupil as a result of
visual stimuli. The extent of dilation shows the degree of interest aroused by the stimuli.
Pshchogalvanometer is used to measure the extent of body excitement as a result of the
visual stimulus. Motion pictures are used to record the movement of the buyer while
120
DBA 1657
deciding to buy a consumer good. Audiometers are used with television to find out the
type of programmes as well as channels preferred by viewers .A device is fitted in the
television itself to record the changes which can be used to ascertain the market share.
NOTES
6. Projective techniques
Certain ideas and thoughts cannot be easily verbalized as it remains at the
unconscious levels in the minds of respondents. This can be brought to the surface by
trained professionals who apply different probing techniques so as to bring to the surface
the deep-rooted ideas and thoughts. Some techniques are explained below:
i. Word association test
The test is used to extract information regarding words which have maximum
association. Respondents are asked to quickly associate a word say - happy with the
first thing that comes to mind. This is often used to get true attitudes and feeling of the
respondent. The same idea is used in marketing research to find out the quality that is
mostly associated with a brand of product. This technique is quick and easy to use and
yields reliable results, when applied to words that are widely known and posses
essentially one meaning.
ii. Sentence completion tests
It is an extension of the word association tests. The respondent is provided
with a several half completed statement regarding a subject. Analysis of replies from the
respondent reveals his attitude towards the subject. This technique not only permits the
words testing but ideas too. It is quick and easy to use, however, it leads to analytical
problems as the responses are multidimensional.
iii. Story completion test
This test is a step further where the researcher may contrive stories instead of
sentences and the respondent to complete the same. The respondent is given just enough
of a story to focus attention on a given subject and is asked to provide a conclusion to
the story.
iv. Verbal projection tests
The respondent is asked to comment on or explain on what other people do.
For example - Why people own a particular product? Answers may reveal the
respondents own motivations.
v. Pictorial techniques
Several pictorial techniques are available. They are discussed below:
121
DBA 1657
NOTES
a. Thematic appreciation test (T.A.T) requires respondent to weave a story around

a picture that is shown. Several need patterns and personality characteristics
could be traced through these tests.
b. Rosenweiz test uses a cartoon format where a series of cartoons with words
inserted are given to the respondents. The respondents is asked to put his own
words in the empty space provided for the purpose in the picture. From the
response the attitude of the respondent can be inferred.
c. Rorschach test consists of ten cards having prints of ink-blots. The design happens
to be symmetrical but meaningless. The respondents are asked to describe
their perception and the responses are interpreted on the basis of some
predetermined psychological framework.
d. Holtzman Inkblot Test(HIT) contains 45 inkblot cards which are based on
colour, movement, shading and other factors involved in inkblot perception.
Only one response per card is obtained from the respondent and interrupted at
three levels i.e. accuracy(F) or inaccuracy (F-) of respondents percepts; shading
and color for ascertaining the affectional and emotional needs; and movement
responses for assessing the dynamic aspects of life.
vi. Play techniques
Under play techniques subjects are asked to act out a situation where various
roles are assigned to them. The researcher may observe such traits as hostility,
dominance, sympathy, prejudice or absence of such traits.
vii. Quizzes, tests and examinations
This technique is used for extracting information regarding specific ability of
candidates indirectly. The procedure uses both long and short questions to test the
memorising and analytical skills of the respondents.
viii. Sociometry
It is a technique for describing the social relationship among individuals in a
group. It attempts to describe attractions or repulsions between individuals by asking
them to indicate whom they would choose or reject in various situations. It enables to
study the underlying motives of the respondents.
Almost all the method of data collection discussed above have some bias
associated with them. Hence, collecting data through multi-methods and from multiple
sources lends rigor to research. However, it would be costly and time consuming.
122
DBA 1657
3.5 THE BASICS OF SAMPLING
NOTES
Sampling is an important concept which is practiced in every activity. Sampling

involves selecting a relatively small number of elements from a large defined group of
elements and expecting that the information gathered from the small group will allow
judgments to be made about the large group. The basic idea of sampling is that by
selecting some of the elements in a population, the conclusion about the entire population
is drawn. Sampling is used when conducting census is impossible or unreasonable. In
a census method a researcher collects primary data from every member of a defined
target population. It is not always possible or necessary to collect data from every unit
of the population. The researcher can resort to sample survey to find answers to the
research questions. However, they can do more harm than good if the data is not
collected from the people, events or objects that can provide correct answers to the
problem. The process of selecting the right individuals, objects or events for the purpose
of the study is known as sampling and the same is dealt in detail in this chapter.
The basic terminologies used in sampling are discussed below:
Population
Population is an identifiable total group or aggregation of elements that are of
interest to the researcher and pertinent to the specified problem. In other words it
refers to the defined target population. A defined target population consists of the
complete group of elements (people or objects) that are specifically identified for
investigation according to the objectives of the research project. A precise definition of
the target population is usually done in terms of elements, sampling units and time frames.
Element
An element is a single member of the population. It is a person or object from
which the data/information is sought. Elements must be unique, be countable and when
added together make up the whole of the target population. If 250 workers in a concern
happen to the population of interest to the researcher, each worker therein is an element.
Population Frame
The population frame is listing of all elements in the population from which the
sample is drawn. The nominal roll of class students could be the population frame for
the study of students in a class.
Sampling units
Sampling units are the target population elements available for selection during
the sampling process. In a simple, single-stage sample, the sampling units and the
population elements may be the same.
123
DBA 1657
NOTES
Sampling frame
After defining the target population, the researcher must assemble a list of all
eligible sampling units, referred to as a sampling frame. Some common sources of
sampling frames for a study about the customers are the customer list from credit card
companies.
Sample
A sample is a subset or subgroup of the population. It comprises some members
selected from it. Only some and not all elements of the population would form the
sample. If 200 members are drawn from a population of 500 workers, these 200
members form the sample for the study. From the study of 200 members, the researcher
would draw conclusions about the entire population.
Subject
A subject is a single member of the sample, just as an element is a single member
of the population. If 200 members from the total population of 500 workers form the
sample for the study, then each worker in the sample is a subject.
3.5.1 Why sampling?

There are several reasons for sampling. They are explained below:
Lower cost: The cost of conducting a study based on sample is much lesser
than the cost of conducting the census study.
Greater accuracy of results: It is generally argued that the quality of a study is

often better with sampling data than with a census. Research findings also
substantiate this opinion.
Greater speed of data collection: Speed of execution of data collection is higher

with the sample. It also reduces the time between the recognition of a need for
information and the availability of that information.
Availability of population element: Some situations require sampling. When the

breaking strength of materials is to be tested, it has to be destroyed. A census
method cannot be resorted as it would mean complete destruction of all materials.
Sampling is the only process possible if the population is infinite.
3.5.2 Steps in Developing a Sampling plan

A number of concepts, procedures and decisions must be considered by a
researcher in order to successfully gather raw data from a relatively small group of
people which in turn can be used to generalize or make predications about all the
elements in a larger target population. The following are the logical steps involved in the
sample execution.
124
DBA 1657
NOTES
Define the target population
Select the Data Collection Method
Identify the Sampling Frame Needed
Select the Appropriate Sampling Method
Determine necessary sample size and overall contact rates
Create an operating plan for selecting sampling units
Execute the operational plan
Define the target population

The first task of a researcher is to determine and identify the complete group of
people or objects that should be included in the study. With the statement of the problem
and the objectives of the study acting as guideline the target population should be identified
on the basis of descriptors that represent the characteristics features of element that
make the target populations frame. These elements become the prospective sampling
unit from which a sample will be drawn. A clear understanding of the target population
will enable the researcher to successfully draw a representative sample.
Select the data collection method
Based on the problem definition, the data requirements and the research
objectives, the researcher should select a data collection method for collecting the
125
DBA 1657
NOTES
required data from the target population elements. The method of data collection guides
the researcher in identifying and securing the necessary sampling frame for conducting
the research.
Identify the sampling frames needed
The researcher should identify and assemble a list of eligible sampling units.
The list should contain enough information about each prospective sampling unit so as
to enable the researcher to contact them. Drawing an incomplete frame decreases the
likelihood of drawing a representative sample.
Select the appropriate sampling method
The researcher can choose between probability and non-probability sampling
methods. Using a probability sampling method will always yield better and more accurate
information about the target populations parameters than the non-probability sampling
methods. Seven factors should be considered in deciding the appropriateness of the
sampling method viz., research objectives, degree of desired accuracy, availability of
resources, time frame, advanced knowledge of the target population, scope of the
research and perceived statistical analysis needs.
Determine necessary sample sizes and overall contact rates
The sample size is decided based on the precision required from the sample
estimates, time and money available to collect the required data. While determining the
sample size due consideration should be given to the variability of the population
characteristic under investigation, the level of confidence desired in the estimates and
the degree of the precision desired in estimating the population characteristic. The number
of prospective units to be contacted to ensure that the estimated sample size is obtained
and the additional cost involved should be considered. The researcher should calculate
the reachable rates, overall incidence rate and expected completion rates associated
with the sampling situation.
Creating an operating plan for selecting sampling units
The actual procedure to be used in contacting each of the prospective
respondents selected to form the sample should be clearly laid out. The instruction
should be clearly written so that interviewers know what exactly should be done and
the procedure to be followed in case of problems encountered, in contacting the
prospective respondents.
Executing the operational plan
The sample respondents are met and actual data collection activities are executed
in this stage. Consistency and control should be maintained at this stage.
126
DBA 1657
3.5.3 Characteristic features of a good sample
NOTES
The ultimate test of a good sample is based on how well it represents the
characteristics of the population it represents. In terms of measurement the sample
should be valid. Validity of the sample depends on two considerations viz., accuracy
and precision.
Accuracy
The accuracy is determined by the extent to which bias is eliminated from the
sample. When the sample elements are drawn properly, some sample elements
underestimates the population values being studied and others overestimate them.
Variations in these values offset each other. This counteraction results in sample value
that is generally close to the population value. An accurate i.e., unbiased sample is one
in which the underestimators and the overestimators are balance among the members
of the sample. There is no systematic variance with an accurate sample. Systematic
variance has been defined as the variation in measures due to some unknown influences
that cause the scores to lean in one direction more than another. Even a large size of
samples cannot counteract systematic bias.
Precision
A second criterion of a good sample design is precision of estimate. No sample
will fully represent its population in all aspects. The numerical descriptors that describe
samples may be expected to differ from those that describe population because of
random fluctuations inherent in the sampling process. This is called sampling error.
Sampling error is what is left after all known sources of systematic variance have been
accounted for. In theory, sampling error consists of random fluctuations only, although
some unknown systematic variance may be included when too many or too few sample
elements possess a particular characteristic. Precision is measured by standard error of
estimate, a type of standard deviation measurement; the smaller the standard error of
estimate, the higher is the precision of the sample. The ideal sample design produces a
small standard error of estimate.
3.6 TYPES OF SAMPLING DESIGN

The sampling design can be broadly grouped on two basis viz., representation
and element selection. Representation refers to the selection of members on a probability
or by other means. Element selection refers to the manner in which the elements are
selected individually and directly from the population. If each element is drawn individually
from the population at large, it is an unrestricted sample. Restricted sampling is where
additional controls are imposed, in other words it covers all other forms of sampling.
The classification of sampling design on the basis of representation and element selection
is shown below:
127
DBA 1657
NOTES
Representation Basis
Element Selection
Unrestricted
Restricted
Probability
Simple random
Complex random
Systematic
Stratified
Cluster
Double
Nonprobability
Convenience
Purposive
Judgement
Quota
Snowball
3.6.1 Probability Sampling

Probability sampling is where each sampling unit in the defined target population
has a known non-zero probability of being selected in the sample. The actual probability
of selection for each sampling unit may or may not be equal depending on the type of
probability sampling design used. Specific rules for selecting members from the
operational population are made to ensure unbiased selection of the sampling units and
proper sample representation of the defined target population. The results obtained by
using probability sampling designs can be generalized to the target population within a
specified margin of error. The different types of probability sampling designs are discussed
below:
A. Unrestricted or Simple Random sampling
In the unrestricted probability sampling design every element in the population
has a known, equal non-zero chance of being selected as a subject. For example, if 10
employees (n = 10) are to be selected from 30 employees (N = 30), the researcher can
write the name of each employee in a piece of paper and select them on a random
basis. Each employee will have an equal known probability of selection for a sample.
The same is expressed in terms of the following formula:
Size of sample
Probability of selection =
Size of population
Each employee would have a 10/30 or .333 chance of being randomly selected
in a drawn sample. When the defined target population consists of a larger number of
sampling units, a more sophisticated method can be used to randomly draw the necessary
sample. A table of random numbers can be used for this purpose. The table of random
numbers contains a list of randomly generated numbers. The numbers can be randomly
128
DBA 1657
generated through the computer programs also. Using the random numbers the sample
can be selected.
NOTES
Advantages and disadvantages

The simple random sampling technique can be easily understood and the survey
result can be generalized to the defined target population with a prespecified margin of
error. It also enables the researcher to gain unbiased estimates of the populations
characteristics. The method guarantees that every sampling unit of the population has a
known and equal chance of being selected, irrespective of the actual size of the sample
resulting in a valid representation of the defined target population.
The major drawback of the simple random sampling is the difficulty of obtaining
complete, current and accurate listing of the target population elements. Simple random
sampling process requires all sampling units to be identified which would be cumbersome
and expensive in case of a large population. Hence, this method is most suitable for a
small population.
B. Restricted or Complex Probability Sampling
As an alternative to the simple random sampling design, several complex
probability sampling design can be used which are more viable and effective. Efficiency
is improved because more information can be obtained for a given sample size using
some of the complex probability sampling procedures than the simple random sampling
design. The five most common complex probability sampling designs viz., systematic
sampling, stratified random sampling, cluster sampling, area sampling and double sampling
are discussed below:
i. Systematic random sampling
The systematic random sampling design is similar to simple random sampling
but requires that the defined target population should be ordered in some way. It
involves drawing every nth element in the population starting with a randomly chosen
element between 1 and n. In other words individual sampling units are selected according
their position using a skip interval. The skip interval is determined by dividing the sample
size into population size. For e.g., if the researcher wants a sample of 100 to be drawn
from a defined target population of 1000, the skip interval would be 10(1000/100).
Once the skip interval is calculated, the researcher would randomly select a starting
point and take every 10th until the entire target population is proceeded through. The
steps to be followed in a systematic sampling method are enumerated below:
Total number of elements in the population should be identified
The sampling ratio is to be calculated ( n = total population size divided by size

of the desired sample)
129
DBA 1657
NOTES
The random start should be identified
A sample can be drawn by choosing every nth entry
Two important considerations in using the systematic random sampling are:
It is important that the natural order of the defined target population list be
unrelated to the characteristic being studied.
Skip interval should not correspond to the systematic change in the target
population.

The major advantage is its simplicity and flexibility. In case of systematic sampling
there is no need to number the entries in a large personnel file before drawing a sample.
The availability of lists and shorter time required to draw a sample compared to random
sampling makes systematic sampling an attractive, economical method for researchers.
The greatest weakness of systematic random sampling is the potential for the hidden
patterns in the data that are not found by the researcher. This could result in a sample
not truly representative of the target population. Another difficulty is that the researcher
must know exactly how many sampling units make up the defined target population. In
situations where the target population is extremely large or unknown, identifying the
true number of units is difficult and the estimates may not be accurate.
ii. Stratified random sampling
Stratified random sampling requires the separation of defined target population
into different groups called strata and the selection of sample from each stratum. Stratified
random sampling is very useful when the divisions of target population are skewed or
when extremes are present in the probability distribution of the target population elements
of interest. The goal in stratification is to minimize the variability within each stratum and
maximize the difference between strata. The ideal stratification would be based on the
primary variable under study. Researchers often have several important variables about
which they want to draw conclusion. A reasonable approach is to identify some basis
for stratification that correlates well with other major variables. It might be a single
variable like age, income etc or a compound variable like on the basis of income and
gender.
Stratification leads to segmenting the population into smaller, more homogeneous
sets of elements. In order to ensure that the sample maintains the required precision in
terms of representing the total population, representative samples must be drawn from
each of the smaller population groups.
130
DBA 1657
There are three reasons as to why a researcher chooses a stratified random

sample;
To increase the samples statistical efficiency
To provide adequate data for analyzing various sub population
To enable different research methods and procedures to be used in different

strata.
NOTES
Drawing a stratified random sampling involves the following steps:

1. Determine the variables to use for stratification
2. Select proportionate or disproportionate stratification
3. Divide the target population into homogeneous subgroups or strata
4. Select random samples from each stratum
5. Combine the samples from each stratum into a single sample of the target
population.
There are two common methods for deriving samples from the strata viz.,
proportionate and disproportionate. In proportionate stratified sampling, each stratum
is properly represented so the sample drawn from it is proportionate to the stratums
share of the total population. The larger strata are sampled more because they make up
a larger percentage of the target population. This approach is more popular than any
other stratified sampling procedures due to the following reasons:
It has higher statistical efficiency than the simple random sample
It is much easier to carry out than other stratifying methods
It provides a self-weighing sample i.e., the population mean or

proportion can be estimated simply by calculating the mean or
proportion of all sample cases.
In disproportionate stratified sampling, the sample size selected from each

stratum is independent of that stratums proportion of the total defined target population.
This approach is used when stratification of the target population produces sample
sizes that contradict their relative importance to the study.
An alternative of disproportionate stratified method is optimal allocation. In
this method, consideration is given to the relative size of the stratum as well as the
variability within the stratum to determine the necessary sample size of each stratum.
The logic underlying the optimal allocation is that the greater the homogeneity of the
prospective sampling units within a particular stratum, the fewer the units that would
131
DBA 1657
NOTES
have to be selected to estimate the true population parameter accurately for that
subgroup. This method is also opted for in situation where it is easier, simpler and less
expensive to collect data from one or more strata than from others.
Stratified random sampling provides several advantages viz., the assurance of
representativeness in the sample, the opportunity to study each stratum and make relative
comparisons between strata and the ability to make estimates for the target population
with the expectation of greater precision or less error.
iii. Cluster sampling
Cluster sampling is a probability sampling method in which the sampling units
are divided into mutually exclusive and collectively exhaustive subpopulation called
clusters. Each cluster is assumed to be the representative of the heterogeneity of the
target population. Groups of elements that would have heterogeneity among the members
within each group are chosen for study in cluster sampling. Several groups with intragroup
heterogeneity and intergroup homogeneity are found. A random sampling of the clusters
or groups is done and information is gathered from each of the members in the randomly
chosen clusters. Cluster sampling offers more of heterogeneity within groups and more
homogeneity among the groups.
Single stage and Multistage cluster sampling
In single stage cluster sampling, the population is divided into convenient clusters
and required number of clusters are randomly chosen as sample subjects. Each element
in each of the randomly chosen cluster is investigated in the study. Cluster sampling can
also be done in several stages which is known as multistage cluster sampling. For example
to study the banking behaviour of customers in a national survey , cluster sampling can
be used to select the urban, semi-urban and rural geographical locations of the study.
At the next stage, particular areas in each of the location would be chosen. At the third
stage, the banks within each area would be chosen. Thus multi-stage sampling involves
a probability sampling of the primary sampling units; from each of the primary units, a
probability sampling of the secondary sampling units is drawn; a third level of probability
sampling is done from each of these secondary units, and so on until the final stage of
breakdown for the sample units are arrived at, where every member of the unit will be
a sample.
Area sampling
Area sampling is a form of cluster sampling in which the clusters are formed by
geographic designations. For example, state, district, city, town etc., Area sampling is a
form of cluster sampling in which any geographic unit with identifiable boundaries can
132
DBA 1657
be used. Area sampling is less expensive than most other probability designs and is not
dependent on population frame. A city map showing blocks of the city would be adequate
information to allow a researcher to take a sample of the blocks and obtain data from
the residents therein.
NOTES
Advantages and disadvantages of cluster sampling

The cluster sampling method is widely used due to its overall cost-effectiveness
and feasibility of implementation. In many situations the only reliable sampling unit frame
available to researchers and representative of the defined target population, is one that
describes and lists clusters. The list of geographical regions, telephone exchanges, or
blocks of residential dwelling can normally be easily compiled than the list of all the
individual sampling units making up the target population. Clustering method is a costefficient way of sampling and collecting raw data from a defined target population.
One major drawback of clustering method is the tendency of cluster to be
homogeneous. The greater the homogeneity of the cluster, the less precise will be the
sample estimate in representing the target population parameters. The conditions of
intracluster heterogeneity and intercluster homogeneity are often not met. For these
reasons this method is not practiced often.
Stratified random sampling Vs Cluster sampling
The cluster sampling differs from stratified sampling in the following manner:
In stratified sampling the population is divided into a few subgroups, each with
many elements in it and the subgroups are selected according to some criterion
that is related to the variables under the study. In cluster sampling the population
is divided into many subgroups each with a few elements in it. The subgroups
are selected according to some criterion of ease or availability in data collection.
Stratified sampling should secure homogeneity within the subgroups and

heterogeneity between subgroups. Cluster sampling tries to secure heterogeneity
within subgroups and homogeneity between subgroups.
The elements are chosen randomly within each subgroup in stratified sampling.
In cluster sampling the subgroups are randomly chosen and each and every
element of the subgroup is studied indepth.
iv. Double sampling

This is also called sequential or multiphase sampling. Double sampling is opted
when further information is needed from a subset of group from which some information
has already been collected for the same study. It is called as double sampling because
initially a sample is used in the study to collect some preliminary information of interest
133
DBA 1657
NOTES
and later a sub-sample of this primary sample is used to examine the matter in more
detail The process includes collecting data from a sample using a previously defined
technique. Based on this information, a sub sample is selected for further study. It is
more convenient and economical to collect some information by sampling and then use
this information as the basis for selecting a sub sample for further study.
3.6.2 Non-probability Sampling

In nonprobability sampling method, the elements in the population do not have
any probabilities attached to being chosen as sample subjects. This means that the
findings of the study cannot be generalized to the population. However, at times the
researcher may be less concerned about generalizability and the purpose may be just to
obtain some preliminary information in a quick and inexpensive way. Sometimes when
the population size is unknown, then nonproability sampling would be the only way to
obtain data. Some non-probability sampling technique may be more dependable than
others and could often lead to important information with regard to the population. The
non-probability sampling designs are discussed below;
A. Convenience sampling
Non-probability samples that are unrestricted are called convenient sampling.
Convenience sampling refers to the collection of information from members of population
who are conveniently available to provide it. Researchers or field workers have the
freedom to choose as samples whomever they find, thus it is named as convenience. It
is mostly used during the exploratory phase of a research project and it is the best way
of getting some basic information quickly and efficiently. The assumptions is that the
target population is homogeneous and the individuals selected as samples are similar to
the overall defined target population with regard to the characteristics being studied.
However, in reality there is no way to accurately assess the representativeness of the
sample. Due to the self selection and voluntary nature of participation in data collection
process the researcher should give due consideration to the non-response error.
Convenient sampling allows a large number of respondents to be interviewed
in a relatively short time. This is one of the main reasons for using convenient sampling
in the early stages of research. However the major drawback is that the use of
convenience samples in the development phases of constructs and scale measurements
can have a serious negative impact on the overall reliability and validity of those measures
and instruments used to collect raw data. Another major drawback is that the raw data
and results are not generalizable to the defined target population with any measure of
precision. It is not possible to measure the representativeness of the sample, because
sampling error estimates cannot be accurately determined.
134
DBA 1657
NOTES
B. Purposive sampling
A non-probability sample that conforms to certain criteria is called purposive
sampling. There are two major types of purposive sampling viz., Judgment sampling
and Quota sampling.
i. Judgment sampling
Judgment sampling is a non-probability sampling method in which participants
are selected according to an experienced individuals belief that they will meet the
requirements of the study. The researcher selects sample members who conform to
some criterion. It is appropriate in the early stages of an exploratory study and involves
the choice of subjects who are most advantageously placed or in the best position to
provide the information required. This is used when a limited number or category of
people have the information that are being sought. The underlying assumption is that
the researchers belief that the opinions of a group of perceived experts on the topic of
interest are representative of the entire target population.
If the judgment of the researcher or expert is correct then the sample generated
from the judgment sampling will be much better than one generated by convenience
sampling. However, as in the case of all non-probability sampling methods, the
representativeness of the sample cannot be measured. The raw data and information
collected through judgment sampling provides only a preliminary insight.
ii. Quota sampling
The quota sampling method involves the selection of prospective participants
according to prespecified quotas regarding either the demographic characteristics
(gender,age, education , income, occupation etc.,) specific attitudes (satisified, neutral,
dissatisfied) or specific behaviours ( regular, occasional, rare user of product) .The
purpose of quota sampling is to provide an assurance that prespecified subgroups of
the defined target population are represented on pertinent sampling factors that are
determined by the researcher. It ensures that certain groups are adequately represented
in the study through the assignment of the quota.
The greatest advantage of quota sampling is that the sample generated contains
specific subgroups in the proportion desired by researchers. In those research projects
that require interviews the use of quotas ensures that the appropriate subgroups are
identified and included in the survey. The quota sampling method may eliminate or
reduce selection bias.
135
DBA 1657
NOTES
An inherent limitation of quota sampling is that the success of the study will be
dependent on subjective decisions made by the researchers. As a non-probability method,
it is incapable of measuring true representativeness of the sample or accuracy of the
estimate obtained. Therefore, attempts to generalize the data results beyond those
respondents who were sampled and interviewed become very questionable and may
misrepresent the given target population.
iii. Snowball Sampling
Snowball sampling is a non-probability sampling method in which a set of
respondents are chosen who help the researcher to identify additional respondents to
be included in the study. This method of sampling is also called as referral sampling
because one respondent refers other potential respondents. Snowball sampling is
typically used in research situations where the defined target population is very small
and unique and compiling a complete list of sampling units is a nearly impossible task.
While the traditional probability and other non-probability sampling methods would
normally require an extreme search effort to qualify a sufficient number of prospective
respondents, the snowball method would yield better result at a much lower cost. The
researcher has to identify and interview one qualified respondent and then solicit his
help to identify other respondents with similar characteristics.
Snowball sampling enables to identify and select prospective respondents who
are small, hard to reach and uniquely defined target population. It is most useful in
qualitative research practices. Reduced sample size and costs are the primary advantage
of this sampling method. The major drawback is that the chance of bias is higher. If
there is a significant difference between people who are identified through snowball
sampling and others who are not then, it may give raise to problems. The results cannot
be generalized to members of larger defined target population.
3.7 DETERMINATION OF APPROPRIATE SAMPLING DESIGN
Determining an appropriate sampling design is a challenging issue and has greater
implications on the application of the research findings. Apart from considering the
theoretical components, sampling issues, advantages and drawbacks of different sampling
techniques, the decision should take into consideration the following factors:
1. Research objectives
A clear understanding of the statement of the problem and the objectives will
provide the initial guidelines for determining the appropriate sampling design. If the
research objectives include the need to generalize the findings of the research study,
136
DBA 1657
then a probability sampling method should be opted rather than a non-probabiolity

sampling method. In addition the type of research viz., exploratory or descriptive will
also influence the type of the sampling design.
NOTES
2. Scope of the research

The scope of the research project is local, regional, national or international has
an implication on the choice of the sampling method. The geographical proximity of the
defined target population elements will influence not only the researchers ability to
compile needed list of sampling units, but also the selection design. When the target
population is equally distributed geographically a cluster sampling method may become
more attractive than other available methods. If the geographical area to be covered is
more extensive then complex sampling method should be adopted to ensure proper
representation of the target population.
3. Availability of resources
The researchers command over the financial and human resources should be
considered in deciding the sampling method. If the financial and human resource
availability are limited, some of the more time-consuming, complex probability sampling
methods cannot be selected for the study.
4. Time frame
The researcher who has to meet a short deadline will be more likely to select a
simple, less time consuming sampling method rather than a more complex and accurate
method.
5. Advanced knowledge of the target population
If the complete lists of the entire population elements are not available to the
researcher, the possibility of the probability sampling method is ruled out. It may dictate
that a preliminary study be conducted to generate information to build a sampling frame
for the study. The researcher must gain a strong understanding of the key descriptor
factors that make up the true members of any target population.
6. Degree of accuracy
The degree of accuracy required or the level of tolerance for error may vary
from one study to another. If the researcher wants to make predictions or inferences
about the true position of all members of the defined target population, then some
type of probability sampling method should be selected. If the researcher aims to solely
identify and obtain preliminary insights into the defined target population, non-probability
methods might prove to be more appropriate.
137
DBA 1657
NOTES
6. Perceived statistical analysis needs

The need for statistical projections or estimates based on the sample results is
to be considered. Only probability sampling techniques allow the researcher to
adequately use statistical analysis for estimates beyond the sample respondents. Though
the statistical method can be applied on the non-probability samples of people and
objects, the researchers ability to accurately generalize the results and findings to the
larger defined target population is technically inappropriate and questionable. The
researcher should also decide on the appropriateness of sample size as it has a direct
impact on the data quality, statistical precision and generalization of findings.
3.8 SAMPLING DECISIONS : SOME ISSUES

Sampling design and sample size are both important to establish the
representativeness of the sample for generalizability. Even a large sample size cannot
yield generalizable research findings if the appropriate sampling design is not used.
Similarly unless the sample size is adequate and acceptable to ensure precision and
confidence, the sampling design however justifiable and sophisticated, may not be useful
to the researcher. Hence, a sampling design should give due consideration to both
sample size and design.
If the sample size is too large it would lead to Type II errors ie., the findings of
the research would be accepted instead of rejection. Due to the large sample size, even
weak relationship might reach significance level and the researcher would be inclined to
believe that these significant relationships found in the sample can be extended to the
population which may not be true. Likewise, if the sample size is too small, it may lead
to generalization issues.
Even if the sample size is appropriate, whether the same is statistically significant
and relevant, is to be considered. For example, there may be a statistically significant
relationship between two variables but if it explains only a very small percentage of the
variation then it may not have a practical utility.
The following rule- of - thumb proposed by Roscoe (1975) can be considered
in determining appropriate sample size.
1. Sample size larger than 30 and less than 500 are appropriate for most research.
2. If the samples are to be broken into sub samples and groups a minimum sample
size of 30 in each category should be fixed.
3. In multivariate research the sample size should be atleast ten times as large as
the number of variables in the study.
4. In case of simple experimental research a sample as small as 10 to 20 in size
would yield good results.
138
DBA 1657
3.8.1 Precision and Confidence in sample size estimation
NOTES
Since the sample data is used for drawing inference regarding the population,
the inferences should be accurate to the extent possible and it should also be possible
to estimate the error. An interval estimation to ensure a relatively accurate estimation of
the population parameter should be made. For this purpose, statistics that have the
same distribution as the sampling distribution of mean, usually a Z or t statistic is used.
For example the problem at hand is to estimate the mean value of purchases
made by a customer from department stores. A sample of 64 customers are identified
through systematic sampling method and it is found that the sample mean X = 105 and
the sample standard deviation S = 10. X, the sample mean is a point estimate of , the
population mean. A confidence interval could be constructed around X to estimate the
range within which would fall. The standard error S X and the percentage or level of
confidence required will determine the width of the interval which is determined by the
formula.
= X KS X
SX =
SX =
S
n
10
= 1.25
64
the cirtical value of t

For 90% confidence level the k value is 1.645
If 90% confidence level is desired then
= 105 +- 1.645(1.25)
would fall between 102.944 and 107.056.
This indicates that using a sample size of 64, it can be stated with 90% confidence that
the true population mean value of all customers would fall between Rs. 102.944 and
107.056.
If it is required to increase the confidence level to 99% without increasing the
sample size, then the precision has to be sacrificed, as could be seen from the following
calculation:
= 105 + _ 2.576(1.25)
would fall between 101.78 and 108.22
139
DBA 1657
NOTES
The width of the interval has increased and as such the precision in the estimation is
comparatively less though the confidence level in the estimation has increased. A larger
sample size is required if the precision and confidence level has to be increased. The
sample size , n is a function of
The variability in the population
Precision or accuracy needed
Confidence level desired
Type of sampling plan used.
If the sample size cannot be increased, the only way to maintain same level of
precision would be by discarding the confidence level in the estimation. The confidence
level or certainty of the estimate will be reduced. It is a must for researchers to consider
four aspects while making decisions regarding the sample size.
The precision level needed in estimating the population characteristics ie the
allowable margin of error.
The level of confidence required ie., the percentage chance the researcher is
willing to take in committing error in the estimation of population parameters.
The extent of variability in the population on the characteristics investigated
The cost - benefit analysis of increasing the sample size.
3.8.2 Sample data and hypothesis testing
In addition to estimating the population parameters, the sample data can also
be used to test hypotheses about population values. For example, if we want to determine whether a customer spent the same average amount in purchases at Department
A as in Department B a null hypothesis can be formed. Null hypothesis proposes that
there is no significant differences in the amount spent by customers at the two different
stores. This would be expressed as:
H0 : A- B = 0
The alternate hypothesis can be states as follows:
H0 : A- B ? 0
If a sample of 20 customers from each of the two stores and find that the mean
value of purchases of customers in Store A is 105 with a standard deviation of 10, and
the corresponding figures for store B are 100 and 15, respectively , it can be seen that
XA X B = 105-100 = 5
The null hypothesis states that there is no significant difference. The probability
of the two group means having a difference of 5 in the context of null hypothesis should
be determined. This can be done by converting the difference in the sample means to a
140
DBA 1657
t statistic and identify the probability of finding a t of that value. The t distribution has
known probabilities attached to it. The critical values in t distribution for two samples of
20 each with 38 as degrees of freedom (n1+n2)-2 = 38) is 2.021. A two tailed test is
used to know whether the difference between Store A and Store B is positive or negative.
The t statistics can be calculated for testing the hypothesis as follows:
t=
(x x ) (
1
SX
NOTES
2 )
SX 2
n1 S12 + n 2 S 22 1 1
+
S x1 S x 2 =
(n1 + n2 2) n1 n2
(20 10 )+ (20 15 ) 1 + 1
2
(20 + 20 2)
t=
(x
20 20
x B ( A B )
4.136
It is known that x A x B = 5 (The difference in the mean of two stores)
A B = 0 (null hypothesis)
t=
50
= 1.209
4.136
The t value of 1.209 is much below the value of 2.201at 95% significance level. Even
for 90% probability requires a value of 1.684. Thus the difference of 5 found between
the two stores is not significant. The conclusion is that there is no significant difference
between the spending pattern of the customers in Store A and in Store B. Thus the null
hypothesis is accepted and the alternate hypothesis is rejected.
3.8.3 Determining the Sample size

Sampling is done to reduce the cost of data collection and for the purpose of
convenience. However, there is a likelihood of missing some useful information about
the population if the sample is inadequate. While deciding the sample size, care should
141
DBA 1657
NOTES
be taken to ensure that neither a small sample is selected so as to enhance the risk of
sampling error nor too many units are selected to increase the cost of study. It is necessary to make a trade-off between (i) increasing sample size which would reduce the
sampling error but increase the cost and (ii) decreasing the sample size which might
increase the sampling error while decreasing the cost.
Several factors should be considered before deciding the sample size. The first
and the foremost is the size of the error that would be tolerable for the purpose of the
decision making. The second is the degree of confidence with the results of the study. If
100 percent confidence of result is needed the entire population must be studied. However it is impractical and costly. Normally, confidence limit is accepted at 99%, 95%
and 90%. The confidence and precision aspects are discussed in detail under the heading precision and confidence in sample size estimation dealt earlier.
For determining the sample size the following relationship is used.
x = standard error of the estimate =
can be calculated if we know the upper and lower confidence limits. If these limits
are assumed to be Y, then
Z
x =Y
where Z is the value of the normal variate for a given confidence level.
The procedure for determining sample size can be illustrated through an example.
A management consultant concern is performing a survey to determine the annual salary of managers numbering 3000 in the textile concern within a district. The sample size
it should take for the purpose of the study should be ascertained in order to estimate the
mean annual earnings within plus and minus 1000 at 95 percent confidence level. The
standard deviation of annual earning of the entire population is known to be Rs.3000.
The desired upper and lower limit is Rs.1000 ie., the estimate of annual earnings within plus and minus Rs.1000 should be ascertained.
Z
= 1000
The level of confidence is 95 %, the Z value is 1.96.

1.96 x =1000
142
DBA 1657
x =
NOTES
1000
= 510.20
1.96
The standard error x is given by
where the population standard deviation
i.e.,
i.e., n =
3000
= 5.88
510.20
n = 34.57
Therefore the desired sample size is approximately 35.
SUMMARY
This chapter dealt in detail the various sources of data and the data collection
3000
= 510
= 510
.20
.20
methods. The primary data sources viz., the focus group and panels were discussed in
nn
detail. The data collection methods viz., the interview, questionnaire, observation and
other methods were examined. Sampling design is an important element of research as
it decides the validity and the reliability of the research findings. The various probability
and non-probability techniques were discussed in detail. The method of determining
the sample size, the precision and confidence desired in estimating the population size
were explained.
With this background, the next unit provides a detailed discussion on the various
multivariate techniques used to analyze the data collected.
Discuss the different data sources, explaining their usefulness and disadvantages?
Discuss the types of error and the steps to avoid the same.
Discuss the important issues to be considered in designing a questionnaire
What are projective techniques? Where can it be used?

143
DBA 1657
NOTES
How has the advancement in technology helped data gathering?
Elucidate the differences between questionnaire and an interview schedule?
What is an electronic survey? Discuss the issues to be considered in designing

and electronic questionnaire
When should a researcher opt for sampling and why?
Discuss the steps involved in sampling plan.
Discuss the various probability and non-probability sampling techniques.
Non-probability sampling design ought to be preferred to probability sampling

designs in some cases. Explain with example.
Discuss the issues concerned with precision and confidence in sampling design.
144
DBA 1657
NOTES
Unit-4
A Refresher on Some Multivariate

Statistical techniques
Unit structure:
4.1 Introduction
4.3 Multivariate techniques
4.4 Factor analysis
4.4.1 Statistics and terms used
4.4.2 Steps in conducting factor analysis
4.4.3 Uses and limitations of factor analysis
4.4.4 Application of statistical package for factor analysis
4.5 Cluster analysis
4.5.2 Steps in conducting cluster analysis
4.5.3 Uses of cluster analysis
4.5.4 Application of for cluster analysis
4.6 Discriminant analysis
4.6.2 Steps in conducting discriminant analysis
4.6.3 Uses of discriminant analysis
4.6.4 Application of statistical package for discriminant analysis
4.7 Multiple regression & correlation
4.7.2 Steps in conducting multiple regression& correlation analysis
4.7.3 Application of Statistical package for multiple regression & correlation
analysis
145
DBA 1657
NOTES
4.8 Canonical correlation

4.8.2 Steps in conducting canonical correlation
4.8.3 Application of Statistical packages for canonical correlation analysis
--------------------------------------------------------------------------------------
4.1 INTRODUCTION
Business problems today are more complex. The various functional areas of
management are confronted by multiple independent and/or dependent variables. This
requires the application of multivariate techniques to gain an insight into the problems or
to make decisions regarding the choices involved. The availability of computers with
fast processing speed and versatile software has enhanced the application of these
techniques which involves complex mathematical calculations. Multivariate analysis can
be defined as those statistical techniques which focus upon, and bring out in bold
relief, the structure of simultaneous relationships among three or more phenomena.
Thus multivariate analysis refers to a group of statistical techniques used when there are
two or more measurements on each element and the variables are analyzed
simultaneously. It is concerned with the simultaneous relationship among two or more
phenomena. Multivariate techniques are largely empirical and deal with the reality. The
basic objective underlying the use of multivariate techniques is to represent a collection
of massive data in a simplified manner. In other words, multivariate techniques transform
a mass of observations into smaller number of composite scores in such a way that they
reflect as much information as possible contained in the raw data obtained in a research
study.
This unit explains some of the multivariate techniques and the application of
statistical package to solve the same.

After reading this unit, you will be able to:
Classify and select appropriate multivariate techniques
Understand where ,why and how to use factor analysis
Know the use of cluster analysis techniques for grouping similar objects or
people
Classify people or objects into groups using discriminant analysis
Predict a metric dependent variable from a set of metric independent variable
146
DBA 1657
NOTES
using multiple regression & correlation
Need and application of canonical correlation
Use the statistical package to apply multivariate techniques
4.3 Multivariate Techniques

Selecting an appropriate technique requires an understanding of the distinction
between dependency and interdependency techniques. In dependence method,
multivariate techniques is used to explain or predict the dependent variable on the basis
of two or more independent variables. A dependence method can be defined as one in
which a variable is identified as the dependent variable to be predicted or explained by
other independent variables. Dependence techniques include multiple regression analysis,
discriminant analysis, MANOVA and conjoint analysis.
In interdependence method, no single variable or group of variables is defined
as being independent or dependent. The multivariate procedure here involves the analysis
of all the variables in the data set simultaneously. The goal of interdependence method
is to group respondents or objects together. In this case no single variable is explained
or predicted by others. Cluster analysis, factor analysis and multidimensional scalings
are the most frequently used interdependence techniques.
In selecting a multivariate technique, two aspects should be considered viz.,
i.
Whether the variables can be grouped as dependent and independent or

whether it is based on the interdependency or dependency based technique.
ii. Whether the data is metric or non metric.

The nature of the measurement scales will enable to determine the appropriate multivariate
technique to be selected to analyze the data. The selection of technique requires
consideration of the types of measures used for both dependent and independent set of
variables. If the dependent variable is measured nonmetrically, the appropriate methods
are discriminant and conjoint analysis. If the dependent variable is measured metrically,
the techniques like multiple regression, ANOVA, MANOVA and conjoint can be used.
Multiple regression and discriminant analysis require metric independents, but they can
use nonmetric dummy variables. ANOVA, MANOVA and conjoint analysis are
appropriate with nonmetric independent variables. The interdependence techniques of
factor analysis and cluster analysis are most frequently used with metrically measured
variables, but nonmetric adaptations are possible. The multivariate procedures are
explained in the following figure:
147
DBA 1657
NOTES
One
Dependence
Methods
Number of
Dependent
variables
None
Interdependence
methods
(Nonmetric)
Nominal
Dependent variable
Level of measurement
(Metric)
Interval or
Ratio
Ordinal
Discriminant
analysis
Conjoint
Factor
Cluster
Perceptual
Mapping
Multiple
regression
ANOVA
MANOVA
Conjoint
Spearmans
Rank
Correlation
Variables in multivariate analysis

Several variables are used in the context of multivariate analysis. They are classified
into different categories and explained below:
1. Explanatory variable and criterion variable
If X is considered to be the cause of Y, then X is described as explanatory
variable. It is also called as causal or independent variable. Y is described as criterion
variable and it is also called as resultant or dependent variable. In some situations
both explanatory variable and criterion variable may consist of a set of many variables
in which case set X1, X2,X3Xn is called as set of explanatory variables and the
set Y1, Y2,Y3Yn is called as a set of criterion variables , if the variation in the
former is supposed to cause the variation in the latter as a whole. In economics the
explanatory variables are called as external or exogeneous variables and the criterion
variables are called endogeneious variables. The term external criterion for
explanatory variable and the term internal criterion for criterion variable are also
used.
2. Observable variables and latent variables
If the explanatory variables described above can be observed directly,
it is termed as observable variables. However, there are some unobservable
variables which may have an influence on the criterion variables. They are termed
as unobservable or latent variables.
148
DBA 1657
NOTES
3. Discrete variables and continuous variables

Discrete variables are those variables which can be measured in term
of the integer value only. Continuous variables can assume real value ie the
decimal points.
4. Dummy variable.
This is also called as Pseudo variable. The term is used in technical
sense and is useful in algebraic manipulations in the context of multivariate
analysis. Xi ( i = 1, ., m) is called as dummy variable if only one of Xi is 1 and
the others are all zero.
4.4 FACTOR ANALYSIS

Factor analysis is the most often used multivariate technique in research studies
especially in studies pertaining to social and behaviourial sciences. It is a class of
procedures primarily used for data reduction and summarization. Researchers can use
factor analysis for two primary functions in data analysis i.e., to identify the underlying
constructs in data and to reduce the number of variables to a more manageable set. In
factor analysis there is no distinction between dependent and independent variables, all
variables under investigation are analyzed together to identify underlying factors. It is
used to summarize the information contained in a large number of variables into a smaller
number of subsets or factors. The purpose of factor analysis is to simplify the data. In
reducing the number of variables, factor analysis procedures attempt to retain as much
of the information as possible and make the remaining variables meaningful and easy to
work with.
Factor analysis resolves a large set of measured variables into relatively few
factors and the factors so derived are treated as new variables. The value of the new
variables are derived by summing the original values which have been grouped into the
factors. The meaning and name of the new variable is subjectively determined by the
researcher. Factor is a linear combination of data and hence the coordinates of each
observation or variable is measured to obtain the factor loadings. The factor loadings
represent the correlation between the particular variable and the factor and are usually
placed in a matrix of correlations between the variable and the factors.
The mathematical basis of factor analysis concerns a data matrix also termed
as score matrix symbolized as S. The matrix contains the scores of N persons of K
measures.
149
DBA 1657
NOTES
a1
b1
c1
k1
a2
b2
c2
k2
a3
b3
c3
k3
aN
bN
cN
kN
Persons(objects)
a1 is the Score of person 1 on measure a , a2 is the score of person 2 on measure

a and kN is the measure of person N on measure k. It is assumed that scores on each
measure are standardized
The sum of scores in any column of the matrix
S is zero and the variance of scores in any column is 1. A factor is any linear combination
of the variable in a data matrix and can be stated in general manner :
A = Wa a + Wbb + . Wkk . The values are obtained and factor loading ie
the factor variable correlations is are calculated. Then communality symbolized as h2,
the eigen value and the total sum of squares are obtained and the results are interpreted.
The technique of rotation is done in order to obtain realistic results. The rotation reveals
different structures in the data. Finally the factor scores are obtained which enables to
explain the factors. After obtaining factor scores, several other multivariate analysis like
cluster, multiple regression, discriminant analysis etc can be performed.
4.4.1 Statistics and terms associated with factor analysis
The statistics and some of the basic terms used in factor analysis are explained
below:
i.
Factor : A factor is an underlying dimension that account for several

observed variables. There can be more than one factor depending upon
the nature of the study and the number of variable involved in it.
ii.
Factor loadings: Factor-loadings are simple correlation between the

variables and the factors. It explains how closely the variables are related
to each one of the factor discovered. This is also known as factor-variable
correlations and acts as the key to understand the meaning of factor. The
absolute size rather than plus or minus signs of the loading is important in
the interpretation of a factor.
iii.
Communality (h2): communality sympolised by h2 shows how much of

each variable is accounted for by the underlying factor taken together. A
150
DBA 1657
high value of communality means that not much of the variable is left over
after whatever the factor represent is taken into consideration. It is worked
out for each variable as under:
NOTES
h2 of ith variable = ( ith factor loading of factor A )2 +

( ith factor loading of factor B)2 + .
iv.
Eigen value (latent root): It represents the total variance explained by

each factor. Eigen value is the sum of squared values of factor loadings
relating to a factor. It indicates the relative importance of each factor in
accounting for the particular set of variables analysed.
v.
Total sum of squares: When the eigen values of all factors are totaled, the
resulting value is termed as the total sum of squares. This value, when divided
by the number of variables involved in the study results in an index that
shows how the particular solution accounts for what, all the variables taken
together represent.
vi.
Rotation: Rotation reveals different structures in data. Different rotations

give results that appear to be entirely different, but from the statistical point
of view, all results are taken as equal. However, right rotation should be
selected to make sense out of the results. If the factors are independent
orthogonal rotation is done and if the factors are correlated, an oblique
rotation is made. Communality for each variable will remain undisturbed
regardless of rotation but the eigen values will change as a result of rotation.
vii.
Factor scores: Factor scores are composite scores estimated for each
respondent on the derived factors. With the factor scores several other
multivariate analyses can be performed.
viii.
Bartletts test of sphericity : It is a test statistic used to examine the

hypothesis that the variables are uncorrelated in the population.
ix.
Correlation matrix: It shows the simple correalation r, between all possible

pairs of variables included in the analysis. The diagonal elements which are
all 1, are usually omitted.
x.
Kaiser-Meyer-Olkin(KMO) measure of sampling adequacy: The KMO

measure of sampling adequacy is an index used to examine the
appropriateness of factor analysis. High values between .05 and 1.0 indicate
factor analysis is appropriate. Values below .5 imply that factor analysis
may not be appropriate.
151
DBA 1657
NOTES
xi.
Scree plot: Scree plot is a plot of the eigen values against the number of
factors in order of extraction.
4.4.2 Steps in conducting factor analysis

The first step involved in conducting factor analysis is to define the problem and
identify the variables involved. A correlation matrix is to be constructed and a method
of factor analysis to be performed is to be selected. Decision regarding the number of
factors to be extracted and the method of factor analysis is made. The rotated factors
are interpreted. Depending upon the objective the factor scores are calculated or
surrogate variables selected so as to represent the factors in subsequent multivariate
analysis. Finally, the fit of the factor analysis model is determined. The steps are illustrated
below :
Formulate the problem
Construct the correlation matrix
Identify the method of factor analysis
Determine the number of factors
Rotate the factors
Interpret the factors
Calculate the factor scores
Select the surrogate variables
Determine the model fit

152
DBA 1657
NOTES
1. Formulate the problem

Problem formulation includes several tasks. The objectives of factor analysis
should be identified and the variables to be included in the factor analysis should be
specified based on the past research, theory and judgement of the researcher. The
variables should be appropriately measured in an interval or ratio scale. An appropriate
sample size should be identified. The sample size should be atleast four or five times
more than the variables identified for the study. For e.g. , if the study includes 20 variables,
then the sample size should be a minimum of 80 or 40. If the sample size is small and
the ratio is not maintained, the results should be interpreted cautiously.
2. Construct the correlation matrix
Correlation matrix provides valuable insight and is the basis for further analytical
process. The variables identified for the study should be correlated in order to conduct
the factor analysis. If the correlation between the variables is small, factor analysis may
not be appropriate. It can also be expected that the variables that are highly correlated
with each other would also highly correlate with the same factor or factors. Formal
statistics are available for testing the appropriateness of the factor model. Bartletts test
of sphericity can be used to test the null hypothesis that the variables are uncorrelated in
the population. A large value of test statistics will favour the rejection of the null hypothesis
and if the null hypothesis cannot be rejected, then the appropriateness of factor analysis
should be questioned.
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy can also be used.
The index compares the magnitudes of the observed correlation coefficients to the
magnitude of the partial correlation coefficients. Small values of the KMO statistics
indicate that the correlation between pairs of variables cannot be explained by other
variables and that factor analysis may not be appropriate. Generally a value greater
than 0.5 is desirable.
3. Identify the method of factor analysis
After determining the appropriateness of factor analysis for analyzing the data,
a suitable method should be selected. The approach used to derive the weights or
factor scores coefficients differentiates the various method of factor analysis. The two
most commonly employed factor analytic procedures are principal component and
common factor analysis. Based on the researchers objective the procedure to be used
is chosen. Principal component analysis is used when the objective is to summarize
information in a larger set of variables into a fewer factors. It is recommended if the
153
DBA 1657
NOTES
primary concern is to determine the minimum number of factors that will account for
maximum variance in the data for use in subsequent multivariate analysis. The factors
are called principal components. If the researcher is attempting to uncover underlying
dimensions surrounding the original variables, common factor analysis is used. Principal
component analysis is based on the total information in each variable, whereas common
factor analysis is concerned only with the variance shared among all the variables.
Principal component analysis
Factor analysis begins with the construction of a new set of variables based on
the relationships in the correlation matrix. Principal component analysis method transforms
a set of variables into a new set of composite variables or principal components that are
not correlated with each other. The linear combination of factors accounts for the variance
in the data as a whole. The best combination makes up the first principal component
and is the first factor. The second principal component is defined as the best linear
combination of variables for explaining the variance not accounted for by the first factor.
Likewise, there may be a third, fourth and kth component, each being the best linear
combination of variables not accounted for by the previous factors.
This process continues till all the variance is accounted. However, usually it is
stopped after a small number of factors has been extracted. The output of the principal
component analysis might look like the data given below:
Extracted
components
% of variance
accounted for
Cumulative
variance
Component no. 1
74%
74%
Component no. 2
15%
89%
Component no. 3
11%
40 %
Numerical results from a factor analysis will be presented like the following
table. The values in the table are correlation coefficients between the factor and the
variables.
154
DBA 1657
A
Variable
Unrotated factors
I
II
h2
.70
-.40
.65
.60
-.50
.61
.60
-.35
.48
.50
.50
.50
.60
.50
.61
.60
.60
.72
2.18
1.39
% of variance
36.30
23.20
Cumulative %
36.30
59.50
Eigenvalue
NOTES
B
Rotated factors
I
II
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
1234567890123456789
.79
.15
.75
.03
.68.
.4
.06
.70.
.13
.77
.07
.85
In the above table .70 is the correlation coefficient between variable A and
factor I. The correlation coefficients are called as loadings. Eigen values are the sum of
the variances of the factor values. For factor I the eigenvalue is sum of .702
+602+602+502+602+602 which is 2.18. The eigen value 2.18 divided by the number of
variables i.e., 6 yields an estimate of the amount of total variance explained by the
factor. In the example given above Factor I accounts for 36% of the total variance.
The column heading h2 gives the communalities i.e., the estimates of the variance in each
variable that is explained by the two factors. From the above table it can be seen that
in the case of Variable A the communality is .702 + (-402) = .65 which indicates that 65
percent of the variance in variable A is statistically explained in terms of factors I and II.
In the unrotated factor, a loading does not provides much information. It is not
possible to find the variables with high loading in factor I and factor II. Rotation enables
to identify the variables associated with the factors.
4. Determine the number of factors
It is possible to compute as many principal components as there are variables,
but it does not serve the purpose of conducting a factor analysis. In order to summarize
155
DBA 1657
NOTES
the information contained in the original variables, smaller number of factors should be
extracted. The question of how many factors are to be extracted arise. Several procedures are discussed below for determining the number of factors.
A Priori determination:
Due to prior knowledge the researcher knows how many factors to extract
and thus can specify the number of factors to be extracted beforehand. The extraction
of factors is completed as soon as the desired number of factors is extracted.
Determination based on Eigenvalues:
In this approach only factors with eigen values greater than 1.0 are retained,
the other factors are not included in the model. An eigen value represents the amount of
variance associated with the factor. Hence, factors with variance greater than 1.0 are
included. If the number of variables is less than 20, this approach will result in conservative number of factors.
Determination based on Scree plot:
A scree plot is a plot of the eigen values against the number of factors in order
of extraction. The shape of plot is used to determine the number of factors. The plot
typically has a distinct break between the steep slope of factors with large eigenvalues
and a gradual trailing off associated with the rest of the factors. The gradual trailing off
is referred as scree. The point at which the scree begins denotes the true number of
factors.
156
DBA 1657
NOTES
Determination based on the percentage of variance:

In this approach the number of factors to be extracted is determined in such a
way that the cumulative percentage of variance extracted by the factors reaches a
satisfactory level. Satisfactory level depends upon the problem at hand. However, it is
recommended that the factors extracted should account for at least 60 percent of the
variance.
Determination based on Split-Half reliability:
The sample is split in half and factor analysis is performed on each half. Only
factors with high correspondence of factor loadings across the two sub-samples are
retained.
Determination based on significance tests:
It is possible to determine the statistical significance of the separate eigen values and retain only those factors that are statistically significant. A drawback is that with
large samples sizes greater than 200, many factors are likely to be statistically significant, although many of these may account for only a small proportion of the total variance.
5. Rotate the factors
The initial or unrotated factor matrix indicates the relationship between the factors and the individual variables. However, it is difficult to identify the variables with a
factor or interpreting the factor is difficult from the unrotated matrix. Through rotation
the factor matrix is transformed into a simpler one that is easier to interpret. Through
rotation it is possible to see that each factor has a non-zero or significant loadings or
coefficients for only some of the variables. Likewise, it can also be ensured that each
variable has a non-zero or significant loading with only a few factors, or with only one
factor. If several factors have high loadings with the same variable, it is difficult to
interpret them. Rotation does not affect the communalities and the percentage of total
variance explained. However, the percentage of variance accounted for by each of the
factor changes. It can be seen from the table shown in previous pages that the variance
explained by the individual factors is redistributed by rotation. Hence, different methods of rotation results in identification of different factors.
Two basic types of rotation available are the orthogonal rotation and oblique
rotation. If the axes are maintained at right angles then the rotation is called orthogonal
rotation.. VARIMAX rotation. is the most commonly used rotation. Its goal is to
minimize the complexity of the components by making the large loadings larger and the
small loadings smaller within each component. There are other rotational methods.
QUARTIMAX rotation makes large loadings larger and small loadings smaller within
157
DBA 1657
NOTES
each variable. EQUAMAX rotation is a compromise that attempts to simplify both

components and variables. These are all orthogonal rotations, that is, the axes remain
perpendicular, so the components are not correlated with one another.
When the axes are not maintained at right angles and the factors are correlated
then it is called oblique rotation. Oblique rotation should be used when factors in the
population are likely to be strongly correlated.
In the table given above, it can be seen that the factor interpretability is more in
case of the rotated matrix than the unrotated matrix. Rotated factor matrix forms the
basis for interpretation of the factors.
6. Interpret factors
Interpretation is facilitated by identifying the variables that have large loading
on the same factor. The factor can be interpreted in terms of the variables that load high
on it. In the above table it can be seen that variables A,B,C load high on Factor I and
hence factor I is interpreted in terms of variables A,B and C. Likewise, the variables
D,E,F are interpreted in terms of Factor II.
Another useful method in interpretation of factors is to plot the variables using
the factor loadings as coordinates. Variables at the end of the axis are those that have
high loadings on only that factor and hence describe the factor.
7. Calculate factor scores
If the goal of factor analysis is to reduce the original set of variables to a smaller
set of composite variables i.e., factors for application in subsequent multivariate analysis it is useful to compute factor scores for each respondent. A factor is simply a linear
combination of the orginal variable. the factor scores for the ith factor may be estimated
as follows:
Fi = Wi1X1 + Wi2X2+ Wi3X3+ .+ WikXk
The weights or factor score coefficients are obtained from the factor score
coefficient matrix. Only in principal component analysis it is possible to compute exact
factor scores and these scores are uncorrelated. In common factor analysis, estimates
of these scores are obtained, and there is no guarantee that the factors will be uncorrelated
with each other. The factor scores can be used instead of the original variables in the
subsequent multivariate analysis.
8. Select the surrogate variables
Instead of computing factor scores, the researcher may select the surrogate or
substitute variables. Selection of surrogate variables involves identifying some of the
original variables for use in the subsequent analysis. This allows the researcher to conduct further analysis and interpret the results in terms of original variables rather than the
158
DBA 1657
factor scores. The variables can be selected by examining the factor matrix and selecting for each factor the variable with the highest loading on that factor. The variable
could be as surrogate variables for the associated factor. This process will work well if
one factor loading for a variable is clearly higher than all other factor loadings. However
if two or more variables have similar loadings the choice will be difficult. In such cases
the choice of variables should be made on the basis of theoretical and measurement
consideration. For example, the theory may suggest that a variable with a slightly lower
loading is more important than one with slightly higher loading. Likewise, if variable
having a slightly lower loading has been measured more precisely, it should be selected
as the surrogate variable.
NOTES
9. Determine the Model fit

Determining the fitness of the model is the final step in factor analysis. A basic
assumption underlying factor analysis is that the observed correlation between variables can be attributed to common factors. The correlation between variables can be
reproduced from the estimated correlations between the variables and the factors. The
differences between the observed correlations as given in the input crorrelation matrix
and the reproduced correlations as estimated from the factor matrix can be examined
to determine the fitness of the model. The differences are called residuals. If there are
many large residuals, the factor model does not provide a good fit to the data and the
model should be reconsidered.
R-type and Q-type factor analysis
Factor analysis may be R-type factor analysis or Q-type. In R-type factor
analysis high correlations occur when respondents who score high on variable 1 also
score high on variable 2 and respondents who score low on variable 1 also score low
on variable 2. Factors emerge when there are high correlations within group of variables
In Q-type factor analysis, the correlations are computed between pairs of
respondents instead of pairs of variables. High correlations occur when respondents A
pattern of responses on all the variables is much like respondents B pattern of responses.
Factors emerge when there are high correlations within a group of people. Q-type
analysis is useful when the object is to sort out people into groups based on their
simultaneous responses to all the variables.
4.4.3 Uses and limitations of factor analysis:

The benefits of using factor analysis are dealt below:
i. Interdependency and pattern delineation. If a researcher has a table of data
concerning attitude, lifestyle, personality characteristics, or answers to a
159
DBA 1657
NOTES
questionnaireand if he suspects that these data are interrelated in a complex

fashion, then factor analysis may be used to untangle the linear relationships
into their separate patterns. Each pattern will appear as a factor delineating a
distinct cluster of interrelated data.
ii. Parsimony or data reduction. Factor analysis can be useful for reducing a
mass of information to an economical description. For example, data on fifty
characteristics for 500 respondents are unwieldy to handle, descriptively or
analytically. Management of data analysis, and clear understanding of such data
are facilitated by reducing them to their common factor patterns. These factors
concentrate and index the dispersed information in the original data and can
therefore replace the fifty characteristics without much loss of information.
iii. Structure. Factor analysis may be employed to discover the basic structure of
a domain. Data collected on a large sample of groups and factor analyzed can
help disclose this structure.
iv. Classification or description. Factor analysis is a tool for developing an
empirical typology. It can be used to classify respondents profiles into types
with similar characteristics or behavior.
v. Scaling. A researcher often wishes to develop a scale on which individuals,
groups, or nations can be rated and compared. A problem in developing a
scale is to weight the characteristics being combined. Factor analysis offers a
solution by dividing the characteristics into independent sources of variation
(factors). Each factor then represents a scale based on the empirical relationships
among the characteristics. As additional findings, the factor analysis will give
the weights to employ for each characteristic when combining them into the
scales. The factor score results are actually such scales.
vi. Hypothesis testing. Hypotheses are generally framed regarding dimensions of
attitude, personality, group, social behavior, and conflict. Factor analysis may
be used to test for their empirical existence. Which characteristics or behavior
should, by theory, be related to which dimensions can be postulated in advance
and statistical tests of significance can be applied to the factor analysis results.
vii. Data transformation. Factor analysis can be used to transform data to meet
the assumptions of other techniques. For example, application of the multiple
regression technique assumes independent variables that are statistically
unrelated . If the predictor / independent variables are correlated in violation of
the assumption, factor analysis can be employed to reduce them to a smaller
set of uncorrelated factor scores. The scores may be used in the regression
160
DBA 1657
analysis in place of the original variables, with the knowledge that the meaningful
variation in the original data has not been lost. Likewise, a large number of
dependent variables also can be reduced through factor analysis.
NOTES
viii.Exploration The unknown domain may be explored through factor analysis. It

can reduce complex inter-relationships to a relatively simple linear expression
and it can uncover unsuspected, startling, relationships. Unlike pure science
researchers, usually the social science researcher is unable to manipulate variables
in a laboratory but they deal with the manifold complexity of behaviors in their
social setting. Factor analysis thus fulfills some functions of the laboratory and
enables the researcher to untangle interrelationships, to separate different sources
of variation, and to partial out or control for undesirable influences on the
variables of concern.
ix. Mapping. Besides facilitating exploration, factor analysis also enables a
researcher to map the social terrain by the systematic attempt to chart major
empirical concepts and sources of variation. These concepts may then be used
to describe a domain or to serve as inputs for further research
x. Theory: The analytic framework of social theories or models can be built from
the geometric or algebraic structure of factor analysis.
Application of Factor analysis:
Some areas in which factor analysis can be used are:
i.
It can be used in market segmentation for identifying the underlying variables

on which to group the customers.
ii. Factor analysis can be used in product research to determine the brand attributes
that influence the consumers choice.
iii. In advertising studies, factor analysis can be used to understand the media
consumption habits of the consumers.
iv. In pricing studies, it can be used to identify the characteristics of price-sensitive
consumers.
Limitations:
i.
Factor analysis involves laborious computations at a heavy cost burden. With

the computer facility and statistical packages the factor analysis has become
relatively faster and easier, however large factor analysis are still bound to be
quite expensive.
161
DBA 1657
NOTES
ii.
The results of single factor analysis are considered generally less reliable and
dependable as the factor analysis mostly starts with a set of imperfect data.
The factor analysis should be done atleast twice. If similar results are obtained
the confidence regarding the results will increase
iii.
Factor analysis is a complicated decision tool that can be used only when one
has thorough knowledge and enough experience of handling this tool.
4.4.4 Application of Statistical package: Factor analysis

A marketing concern would like to predict the sales of the cars from a set of
variables. However, many of the variables are correlated and this might adversely result
in a wrong prediction. The variables are vehicle type, price, engine size, fuel capacity,
fuel efficiency, wheel base, horsepower, width, length. Factor analysis with principal
component extraction can be used to identify a manageable subset of predictors.
The steps to be followed in performing factor analysis and interpretation of
the same output is discussed below:
From the Data Editor Window
Click on Analyze
Click on Data Reduction
Click on Factor...
The following Factor Analysis dialog box will appear.
162
DBA 1657
NOTES
Select the variables you want to enter into the factor analysis by double clicking on
them, or use the shift or control keys to select them and click the right arrow key to
move the selected variables to the Variables list on the right. Click Extraction
Extracting factors and factor rotation:
There is no hard and fast rule to determine the number of factors. A commonly
used convention is to use the number of factors with eigen values greater than 1. The
statistical package will select this number by default. The scree plot may also be used
to determine the number of factors
Click on Rotation. The Factor Analysis: Rotation dialog box appears.

163
DBA 1657
NOTES
1
2
3
Under method, select Varimax by clicking on it

Under Display, select Rotated solution by clicking on it
Click on Continue which brings you back to the Factor Analysis dialog
box.
4 Click on OK to run the analysis.
5 Click Continue.
6 Click OK in the Factor Analysis dialog box.
Interpretation of the output

The output obtained through performing the steps discussed is detailed in the
following pages.
Communalities
Communalities indicate the amount of variance in each variable that is accounted

for. Initial communalities are estimates of the variance in each variable accounted for by
all components or factors. For principal components extraction, this is always equal to
1.0 for correlation analysis. The communalities in this table are all high, which indicates
that the extracted components represent the variables well.
164
DBA 1657
NOTES
Total variance explained
The variance explained by the initial solution, extracted components, and rotated
components is displayed. This first section of the table shows the Initial Eigenvalues.
The Total column gives the eigenvalue, or amount of variance in the original variables
accounted for by each component. The % of Variance column gives the ratio of the
variance accounted for by each component to the total variance in all of the variables.
The Cumulative % column gives the percentage of variance accounted for by the first n
components. For example, the cumulative percentage for the second component is the
sum of the percentage of variance for the first and second components. For the initial
solution, there are as many components as variables.
The second section of the table shows the extracted components. They explain
nearly 88% of the variability in the original ten variables, so the complexity of the data
set can be considerably reduced by using these components, with only a 12% loss of
information.
The rotation maintains the cumulative percentage of variation explained by the

extracted components, but that variation is now spread more evenly over the
165
DBA 1657
NOTES
components. The large changes in the individual total suggest that the rotated component
matrix will be easier to interpret than the unrotated matrix.
The scree plot enables to determine the optimal number of components. The
eigen value of each component in the initial solution is plotted. Generally, the components on the steep slope are extracted. The components on the shallow slope contribute little to the solution. The last big drop occurs between the third and fourth components, so the first three components are selected.
Rotated Component Matrixa
Price in thousands
Horsepower
Engine size
Length
Wheelbase
Width
Vehicle type
Fuel efficiency
Fuel capacity
Curb weight
1
.935
.933
.753
.155
3.616E-02
.384
-.101
-.543
.398
.519
Component
2
3
-3.45E-03 4.136E-02
.242 5.565E-02
.436
.292
.943 6.862E-02
.884
.314
.759
.231
9.478E-02
.954
-.318
-.681
.495
.676
.533
.581
Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
166
DBA 1657
The rotated component matrix helps to determine what each components

represent. The first component is most highly correlated with Price in thousands and
Horsepower. Price in thousands is a better representative, however, because it is less
correlated with the other two components. The second component is most highly
correlated with Length. The third component is most highly correlated with Vehicle
type. This suggests that Price in thousands, Length, and Vehicle type can be focused
for in further analysis.
NOTES
4.5. CLUSTER ANALYSIS

Like factor analysis cluster analysis examines an entire set of interdependent
relationships. Cluster analysis, also called as classification analysis or numerical taxonomy
is a class of techniques used to classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend to be similar to each other and
dissimilar to objects in other clusters. The primary objective of cluster analysis is to
classify objects into relatively homogeneous groups based on the set of variables
considered. Objects in a group are relatively similar in terms of these variables and
different from objects in other groups.
Variable 1
Cluster analysis does not distinguish between dependent and independent

variables. The independent relationship between the whole set of variables are examined.
The following illustration shows an ideal clustering situation in which clusters are distinctly
separated on two variables. Each object is assigned to only one cluster and there is no
overlapping areas.
Variable 2
167
DBA 1657
NOTES
The following illustration provides a clustering situation that is practically

encountered. The boundaries for the clusters are not clear cut and the classification of
consumers is not obvious, as many of them could be grouped into one cluster or another.
Cluster and discriminant analysis are both concerned with classification.

However discriminant analysis requires prior knowledge of the cluster or group
membership for each object included to develop the classification rule. In cluster analysis
there is no a priori information about the group or cluster membership for any of the
objects. Groups are suggested by data not defined a priori.
4.5.1 Statistics and terms associated with cluster analysis
Clustering methods are based on relatively simple procedures that are not
supported by an extensive body of statistical reasoning. The methods are based on
algorithms and hence differ from factor, discriminant, regression, ANOVA which are
based on extensive statistical reasoning. The terms associated are discussed below:
i.
Agglomeration schedule: The schedule gives information on the objects or

cases being combined at each stage of a hierarchical clustering process.
ii. Cluster centroid: The cluster centroid is the mean values of the variables for
all the objects in a particular cluster.
iii. Cluster centers: The cluster centers are the initial starting points in nonhierarchical clustering. Clustering are built around these centers or seeds.
iv. Cluster membership: Cluster membership indicates the cluster to which each
object or case belongs.
v. Dendrogram: Dendrogram or tree graph, is a graphical device for displaying
clustering results. Vertical lines represent clusters that are joined together. The
position of the line on the scale indicates the distance at which clusters were
joined. The dendrogram is read from left to right.
168
DBA 1657
vi. Distance between cluster centers: The distance indicates how separated the
individual pairs of clusters are. Clusters that are widely separated are distinct
and desirable.
NOTES
vii. Icicle diagram: An icicle diagram is a graphical display of clustering results. It

resembles a row of icicles hanging from the roof of a house. The columns
correspond to the objects being clustered and the rows correspond to the
number of clusters.
viii.Similarity/distance coefficient matrix: It is a matrix containing pair wise
distances between objects or cases.
4.5.2 Steps in conducting cluster analysis
The first step in cluster analysis is to formulate the clustering problem by defining
the variables on which the clustering will be based. Then an appropriate distance measure
must be selected. The distance measure determines how similar or dissimilar the objects
being clustered are. Several clustering procedure are available from which the researcher
should select the appropriate one suitable to the problem. The researcher should decide
the number of clusters and the derived cluster should be interpreted in terms of the
variables used to cluster them. Finally the validity of the clustering process should be
assessed.
The steps are explained in the clustering procedure are listed below:
Select a distance measure
Select a clustering procedure
Decide on the number of clusters
Interpret and profile clusters
Assess the validity of clustering

169
DBA 1657
NOTES
1. Formulate the problem

The most important aspect in formulating a problem is selecting the variable on
the basis of which clusters are to formed. Including irrelevant variables will affect the
clustering solution. The variables selected should describe the similarity between objects
in terms of the problem selected. The variables should be selected based on past research,
theory or in consideration of the hypothesis being tested. In exploratory research, the
researcher should act based on judgment and intuition.
2. Select a distance measure
The objective of clustering is to group similar objects together. For this purpose
some measures should be adopted to assess how similar or different the objects are.
The most common approach is to measure similarity in terms of distance between pairs
of objects. Objects with smaller distances between them are most similar to each other
than those at larger distances. The following are some of the methods available to
measure the distance between objects:
i.
The Euclidean distance is the most commonly used measure. It is the square
root of the sum of the squared differences in values for each variable.
ii. The city-block or Manhattan distance measure the distance between two
objects in terms of the sum of the absolute differences in values for each variable.
iii. The Chebychev distance between two objects is the maximum absolute
difference in values for any variable.
The variables involved in the study may be measured in terms of different units for
example in terms of Likert scale, frequency, percentages etc. in such cases before
clustering the respondents , the data must be standardized by rescaling each variable
to have a mean of zero and a standard deviation of unity. The outliners or cases
with non-conforming values should also be eliminated.
3. Select a Clustering Procedure
Clustering procedures may be broadly categorized as hierarchical or nonhierarchical. Hierarchical clustering is characterized by development of hierarchy or
tree-like structure. Hierarchical methods can be of two types viz., divisive or
agglomerative. Divisive clustering starts with all the objects grouped in a single cluster.
Clusters are divided until each object is in a separate cluster. Agglomerative clustering
starts with each object in a separate cluster. Clusters are formed by grouping objects
into bigger and bigger clusters. This process is continued until all objects are formed
into a single cluster. Agglomerative consists of (i) Linkage methods, (ii) Variance methods
and (iii) Centroid methods.
170
DBA 1657
NOTES
(i). Linkage methods

Linkage methods include single linkage, complete linkage and average linkage.
The single linkage method is based on minimum distance or the nearest neighbour rule.
The first two objects clustered are those that have the smallest distance between them.
The next shortest distance is identified, and either the third object is clustered with the
first two, or a new two-object cluster is formed. At every stage the distance between
two clusters is the distance between their two closest points as illustrated below:
Single Linkage
Minimum
distance
Cluster 2
Cluster 1
Single linkage method does not work well when the clusters are poorly defined.
The complete linkage method is similar to single linkage except that it is based on the
maximum distance or the furthest neighbour approach. The distance between two
clusters is calculated as the distance between their two furthest points.
Complete
Average
Linkage
Linkage
Maximum
Average
distance
distance
Cluster
Cluster
1 1
ClusterCluster
2
2
In the average linkage method the distance between two clusters is defined as
the average of the distances between all pairs of objects, where one member of the pair
is from each of the clusters. This method uses information on all pairs of distances, not
merely the minimum or maximum distances. Hence it is preferable to single and complete
linkage method.
171
DBA 1657
NOTES
(ii). Variance methods

The variance method attempts to minimize within cluster variance. Wards
procedure is a commonly used variance method. For each cluster, the means of all the
variables are computed. Subsequently for each object the squared Euclidean distance
to the cluster means is calculated. The distances are summed for all the objects. At each
stage, the two clusters with the smallest increase in the overall sum of squares within the
cluster distances are combined. This is illustrated below:
Wards Method
(iii). Centroid methods

In the Centroid methods, the distance between two clusters is the distance
between their centroids i.e., means of all the variables. Every time objects are grouped,
new centroid is computed.
Centroid Method
The average linkage method and wards method perform better than other
procedures.
Non-hierarchical clustering
The non-hierarchical clustering method is also known as k-means clustering.
This method includes sequential threshold, parallel threshold and optimizing partitioning.
i.
In sequential threshold method, a cluster center is selected and all objects

within a prespecified threshold value from the center are grouped together.
Next a new cluster or seed is selected and the process is repeated for the
172
DBA 1657
unclustered points. Once an object is clustered with a seed, it is no longer

considered for clustering with subsequent seeds.
ii.
The parallel threshold method operates similarly, however several clusters

are selected simultaneously and objects within the threshold are grouped
with the nearest center.
iii.
The optimizing partitioning method differs from the other threshold method
i.e., the objects can later be reassigned to clusters to optimize an overall
criterion, such as average within-cluster distance for a given number of
clusters.
NOTES
Non-hierarchical clustering is faster than the hierarchical methods and is

preferable when the number of objects or observation is large. The major drawback of
non-hierarchical procedure is that the number of clusters must be prespecified and the
selection of cluster centers is arbitrary. The clustering results depend on how the centers
are selected.
The hierarchical and non-hierarchical methods can be used together. An initial
clustering solution can be obtained using a hierarchical procedure and the number of
cluster and cluster centroids so obtained are used as inputs to the optimizing portioning
method.
4. Decide on the number of clusters
Some guidelines to make decision regarding the number of clusters are:
Theoretical, conceptual or practical considerations may suggest the number of clusters.
y In hierarchical clustering , the distance at which clusters are combined can be
used as criteria. This information can be obtained form the agglomeration
schedule or from the dendrogram.
y In non-hierarchical clustering, the ratio of total within-group variance to between
group variance can be plotted against the number of clusters. The point at
which a sharp bend occurs indicates an appropriate number of clusters.
Increasing the number of clusters beyond this point will not be useful
The relative sizes of the cluster should be meaningful with each cluster having more
elements. It is not useful to have only one element in a cluster.
5. Interpret and profile the clusters
Interpreting and profiling clusters involves examining the cluster centroids. The
centroids represent the mean values of the objects contained in the cluster on each of
173
DBA 1657
NOTES
the variables. The centroids enable us to describe each cluster by assigning it a name or
label. It will be more helpful to profile the clusters in terms of variables that are not used
for clustering. The demographic, psychographic, product usage, media usage or other
variables can be used for profiling. The variables that significantly differentiate between
clusters can be identified via discriminant analysis and one-way analysis of variance.
6. Assess Reliability and Validity
Several decisions are made on the basis of cluster analysis, hence clustering
solutions should not be accepted without assessing the reliability and validity. The
following procedure can be followed to provide adequate checks on the quality of
clustering results.
y Perform cluster analysis on the same data using different distance measure.
Compare the results across measures to determine the stability of the solutions.
y Use different methods of clustering and compare the results.
y Split the data randomly into halves, perform clustering separately on each half
and compare the cluster centroids across the two sub-samples.
y Delete variables randomly. Perform clustering based on the reduced set of
variables. Compare the results with those obtained by clustering based on the
entire set of variables. In non-hierarchical clustering, the solution may depend
on the order of cases in the data set. Multiple runs using different order of cases
can be performed until solutions are stabilized.
4.5.3 Uses of Cluster analysis:

Some practical area where cluster analysis can be used is explained below;
i.
Segmenting the market: The consumers may be clustered on the basis of the
benefits sought from the purchase of a product. Each cluster would consist of
consumers who are relatively homogeneous in term of the benefit they seek.
This is called benefit segmentation.
ii. Understanding buyer behaviour: Cluster analysis can be used to identify

homogeneous groups of buyers. The buying behaviour of each group may be
examined separately.
iii. Identifying new product opportunities: clustering brands and products enables
to identify the competitive sets within the market. Brands within the same cluster
compete more fiercely with each other than with brands in other clusters. A
174
DBA 1657
firm can examine its current offerings compared to those of the competitors to
identify potential new product opportunities.
NOTES
iv. Selecting test markets: Clustering geographical areas enables to select

comparable cities to test the various marketing strategies.
v. Reducing data: Clusters analysis can be used as data reduction tool to develop
clusters or subgroups of data that are more manageable than individual
observations.
4.5.4 Application of Statistical package : Cluster analysis

A car manufacturing concern would like to ascertain the current market for its
vehicles. For this it needs to group cars based on the information available regarding
various models of vehicles. The information regarding the vehicle type, price, engine
size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available.
The segmentation could be performed using the Hierarchical Cluster Analysis procedure.
The steps are discussed below:
To perform cluster analysis from the menu choose:
Analyze
Classify
Hierarchical Cluster...
175
DBA 1657
NOTES
Select the variables on basis of which clusters are to be formed. Also select the
case labeling variable
Click Plots.
Select Dendrogram.
Select None in the Icicle group.
Click Continue.
Click Method in the Hierarchical Cluster Analysis dialog box
176
DBA 1657
NOTES
Select Nearest neighbor as the cluster method.

Select Z scores as the standardization in the Transform Values group.
Click Continue.
Click OK in the Hierarchical Cluster Analysis group.
Interpretation of the output

The output of cluster analysis is discussed below: The dendrogram is a graphical
summary of the cluster solution.
177
DBA 1657
NOTES
Cases are listed along the left vertical axis. The horizontal axis shows the
distance between clusters when they are joined. Parsing the classification tree to determine the number of clusters is a subjective process. Generally, the gaps between
joinings along the horizontal axis is looked for . Starting from the right, there is a gap
between 20 and 25, which splits the automobiles into two clusters. There is another
gap from approximately 4 to 15, which suggests 6 clusters
The agglomeration Schedule
The agglomeration schedule is a numerical summary of the cluster solution
At the first stage, cases 8 and 11 are combined because they have the smallest
distance. The cluster created by their joining next appears in stage 7. In stage 7, the
clusters created in stages 1 and 3 are joined. The resulting cluster next appears in stage
8. When there are many cases, the table becomes rather long, but it may be easier to
scan the coefficients column for large gaps rather than scan the dendrogram. A good
cluster solution sees a sudden jump (gap) in the distance coefficient. The solution before the gap indicates the good solution. The largest gaps in the coefficients column
occur between stages 5 and 6, indicating a 6-cluster solution, and stages 9 and 4,
indicating a 2-cluster solution. These are the same as the findings from the dendrogram.
4.6. DISCRIMINANT ANALYSIS

Discriminant analsyis is a dependence multivariate technique. The purpose of
dependence technique is to predict a variable form a set of independent variables. It is
also used for predicting group membership on the basis of two or more independent
variables. Discriminant analysis is a technique for analyzing data when the criterion or
dependent variable is categorical and the predicator or independent variables are interval
in nature. For eg., The dependent variable may be the choice of a brand and the
independent variable may be the ratings of attributes of soft drinks on 5 point Likert
scale.
178
DBA 1657
NOTES
The objectives of discriminant analysis are as follows:

1. Development of discriminant fuctions which will best discriminate between
the categories of the dependent variable. Discriminant function is the linear
combination of the predictor or independent variables that will best discriminate
between the categories of dependent variable.
2. To examine whether significant differences exist among the groups in terms of
the predictor variables.
3. Determination of the predictor variables which contributes to most of the
intergroup differences.
4. Classification of cases to one of the groups based on the values of the predictor
variables
5. Evaluation of the accuracy of classification.
The discriminant analysis techniques are described by the number of categories
possessed by the dependent variable/ criterion variable. When the ciriterion variable
has two categories, the technique is known as two-group discriminant analysis. When
three or more categories are involved the technique is referred to as multiple discriminant
analysis. In two group discriminant analysis it is possible to derive only one discriminant
function. In multiple discriminant analysis, more than one function may be computed.
The discriminant analysis model involves the linear combination of the following
form:
D = b0 + b1X1 + b2X2+ b3X3 + ..+ bkXk
D = Discriminant score
bn = Discriminant coefficients or weights
Xn = Predictors or independent variable
The coefficients or weights(b) are estimates so that the groups differ as much
as possible on the values of the discriminant function. This will happen when the ratio of
between group sum of squares to within-group sum of squares for the discriminant
scores is at the maximum. Any other linear combination will result in a smaller ratio.
Several statistics are associated with discriminant analysis which are dealt below:
4.6.1 Statistics associated with discriminant analysis

The important statistics associated with discriminant analysis are:
Centroid :
The centroid is the mean value for the discriminant socres for a particular group.
The means for a group on all functions are the group centroids.
179
DBA 1657
NOTES
Classification matrix:
This is also called as confusion matrix or prediction matrix. It contains the number
of correctly and misclassified cases. The correctly classified cases appear on the diagonal
because the predicted and actual groups are the same. The off diagonal elements
represent cases that have been incorrectly classified. The sum of the diagonal elements
divided by the total number of cases represent the hit ratio.
Discriminant function coefficients:
The unstandardised discriminant function coefficients are the multipliers of
variables, when the variables are in the original units of measurement.
Discriminant scores:
The unstandardized coefficients are multiplied by the values of the variables.
These products are summed and added to the constant term to obtain the discriminant
scores.
Eigenvalue.
For each discriminant function, the eigen value is the ratio of between group to
within group sums of squares. Large eigenvalues imply superior functions.
F values and their significance:
These are calculated from a one-way ANOVA, with the grouping variable
serving as the categorical independent variable. Each predictor serves as metric dependent
variable in the ANOVA.
Group means and group standard deviation:
These are computed for each predictor for each group.
Pooled within- group correlation matrix:
The pooled within group correlation matrix is computed by averaging the
separate covariance matrices for all the groups.
Structure correlations:
This is also referred to as discriminant loadings , the structure correlations
represent the simple correlations between the predictors and the discriminant function.
Total correlation matrix:
If the cases are treated as if they were from a single sample and the correlations
computed, a total correlation matrix is obtained.
180
DBA 1657
NOTES
Wilks :
Sometimes also called the U statistic, Wilks for each predictor is the ratio of
the within group sum of squares to the total sum of squares. The values range between
0 and 1. Large values of nearing 1 indicates that the group means are not different.
Small values of nearing 0 indicate that the group means are different.
4.6.2 Steps in conducting two group discriminant analysis
The steps in conducting two group discriminant analysis are discussed below:
Research design issues
Assumptions
Estimating the discriminant functions
Assess the validity of discriminant

analysis
Interpretation of discriminant
functions
Validation of discriminant results
1. Formulate the problem:

The first step in discriminant analysis is to formulate the problem by identifying
the objectives, the criterion variable and the dependent variables. The criterion variables
must consist of two or more mutually exclusive and collectively exhaustive categories.
181
DBA 1657
NOTES
When the dependent variable is interval or ratio scaled, it must first be converted into
categories. The predictor variable should be selected based on a theoretical model or
previous research or in the case of exploratory research, the experience of the researcher
should guide the selection.
2. Research design issues
Research design for discriminant anlysis requires consideration of the following
issues (1) selection of both dependent and independent variables, (2)deciding the sample
size needed for estimation of discriminant function and (3) division of sample for validation
purpose.
(i) Selection of dependent and independent variable
To apply discriminant analysis the researcher should specify the dependent and
the independent variables. Dependent variable should be categorical and the independent
variables are metric. The number of dependent variables categories can be two or
more, but these groups must be mutually exclusive and exhaustive. Each observation
should be such that it can be placed into only one group. The dependent variable in
some cases may involve two groups eg., purchasers and non purchasers. In some
cases it may also involve several groups such as heavy users, medium users, light users
and non-users of a product.
After the decision regarding the dependent variables, the researcher must decide about
the independent variables to be included in the analysis. Independent variables can be
selected in the following two ways.
Identifying the variables from the previous research or from the theoretical model
that is underlying the basis of research question.
The second approach is intuition i.e., utilizing the researchers knowledge and
intuitively selecting variables for which previous research is not available.
(ii) Sample size

The ratio of sample size to the number of predictor variables should be
considered in discriminant analysis. Many studies suggest a ratio of 20 observations for
each predictor variable. If adequate sample is not maintained the results become unstable.
The minimum size recommended is five observations per independent variable. The
ratio applies to all variables considered in the analysis, even if all of the variables considered
are not entered into the discriminant function. In addition to the overall sample size, the
researcher must also consider sample size of each group. The smallest group size must
exceed the number of independent variables. The practical guideline is that each group
should have atleast 20 observations.
182
DBA 1657
NOTES
(iii) Division of sample

The sample should be divided into two groups called as estimation or analysis
sample and the holdout or validation sample. The analysis sample is used for estimation
of the discriminant function. The hold out or validation sample is reserved for validating
the discriminant function. It is essential that each subsample should be of adequate size
to support conclusions from the results.
If the sample is large enough, it can be split in half. One half serves as the
analysis sample and the other is used for validation. The analysis sample is used to
develop the discriminant function and the validation sample is used to test the discriminant
function. This method of validating the sample is referred to as the split-sample or
cross-validation approach. The role of the halves is then interchanged and the analysis
is repeated. This is called double cross- validation. The distributions of the number of
cases in the analysis and validation samples follow the distribution in the total sample.
For example, if the total sample contains 60 percent users and 40 percent non-users
of the product, then the analysis and validation sample would each contain 60 percent
users and 40 percent non-users.
3. Assumptions
The key assumptions in deriving the discriminant function are multivariate
normality of the independent variables and the unknown dispersion and covariance
structures for the groups as defined by the dependent variable. As in the case of all
multivariate techniques, the implicit assumptions that all relationship are linear applies to
discriminant analysis also. The researcher should examine the data and if assumptions
are violated, the researcher should identify the alternative methods available and the
impacts on the results that can be expected. Data not meeting the multivariate normality
assumption can cause problems in the estimation of the discriminant function.
4. Estimating the discriminant function
To derive at the discriminant functions, the researcher must decide on the method
of estimation and then determine the number of functions to be retained. After the
estimation of the function the overall model fit can be assessed in several ways.
Methods to derive discriminant function:
Two computational methods are used to derive the discriminant function viz.,
simultaneous/direct method and the step-wise method. The direct method involves
estimating the discriminant function so that all the predictors are included simultaneously.
In this case each independent variable is included regardless of its discriminating power.
This method is appropriate when the researcher wants to include all the independent
183
DBA 1657
NOTES
variables for theoretical reasons and is not interested in viewing the intermediate results
based only on the most discriminating variables. In step-wise discriminant analysis the
independent variables are entered one at a time, based on their ability to discriminate
among groups. The stepwise method is useful when the researcher wants to consider a
relatively large number of independent variables for inclusion in the function.
Statistical significance
The researcher must assess the level of significance of the discriminant function
computed. It would not be meaningful to interpret the analysis if the discriminant functions
estimated were not statistically significant. Significance test can be done on the basis of
a number of statistical criteria viz., wilks lambda, Hotellings trace and Pillai criterion.
The significant criterion of .05 or beyond is often used. If the higher levels of risk for
including non-significant results are acceptable, significance level at .2 or .3 may be
fixed.
If the number of groups is three or more, the researcher must decide not only if
the discrimination between groups is significant but also if each of the estimated
discriminant function is statistically significant.
Assessing Overall Fit
Assessing overall fit of the selected discriminant function involves three tasks:
calculating discriminant Z scores for each observation, evaluating group differences on
the discriminant Z scores and assessing group membership predication accuracy.
5. Interpretation of discriminant functions
Interpretation involves examining the discriminant functions to determine the
relative importance of each independent variable in discriminating between the groups.
Three methods are available to assess the importance of the discriminating function.
i.
The sign and magnitude of the standardized discriminant weights or

discriminant coefficient assigned to each variable is taken into consideration.
A small weight may indicate that the corresponding variable is irrelevant in
determining the relationship
ii.
Discriminant loadings also referred as structure correlations, measure the

simple linear correlations between each independent variable and the
discriminant functions. Variables are associated with the functions in which
it has a higher loading
iii.
If step-wise method is selected in deriving discriminant functions, an

additional means of interpreting the relative discriminating power of the
independent variable is available through partial F values. The absolute
184
DBA 1657
sizes of the significant F values are examined and ranked. Large F values
indicate greater discriminatory power.
NOTES
6. Validation of the discrimination results

The final stage in discriminant analysis involves validating the discriminant results
to provide assurance that the results have external as well as internal validity. The most
frequently used procedure to validate the discriminant function is to divide the groups
randomly into analysis and holdout sample. This involves developing a discriminant
function with the analysis sample and applying the same to the holdout sample.
Instead of randomly dividing the total sample into analysis and holdout samples
once, the total sample can be divided randomly divided into analysis and holdout samples
several times, each time testing the validity of the function through the development of a
classification matrix and a hit ratio.
Either one of these two approaches can be used only when the smallest group
size is atleast three times the number of predictor variables.
4.6.3 Uses of Discriminant analysis:
A medical researcher may record different variables relating to patients

backgrounds in order to learn which variables best predict whether a patient is
likely to recover completely (group 1), partially (group 2), or not at all (group 3).
A biologist could record different characteristics of similar types (groups) of

flowers, and then perform a discriminant function analysis to determine the set
of characteristics that allows for the best discrimination between the types.
Discrimininant analysis can help to distinguish between heavy, medium and light
users of a product in terms of consumption habits and lifestyles
It enables to carry out image research i.e., it enables to distinguish between

customers who exhibit favorable perceptions of a store and those who do not
It assists in distinguishing how market segments differ in media consumption

habits
1.6.4 Application of Statistical package : Discriminant analysis

Using cluster analysis a telephone company has categorized the customers into
four groups viz., Basic service, e- service, plus service and total service. The concern
wants to predict group membership so as to customize offers for individual prospective
customers. The predication should be based on the demographic data viz., gender,
age, marital status, income, education, number of years in current address, years with
current employer, retired and number of people in family. The Discriminant Analysis
procedure can be used to classify customers.
185
DBA 1657
NOTES
The steps are discussed below:

To run the discriminant analysis, from the menus choose:
Analyze
Classify
Discriminant...
Select the grouping variable.
Click Define Range.

Enter the Minimum
Enter the Maximum
186
DBA 1657
NOTES
Click Continue
Click Classify in the Discriminant Analysis dialog box.
Select Summary table and Territorial map.

Click Continue.
Click OK in the Discriminant Analysis dialog box.
These selections produce a discriminant model using the step-wise method of
variable selection.
Interpretation of the Output
The discriminant model produced using the step-wise method of variable
selection is discussed below:
Variables Not in Analysis
187
DBA 1657
NOTES
When there are lots of predictors, the stepwise method can be useful by automatically selecting the best variables to use in the model. The stepwise method starts
with a model that doesnt include any of the predictors. At each step, the predictor with
the largest F to Enter value that exceeds the entry criteria (by default, 3.84) is added
to the model. The variables left out of the analysis have F to Enter values smaller
than 3.84, so not added.
The following table displays statistics for the variables that are in the analysis at each
step.
Variables in Analysis
Tolerance is the proportion of a variables variance not accounted for by other

independent variables in the equation. A variable with very low tolerance contributes
little information to a model and can cause computational problems.
F to Remove values are useful for describing what happens if a variable is
removed from the current model (given that the other variables remain). F to Remove
for the entering variable is the same as F to Enter at the previous step (shown in the
Variables Not in the Analysis table
From the Summary of the Canonical functions eigen values table it can be
seen that nearly all of the variance explained by the model is due to the first two discriminant functions
188
DBA 1657
NOTES
Eigen Values
Three functions are fit automatically, but due to its minuscule eigenvalue, the
third function can be ignored.
Wilks lambda shows that only the first two functions are useful
Wilks Lamda
Structure Matrix
The structure matrix enables to identify the significant variables within each
function.
When there is more than one discriminant function, an asterisk(*) marks each
variables largest absolute correlation with one of the canonical functions. Within each
189
DBA 1657
NOTES
function, these marked variables are then ordered by the size of the correlation. Level
of education is most strongly correlated with the first function, and it is the only variable
most strongly correlated with this function. Years with current employer, Age in years,
Household income in thousands, Years at current address, Retired, and Gender
are most strongly correlated with the second function, although Gender and Retired
are more weakly correlated than the others. The other variables mark this function as a
stability function. Number of people in household and Marital status are most
strongly correlated with the third discriminant function, but since this is a useless function,
predictors are also useless.
The territorial map
The territorial map helps to study the relationships between the groups and the
discriminant functions. Combined with the structure matrix results, it gives a graphical
interpretation of the relationship between predictors and groups.
The territorial map offers a comprehensive view of the discriminant model.The

first function, shown on the horizontal axis, separates group 4 from the others. Since
Level of education is strongly positively correlated with the first function, this suggests
190
DBA 1657
that group 4 customers are, in general, the most highly educated. The second function
separates groups 1 and 3. Since, the third function was found to be rather insignificant,
only the first two discriminant functions are plotted.
NOTES
From Wilks lambda, it can be understood that the model is doing better than
guessing, but the classification results should be considered to determine how much
better the model is.
Given the observed data given in the above table it can be seen that, the null
model (that is, one without predictors) would classify maximum number of customers
191
DBA 1657
NOTES
into the model group, Plus service. Thus, the null model would be correct 281/400
(281 customers out of 400 customers ) = 28.1% of the time.
The discriminant model gets 11.4% more or 39.5% of the customers. In
particular, the model excels at identifying Total service customers. However, it does an
exceptionally poor job of classifying E-service customers.
4.7 MULTIPLE REGRESSION & CORRELATION

Multiple regression is a multivariate statistical technique used to examine the
relationship between a single dependent variable and a set of independent variables.
The objective of multiple regression analysis is to use the independent variables whose
values are known to predict the single dependent variable. Each independent variable
is weighted by the regression analysis procedure to estimate the maximal prediction
from the set of independent variables. The weights denote the relative contribution of
the independent variables to the overall prediction and facilitate interpretation as to the
influence of each variable in making the prediction.
Regression analysis is the most widely used technique for business decisionmaking. It is the foundation for building business forecasting models. It can also be used
to study the factors influencing consumer decisions. It enables to evaluate the expected
return from a stock option etc.,
4.7.1 Statistics associated with multiple regression and correlation analysis

The statistics and some of the basic terms used in multiple correlation and
regression analysis are explained below:
Beta coefficient : It is a standardized regression coefficient on the basis of which the
direct comparison between coefficients regarding their relative explanatory power of
dependent variable can be made.
Correlation coefficient( r): It indicates the strength of the association between any
two metric variables. The sign (+ or - ) indicates the direction of the relationship. The
correlation value can range from -1 to + 1 with +1 indicating a perfect positive
relationship, 0 indicating no relationship and 1 indicating a perfect negative or reverse
relationship
Coefficient of determination(R2): It is the measure of proportion of the variance of
the dependent variable about its mean that is explained by the independent variables.
The coefficient vary between 0 and 1 . Higher value of R2 greater the explanatory
power of the regression equation and therefore better prediction of dependent variable
is possible.
192
DBA 1657
Collinearity : It is an expression of relationship between two(collinearity) or more

independent variables( multicollenearity). Two independent variables are said to exhibit
complete collinearity if their correlation coefficient is 1, and complete lack of collinearity
if their correltion coefficient is 0. Multicollinearity occurs when any single independent
variable is highly correlated with a set of other independent variables.
NOTES
Regression coefficient (bn): Numerical value of the parameter estimate directly

associated with an independent variable. In the model Y = b0+ b1x1, the value b1 is the
regression coefficient for the variable X1 .
Residual (e or E): Error in predicting the sample data. It is an estimate of the true
random error in the population and not just the error in the prediction of the sample.
4.7.2 Steps in conducting multiple regression analysis

The steps in conducting multiple regression analysis are discussed below:
1. Formulating the research problem
The starting point in multiple regression is identification of the research problem.
In selecting suitable application for multiple regression , three issues are to be considered
viz., the appropriateness of the research problem, specification of a statistical relationship
and selection of the dependent and independent variables.
(i)Appropriateness of research problem
Multiple regression is an appropriate tool for research problems concerned
with prediction and explanation. These problems are not mutually exclusive, an application
of multiple regression analysis can address either or both types of research problem.
The fundamental purpose of multiple regression is to predict the dependent variable
with a set of independent variables. In predicting the dependent variables, two more
objectives are fulfilled viz., it provides an objective means of assessing the predictive
power of a set of independent variables and also enables comparing two or more set of
independent variables to ascertain the predictive power of each variate.
It also provides a means of objectively assessing the degree and character of
the relationship between dependent and independent variables. The independent variables
in addition to their collective prediction of the dependent variable, may be considered
for their individual contribution to the variate and its predictions. The variate may be
interpreted on any of the three perspectives: the importance of independent variables,
the type of relationships found, or the interrelationships among the independent variables.
193
DBA 1657
NOTES
(ii)Specifying a statistical relationship

Multiple regression is appropriate when the researcher is interested in a statistical
not functional relationship. In case of specifying the functional relationship there will be
no error in prediction. In specifying a statistical relationship there will always be some
random component to the relationship being examined. In statistical relationship more
than one value of the dependent value will be usually be observed for any value of an
independent variable.
(iii)Selection of dependent and independent variables
The success of multiple regression techniques depends on the selection of the
variables that are to be used in the analysis. The selection of dependent and independent
variable should be based on conceptual or theoretical grounds. In selecting the variables
the measurement error and specification error should be taken into consideration.
Measurement error refers to the degree that the variable is an accurate and consistent
measure of the concepts that are being studied. Measurement error may happen more,
in case of selection of the dependent variable. The most problematic issue in independent
variable selection is specification error which is concerned with the inclusion of irrelevant
variables or the omission of relevant variables from a set of independent variables.
2. Research design issues
In the design of a multiple regression analysis the researcher must consider the
issues regarding the sample size, the nature of the independent variables and the possible
creation of new variables to represent special relationship between the dependent and
the independent variables. The sample size used in multiple regression is most important
as the effect of sample size is most directly felt in the statistical power of the significance
testing and the generalization of the result. Power in multiple regression refers to the
probability of detecting as statistically significant a specific level of R2 or a regression
coefficient at a specified significance level for a specific sample size. Sample size has a
direct and sizable impact on power. Sample size also affects the generalization of the
results by the ratio of observations to independent variables. There should be atleast
five observations for each independent variable in the variate. If the ratio falls below this
stipulation, the risk of over fitting the variate to the sample i.e., making the result too
specific to the sample result which leads to lack of generalization.
Multiple regression deals with the linear association between metric dependent
and independent variables. If non-metric data needs to be included or to represent any
effects other than non-metric variable, new variables must be created by transformations.
194
DBA 1657
The transformations can be performed by using simple commands in various statistical

packages.
NOTES
3. Assumptions
In carrying out multiple regression analysis several assumptions about the
dependent and independent variables and about the relationships as a whole are made.
once the variate has been derived through multiple regression, it acts collectively in
predicting the dependent variable. The assumption is made not only for the individual
variables but also for the variate itself. The variate and its relationship with the dependent
variable should also meet the assumptions of multiple regression. The assumptions are;
(i)
The linearity of the relationship between dependent and independent

variables is assumed. This represents the degree to which the change
in the dependent variable is associated with the independent variable.
Partial regression slots can be used to show the relationship between
a single independent variable and dependent variable. A curvilinear
pattern of residuals indicate a non-linear relationship between a
specific independent variable and the dependent variable.
(ii)
The presence of unequal variances i.e., heteroscedasticity is one of

the most common assumption violations. Plotting the residuals against
the predicted dependent values and comparing them to the null
plot shows a consistent pattern , if the variance is not constant.
(iii)
Independence of the error terms is assumed in regression. Each

predicted value is assumed to be independent, it is not related to
any other prediction i.e., it is not sequenced by any other variable.
(iv)
Normality of the dependent and independent variables or both is

assumed. However, this is the most frequently encountered
assumption violation.
4. Estimating the regression model

In order to estimate the regression model and to assess the overall predictive
accuracy of the independent variables three tasks must be performed viz.,(i) selecting
a method for estimating the regression model,(ii) assessing the statistical significance of
the overall model in predicting the dependent variable and (iii)determining whether any
observation exert undue influence on the results.
195
DBA 1657
NOTES
(i) Method for estimating the regression model

The regression model can be estimated using confirmatory approach , sequential
search methods and combinational approach.
(a) The confirmatory approach is used when the set of independent variables is
completely specified. The researcher has total control over the variable selection.
(b) In sequential search method the regression equation is estimated using a set of
variables and then selectively adding or deleting variables until some overall crieteria is
achieved. This approach provides an objective method for selecting variables that
maximizes the prediction with the smallest number of variables employed. There are
two types of sequential approach viz., step-wise estimation and forward addition and
backward elimination.
(i). Step-wise estimation allows the researcher to examine the contribution of
each independent variable to the regression model. Each variable is considered for
inclusion prior to developing the equation. The independent variable with the greatest
contribution is added first, then based on incremental approach further variables are
selected.
(ii). Forward addition and backward elimination procedures are based on trial
and error approach. The forward addition model is similar to the step-wise procedure
mentioned above. The backward elimination procedure computes a regression equation
with all the independent variables and then deletes independent variables that do not
contribute significantly.
(c) The combinational approach is where all possible combinations of the independent
variables are used using a procedure called all-possible-subsets regression. All possible
combinations of the independent variables are examined as the best fitting set of variables
is identified.
(ii) Assessing the statistical significance of the overall model in predicting the
dependent variable
Testing for statistical significance is needed when the analysis is based on sample
rather than census. Significance testing of regression coefficient provides a statistically
based probability estimate of whether the estimated coefficients across a large number
of samples of a certain size will be different than zero. The test is done to determine
whether the impacts represented by the coefficients are generalizable to other samples
from the population.
196
DBA 1657
NOTES
(iii). Identifying the influential observations

Individual observation should be focused so as to identify the observation that
lie outside the general patterns of the data set or that strongly influence the regression
results. Influential observations are of three basic types: outliers, leverage points and
influentials. Outliers are observations that have large residual values and can be identified
only with respect to a specific regression model. Leverage points are observations that
are distinct form the remaining observations based on their independent variable values.
Influential observation includes all observations that have a disproportionate effect on
the regression results. It includes outliers and leverage points and may include other
observations as well.
5. Interpreting the regression variate
Each of the independent variables should be standardized before the regression
equation is estimated. The coefficients resulting from standardized data are called beta
coefficients. The advantage is that they eliminate the problem of dealing with different
units of measurement thus reflecting the relative impact on the dependent variable of a
change in one standard deviation in either variable. Since there is a common unit of
measurement, it enables to identify the variable which is having the highest impact.
6. Validation of the results
After identifying the regression model, the next step is to ensure that it represents
the general population and is appropriate for the situation in which it will be used. The
most appropriate empirical validation approach is to test the regression model on a
new sample drawn form the general population. The ability to collect new data is limited
due to cost, time pressures or availability of respondents. In this case, the split samples
can be used i.e., the sample can be divided into two parts viz., an estimation model for
the purpose of creating the regression model and the holdout or validation sub-sample
to test the equation.
4.7.3 Application of Statistical package : Multiple Regression and Correlation
An automobile concern wants to identify the sales for a variety of personal
motor vehicles so as to identify over and under performing models. This necessitates
establishing a relationship between vehicle sales and vehicle characteristics. Information
concerning different makes and models of cars like the vehicle type, price, engine size,
fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available.
Linear regression can be performed in STATISTICAL PACKAGEto identify
models that are not selling well. Steps are discussed below:
197
DBA 1657
NOTES
To run a linear regression analysis, from the menu choose:

Analyze
Regression
Linear
Select the dependent variable
198
DBA 1657
Select the Independent variables.
NOTES
Select Stepwise as the entry method.

Select the case labeling variable.
Click Statistics
Select Casewise diagnostics and type 2 in the text box.

Click Continue.
Click Plots in the Linear Regression dialog box.
Select the y variable and the x variable.

Select Histogram.
Click Continue.
Click Save in the Linear Regression dialog box.
199
DBA 1657
NOTES
Select Standardized in the Predicted Values group.

Select Cooks and Leverage values in the Distances group.
Click Continue.
Click OK in the Linear Regression dialog box
Interpretation of output
The collinearity among the variables needs to be verified from the output
collinearity diagnostics. If the eigenvalues are close to 0, it means that the predictors
are highly inter-correlated and that small changes in the data values may lead to large
changes in the estimates of the coefficients. Condition index values greater than 15
indicates a possible problem with collinearity; greater than 30, a serious problem.
The following collinearity table show that there are no eigenvalues close to 0,
and all of the condition indexes are much less than 15. The model built using step-wise
methods does not have problems with collinearity.
200
DBA 1657

Collinerity diagnostics
NOTES
Checking the model fit
The ability of the model to predict the dependent variable can be checked
through the model fit summary
Model Summary
The adjuster R square value predicts the fitness of the model. Higher value is
preferable.
Stepwise Co-efficients
The step-wise algorithm chooses price and size of the vehicle wheelbase as
predictors. Sales are negatively affected by price and positively affected by size. Hence
the conclusion is that cheaper, bigger cars sell well.
201
DBA 1657
NOTES
4.8 CANONICAL CORRELATION

Canonical correlation analysis is a multivariate statistical model that facilitates
the study of interrelationship among set of multiple dependent variables and multiple
independent variables. Multiple Regression predicts a single dependent variable from a
set of multiple independent variables, canonical correlation simultaneously predicts
multiple dependent variables from multiple independent variables.
Canonical correlation has fewest restrictions on the types of data on which it
operates. Other techniques impose high level of restriction and hence the information
obtained from them is of higher quality. However in case of situations with multiple
dependent and independent variables, canonical correlation is the most appropriate
and powerful multivariate technique. Canonical correlations goal is to quantify the strength
of the relationship between two sets of dependent and independent variables.
Canonical correlation deals with the association between composite of sets of
multiple dependent and independent variables. During the process it develops a number
of independent canonical functions that maximize the correlation between the linear
composites, also known as canonical variates, which are sets of dependent and
independent variables. Each canonical function is actually based on the correlation
between two canonical variates, one variate for the dependent variables and one for
the independent variables. The variates are derived to maximize their correlation.
4.8.1 Statistics and Key terms associated with Canonical correlation

The statistics and some of the basic terms used in multiple correlation and
regression analysis are explained below:
Canonical variable or variate: A canonical variable, also called a variate, is a linear
combination of a set of original variables in which the within-set correlation has been
controlled. That is, the variance of each variable accounted for by other variables in the
set has been removed. It is a form of latent variable. There are two canonical variables
per canonical correlation (function). One is the dependent canonical variable, the
independent variable are called the covariate canonical variable.
Canonical correlation, also called a characteristic root, is a form of correlation
relating two sets of variables. There may be more than one canonical correlation, each
representing an orthogonally separate pattern of relationships between the two latent
variables. The maximum number of canonical correlations between two sets of variables
is the number of variables in the smaller set.
202
DBA 1657
Pooled Rc2 (pooled canonical correlation) is the sum of the squares of all the canonical
correlation coefficients, representing all the orthogonal dimensions in the solution by
which the two sets of variables are related. Pooled Rc2 is used to assess the extent to
which one set of variables can be predicted or explained by the other set.
NOTES
Eigenvalues They reflect the proportion of variance in the canonical variate explained
by the canonical correlation relating two sets of variables.
Canonical weight. This is also called as the canonical function coefficient or the
canonical coefficient: The standardized canonical weights are used to assess the relative
importance of individual variables contributions to a given canonical correlation.
Structure correlation coefficient is also called as canonical factor loadings A
structure correlation is the correlation of a canonical variable with an original variable in
its set. Structure correlations are used for the following purposes.
Interpreting the Canonical Variables: The magnitudes of the structure
correlations help in interpreting the meaning of the canonical variables with which they
are associated. Larger canonical factor loadings should be weighted more when assigning
an interpretive label to the given canonical correlation. A rule of thumb is for variables
with correlations of 0.3 or above to be interpreted as being part of the canonical variable,
and those below not to be considered part of the canonical variable.
Calculating Variance Explained in a Given Original Variable: The square
of the structure correlation is the percent of the variance in a given original variable
accounted for by a given canonical variable on a given canonical correlation.
Canonical communality coefficient is the sum of the squared structure coefficients
for a given variable. The canonical communality coefficient measures how much of a
given original variables variance is reproducible from the canonical variables.
Redundancy coefficient, d, also called Rd, measures the percent of the variance of
the original variables of one set may be predicted from a (usually the first) canonical
variable from the other set. High redundancy means high ability to predict.
4.8.2 Steps in conducting Canonical correlation

The steps involved in building canonical correlation are discussed below:
1. Formulating the objectives
The canonical correlations analysis is highly flexible in terms of both the number
and types of variables handled and hence more complex problems could be addressed.
203
DBA 1657
NOTES
Two sets of variables - dependent and independent are identified in the canonical
correlation. Once the variables are identified, the canonical correlation can be performed
for the following purposes;
(i)
Determining the magnitude of relationship between two sets of variables
(ii)
Deriving a set of weights for each set of dependent and independent variables
so that the linear combinations of each set are maximally correlated.
(iii)
Explaining the relationship between the sets of dependent and independent

variables by measuring the relative contribution of each variable to the
extracted canonical functions.
2. Designing a Canonical Correlation Analysis

The researcher in case of canonical analysis may add more number of dependent
and independent variables without understanding the implications on the sample size.
The issues on the sample size and the necessity for a sufficient number of observations
per variable are frequently encountered. Small samples will not represent the correlations
well and very large samples will indicate statistical significance in all instances, even
when practical significance is not indicated. The sample should constitute of at least 4
observations for one variable to avoid overfitting the data.
The classification of data as dependent and independent variables does not
assume much significance for statistical estimation, as canonical correlations weights
both variates to maximize the correlation and it does not place any particular emphasize
on either variate. However a researcher must have conceptually linked set of the variables
before applying canonical correlation analysis. This makes the specification of dependent
and independent variables essential so as to establish a strong conceptual foundation
for the variables.
3. Assumptions
The following assumptions are made;
Multivariate normality is required for significance testing in canonical correlation.
This assumption is violated when dichotomous, dummy, and other discrete variables
are used.
Low multicollinearity: To the extent that the variables within the independent sets of
variables are highly intercorrelated, the canonical coefficients will be unstable. The
coefficients for some variables may be misleadingly low or even negative because
variance has already been explained by other variables.
204
DBA 1657
NOTES
Homoscedasticity and other assumptions of correlation are assumed.

Minimal measurement error is assumed since low reliability affects the correlation
coefficient. Canonical correlation also can be quite sensitive to missing data.
Adequate sample size must exist to reduce the chances of Type II error (thinking
you dont have something when you do). Stevens (1986) recommends at least 20
times as many cases as variables in the analysis in order to interpret the first canonical
correlation only. For two canonical correlations, Barcikowski and Stevens (1975)
recommend 40 to 60 times as many cases as variables.
No or few outliers. Outliers can substantially affect canonical correlation coefficients,
particularly if sample size is not very large.
5. Deriving the Canonical Functions and Assessing Overall Fit
The first step on canonical correlation analysis is to derive one or more canonical
functions. The canonical correlation analysis focuses on accounting for maximum amount
of relationship between the two sets of variables. The first pair of canonical variate is
derived so as to have the highest intercorrelation between two sets of variables. The
second pair of canonical variates is then derived so that it exhibits the maximum
relationship between the two sets of variables not accounted for by the first pair of
variates.
Three criteria can be used in conjunction with one another to decide which
canonical function should be interpreted. They are the:
(1) Level of statistical significance of the function. The level of significance generally
considered to be minimum acceptable is the .05 level
(2) Magnitude of the canonical correlation represented by the size of canonical
correlations should be considered when deciding which functions to interpret and
(3) Redundancy measure for the percentage of variance accounted.
5. Interpreting the Canonical Variate
Interpretation involves examining the canonical functions to determine the relative
importance of each of the original variables in the canonical relationships. The
following three methods can be used to interpret the variate:
205
DBA 1657
NOTES
(a) Canonical weights can be used to interpret the canonical functions. This involves
examining the sign and magnitude of the canonical weight assigned to each
variable in its canonical variate. Variables with relatively larger weights contribute
more to the variates and vice versa.
(b) Canonical loadings can be used to interpret the functions. It measures the simple
linear correlations between an original observed variable in the dependent or
independent set and the sets canonical variate. The larger the coefficient, the
more important it is in deriving the canonical variate.
(c) Canonical cross loading can be used as an alternative to canonical loadings.
This involves correlating each of the orginal observed dependent variables directly
with the independent canonical variate and vice versa.
6. Validation and Diagnosis
Canonical correlation analysis should be subjected to validation methods to
ensure that the results are not specific only to the sample data and can be generalized
to the population. For the purpose of validation two sub samples can be created
and analyses can be performed on each sub sample separately. Then the results are
compared for similarity of canonical functions, variate loadings etc., If marked
differences are found additional investigation should be performed.
Another approach is to assess the sensitivity of the results to the removal of
dependent or independent variable. To ensure the stability of the canonical weights
and loadings, multiple canonical correlations can be performed each time by
removing a different independent and dependent variable.
4.4.3 Application of Statistical Package : Canonical correlation
Canonical correlation can be carried out in SPSS using syntax. There are two
ways to perform the same. One is to use the Canonical correlation.sps macro. The
other way is to use MANOVA with DISCRIM subcommand.
(1) Canonical correlation.sps macro
The macro is a part of the SPSS package and can be found in a subdirectory
where SPSS is installed. To use the canonical correlation macro, locate the file Canonical
correlation.sps on the computer. Suppose that it is in c:\Program Files\spss. In the
syntax window, type
include file c:\Program files\spss\canonical correlation.sps.
206
DBA 1657
cancorr set1=var1 var2 var3
NOTES
/set2=var4 var5 var6.

In the above syntax replace var1-var6 with variable names to be used in the
canonical correlation analysis.
(2)
MANOVA
To use MANOVA the following syntax should be typed in the window:
MANOVA set1 WITH set2
/DISCRIM ALL ALPHA(1)

/PRINT SIG(EIG DIM).
Replace set 1 and set2 with the variable lists. Then run the program by selecting
Run from the menu. The data set is to be kept open in the data window while running
the program. The MANOVA output contains also multivariate regression results in
addition to canonical correlation analysis. The canonical correlation coefficients in the
macro output have the same values, but opposite signs to the ones in the MANOVA
output. The table names are also different, for example, the correlations between the
variables under analysis and canonical variables are called loadings in the macro output.
SUMMARY
Selection of multivariate techniques to analyze the data is based on two criteria:
dependent or independent variables and the type of data ie metric or non metric. The
various multivariate techniques like factor, cluster, multiple regression and correlation
discriminant analysis and canonical correlation were presented. The criteria for applying
the statistical tests and the steps involved in conducting the same is explained in detail.
Applications of these statistical tests using the software package were also discussed.
Once the data analysis is done, the report has to be prepared to communicate the
results to all concerned. The next unit on report writing deals with the same.

y Explain the significance of multivariate techniques in the context of research
studies.
y Identify a situation where factor analysis can be used. Discuss the steps involved
in performing the same.
207
DBA 1657
NOTES
y Explain the application of cluster analysis with example. Elucidate the process
of performing the same in SPSS. How will you interpret the results?
y What are the uses of discriminant analysis? Explain the process of building a
discriminant model.
y What is multiple regression? Explain the steps involved in the application of the
same .
y When can you apply canonical correlation? Explain the steps involved in building
the model.
208
DBA 1657
NOTES
Unit 5
The Research Report

Unit structure:
5.1 Introduction
5.3 Purpose of business reports
5.4 Types of reports
5.5 The concept of audience
5.6 Basics of written reports
5.6.1 Stages in writing reports
5.6.2 Characteristics of a good report
5.7 Integral parts of report
5.8 Research proposal
5.9 Visual aids in reports
5.9.1 Steps in creating visual aids
5.9.2 Guidelines
5.9.3 Types of visual aids
-------------------------------------------------------------------------------------
5.1 INTRODUCTION
Report writing is an integral part of a research process. Research reports are
written to communicate to the world at large the results of the research, field work, and
other activities. Research report is a concrete outcome of the research work undertaken.
The quality of the research is judged by the quality of the writing and how well the
importance of the findings is conveyed. A research carried out very scientifically revealing
findings of great importance may not be of value if the same is not communicated
effectively. In the context of business, the report assumes importance as it is through the
reports the management gets information regarding the activities performed at various
levels of the organization. The management takes decisions and controls various activities
of the business on the basis of information provided through the business reports.
According to Louis L.N. Business report is an unbiased and arranged presentation of
facts by one or more than one persons for a definite and specified important business
209
DBA 1657
NOTES
purpose. Koontz and ODonnell define report as, a documentation in which by the
purpose of providing information a specified problem is researched and analyzed and
conclusions, thoughts and sometimes references are presented. In a nut shell, a business
report is any factual, objective document that serves a business purpose. This chapter
provides an insight into the basics of writing research reports in addition to the contents
and characteristic features of a good report. The contents of a research proposal and
the use of visual aids in preparing reports are dealt in detail.

After reading this unit you will be able to understand;
y The basics of research report
y The importance and types of report
y The characteristics of a good report
y The need for audience analysis
y The contents of a report
y The steps in generating a report
y The contents of a research proposal
y Use of visual aids in report
5.3 PURPOSE OF BUSINESS REPORTS:

The business reports are prepared for the following purposes:
y Report enables the management to monitor the operations undertaken at various
levels and control the same
y The written report acts as a guideline for future course of action. It enables to
plan and organize things in an effective manner.
y The feed back regarding the various aspects, controls and processes
implemented in the organization can be obtained through the reports.
y The information regarding specific problems or issues can be obtained by way
of report. This report may be narrowly focused and provide the desired
information to the management in a brief format.
y Information provided in the reports enables decision making.
6
Report may also be prepared to convince the reader or to sell an idea. The
report in this case would be more detailed and convincing as to how the proposed
idea could add to the organizations value or the justification as to why it should
be adopted.
210
DBA 1657
y Report may also be prepared to provide several alternative solutions or

recommendations so as to compare the pros and cons and select a best course
of action. A detailed discussion of methodology, criteria for comparison, data
analysis etc should be provided
NOTES
y Reports may be prepared to provide an insight into the problem and may also
provide a final solution to the same.
5.4 TYPES OF REPORTS:

Reports can be classified on the basis of purpose, source, frequency, target
audience, length, subject dealt, function performed and intention.
1. Source
Source refers to the person/persons who initiated the report. Voluntary reports
are prepared on own initiative and they require more detail. The background of the
subject should be more carefully planned. The authorized reports are those which are
prepared as a response to a request made.
2. Frequency
Routine or periodic reports are submitted on a recurring basis which may be
weekly, monthly, daily etc. Some routine reports may be prepared in preprinted
computerized form. Due to the routine nature of report, it requires only less introduction
then the special reports. Special reports are nonrecurring in nature and they present the
results of specific, one time studies or investigations.
3. Length
A short report differs from a long report in scope, research and duration. A
long report examines the problem in detail and requires more extensive time and effort
in preparation. On the contrary a short report may discuss only a module of a problem.
A summary is a short report which gives a concise overview of a situation. It highlights
the important details but does not include background material, examples or specific
details. A short report is suitable when the problem is well defined, is of limited scope
and has a simple methodology. It normally runs to five pages.
4. Intent
Informal reports focuses on the facts and explains or educates the readers.
Analytical report is designed to solve a problem by convincing readers that the
conclusions and recommendations reached are justified based on the data collected,
analysis and interpretation. Information provided plays a supporting role in convincing
the reader.
211
DBA 1657
NOTES
5. Function
The reports may be classified as informative and interpretative on the basis of
function performed. Informative reports present facts pertinent to the issue or situation.
Common types of informational reports include those for monitoring and controlling
operations, statements of policies and procedures, compliance reports and progress
reports. It may take the form of an operating or a periodic report. Operating reports
provide managers with detailed information regarding all activities like sales, inventory,
costs etc., Periodic reports which describe the activities in a department during a particular
period.
Interpretative also knows as analytical or investigative report analyses the facts
and presents recommendations and conclusions. The report presents facts and persuades
the reader to accept a stated decision, action or the recommendations detailed throughout
the report. It may take the form of problem solving report providing the background
information and analysis about the various options. Trouble shooting reports is a form
of problem solving report which discusses the source of the problem, extent of damage
done and solutions possible. A feasibility report is a problem solving report that studies
proposed options to assess whether all or any one of them is sound.
6. Subject dealt
The reports may be categorized as problem determining, fact finding, performance
report, technical report etc. The problem determining report focuses on underlying a
problem or to ascertain whether a problem actually exists. Technical reports are concerned
with presenting data on a specialized subject with or without comments.
7. Legal reports
Reports may be prepared to meet the government regulations. For eg., A
compliance report explains what a company is doing to conform to the government
regulations. It may be prepared on annual basis like the income tax returns, annual
share holders report etc. Interim compliance reports can also be prepared to monitor
and control the licenses granted by the government.
5.5 The Concept of audience
Reports are written for the sake of audience i.e., the readers of the reports.
The goal of report writer is to enable the audience to act and hence the audience should
be taken into consideration, right from word choice, planning, organizing, deciding about
the visual aids, sentence structure etc., A good report requires to tune up the various
aspects of the audience viz., their knowledge level, their role in the given situation, their
place in the organization and their attitude.
212
DBA 1657
NOTES
(1)Knowledge level of the audience

Knowledge level refers to the extent to which the audience are aware about the
subject matter discussed in the report. This level ranges from expert to non-expert. An
expert audience understands the basic terminology, facts, concepts and implications
associated with the topic. Information about the extent of audience knowledge enables
to choose the information to be presented and the depth of explanation needed. In
ascertaining the audience knowledge level the following aspects should be considered.
Understanding the knowledge level of the audience
The researcher should ascertain what the prospective reader of the report knows
about the topic. The knowledge level can be ascertained by directly discussing with the
audience .The duties and responsibilities occupied by the audience will also provide a
key to understand their familiarity with the concept.
Adapting to the knowledge level of audience
The report should be adapted to the knowledge level of the audience by building
on their schemata i.e., based on the concepts they have formed from their prior
experiences. The basic principle here is that the reports add to the knowledge level of
the audience and not to waste the time by concentrating on what the audience already
knows.
(2)Audience Role in the situation
Audience take decision and plan the further course of action based on the
report presented to them. A good report should be adapted to accommodate different
audience roles. The topics and subtopics may be similar, but the report should be quite
different because of the different roles of the intended audiences. In order to determine
the audience roles the following aspects should be considered.
i. Type of audience
The audience for whom the report is drafted should be taken into account. The
audience could be a single person or members of a committee or a large group.
Sometimes there may be both primary audience as well as secondary audience. Primary
audiences are those at whom the document is addressed. Secondary audiences are
people who could read the report for information but not for immediate action.
ii. Audience need
Information in reports could be presented in different format in different way.
The audience need should be considered in deciding the format of report, its content,
the details needed, the level of precision required and the time period within which the
report is to be submitted.
213
DBA 1657
NOTES
iii. Writers goal and audiences need

The basic goal of a researcher/ the person who prepares the report is to enable
the audience to perform the act. This has to be fulfilled by delivering the basic message
that has a specific purpose. Basic message consists of the basic facts that need to be
presented to filling the purpose which may be to inform, instruct or persuade the readers.
If the goal is to persuade the reader to act in a certain manner because of the information,
then the report should clearly point out the significance of the data and the action they
support.
iv. Audiences task
Audiences task refers to the type of activity which the audience will indulge in
after reading the report. The audiences task may be different and hence the report
directed at each one of audience should also be different. For eg., For a manager the
report should contain explanatory paragraphs rather than the numbered how-to-do-it
steps which will be apt for a operator.
v. Number of audience
In case of the presence of both the primary and the secondary audience, the
report writer must decide whether the concentration should be on the primary or
secondary audience. If a brief note is only needed than a lengthy format, the same
should be preferred. If the same report is to be circulated among various members then
brief informal report is inappropriate.
(3) Audiences attitude
Audience attitude refers to the expectations of a reader while reading the report.
The expectations arise due to the readers role played in the organization, the social
situation ,the feelings about the message provided and the sender. These attitudes
powerfully affect the way the readers read the message in the report. The following
factors help to determine the attitudes of the individuals
y The consequences which arise out of the information given in the report should
be considered. A positive message may be welcomed in a optimistic perspective
by the readers.
y Understanding the history may provide an insight into the attitude of the reader.
History is the situation prior to the report writing. The report writers need to
show that the situation prior to writing report is understood. Otherwise the
readers may dismiss the same on the ground that the writer does not understand
the implications of what is being reported.
y The readers power affects his attitude towards the information provided in the
report. Power is the supervisory relationship of the author and the reader. The
214
DBA 1657
more powerful the reader, the less likely the report will give orders and the
more likely it is to make suggestions.
NOTES
y Formality is the degree of impersonality in the document. Written report

communicated to the reader is mostly presented in an official tone. The extent
of formality depicted in the report will affect the perception of the reader
regarding the message conveyed through the report
y Readers feeling regarding the subject dealt may be positively inclined, neutral
or negatively inclined towards the topic. If there is a positive inclination towards
the subject then the reader may be more receptive towards the message
conveyed.
y The writer should establish a relationship with the reader. The relationship is
affected by the writers credibility and authority. If the readers believe that the
writer has followed a clear and scientific method of investigating the topic, a
positive image will be created.
y The audience expect messages in a certain form. To stimulate a positive attitude,
the document should be presented in the form expected.
5.6 Basics of written reports
This section deals with an overview of steps involved in writing reports and
also highlights the characteristics of a good report.
5.6.1 Stages in report writing
Report writing is a process which should be carried out at various stages. The
goal of the writing process is to generate clear, effective document so as to enable the
audience to act. The writing process is performed in the following three stages viz., prewriting, writing and post-writing stage. The stages are discussed below:
I. Pre-writing stage
Pre-writing stage involves planning the task for writing the reports. It includes
collection of all the relevant information and deciding the steps to be followed. It involves
three tasks viz., analyzing the situation, investigation and adaptation.
1. Analyzing the situation
A thorough analysis of the situation should be made to decide whether the
situation merits writing report. Sometimes, it may be enough to make a phone call or
email or conduct a meeting. If situation warranties writing reports, then the next step is
to decide the type of report needed. It may be informational or an analytical report. In
case of informational report, the specific purpose of the report should be defined and
215
DBA 1657
NOTES
report type that is appropriate should be selected. For analytical reports, the problem
should be defined before stating the purpose of the report.
Problem definition
The problem addressed by a report may be defined by the person who authorizes
the report or by the researcher himself. The readers of the report should be convinced
about the existence of the problem. This requires persuasive writing method. The problem
definition can be made by answering the following issues:
y What needs to be ascertained?
y When did the problem start?
y What is the importance of the issue?
y Who are involved in the situation?
y Where is the trouble located?
Problem factoring can also be done which involves breaking down the perceived
problem into a series of logical, connected questions that try to identify the cause and
effect. Speculating the cause for a problem leads to forming a hypothesis. A hypothesis
is a potential explanation that needs to be tested. Dividing the problem and framing the
hypothesis based on the available evidence enables to tackle even the most complex
situation.
Developing the statement of purpose
The problem statement enables to define what is going to be investigated whereas
the statement of purpose defines, why the report is prepared. The purpose statement
can be started with an infinite phrase. For eg., To analyse the reasons for fall in the
share price. Using an infinite phrase ( to plus a verb) encourages to take control and
decide where the starting should be made. The purpose statement should be highly
specific and the same should be checked with the person who has authorized the report.
The confirmed statement can be used as the basis for developing the preliminary outline
of the report.
Developing a preliminary outline
Preliminary outline establishes the framework for the report preparation. It
provides a visual diagram of the report to be prepared, in important points, the order in
which the discussion will take place and the details to be included. The preliminary
outline might look different from the final outline of the report, however, the outline
guides the research effort and acts as a foundation for organizing and composing the
report. Since, outline is only a working draft it will be revised and modified in the further
216
DBA 1657
steps. The two common outline formats used to guide the writing efforts are; alphanumeric
and decimal. The grammatical parallelism should be ensured among the various items
presented at the same level. Parallelism ensures generality by showing that the ideas are
related and they are of similar importance.
NOTES
Preparing the work plan

Most of the reports have a firm deadline to be met. A carefully prepared work
plan ensures that the quality reports are produced on the schedule. If the work plan is
prepared for the researcher himself, it can be prepared in an informal manner. However
in case of proposal, a detailed work plan should be prepared which becomes the basis
for the contract if the proposal is accepted. A formal work plan might include the following
elements:
y Statement of the problem which enables to stay focused on the core problem
y The purpose statement which describes the plan to be accomplished with the
report and the boundaries of the work.
y A description of the product that will arise out of the investigation. Many times
the report may be the only outcome.
y A review of the project assignments, schedules and resource requirements
indicating who will be responsible for what, when the task will be completed
and how much will be the investigation cost.
y Plan for following up after delivering the report should be explained.
2. Investigating information
Information should be gathered for writing reports on various perspectives
such as the specific company information, trends, issues, product, events, related
literature, micro and macro economic perspectives of the problem taken for the study
etc. The following tasks should be completed in investigating the information:
y Identify the right questions
y Find and access primary and secondary sources of information
y Evaluate and finalize the resources
y Process the information
y Analyze the data
y Interpret the finding
The tasks mentioned need not be performed in a specific order.
217
DBA 1657
NOTES
3. Adapting the report

A good relationship with the audience should be maintained in order to ensure
that the report is audience centered. A report will be successful, only if it focuses on the
audience. The focus on the audience can be maintained by following the criteria given
below:
y The you attitude should be followed and the report should answer the audience
questions and solve their problems.
y Emphasize should be given to the positive aspects. If the report recommends a
negative action, the facts should be stated and the recommendation should be
made positively.
y Credibility should be established by building audience trust. The trust can be
gained by researching the topic from all sides and documenting the findings
with credible sources.
y The report should address the audience in a polite manner. The audiences
respect should be earned by being courteous, kind and tactful.
y Bias- free language should be used. Unethical and embarrassing blunders in
language related to gender, race, age and disability should be avoided.
y The style and language of the report should reflect and adapt to the image of
the organization.
Selecting the appropriate channel and medium
A right medium should be selected for conveying the report. It may be in the
form of oral presentation, e-format, email, letter or a formal written report. Written
reports are opted to convey complex lengthy information which needs to be presented
in a structural format and is need for further reference. If immediate feedback is needed,
oral reports are appropriate. Electronic reports are stored in electronic media and may
be distributed on disk, attached to an email or posted in the website. When compared
to paper based reports, electronic reports enable to save cost and space. It also enables
faster distribution as well as include multimedia features. The appropriate channel should
be chosen based on the requirement of the audience and the researcher.
II. Writing stage
Actual composing of report should be preceded by organizing the material
collected and arranging the same in a logical order that will meet the audience needs.
The format, length, order and structure of the report should be decided before drafting
the report.
218
DBA 1657
NOTES
Deciding the format and length

Four options are available to format the report viz.,
Pre-printed form: The pre-printed form is a fill-in-the-blank type report. These reports
are relatively short and deals with only routine information.
y Memo: It is a short informal report distributed within the organization. It has
headings and visual aids and if the length exceeds ten pages it is called as memo
report.
y Letter: It includes the normal parts of a letter and in addition may have headings,
footnotes, tables and illustrations. It is commonly used for reports of five or
fewer pages that are directed to outsiders.
y Manuscript: It is commonly used for reports that require a formal approach
and may range from a few pages to several hundred pages. The prefatory parts
and supplementary parts will have more number of pages as the length of the
report increases.
If the report is more of routine nature, the flexibility in deciding the format and
length is much lesser. The length of the report is often decided by the subject matter and
the type of relationship with the audience.
Choosing the approach
The researcher may choose a direct or indirect approach in writing the report.
A direct report starts with the main idea first and thereby saves time and enables easier
understanding of the report. The direct approach is used when the audience is more
receptive or open-minded. The report starts with the findings, conclusions and
recommendations. This method is mostly followed in the business reports.
The indirect approach withholds the main idea until the latter part of the report.
If the audience is skeptical or hostile then the complete findings and all supporting
details should be presented before presenting the findings and conclusions.
Structuring the reports
Structure of the report deals with the way in which the ideas will be subdivided
and developed. The structure of the report depends on its type viz., informational,
analytical, investigative etc. The reports may follow topical organization i.e., arranging
materials according to one of the following topics:
y Materials may be organized on the basis of the importance of the subject matter.
The most important topic may be presented first and least important at the end
of the report
219
DBA 1657
NOTES
y If the report is presented on the process, then it should be arranged in sequential

order of the process.
y If events are reported in the study then the same should be reported in the
chronological order.
y If a physical object is discussed in a report ,then the same should be discussed
from left to right, top to bottom, outside to inside.
y If the report is organized on the basis of geographical area occupied it has to be
organized on the basis of the regions under study viz., city, district, state or
country.
Composing Reports
Once the decision regarding the length, approach and structure is made,
composing of first draft can begin. The writing task should start with preparation of a
final outline. This would act as a guide to the writing process and will also enable to
critically evaluate the selection and order of information to be presented in the report.
The outline preparation may lead to rephrasing the points and tone of the report. While
composing the reports, the researcher should only concentrate on drafting the message
and not editing or polishing the same which is done at a later stage. While composing
the reports the following points should be kept in mind:
y Formal language should be used in writing reports. Obsolete and pompous
language should be avoided. Similarly using big words, trite expressions and
overly complicated sentences to impress others should not be attempted.
y Correct words should be used in report. The words selected should convey
the meaning clearly, specifically and dynamically. The words that are familiar to
the audience should be chosen. Clinches and jargons can be used only when it
is understood by the audiences for whom the report is directed to.
y Due attention should be paid to the grammatical accuracy of the content delivered
as it affects the image of the researcher.
y The report should concentrate on presenting the facts
y The arguments for or against any aspect should be constructed in a rational
manner
y Active or passive voice should be used appropriately in composing the reports.
Active voice can be used to emphasize the subject and to produce shorter
sentences. Passive voice is mostly used in research reports as it is prepared in
a formal situation.
220
DBA 1657
y To ensure readability, the report should be broken up into paragraphs with

suitable headings.
NOTES
y Consistent time perspective should be ensured in the report i.e., the report
should be in past or present tense. The chronological sequence should also be
adapted in presenting the events.
y The readers perspective of the report might be different from the researchers
perspective. Hence a preview or road map of the report structure should be
included. This will clarify the reader regarding the overall organization and flow
of report.
III. Post-writing stage
A research report will undergo many drafts before finalization. The report is
revised many times to ensure the content, organization, style and tone, readability, clarity
and conciseness. Post-writing stage involves revision of the report, production and
proofreading the same.
(1)Revision
Revision takes place during and after preparation of the first draft. It is an
ongoing process that occurs throughout the writing process. Revision involves search
for best way of saying something, probing for right words, rephrasing sentences,
reshaping, juggling elements etc. Revision is a never ending process, however, every
research report has a deadline and hence schedules should be drawn and met. Revision
consists of three main activities viz,(i) evaluating content, organization ,style and tone
(ii) reviewing for readability and scannability and (iii) editing for clarity and conciseness
(i) Evaluating content, organization, style and tone
During the process of evaluating the content the following aspects should be
given due attention:
y Accuracy of the information presented
y Relevance of the facts presented to the concerned audience
y Completeness of information provided to suite the audience needs
y Balance between specific and general information
While reviewing the organization the following aspects should be considered:
y Logical order in presentation and coverage of all main points to be ensured
y Assuring that the main theme is given more space and prominence
221
DBA 1657
NOTES
y Correctness in the sequence of presentation

y Grouping of scattered details in an appropriate manner
More attention should be given to the introduction and conclusion of the report
as it has major impact on the audience. The words used should be of right style and
tone. The opening statements should be relevant, interesting and enticing the reader to
read further. It should establish the subject, purpose and organization of the information
in the report. The conclusion should be reviewed to ensure that it summarizes the main
idea and leaves the reader with a positive impression.
(ii) Reviewing for readability and scannability
Readability depends on choice of words, sentence length, sentence structure,
organization and the physical appearance of the message. The following techniques can
be used to ensure readability:
y Variety in sentence structure makes the information presented more appealing
to the reader. While long sentences should be avoided, use of too many short
sentences should not be attempted. Average sentence length should consist of
20 words or fewer.
y Important ideas can be presented in the forms of list. Lists are effective tools
for highlighting and simplifying the information presented. It provides the reader
with clues, simplifies the complex subjects, highlights the main point, breaks up
the pages visually and ease the skimming process for busy readers
y Heading is a brief title that provides clues to the reader about the content of the
section that follows. Heading should be properly used to attract the readers
attention and to divide the material into shorter sections.
(iii) Editing for clarity and conciseness
Clarity in information presented should be ensured. Clarity prevents confusion.
If the information is presented in a cluttered manner it can be interpreted by the reader
in several ways which is not intended by the researcher. The following aspects should
be considered to ensure clarity:
y Long sentences should be broken up. Connecting too many clauses with and
should be avoided.
y Too many hedging statements should be avoided.
y Parallelism should be ensured among related ideas. It can be achieved by
repeating the pattern in words, phrases, clause or entire sentences.
y Long noun sequences should be avoided
222
DBA 1657
y Words ending with -ion,-tion, -ing, -ment, -ant, -ent, -ance, and -ency should
be used with care as they change verbs into nouns and adjectives. Verbs should
be used instead of noun phrases
NOTES
y Reports use expressions like above-mentioned, as mentioned above, the

former, the latter, respectively . These words cause reader to jump from point
to point and hence these awkward references should be kept to the minimum.
To ensure conciseness, every word in the report should be carefully scrutinized.
Words which do not serve any function should be eliminated. Every long word should
be replaced with a short word. Conciseness should be ensured by way of deleting
unnecessary words and phrases, shortening words and phrases and by eliminating
redundancies.
Use of computer enables to revise the report in a much faster and efficient
manner. Word processor helps to add, delete and move text with functions like cut and
paste, search and replace, replace all options etc. Autocorrect feature enables to store
words commonly misspelled or mistyped along with correct spelling. History of revisions
made can also be fetched by enabling the software options. Three advanced software
functions viz., spell checker, thesaurus and grammar checker enables to create an effective
report.
(2) Production the report
Producing the report involves adding elements such as graphics and designing
the page layout to give the report attractive and contemporary appearance. Adding
graphics is dealt in detail in the latter part of this unit. The appearance of the report
meets the eyes of the reader first and plays an important role in creating impression.
Effective design should have the following elements:
y Consistency should be ensured throughout the report in terms of the margins,
typeface, type size, spacing, paragraph indent, borders, columns etc.
y Proper balance should be maintained between the text, graphs and white space.
y Too much of highlighting, decorative touches and design element should be
avoided. Simplicity in design should be aimed at.
y Attention should be paid to details like heading should not be separated from
the information, avoiding narrow columns and the like.
y Variety of design elements such as line justification, typefaces, styles etc can be
used to create a professional and interesting report , but it should be kept in
mind that too many design elements might confuse the reader.
223
DBA 1657
NOTES
(3) Proof reading

While proofreading attention should be devoted to spelling, punctuations and
typographical errors. Credibility of the researcher is affected by the attention paid
to details, mechanics and form. Researcher should carefully check the grammar
usage, language errors, missing material, design errors and typographical errors.
Design errors include elements like wrong typeface, wrong type style, misalignment
etc. typographical errors include uneven spacing between lines and words, heading
at the bottom of a page, incorrect hyphenation, non confirmation with the guidelines
provided etc. Attention should also be paid to overall format. Routine documents
will have only fewer elements to check. Longer more complex documents have
many components that need checking and more time should be devoted for the
same.
The three stages in report writing is summarized in the following pictorial
representation.
Stages in report writing
Prewriting
Analyzing
Investigating
Adaptation
Writing
Format
&length
Structure
Order
Composing
Post writing
Revision
Production
Proof reading
5.6.2 Characteristics of a good report

The desirable features of a good report is dealt under various sections of report
writing discussed in previous pages. A summary of the salient features are listed below;
y A good research report should focus on the purpose of the study and the type
of the audience
y It should also have clarity, conciseness and coherence
y Right emphasis should be placed on the important aspects of the problem
identified meaningful organization of paragraphs, sentences and smooth transition
from one topic to next should be achieved by ensuring , parallelism, specificity
etc.
224
DBA 1657
y The report should be free of technical or statistical jargon if the same is addressed
to audiences who may not understand.
NOTES
y Care should be taken to avoid grammatical, spelling and typographical errors.

Assumptions made by the researcher should be clearly spelled out.
y Operational definitions of words used with specific meaning should be given in
the beginning of the report.
y The report should be organized in a meaningful manner so as to enable smooth
flow of information.
y Side headings should be properly used to ensure smooth and logical flow of
meaning Appearance should be given due emphasize so that a professional
image can be created.
y Ambiguity, multiple meanings and allusions should be avoided by choosing the
right words and sentences.
y The report should adhere to the guidelines specified regarding the format etc
and should be prepared within the schedule provided.
5.7 INTEGRAL PARTS OF A REPORT

Research report has a set of identifiable components. The components of report
should be decided keeping in mind the needs of the audience. The headings and side
headings should also focus on the requirements of the audience and the problem identified
for the study. Generally, a research report consists of three parts; the preliminaries, the
text and the reference materials. Each of the main parts may consist of several subsections
as shown below:
A. The preliminaries
The preliminaries do not make a direct contribution to the identified research
problem. However, it assists the reader in using the research report. The subsections in
preliminaries are discussed below:
Letter of transmittal
A letter of transmittal is required in case of formal relationship between the
researcher and audience at whom the report is directed at. It is mostly used in case of
carrying out research work for a specific client or for an outside organization. The letter
should highlight the authorization for conducting the project and the specific instructions
provided to complete the study. It should also state the purpose and scope of the study.
The letter of transmittal is not necessary if the report is aimed at authorities within the
organization.
225
DBA 1657
NOTES
Title page
Most of the organization have their own form of title page for the research
report and the same should be complied with. The title page generally has the
following information:
y Title of the report.
y The month and year of submission.
y For whom and by whom the report is submitted.
y If project report is submitted for award of degree, the degree for which the
dissertation is submitted for should be listed.
The best practice is to centre the title of the report on the page in upper case
letters. If the title is too long to be centered on one line, an inverted pyramid principle
should be followed without splitting word or phrases.
Preface
The preface may include the writers purpose in conducting the study, a brief
resume of the background, scope, purpose, general nature of the research for which
the report is prepared and the acknowledgments. A preface can be prepared only after
the final form of the report is ready. In the case of dissertation submitted for award of
degree the preface is omitted and instead an acknowledgment is added.
Acknowledgment recognizes the persons to whom the writer is indebted for
guidance and assistance during the study. It also credits the institution for providing
funds to conduct the study and for granting permission to use the facilities. The researcher
should acknowledge the assistance provided by all concerned honestly in a simple and
tactful manner.
Executive summary
An executive summary is a brief account of the research study. It is a report in
miniature covering all aspects in the body of the report but in a brief manner. It provides
an overview of the research problem identified and highlights the important information
such as the sampling design, data collection method used, results of data analysis, findings
and recommendation. The length of the executive summary will normally be two to
three pages. The executive summary is usually written after the completion of the report.
Sometimes a synopsis or an abstract may be included instead of the executive
summary, however, they are not one and the same. Executive summaries are more
comprehensive than a synopsis. It includes heading, visual aids and enough information
to help busy people to make quick decision. Although executive summaries are not
226
DBA 1657
designed to replace the report, in some cases it may be the only thing that may be read
by the audience. By contrast, a synopsis is only a brief overview of the entire report and
may either highlight the main points as they appear in the report or simply inform the
reader as to the content of the report. The purpose of synopsis is to entice the audience
to read the report.
NOTES
Table of contents
The table of contents includes the major divisions of the report. It indicates in
outline form the topics included in the report. The purpose of a table of contents is to
provide an analytical overview of the topics included in the report together with the
sequence of presentation. Depending on the length and complexity of the report, the
content page may show only the top two or three levels of headings or only the firstlevel headings. Care should be exercised to see that the titles of chapters and captions
of subdivisions within chapters correspond exactly with those included in the body of
the report. Page numbers for each of the divisions are given. The relationship between
major divisions and minor subdivisions should be shown by using capital letters and
indentation or by using numeric system.
The table of contents is prepared after the other parts of report have been
typed, so that the page numbers can be given. If they are fewer than four visual aids, the
same may be listed in the table of contents, but if there are more than four visual aids, a
separate list of illustration should be prepared. Some guidelines for writing table of
contents are given below:
y The page is titled as Table of Contents or Contents.
y The name of each section should be worded and formatted as it appears in the
text.
y The table of contents should not be underlined as they may overwhelm the
words.
y Use only the page number on which the section starts.
y The margins should be set such that the page numbers align on the right.
y Not more than three levels of headings should be given.
y The leaders, a series of dots can be used to connect the words to page numbers
List of Tables
The researcher should prepare a list of tables compiled under the heading LIST
OF TABLES. It should be centered on a separate page by itself. Two spaces below
the headings Table number, Title, and Page number should be given. Table number
227
DBA 1657
NOTES
should be aligned to the left, page number should be aligned at the right and the title
should be centered.
List of Illustrations
The list of figures should be prepared in the same form as the list of tables. The
page is headed as LIST OF FIGURES. The list includes the Figure number, title of the
figure and page number. Normally arabic numerals are used for numbering.
B. The Text
The text is the most important part of a report as it is in this section that the
writer presents the facts. The researcher should devote the greater part of attention to
the careful organization and presentation of his findings or arguments. The text may be
organized as introduction, methodology and as many chapters as required for presenting
the report.
Introduction
The introduction prepares the reader for the report by describing the various
parts; background, problem statement and research objectives.
Background
The background information provides a prelude to the reader of the research
report. It may be the preliminary results of exploration the survey or any other source.
The secondary data from the literature review could also be highlighted. Previous research,
theory or situations that led to the research issue can be discussed. The literature should
be organized, integrated and presented in a logical manner. The background includes
definitions, assumptions etc. It provides the needed information to understand the
remainder of the research report. It contains information pertinent to the management
problem or the situation that led to the study. It may be placed before the problem
statement.
Problem statement
The problem statement contains the need for the research project. The problem
is usually represented by a management question. It is followed by a more detailed set
of objectives. The guidelines are given below:
y It gives basic facts about the problem.
y It specify the causes or origin of the problem.
y It explains the significance of the problem.
Research objectives
The research objectives provide the purpose of the research. The objectives
may be research questions and associated investigative questions. In correlational study,
228
DBA 1657
the hypothesis statements are included. Hypothesis are declarative statements describing
the relationship between two or more variables. They state clearly the variables of
concern, the relationships among them, and the target group being studied. Operational
definitions of variables should be included.
NOTES
Methodology
The methodology contains the following sections:
y The type of the study viz., descriptive, exploratory should be mentioned in the
methodology.
y The sampling design explains the sample method and sample size.
y The data collection method is described in the report.
y The tools used for analysis of data should be explained.
Findings and Conclusions
The findings section is generally the longest section of the report. The objective
is to explain the data. Wherever needed the data should be supplemented with charts,
and graphs. The conclusion serves the important function of tying together the whole
thesis or assignment. The recommendations of the study are also presented in this section.
It provides idea about the corrective actions. In academic research, the suggestions
broaden the understanding of the subject area. In applied research, the recommendation
includes the guidelines for further managerial actions. Several alternatives may be provided
with further justifications. The conclusion should leave the reader with the impression
of completeness and of positive gain.
C. Reference material
The reference material includes, bibliography, appendix and index.
Bibliography
The bibliography follows the main body of the text and is a separate but integral
part of a thesis, preceded by a division sheet or introduced by a centered capitalized
heading BIBLIOGRAPHY. A bibliography is a list of secondary sources consulted
while preparing the report. In a proper sense bibliography differs from the reference
list. A bibliography is the listing of the work that is relevant to the main topic of research
interest arranged in the alphabetical order of the last names of the authors. A reference
list is a subset of bibliography. It includes details of all the citations used in the literature
survey and elsewhere in the research report, arranged in the alphabetical order of the
last names of the author. These citations are provided for the purpose of crediting the
author and enabling the reader to find the works cited
Proper citation, style and formats should be followed in providing reference.
Various methods of referencing are available viz., Publication manual of the American
229
DBA 1657
NOTES
Psychological Association (APA), The Chicago Manual of Style, The Modern Language
Association (MLA) System, American Chemical Society (ACS) system. Each of the
manuals specifies with examples, how the books, journals, newspaper articles,
dissertations and so on should be referenced.
For books the order may be as under:
1. Name of the author, last name first
2. Title of the book in italics
3. Place of publication and the publisher
4. Year of publication
Example:
Peeru Mohamed et.al, Customer Relationship Management, Delhi, Vikas publishing
house, 2002.
References for articles in journals could be cited as under:
1. Name of the author, last name first
2. title of article in quotation marks
3. Name of periodical, in italics
4. The volume or volume and number.
5. The date of the issue
6. The pagination
Example
Chitra.K, In search of Green Consumer: A Perceptual Study, Journal of
Services Research, Volume 7, No.1, April-September, 2007, pp.173-191.
The above examples are just samples for bibliography entries. There are many
other acceptable forms which can be used. However, a researcher should follow a
consistent style of reference throughout the report.
Appendix
The appendix contains information of a subordinate, supplementary or highly
technical nature that the researcher does not want to place in the body of the report.
Each appendix should be clearly separated from the other and should be listed in the
table of contents. The guidelines for preparing appendix are:
y Each appendix item should be referred in the appropriate place in the body of
the report.
230
DBA 1657
y In short reports, the page number numbers may be continued in sequence from
the last page of the body.
NOTES
y In long reports, a separate pagination system can be followed as the appendixes

are often identified as Appendix A, Appendix B, and so on. The page numbers
can be given along with the appropriate letter: A-1, A-2, B-1, B-2.
y The illustrations in the appendix may continue with the sequence started in the
body of the report.
Index
The index should be included after bibliography and the appendix. It acts as a
good guide to the reader. Index may be prepared both as subject index and author
index. The subject index gives the names of the subject-topics or concepts along with
the number of pages on which they have appeared or discussed in the report. The
author index gives similar information regarding the names of the authors. The index
should always be arranged alphabetically. An index is not required for an unpublished
thesis or a report. If the finding in the report is subsequently published as a book,
monograph or bulletin, an index is necessary.
5.8 RESEARCH PROPOSAL

A research proposal is also a type of research report prepared for getting the
permission to proceed with the research work. It is a work plan, outline, statement or
intent or draft plan of the proposed research work. It gives an insight into what, why,
how, where and for whom the research is done for. It is a road map showing all elements
of the research process and resources required at every step right from the beginning to
the end.
The preparation of research proposal benefits both the researcher and the
research sponsor. The research proposal enables the sponsor to assess the research
design and the validity of the same. The sincerity of the researcher can be evaluated by
comparing the completed work with the proposal. It serves as the basic for additional
discussion on the problem identified. The proposal benefits the researcher more than
that of the sponsor as it necessitates the researcher to plan and review the logical steps
involved in the research process. This enables the researcher to revise the research
process where needed. It acts as an outline for preparing the final project report.
The research proposal can be prepared for the internal or external audiences.
An internal proposal is prepared by the research department or staff within the firm.
External proposal is sanctioned by outsiders like the government agencies or University
Grants Commission and the like sponsors.
231
DBA 1657
NOTES
5.8.1 Structure of the research proposal

The research proposal may include the following modules. The modules are
flexible. The contents and length can be altered to suit the requirement of the researcher
and the sponsoring agents. A brief overview of the contents is dealt below:
Executive summary
Executive summary enables the sponsors to understand the core of the research
proposal within a short time. The goal of the summary is to secure a positive evaluation
by the sponsors who will authorize the research work. It should include a brief background
of the research work proposed, its importance, the objectives, the proposed research
design, the deliverables and the implication of conducting the research work. It should
highlight the benefits of conducting the proposed research.
Problem statement
This section should provide the background of the problem, consequences and
the implication of the same to the management or the sponsor. The importance of the
finding answer to the research question should be asserted. It should also specify the
boundary line of the problem and the issues which may not be addressed. The problem
statement should be clear to the management to make the decision regarding its
significance and the future action required to solve the same.
Research objectives
This section highlights the purpose of conducting the research. It should give
specific, concrete and achievable goals. The objectives should be listed in the order of
importance or it can be specified in a general term. Later on specific objectives could
be highlighted. It is the core of the proposed research work and also for the final
research report.
Review of literature
This module examines and presents the recent research studies, industry reports
etc that supports the proposed study. Unnecessary information should be avoided. A
brief review of the information of interest should be highlighted. The objectives,
methodology, results and conclusions of the similar studies should be presented. The
researcher should discuss how the literature applies to the proposed study and the gap
which will be addressed by conducting the study.
Benefits of the study
The explicit benefits that can be gained by conducting the study should be
highlighted. The importance of doing the study is emphasized. This section gains more
importance if the proposal is submitted to an external body, particularly if it is an
232
DBA 1657
unsolicited proposal. This section should be geared to convince the sponsor that their
needs will be met by the conduct of the study.
NOTES
Research design
The design module describes the technical issues involved in conducting the
study. What is going to be done is described in technical terms. It can be divided into
many subsection viz., type of study, sampling design, data collection method tools for
analysis , scope of the study and limitations. The justification as to why the particular
method of sampling or data collection is opted should also be discussed.
Qualifications of the researcher
This section should provide the names of the principal investigator and coinvestigators, individuals involved in the project. The professional research competence
and experience of the researchers should be highlighted to assure the sponsor. The
academic experience, research experience and similar projects conducted for internal
and external agencies should be listed. The membership of the researcher in various
associations and other relevant accomplishment can be mentioned. A profile of the
researcher can be enclosed in the appendix of the report.
Budget
Budget should be prepared in the format required by the sponsoring agents.
The details to be presented in the budget varies depending on the sponsors requirements.
It should not be more than one or two pages. All the expenses should be presented with
a proper breakup.
Schedule
The schedule should indicate the major phases of the project, the time required
at each phase and the milestones that determines the completion of the project. For
example the major phases may be refining the problem based on interaction with
management, tuning up the objectives, designing the questionnaire, conducting pilot
study, data collection, analysis and interpretation and report writing. Each of the phases
should be presented along with the time schedule and the resources including the people
assigned to complete the work.
Facilities and special resources
The special facilities or resources needed to complete the project should be
described in detail along with the justification for the same. The proposal should carefully
list the relevant facilities and the resources that will be used. The costs for such facility
should also be detailed in the budjet.
Apart from the above, the bibliography listing the books, journals, websites
referred should be mentioned in the alphabetical order. The appendixes including the
233
DBA 1657
NOTES
glossary of terms, questionnaire, profile of the investigator etc should be prepared. For
a detailed discussion of the sections refer the integral parts of research report.
5.9 Visual aids in reports

Visual aids are an essential part of report. Carefully presented visual aids can
make the report more interesting and understandable. Visual aids have the simple purpose
of revealing the data. It enables to understand complex data and the interrelationship
among data in a easier manner. It also enables to view data from different perspectives.
The purposes of using visuals are:
y To summarize data and present information in concise form.
y To provide an opportunity to the reader to explore data on their own. The
reader can focus on any aspects that are relevant to their needs.
y To orient the readers to the topic even before the text is presented.
y To communicate effectively with diverse audience.
y To attract and hold readers attention.
y To make the reader understand the text description and quantitative information
in a better manner.
y To simplify information by breaking complicated description into components
that can be depicted with conceptual models, flowcharts, diagrams etc.,
y To summarize major points in a narrative form with the help of charts that sum
up data
y To enhance the retention of important message in readers mind by presenting
the same in visual form.
5.9.1 Steps in creating visual aids

The steps involved in creating visual aids viz., Planning, drafting, finishing and
discussing are presented:
Planning the visuals
The visual aid is an opportunity to present data and to engage the reader. The
overall goal of presenting visuals is to help the reader to find the needed information.
While planning the visuals the following aspects should be considered:
y The level of knowledge of the reader.
y The need for the information.
y The researchers goal in presenting the visuals.
234
DBA 1657
The researcher should consider the following aspects:
NOTES
y Each visual aid has a one main point to communicate.

y The time needed for presenting a clear visual.
y The layout of the visuals.
Drafting the visuals
Drafting involves producing the visuals, revising the same until it produces the
data in the most effective manner. It involves selecting the type of visuals, selecting the
wordings, tick marks, data line characteristics, type of legends, colours etc. the various
elements are to be selected and tried until the best visual is produced to present the
concept in a clear manner.
Finishing the visuals
Care should be exercised to finish all the visuals in a consistent manner. The
visuals should create a pleasant view. Cluttering of pages with unnecessary visuals or
presenting simple facts that could be easily understood by text should be avoided.
Discussing the Visual aids
The readers attention should be carefully guided to the visual aids to be
discussed. The description of the visual aids could be done at three levels viz., elementary,
intermediate or overall information. In addition, the background which necessitates the
visuals, the methodology viz the aspects used to represent the various components in
the visual and the overall significance of information derived out of visuals should be
explained. The visual can be referred by number. If it is presented after or before
several pages, the page number should also be mentioned apart from the visual number.
Either textual or parenthetical method can be followed in referring the visuals.
The textual reference is a statement in the text that calls attention to the visual
aid. For example, Table 1 shows that the price of the product is declining gradually. In
parenthetical reference the visual aid is referred in parentheses in sentence. For example,
the sales show an increasing trend (see figure 1).
5.9.2 Guidelines for creating effective visual aids

The following guidelines can lead to creation of an effective visual aid.
y The researcher should plan for the visual aids and develop the same as early as
planning the first draft of report.
y Each visual aid should be prepared in such a manner that it concentrates on
conveying one point only. If too much of data is included, the reader may not
be able to grasp the meaning clearly.
235
DBA 1657
NOTES
y The visual aids should be positioned in the report at logical and convenient
places.
y The visuals should be revised to eliminate clutter in terms of unnecessary words,
lines , three dimensions etc.
y High quality visuals should be created in terms of clarity in lines, words, numbers
and organizations as it is an important aspect which determines the effectiveness
of a report.
Various types of visuals are available to present the data. Some types of visuals
depict certain kinds of data better than others:
y Tables can be opted to present detailed, exact values.
y Frequencies and pie chart can be represented better with pie chart,
segmented bar chart or area chart.
y Line chart or bar chart can be used to illustrate trend over a time period.
y Bar chart is used to compare one item with another.
y Pie chart is used to compare on part with the whole.
y Line chart, bar chart or a scatter chart can be used to depict correlations.
y Map is used to show the geographical relationship.
y Flowchart or diagram is used to illustrate a process or a procedure.
5.9.3 Types of visuals

The various types of visuals are discussed in detail.
1. Tables
A table is a collection of information presented in columns and rows. Tables
should contain enough information to enable the readers to understand its contents. It
should have a caption that contains the table number and title, rules, column heads, data
and notes. The title should explain the subject of the table, details regarding the data
classification and the time period or other related matters. A subtitle is sometimes included
under the title to explain some aspects of a table like the statement explaining the
measurement units in which the data is expressed. The contents of the columns are
explained by the column heads and the row contents are explained by the stub. The
body of the table contains the data and the footnote contains the needed explanation.
Footnotes should be identified by letters or symbols such as asterisks. The source note
should also be presented. Tables should be accompanied by text to direct the readers
attention to the important figures. The guidelines relating to creation of table are given
below:
236
DBA 1657
y The tables should be numbered consecutively throughout the report. The number
and the title are given above the table.
NOTES
y Table title should be informative and identify the main points of the table.
y Horizontal rules are used to separate the parts of the table. The rules are placed
above and below the column heads and below the last row of the table. Vertical
lines can also be used to separate the column.
y Spanner head should be used to characterize the column headings. The spanners
eliminate repetition in column headings.
y Common understandable units should be used. All items in a column should be
expressed in the same unit and rounded off for simplicity.
y Column or row total should be provided wherever needed.
y Explanatory comments should be placed below the table with the word Note.
y Source of the data given in the table should be mentioned.
2. Line graphs
Line graph depicts trends or relationship. It shows the relationship between
two variables by a line connecting points in X axis and Y axis. The line graphs usually
show trends over time. The line connects the points and its ups and downs illustrate the
changes. Line graphs have conventional parts; a caption that contains number and title,
axis rules, axis labels and a legend. Some guidelines to create line graphs are given
below;
y The figures should be numbered consecutively using Arabic numerals.
y Brief clear title should be used to specify the content of the graph
y The caption can be given either above or below the figure, but a consistent
pattern should be followed throughout the report
y The independent variables are recorded on the X axis and the dependent
variables on the Y axis
y Clear axis labels should be provided
y If the graphs have more than one line, visual distinct should be made between
the same. The lines should also be identified with labels or in a legend
237
DBA 1657
NOTES
Example:
A surface chart, also called an area chart, is a form of line chart with a
cumulative effect; all the lines add up to the top line, which represents the total. This
form of chart helps to illustrate the changes in the composition of something over time.
In preparing the surface chart, the most important segment should be put in the baseline
and the number of strata should be restricted to four or five.
3. Bar graphs
A bar chart depicts numbers by height or length of its rectangular bars. It makes
numbers easy to read and understand. Bar charts are very much useful to
y Compare the size of several items at one time.
238
DBA 1657
y To track changes over time.
NOTES
y To indicate the composition of several items over time.

y To show the relative size of components of a whole.
The guidelines to prepare bar charts are:
y Proper title that informs about the content should be given and the graphs should
be numbered consecutively in Arabic numerals.
y The independent variables to be placed on horizontal axis and the dependent
variable on the vertical axis.
y Proper axis labels and description of legends should be adhered to in a consistent
manner.
y Elaborate cross-hatching and striping should be avoided.
y Subdivision of bars can be made to show additional comparison.
Figure 3. Product wise sales
Sales in percentag e
100%
80%
60%
40%
20%
0%
1
Year
A
A bar chart can be created in many ways depending on the need and creativity
of the researcher. However care should be exercised to see that the width of all bars
are be uniform and are placed evenly in logical order.
4. Pie charts
A pie chart is used to show the relative sizes of parts of a whole. It uses segments
of a circle to indicate percentages of a total. The whole circle represents 100 percent,
the segments of circle represents each items percentage of the total. Pie charts are
effective ways to show percentages or to compare one segment with another. General
guidelines are:
239
DBA 1657
NOTES
y A pie chart should not be divided into more that five segments as the reader
may have difficulty in differentiating the sizes of small segments.
y Segments should be identified with legends or call outs.
y The segments should be arranged in sequence clockwise form largest to smallest.
y Different color s or patterns can be used to distinguish the various pieces.
y All the segments put together should add up to 100 percent, if percentages are
used. Percentages can be placed inside the segments.
y The segment which needs greater attention can be exploded i.e pulled out from
the rest of the segment.
Example:
5. Pictograms
A chart that uses symbols instead of words or numbers to portray data is known
as pictogram. It is very novel way of presentation and it conveys more literal visual
messages. Pictograms enhance reports value.
6. Flow charts
Flow charts are used to show a time sequence, decision sequence or conceptual
relationships. The flowcharts are indispensable when illustrating processes, procedures
and sequential relationships. Arrows indicate the direction of the action, and symbols
represent steps or particular points in the action. In case of computer programming the
symbols have special shapes for certain activities.
240
DBA 1657
NOTES
7. Organization charts
The organization chart illustrates the positions, units or functions of an organization
and the way they interrelate. Organization charts are used to depict the interrelationships
among the parts of an organization. An organizations normal communication channels
can be explained in detail with the help of organization charts.
8. Decision charts
A decision chart or decision tree is a flow chart that uses graphs to explain
whether or not to perform a certain action in a certain situation. At each point, the
reader must decide yes or no and then follow the appropriate path until the final goal is
reached.
241
DBA 1657
NOTES
9. Gantt Charts
A Gantt chart represents the schedule of a project. Unit of time is represented
along the horizontal axis and sub processes are explained on the vertical axis. The lines
indicate the starting and ending point of each sub-process.
242
DBA 1657
NOTES
10. Maps
Maps are used to represent statistics by geographical area. It is also used to
show location relationships. Maps can be used to show regional differences in sales of
the company. The maps can be illustrated to suit the needs. It can use dots, shading,
colour lines, labels, numbers and symbols. The computer softwares like Excel and
Coral draw has templates which makes the production of maps easier.
11. Photographs
Photographs enable to capture the exact appearance of an object and uses
visual appeals to capture the readers attention. The advent in technology like the digital
cameras has reduced the cost of including photographs drastically. Further modification
of photos to the requirement can be made with the help of software. It duplicates the
items to be discussed and also shows the relationships among various parts. Photographs
can be used to provide general introduction to orient the readers towards the object.
12. Drawings and diagrams
Drawings and diagrams are often used to show how something looks or operates.
Diagrams can be much clearer than words in explaining the readers the process or the
uses of an object. A variety of software programs can be used to add decorative touch
to the report. The drawings/diagrams enables to eliminate unnecessary details so that
the readers can focus on important aspect. Two commonly used drawings are the
exploded view and the detailed drawing. An exploded view shows the parts disconnected
but arranged in the order in which they fit together. They are used to show the internal
parts of a small and intricate object or to explain how the objects are assembled.
Manuals often use exploded drawings with named or numbered parts. Detailed drawings
are renditions of particular parts or assemblies.
SUMMARY
The research report is prepared to communicate the research findings. This
unit covered the different types of reports. The importance of audience analysis was
explained. The steps involved in the preparation of report and the integral parts of the
report were discussed. The contents of research proposal were highlighted. In addition,
the basic guidelines to use the visual aids and the various types of visual aids were
dealt.
243
DBA 1657
NOTES

1 Discuss the need for understanding the audience while preparing the research
report?
2
What are the steps involved in writing a research report?
Discuss the contents of a report?
Prepare a research proposal for identifying the market potential of a new product
launched by your concern.
What type of visual aids can be used for the presenting a report on the customer
satisfaction of a new brand of laptop introduced by your concern in the market.
244

DBA1657-Research Methods in Business

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBA1657-Research Methods in Business

Uploaded by

Copyright:

Available Formats

DBA 1657

RESEARCH METHODS IN BUSINESS

Anna University Chennai

RESEARCH METHODS IN BUSINESS

1.2 LEARNING OBJECTIVES:

Define research and understand the advantages of the knowledge of research

Highlight the distinctive characteristic features of research

Describe the building blocks of Science in research

Understand the steps in the research process

Develop a research design

Understand the need and basic features of the theoretical framework

Describe the steps in hypotheses development and testing

Understand the need and the major types of research design

1.3 DEFINITION OF RESEARCH

RESEARCH METHODS IN BUSINESS

1.4 IMPORTANCE OF KNOWLEDGE OF RESEARCH IN BUSINESS

Research enables the managers to identify critical issues, gather relevant

It enables the managers to understand the research reports prepared by

Anna University Chennai

RESEARCH METHODS IN BUSINESS

1.5 HALLMARKS OF SCIENTIFIC RESEARCH

RESEARCH METHODS IN BUSINESS

1.5.5. Precision and Confidence

Anna University Chennai

RESEARCH METHODS IN BUSINESS

1.6 THE BUILDING BLOCKS OF SCIENCE IN RESEARCH

RESEARCH METHODS IN BUSINESS

The interview in this survey will be difficult and expensive

Anna University Chennai

RESEARCH METHODS IN BUSINESS

Anna University Chennai

RESEARCH METHODS IN BUSINESS

5. Scientific Data collection

1.7. RESEARCH PROCESS : AN OVERVIEW

1.7.1. Defining the Research Problem:

Anna University Chennai

RESEARCH METHODS IN BUSINESS

There must be some objective(s) to be attained at.

There must be some environment to which the difficulty pertains.

Criteria for selecting the research problem

Subjects on which the research is carried on amply should not be normally

Too narrow or too vague problems should be avoided

The selection of a problem must be preceded by a preliminary study.

RESEARCH METHODS IN BUSINESS

Own experience as well as observation of others experience and situations

Detailed discussion with various authorities concerned with the problem

Focus group interviews

Scrutinizing published data

Review of literature enables to identify problems which are researched and

To document the studies relevant to the problem identified for research

Anna University Chennai

RESEARCH METHODS IN BUSINESS

To have a comprehensive theoretical framework from which hypothesis can be

To enable to develop the problem statement in a precise and clear manner

To understand the research gap

To stimulate researcher to carry out the work

To confirm the appropriateness of procedure by referring to similar studies

To trace inconsistencies, contradictions and consistencies

To familiarize with methodology, research tools and statistical analysis

The literature review needs to be performed on the variables identified through

RESEARCH METHODS IN BUSINESS

ii. Gathering relevant information.

Anna University Chennai

RESEARCH METHODS IN BUSINESS