K.Anandakumar: Lecturer Department of Management Studies Velammal Institute of Technology

K.
ANANDAKUMAR/BRM 2010
UNIT 1- INTRODUCTION
BUSINESS RESEARCH- DEFINITIONS AND
SIGNIFICANCE-THE RESEARCH PROCESS-
TYPES OF RESEARCH-EXPLORATORY &
CAUSAL RESEARCH-THEORETICAK &
EMPRICAL RESEARCH-CROSS SECTIONAL &
TIME SERIES RESEARCH-RESEARCH
QUESTIONS/PROBLEMS-RESEARCH
OBJECTIVES-RESEARCH HYPOTHESIS-
K.ANANDAKUMAR
LECTURER
DEPARTMENT OF
MANAGEMENT STUDIES
VELAMMAL INSTITUTE
OF TECHNOLOGY
1
K.ANANDAKUMAR/BRM 2010
1.1. Definition of Research:
 Research in common man’s language refers to ‘search for knowledge’.

 Research is an art of scientific investigation. It is also a systematic design,
collection, analysis and reporting the findings and solutions for the marketing
problems of a company.
 Research is also defined as careful or critical inquiry or examination in
selecting facts or principles; diligent investigation in order to ascertain
something.
 According to Redman and Mory, research is defined as ‘systematized effort to
gain new knowledge’.
 Research is defined as an organized, systematic, data based, critical, objective,
scientific inquiry or investigation into a specific problem undertaken with the
purpose of finding answers or solutions to it.
1.2: Need of the Research
 To identify and find solutions to the problem

 Example
 Why business fluctuation takes place once in three years?
 Why is that, demand for a product falling?
2
 To help making decisions

 Example
 Should we maintain the same advertising budget as last year?
 To find alternative strategies
 Example
 Should we follow pull or push strategy to promote the product?
 To develop new concepts
 Example
 CRM – customers Relationship Management
 Horizontal Marketing
 MLM – Multi level Marketing.
1.3: Objectives of Research
 It develops focus
 Since the days of steam engine, the research continued to come up with more
powerful locomotive which could be operated with alternative sources of energy
like diesel, electricity etc.
 It reveals characteristics
 In these days, before a criminal is sentenced, efforts are taken to study why be
had turned a criminal. This help to develop an approach to create opportunities
for criminals to change themselves and join the main stream of life.
 It determines frequency of occurrence
 It tests hypothesis
 Promotes better decision making
 Research is the basis for innovation
 Research identifies the problem areas.
 Helps in forecasting which is very useful for manages
 Research helps in the development of new produces or in modifying existing
products and in understanding the competitive environment.
3
 It helps in optimum utilization of resources.

 It helps in identifying marketing opportunities and constraints
 It helps in evaluating marketing plans.
1.4: Significance of Research
 All progress in born of inquiry. Doubt is often better than overconfidence, for it
leads to inquiry, and inquiry leads to intervention.
 Research inculcates scientific and inductive thinking and it promotes the
development of logical habits of thinking and organization.
 The increasingly complex nature of business and government has focused
attention on the use of research in solving operational problems.
 Research as an aid to economic policy, has gained added importance, both for
govt and business
 Research provides the basis for nearly all government policies in our economic
system.
 Research has its special significance in solving various operational and planning
problems of business and industry.
 Research is equally important for social scientists in studying social
relationships and in seeking answers to various social problems.
 Research is the fountain of knowledge for the sake of knowledge and an
important source of providing guidelines for solving different business,
governmental and social problems.
 It is a sort of formal training which enables one to understand the new
developments in one’s field in a better way.
 Research replaces intuitive business decisions by more logical and scientific
decisions.
 Increased amounts of research make progress possible.
1.5 Research Process
4
Research is a process. A process is a set of activities that are performed to

achieve a targeted out comes. So research process would refer to various steps
and stage involved in research activity.
(1)Observation – Identification of broad Problem area:
 Identification of broad problem area through observation is the first step in a

research process.
 Broad Problem area refers to the entire situation where one sees a possible
need for research and problem solving.
 Examples
 The sales volume of a product is not picking up.
 Training programs are perhaps not as effective as anticipated.
(2)Preliminary Data Gathering:
 This could be done by

 Interviewing
 Extensive literature survey.
 Data gathered for research from the actual site of occurrence of events are
called primary data.
 Example
 Opinion of the customers by administering questionnaires to individuals
 Data gathered through existing sources are called secondary data.
 Example
 Company policies and procedures.
(3)Extensive Literature Survey:
 This is a very crucial stage in research. It is in this stage, the researcher makes
himself familiar with all the previous studies and their findings relevant to his
field to work.
5
 He learns the methodology and approach developed by these past studies.

 Literature survey is the documentation of a comprehensive review of the
published and unpublished work from secondary sources of data in the
areas of specific interest to the researcher.
(4)Problem Definition:
 After the interviews and the literature review, the researcher has to define the
issues of concern more clearly.
 Problem definition or problem statement is a clear, precise and succinct
statement of the question or issue that is to be investigated with the goal of
finding an answer or solution.
 Example
 To what extend has the new advertising campaign been successful in creating
the high-quality, customer-Centred corporate image that it was intended to
produce?
(5) Theoretical Frame work:
 After conducting the interviews, completing a literature survey and defining the
problem, one is ready to develop a theoretical framework.A theoretical
framework is none other than identification the network of relationship
among the variables considered important to the study of any given
problem situation.
(6)Hypothesis Development:
 A hypothesis can be defined as a logically conjectured relationship between

two or more variables expressed in the form of a testable statement.
 By testing the hypothesis and confirming the conjectured relationships, it is
expected that solutions can be formed to correct the problem encountered.
 Examples:
6
 Employees who are healthier will take sick leave less frequently.
 Women are more motivated than men.
 There is a relationship between age and job satisfaction.
(7)Preparation of research design:
 This is a stage in which the researcher clearly spells out how he intends to carry
out his work.
 In other words, a research design is a description of conceptual structure
within which the research will be conducted.
 The researcher would indicate through the design whether he adopts
experimental design or formal design.
 He would also state the purpose of his research work viz., descriptive,
diagnostic, explorative or experimental.
(8)Determining the sample design:
 The researcher has to make a careful selection of a few elements from the
population and then study them intensely and reach conclusion.
 The researcher should determine
 The size of the sample,
 The method of sampling,
 The tests of sample etc.,
(9)Data Analysis:
 This involves
 Editing
 Tabulating
 Coding
Editing:
7
The data collected should be scanned to make sure that it is complete and that
all the instructions are followed. This process is called editing. Once these
forms have been edited, they must be coded.
Coding:
It means assigning numbers to each of the answers, so that they can be

analysed.
Tabulation:
The final step is called tabulation. It is the orderly arrangement data in a tabular
form. Also, at the time of analysing the data, the statistical tests to be used must
be finalised such as Z-test, t-test, X2 test, ANOVA, correlation, regression,
SPSS package etc.
(10)Interpretation and report:
 After collecting and analysing the data the researcher has to accomplish the task
of drawing inferences followed by report writing.
 This has to be done very carefully, otherwise misleading conclusions may be
drawn and the whole purpose of dong research may get vitiated.
 It is only through interpretation that the researcher can expose relations and
processes that underlie his findings.
The flowchart of research process
8
Clarifying the Research Question
The Research Process Discover the Management Dilemma
Define the Management Question
Define the Research Questions

Stage 1
Refine the Research Questions
Exploration Exploration
Stage 2
Research
Proposal
Research Design Strategy
(type,purpose,time,scope,environment)
Stage 3
Data Collection Sampling
Design Design
Instrument development & Pilot

testing
Stage 4
9
Data Collection & Preparation
Stage 5
y
r
a
s
p
jti
b
e
v
u
q
d
o
iT
m
n
f
lh
c
1.6. Types of research:
1.6.1. Based on application:

K.ANANDAKUMAR/BRM
 Gathering knowledge for knowledge’s sake is termed pure or basic
Several
2010
From the perspective of the application, a research can be classified into two
broad categories:
 Pure research and

 Applied research
Pure research/desk research/basic research:
research. It is mainly concerned with generalization and with the formulation

of a theory.
 Pure
question
research
research involved
questions
is an alternative
that management
applicationmight
solve the management dilemma.
may developing and testing theories and hypotheses that are
be formulated at this stage. Each
intellectually challenging
action
at thetake to
present
Usually the most plausible action,
or the one that offers the greatest
gain using the fewest resources, is
researched first.
to the researcher but may or may not have practical
time or in the future.
10
 Pure research is also concerned with the development, examination, verification

and refinement of research methods, procedures, techniques and tools that form
the body of research methodology.
 Examples of pure-research include developing a sampling technique that can be
applied to a particular situation.
 Research relating to pure mathematics or concerning some natural phenomenon
are instances of fundamental research.
 Likewise, studies focussing on human behaviour also fall under the category of
fundamental research also fall under the category of fundamental research.
Applied Research:
 Research conducted in a particular setting with the specific objective of solving

an existing problem in the situation
 An attempt to find a solution to an immediate problem encountered by a firm,
an industry, a business organisation, or the society is known as Applied
Research.
 Examples:
Research to identify social, economic or political trends that may affect a
particular institution or the copy research (research to find out whether certain
communications will e read and understood) or the marketing research
evaluation research are examples of applied research.
1.6.2. Based on Objectives:
From the perspective of the objective, a research broadly classified into
 Descriptive,
 Correlational,
 Explanatory or
 Exploratory.
11
Descriptive:
 To describe what is prevalent regarding

 A group of people
 A community
 A phenomenon
 A situation
 A program
 An outcome
 The main purpose of descriptive is description of the state of affairs as it exists
at present.
 The main characteristics the researcher has no control over the variables.
 Examples:
 Socio-economic characteristics of residents of a community
 Types of service provided by an agency
 Needs of a community
 Sale of a product
 Descriptive researcher, also known as statistical research, describes data and
characteristics about the population or phenomenon being studied.
 Descriptive studied answers the question who, when, what, where and how.
Analytical Research:
 The researcher has to use facts or information already available and analyze
these to make a critical evaluation of the material.
Correlation Research:
The main emphasis in a correlation research is to discover or establish the

existence of a relationship/association/interdependence between two or more
aspects of a situation.
12
 Examples:
 What is the impact of an advertising campaign on the sale of a product?
 What is the relation between technology and unemployment?
 Are smoking and cancer related?
Explanatory:
 Attempts to clarify why and how there is a relationships between two aspects of
a situation or phenomenon.
Exploratory:
 A research study where very little knowledge or information is available on the

subject under investigation.
 Exploratory research is a type of research conducted because a problem has not
been clearly defined.
 The results of exploratory research are not usually useful for decision-making
by themselves, but they can provide significant insight into a given situation.
1.6.3. Based on inquiry mode:
Broadly there are two approaches to inquiry.
 The structured approach-quantitative research

 The unstructured approach-qualitative research.
(a) Qualitative:
 Research involving analysis of data/information that is descriptive in nature and
not readily quantifiable.
 The study is classified as qualitative if the purpose of the study is primarily to
describe a situation, phenomenon, problem or event.
 Examples :
 The description of an observed situation.
13
 The historical enumeration of events

 A description of the living conditions of a community.
(b)Quantitative:
 The study is classified as quantitative study if you want to quantify the variation
in a phenomenon situation, problem or issue.
 Examples of quantitative aspects of a research study are: How many people
have a particular problem? How many people hold a particular attitude?
1.6.4. Difference between quantitative and qualitative.
Serial Aspects Qualitative Quantitative

no
1 Focus of research Understand and interpret Describe, explain and
predict
2 Researches High researchers is Limited controlled to
involvement participant or catalyst prevent by hours
3 Research purpose In-depth understanding, Describe or predict
theory building build and test the
4 Sample design Small Large
5 Research design May evolve adjust during Determined before
the course of the project commencing the
Often uses multiple project
methods simultaneously Uses single method or
or sequentially mixed method
Consistency is not Consistency is critical
expected Involves either a cross-
Involves longitudinal sectional or a
approach longitudinal approach
6 Participant Pretasking is common No preparation desired
Preparation to avoid biasing the
participant
7 Data type and Verbal or pictorial Verbal descriptions
14
preparation descriptions Reduced to numerical

Reduced to verbal codes codes for computerized
(sometimes with analysis
computer assistance)
8 Data analysis Human analysis Computerized analysis-
following computer or statistical are
human coding; primarily mathematical methods
non-quantitative dominate
Forces researcher to see Analysis may be
the contextual frame ongoing during the
work of the phenomenon project
being measured- Maintains clear
distinction between facts distinction between
and judgments less clear facts and judgements
Always ongoing during
the project
9 Insight and meaning Deeper level of Limited by the
understanding is the opportunity to probe
norm; determined by respondents and the
type and quantity of free- quality of the original
response questions data collection
Researcher participation instrument
in data collection allows Insights follow data
insights to form and be collection and data
tested during the process entry, with limited
ability to reinterview
participants
10 Research sponsor May participate by Rarely has either direct
involvement observing research in real or indirect contact with
time or via taped participant
15
interviews
11 Feedback turnaround Smaller sample sizes Larger sample sizes
make data collection lengthen data
faster for shorter possible collection; internet
turnaround methodologies are
Insights are developed as shortening turnaround
the research progresses but inappropriate for
shortening data analysis many studies
Insight development
follows data collection
and entry, lengthening
research process;
interviewing software
permits some tallying
of responses as data
collection progresses
12 Data Security More absolute given use Act of research in
of restricted access progress is often
facilities and smaller known by competitors;
sample sizes insights may be
gleaned by competitors
for some visible, field-
based studies
1.6.5. Other types of research:
Casual Research:
 A research design in which the major emphasis is on determining a cause-and-

effect relationship
16
 In marketing, casual research is used for many types of research including

testing marketing scenarios, such as what might happen to product sales if
changes are made to a product’s design or if advertising is changed.
 It is the testing of a hypothesis on the cause and effect within a given market.
 For example, if a clothing company currently sells blue denim jeans, casual
research can measure the impact of the company changing the product design to
the colour white. Following the research, company bosses will be able to decide
whether changing the colour of the jeans to white would be profitable. To
summarise, casual research is a way of seeing how actions now will affect a
business in future.
Cross-sectional study:
 Cross-sectional analysis studies the relationship between difference variables at

a point in time.
 Examples
 Data were collected from stock brokers between April and June of last year to
study their concerns in a turbulent stock market. Data with respect to this
particular research had not been collected before, nor will they be collected
again them for this research.
 A drug company desirous of investing in research for a new obesity (reduction)
pill conducted a survey among obese people to see how many of them would be
interested in trying the new pill. This is a one-shot or cross-sectional study to
assess the likely demand for the new product.
 The purpose of both the studies in the two examples was to collect data that
would be pertinent to find the answer to a research question. Data collection at
one point in time is sufficient. Both were cross sectional designs.
 Cross –sectional research is a type of research method often used in
developmental psychology, but also utilized in many other areas including
social science, education and other branches of science. This type of study
17
utilizes different groups of people who differ in the variables of interest, but
share other characteristics such as socio-economic status, educational
background and ethnicity.
 A cross sectional research is an observational one. This means the researchers
record information about their subjects without manipulating the study
environment.
 For example, measuring the cholesterol levels of daily walkers and non-walkers
along with any other characteristics that might be.
 Cross sectional research takes a ‘slice’ of its target group and bases its overall
finding on the views or behaviours of those targeted assuming them to be
typical of the whole group interest of us. We would not influence non-walkers
to take up that activity, or advise daily walkers to modify their behaviour. In
short, we had tried not to interfere.
 The defining feature of a cross-sectional study is that it can compare different
population groups at a single point in time.
Longitudinal study:
 A research study for which data are gathered at several points in time to
answer a research question is called longitudinal study.
 A longitudinal study is like a cross sectional an observational one
 The benefits of longitudinal study are that researchers are able to detect
developments or changes in the characteristics of the target population at both
the group and the individual level.
 The key here is that longitudinal studies extend beyond a single moment in
time. As a result, they can establish sequences of events.
 For example
 We might choose to look at the change in cholesterol levels among women over
40 who walk daily for a period of 20 years. The Longitudinal study design
18
would account for cholesterol levels at the onset of a walking regime and the
walking behaviour continued over time.
 The researcher might want to study employees’ behaviour before and after a
change in the top management, so as to know what effects the change
accomplished.
1.6.6. Difference between cross sectional and longitudinal studies:
The fundamental difference between cross sectional and longitudinal studies

is that cross-sectional studies take place at a single point in time and that
longitudinal study involves a series of measurements taken over a period of
time.
Action Research:
 The process by which practitioners attempt to study their problems scientifically

in order to guide, correct and evaluate their decisions and actions is called
action research. The action research requires the person who facts the problem
to find out a solution for it.
 Examples :
 A bus operator wants to study the profitability of the bus operation in the urban
and rural areas. For this purpose he selects a town route and mofussil route and
ascertains the cost of operation for a month and the collection for the buses. He
can compare the route to find out the more profitable one. His findings will
reveal that a particular route is more profitable than the other. This is an action
research in the find of business.
Historical Research:
 Historical Research is nothing but objective location, evaluation and synthesis

of evidence in order to establish facts and draw conclusions concerning the past.
 Example
19
 A study of factors influencing the growth of location for cement plants in Tamil
Nadu is an historical research.
Cross-cultural:
 Studies done across two or more cultures to understand, describe, analysis or

predict phenomena.
Library Research:
 Library Research is conducted with the help of written materials mostly

located in large libraries.
 This research is concerned with the evolution of theories, study involving cause-
and-effect relationship and seeking out significant facts and interpretation of the
past data which are found in journals, reports and directories.
Motivational:
 A particular data gathering technique directed toward surfacing information,

ideas and thoughts that are not either easily verbalized or remain at the
unconscious level in the respondents.
 The motivation research which investigates into the reasons for human
behaviour.
 The main aim of this type of research is discovering the underlying motives and
desires of human beings by using in-depth interviews.
Conceptual:
20
 The research related to some abstract idea or theory is known as conceptual

research. Generally, philosophers and thinkers use it for developing new
concepts or for reinterpreting the existing ones.
Empirical:
 Empirical Research exclusively relies on the observation or experience with

hardly any regard for theory and system.
 It is also known as experimental type of research.
 Such research is data based, which often comes up with conclusion that can be
verified through experiments or observation.
Clinical Research:
 The kinds of research follow case-study methods or in-depth approaches to

reach the basic casual relations.
 Such studies usually go deep into the causes of things or events that interest us,
using very small samples and very deep probing data gathering devices.
1.7. Research Questions/Problems:
 A research problem refers to some difficulty which a researcher experiences in

the context of either a theoretical or practical situation and wants to obtain a
solution for the same.
1.7.1. Sources of research problem:
Most research in the humanities revolves around four PS:
 People
 Problems
21
 Programs
 Phenomena
Every research has two aspects;
1) The study population

2) The subject area
Aspects of a About Study of

study
People Individuals, They provide you
Study population organisations, groups, with the required
communities information or you
correct information
from or about them
Subject area Problems Issues, situation,

association, needs,
22
population composition,
profiles etc. Information that you
Program Contents, structure,
need to collect to find
outcomes, attributes,
answers to your
satisfaction, consumer
research question
service providers etc.
phenomenon Cause-and-effect
relationship, the study of
a phenomenon itself etc.,
1.7.2. Importance of formulating a Research Problem:
 The formulation of a problem is far more often essential than its solution, which
may be merely a matter of mathematical or experimental skills.
 If one wants to solve a problem, one must generally know what the problem is.
It can be said that a large part of the problem lies in knowing what one is trying
to do.
1.7.3. Steps in the formulation of a research problem:
Step1:
23
K.ANANDAKUMAR/BRM
 Identify a broad field or subject area of interest to you.

 Example
2010
 If you are a social work student, inclined to work in the area of youth welfare,
refugees or domestic violence after graduation, you might take to research in
one of these areas. Or if you are studying marketing you might be interested in
researching consumer behaviour. Or, as a student of public health, intending to
work with patients who have HIV/AIDS, you might like to conduct research on
a subject area relating to HIV/AIDS. As far as the research journey goes these
are the broad research areas. It is imperative that you identify one of the
interests to you before undertaking your research journey.
Step2:
 Dissect the broad area into subareas.

 Example
Step3:
 Select what is of most interest to you

 It is neither advisable nor feasible to study all sub areas. Out of this list, the
researcher has to select subareas about which he is passionate. Once you are
confident that what you have selected you are passionate about and can manage,
you are ready to go to next step.
Step4:
24
s
m
r
d
l
i
e
D
o
p
y
a
h
w
V
f
c
t
v
u
fi
x
n
b
ff
ti
 Raise research questions

 At this step you ask yourself, “what is it that i want to find about in this
subarea?” Within your chosen subarea, first list whatever questions you want to
find answers.
Step5:
 Formulate objectives
 Formulate your objectives and sub objectives. Your objectives grow out of your
research questions. The main difference between objectives and research
questions is the way in which they are written. Research questions are obviously
that- questions. Objectives transform these questions into behavioural aims by
using action-oriented words such as to find out, to determine, to ascertain, and
to examine.
Step6:
 Assess your objectives

 Now examine your objectives to ascertain the feasibility of achieving them
through your research endeavour. Consider them in the light of the time,
resources (financial and human) and technical expertise at your disposal.
Step7:
 Double check
 Go back and give final consideration to whether or not you are sufficiently
interested in the study, and have adequate resources to undertake it. Ask
yourself, “Am I really enthusiastic about this study? And, “Do I have enough
resources to undertake it?” Answer these questions thoughtfully and
realistically. If your answer to one of them is ‘no’, re-assess your objectives.
1.8. Research Hypothesis:
25
1.8.1. Definition:
 A Hypothesis can be defined as a logically conjectured relationship between

two or more variables expressed in the form of a testable statement.
 By testing the hypotheses and confirming the conjectured relationships, it is
expected that solutions can be found to correct the problem encountered.
 Example: Employees who are healthier will take sick leave frequently.
1.8.2. Characteristics of a hypothesis:
 A hypothesis should be simple, specific and conceptually clear.

There is no place for ambiguity in the construction of a hypothesis, as ambiguity
will make the verification of your hypothesis almost impossible. It should be
uni- dimensional – that is, it should test only one relationship or hunch at a time.
For example
“The average age of the male students in this class is higher than that of the
female students”.
The above hypothesis is clear, specific and easy to test. It tells you what you are
attempting to compare (average age of this class) , which population groups are
being compared ( female and male students), and what you want to establish
(higher average age of the male students).
 Should be capable of verification
Methods and techniques must be available for data collection and analysis.
There is no point in formulating a hypothesis if it cannot be subjected to
verification because there are no techniques to verify it. However, this does not
mean that you should not formulate a hypothesis for which there are no methods
of verification. You might, in the process of doing your research, develop new
techniques to verify it.
 Should be related to the existing body of knowledge.
26
It is important that your hypothesis emerges from the existing body of

knowledge, and that it adds to it, as this is an important function of research.
This can only be achieved if the hypothesis has its roots in the existing body of
knowledge.
 Should be operationalisable (can be measured)
This means that it can be expressed in terms that can be measured. If it cannot
be measured, it cannot be tested and, hence no conclusions can be drawn.
1.8.3. Functions of hypothesis:
 It guides the direction of the study.

 It identifies facts that are relevant and those that are not.
 It suggests which form of research design is likely to be most appropriate.
 It provides a framework for organizing the conclusions that result.
1.8.4. Types of hypothesis:
(a) Null:
 The null hypothesis is a proposition that states a definitive, exact relationship
between two variables. That is, it states that population correlation between two
variables is equal to zero. In general the null hypothesis is expressed as no
significant relationship between two variables or no significant difference
between two groups.
 Example: There is no relationship between age and job satisfaction.
(b)Alternate hypothesis.
 The negation of null hypothesis is called alternate hypothesis. (Or) The
complementary of null hypothesis is called null hypothesis. (Or)The conclusion
we accept when the data fail to support the null hypothesis is called alternative
hypothesis.
 Example: women are more motivated then men.
27
(c) Crude hypothesis:

 A crude hypothesis is formed to initiate the process of research when the
researcher is commencing his research work; he needs some guiding of focus.
 For this purpose he might develop a hypothesis is based on the available
evidence or data. It does not lead to higher theoretical research in the nature of
law or a theory. Such a hypothesis is called crude hypothesis.
 For examples, in public sector transport undertakings, operation of
uneconomical routes is the reason for poor revenue realization. In this
hypothesis, it is believed that operation of uneconomical routes affect revenue
collection. But in reality, it may be the operation of routes with poor passenger
response that might cause poor revenue. But the researcher needs a starting
point and so he develops a crude hypothesis.
d. Refined hypothesis;
 Refined hypothesis is one, which is more significant in research and the degree
of significance depends on the level of abstraction.
 The refined hypothesis may be hypothesis that state the existence of empirical
uniformities, hypothesis that are concerned with complex ideal types and
hypothesis that are concerned with relation of analytical variables.
 For examples, a hypothesis like “reduction of tax rates and extent of evasion”
would have been studied before formulating this hypothesis.
e. Working hypothesis;
 Working hypothesis is usually formed in the process of verifying the

relationship among various variables included in research.
 It provides useful guidelines to the researcher in demining the nature of data to
be collected, volume of data required, the sample technique to be used,
analytical tools to be selected, etc.
28
 Once the necessary data or facts are collected for the purpose of empirical
verification, this type of hypothesis becomes redundant.
 For examples, ‘’ monetary incentives act as great motivators’’ may be a
hypothesis formulated to facilitate focus on the collection of data regarding
monetary incentives and how this had improved the production or sales, etc., in
a specific environment.
 Once these data are collected, they could be analyzed and based on that, a
correct hypothesis may be formulated. In such a situation, the original
hypothesis becomes redundant.
f. Statistical hypothesis;
 Statistical hypothesis are those, which are formulated based on the sample data
or facts.
 They serve the usual purpose of testing any expected relationship among
variables.
 Once these hypotheses are tested or verified, the conclusion about the
population is drawn.
 For example, with sample data, when a tentative statement is made it is tested
for acceptance or rejection. Once it is accepted with the sample data, it is used
for making inference and drawing conclusions.
 For example, in a steel factory, a very large quantity of iron rods is cut to
specific size.
 A few samples are selected over a few days and measured for their accuracy is
size.
 Suppose the sample test reveals that there is no significant difference in the size
of iron rods cut.
 Then, on this basis it may be inferred that in the bulk of iron rods cut, there will
be no significant difference in size.
29
 Similarly, by studying the consumption behaviour of sample respondents,

inference regarding the consumption behaviour of people in a state could be
inferred.
1.8.5. Errors in testing a hypothesis:
(a) Type 1 error: Rejection of null hypothesis when it is true.

(b) Type 2 error: Acceptance of null hypothesis when it is false.
1.8.6. What is a strong hypothesis?
A strong hypothesis should fulfil three conditions.
 Adequate for its purpose.

 Testable
 Better than its rivals.
30
cepjHoARt K.ANANDAKUMAR/BRM
Process of Hypothesis Testing
31
2010
1.9. 1.9.Characteristics of good research:

 Purpose clearly defined.
 Research process detailed.
 Research design thoroughly planned.
 High ethical standards applied.
 Limitations frankly revealed.
 Adequate analysis for decision maker’s needs.
 Findings presented unambiguously.
 Conclusions justified.
 Researcher’s experience reflected.
1.10. Hallmarks of research:
The hallmarks or main distinguishing characteristics of scientific research may

be listed as follows.
1. Purposiveness
2. Rigor
3. Testability
4. Replicability
5. Precision and confidence.
6. Objectivity
7. Generalizability
8. Parsimony
1.11. The role of theories in research
Theories play crucial roles in the development of research:
(1) Organizing knowledge and explaining laws

(2) Predicting new laws
(3) Guiding research
32
Organizing knowledge and explaining laws:
 First theories serve to organize knowledge and explain laws.

 In the absence of a theory we simply have a collection of description and some
laws.
 The theory pulls these together into a unified framework.
 The better the theory, the more events and laws it can explain.
Predicting new laws:
 The second role of theories is to predict new laws.

 A fruitful theory not only explains many different laws that were previously
unrelated but also suggests places to look for new laws.
Guiding research:
 Theories also serve to guide research.

 A good theory suggests new experiments and helps researcher’s choose
alternative ways of performing them.
 When researchers use a theory to predict a new law, they also use the theory to
suggest new experiments to perform in order to establish the new law.
33
34
Unit 2-Research design and Measurement

Research design-definition-types of research design-exploratory and
casual research design-descriptive and experimental design- different
types of experimental design-validity of findings-internal and external
35
2.1 Research design:
2.1.1 Definition:
 Research design constitutes the blueprint for the collection, measurement and
analysis of data.
 Research design aids the researcher in the allocation of limited resources by
posing crucial choices in methodology.
 Research design is the plan and structure of investigation conceived as to obtain
answers to research questions. The plan is the overall scheme or program of the
research. It includes an outline of what the investigator will do from writing
hypothesis and their operational implications to the final analysis of data.
 RD expresses both the structure of the research problem- the framework,
organization or configuration of the relationships among variables of a study-
and the plan of investigation used to obtain empirical evidence on those
relationships.
From the definitions, it is found that the essentials of research design:
36
2.2. Experimental designs:
 A set of procedures for devising an experiment such that a change in a

dependent variable may be attributed solely to the change in independent
variables.
2.2.1 .
 An activity and time based plan.
 A plan always based on the research question.
 A guide for selecting sources and types of
information.
 A framework for specifying the relationship among
the study’s variables.
 A procedural outline for every research activity.
Various notations used in experimental design:
37
 X represents the introduction of an experimental stimulus to a

group. The effect of this independent variable(s) is of major interest. The
manipulation or change of an independent variable.
 O identifies a measurement or observation activity.
 R indicates that the group members have been randomly assigned

to a group.
 E experimental effect: that is, the change in the dependent

variable due to the independent variable.
2.2.2. Illustrated example:
 A supermarket intended to determine the effect of change in packaging style

(independent variable) on sales of mangoes (dependent variable) through
experimentation.
 At the time of the decision, the store sold the produce in pre-weighted packs
containing two mangoes.
 After recording the sales of mangoes in this manner management changed
(manipulates the independent variable) the packaging system and started selling
the mangoes from open produce bins.
 The change yielded better sales figures.
Question: “Did the change from selling in packs of two to free selection from
produce bins cause this sales increase?”
In answering this question, the following questions need to be answered:
 Could there be other variables that could have affected mango sales?
 What would happen to the sales if the weather changed from rainy to fair?
 Did the change take place during a festive season?
38
 In this example, weather and the onset of the festive season etc. may be viewed
as extraneous variables, having an effect on the dependent variable. However,
these are not independent variables.
 This example clearly shows that isolating the effects of independent variables
on dependent variables without controlling for the effects of the extraneous
variables is very difficult.
Experimental designs help to accomplish this task.
Experimental Design: the mango example
• Divide the 16 supermarkets in two equivalent groups of 8 - one control

group, the other experimental group.
• In the shops in control group, DO NOT CHANGES the packaging style,

in the experimental group, make the change.
• Measure the sales for both groups before the experiment date and after
the experiment date.
• Assume that the difference in the two groups are as below:
After Before Difference
Control group 30,720units [O4] 27,980 [O3] 2,740 [O4 - O3]
Experimental group 31,688 [O2] 27,816 [O1] 3,872 [O2 - O1]
Sales increase due to new system 1,132
39
Change = (O2 -- O1) -- (O4 – O 3 )
2.3. Classification of experimental design:
The various experimental designs are as follows:
 “Quasi-” and “true” experimental designs

 Purely post-design
 Before-after design
 Factorial design
 Latin square design
 Ex-post facto design
Quasi and True experimental design
 Quasi designs: designs which do not properly control for the

effects of extraneous variables.
 True designs: designs which properly control for the effects of

extraneous variables and isolate the effects of independent variables on the
dependent variables
Purely post-design
 In this design, the dependent variable is measured after exposing the test units to
the experimental variable.
 Example
 Assume M/s Hindustan Lever Ltd wants to conduct an experiment on the
“Impact of free sample on the sale of toilet soaps”. A small of toilet soaps are
mailed to selected customers in a locality. After one month, a coupon of 25
paise off on one cake of soap is mailed to each customer to whom free samples
were sent earlier. An equal number of these coupons are also mailed to people
in another locality in the neighbourhood. The coupons are coded to keep an
account of the number of coupons redeemed from each locality. Suppose, 400
40
coupons were redeemed from the experimental group and 250 coupons were
redeemed from the control group. The difference of 150 is supposed to be the
effect of free samples. In this method, the conclusion can be drawn only after
conducting the experiment.
Before – After Design
 In this method, measurements are made before as well as after the design.
 Example:
 Let us say that, an experiment is conducted to test an advertisement which is
aimed at reducing alcoholism. Attitudes and perceptions towards consuming
liquor are measured before exposure to the advertisement. The group is exposed
to an advertisement, which tells them the consequences, and their attitudes are
again measured after several days. The difference, if any, shows the
effectiveness of that advertisement.
The above example of “Before-after” suffers from validity threat due to the
following.
 Before Measure Effect
It alerts the respondents to the fact that they are being studied. The respondents
may discuss the topics with friends and relatives and modify their between
behaviour accordingly.
 Instrumentation Effect
This can be due to two difference instruments being used – one before and one
after. A change in the interviewers before and after, results in the
instrumentation effect.
Factorial Design
41
 Factorial design permits the researcher to test two or more variables at the same
time. Factorial design helps to determine the effect of each of the variables and
measure the interacting effect of many variables.
 Example:
 A departmental store wants to study the impact of price reduction of products.
Given that, there is also promotion (POP) being carried out in the stores (a) near
the entrance (b) at usual place, at the same time. Now assume that there are two
price levels namely regular price A1 and reduced price A2. Let there be three
types of POP namely B1, B2, and B3. There are 3×2=6 combinations possible.
The combinations possible are B1A1, B1A2, B2A1, B2A2, B3A1, B3A2. Which of
these combinations is best suited is what the researcher is interested in. Suppose
there are 60 departmental stores of the chain divided into groups of 10 stores
each. Now, randomly assign the above combination to each of these 10 stores as
follows:
Combinations Sales
B1A1, S1
B1A2 S2
B2A1 S3
B2A2 S4
B3A1 S5
B3A2 S6
S1 to S6 represents the sales resulting from each variable. The data gathered will
provide details on product sales on account of two independent variables.
The questions that will be answered are,
Is the reduced price more effective than regular price?
Is the display at the entrance more effective than regular price?
Is the display at the entrance more effective than the display at the usual
location?
42
Also, the research will tell us about the interaction effect of the two
variables.
Outcome of this experiment on sales is as follows:
1. Price reduction with display at the entrance.

2. Price reduction with display at the usual place.
3. No display and regular price applicable.
4. Display at the entrance with regular price applicable.
Latin Square Design
 The researcher chooses three shelf arrangements in three stores. He would like
to observe the sales generated in each of these stores at different periods. The
researcher must make sure that one type of shelf arrangement is used in each
store only once.
 In the Latin Square Design, only one variable is tested. As an example of Latin
Square design, assume that a supermarket chain is interested in the effect of in-
store promotion on sale. Suppose there are 3 promotions considered as follows:
1. No promotion
2. Free sample with demonstration
3. Window display.
4. Which of the three will be effective? The outcome may be affected by the size
of the stores and the time period. If we choose three stores and three time
periods, the total number of combination in 3×3=9. The arrangement is are
follows:
Time period Store

1 2 3
1 B C A
2 C A B
3 A B C
43
Ex-post Facto Design
 This is a variation of “after only design”. The groups such as experiment and
control are identified only after they are exposed to the experiment.
 Let us assume that a magazine publisher wants to ascertain the impact of
advertisement on knitting in Women’s Era periodical. The subscribers were
asked whether they have seen this advertisement on ‘knitting’. Those who have
read and not read were asked about the price, design, etc, of the product. The
difference indicates the effectiveness of the advertisement. In this design, the
experimental group is set to receive the treatment rather than exposing it to the
treatment by its choice.
2.4. Classification of designs
The following table classifies RD using 8 different descriptors.
Category Options
The degree to which the research Exploratory study.

question has been crystallized. Formal study.
The method of data collection. Monitoring.
Communication study.
The power of the researcher to Experimental.
produce effects in the variables under Ex post facto.
study.
44
The purpose of the study. Casual.

Descriptive.
Number of contacts Cross_ sectional.
Longitudinal.
The topical scope _breadth and Case.
depth of the study. Statistical study.
The research environment. Field setting.
Laboratory research.
Simulation.
The participant’s perceptions of Actual routine.
research activity. Modified routine.
Based on reference period Retrospective
Prospective
Retrospective-Prospective
2.4.1. Exploratory studies:
Exploratory studies tend toward loose structures with the objective of

discovering future research tasks. The immediate purpose of exploration is
usually to develop hypothesis or questions for further research.
2.4.2. Formal study:
It begins where the exploration leaves off it begins with a hypothesis or

research question and involve precise procedures and data source specifications.
The goal of formal research design is to test the hypothesis or answer the
research questions posed.
2.4.3. Monitoring:
The studies in which the researcher inspects the activities of a subject or the
nature of some material without attempting to elicit responses from anyone.
45
Ex: Traffic counts at an intersection, a research of the library collection, an

observation of the actions of a group of decision makers.
2.4.4. Communication study:
In Communication study, the researcher questions the subjects and collects

their responses by personal or impersonal means.
The collected data may result from
(1) Interview or telephone conversations.

(2) Self administered or self_ reported instruments sent through the mail.
2.4.5. Experiment:
 The researcher attempts to control and manipulate the variables in the study.
 Experimental design is appropriate when one wishes to discover whether certain
variables produce effects in other variables.
2.4.6. Ex post facto design:
 With an ex post _facto design, investigators have no control over the variables
in the sense of being able to manipulate them. They can only report what has
happened or what is happening.
2.4.7. Descriptive:
If the research is concerned with finding out who, what, where, when or how
much then the study is descriptive.
Ex: Research on `crime’ is descriptive when it measures the types of crimes

committed, how often, when, where and by whom.
2.4.8. Causal:
46
If the research is concerned with learning ‘why’ that is, how one variable
produces changes in another, it is causal.
Ex: why the crime rate is higher in city A than city B?
2.4.9. Cross_ Sectional:
Cross_ sectional studies are carried out once and represent a snapshot of one
point in time.
2.4.10. Longitudinal studies:
These are the studies in which an event or occurrence is measured again and
again over a period of time. This is also known as Time- Series -Study.
2.4.11. Statistical Studies:
 They attempt to capture a population’s characteristics by making inferences

from a sample’s characteristics.
 Hypotheses are tested quantitatively.
 Generalizations about their findings are presented based on the
representativeness of the sample and the validity of the design.
2.4.12. Simulation:
 An alternative to lab and field experimentation currently being used in business

research is simulation.
 Simulation uses a model-building technique to determine the effects of changes,
and computer-based simulations are becoming popular in business research.
 A simulation can be thought of as an experiment conducted in a specially
created setting that very closely represents the natural environment in which
activities are usually carried on.
47
 In that sense, the simulation lies somewhere between a lab and a field
experiment, insofar as the environment is artificially created but not far different
from “reality”.
 For example, in the study by Koolstra and Beentijes (1999), elementary
students participated in different television-based treatments in vacant school
rooms similar to their actual classrooms.
Comparison between exploratory, descriptive and causal research design:
s.n Exploratory Descriptive Causal

o
1 Objectives Gather Describe and Establish causality,
background measure market develop if-then
information, phenomena, statements
define characteristics or
terms,clarif functions of interest
y problems
and
hypothesis,
establish
research
priorities
48
2 Characteristic Relatively Prior formulation of Manipulation of one

s simple, specific hypothesis, or more independent
versatile pre planned and variables;preplanned
and flexible; strutured design. and structured
often the design; control of
first phase other mediating
of multiple variables
research
design,
unstructured
3 Methods Secondary Secondary data Experiments:
data analysis,surveys, laboratory,field,test
analysis. panels,observationa marketing
Qualiitative l and other data
research,
expert
surveys,
pilot
surveys
4 Results/ Tentative Conclusive Conclusive
findings
2.4.13. Retrospective study design
Retrospective study design investigates a phenomenon, situation, problem or

issue that has happened in the past. They are usually conducted either on the
basis of the data available for that period or on the basis of respondents’ recall
of the situation.
Examples:
 A historical analysis of migratory movements in Eastern Europe between 1915

and 1945.
 The relationship between levels of unemployment and street crime.
 The living conditions of Tamilians in srilanka in the early 20th century.
2.4.14. The prospective study design:
49
The prospective studies refer to the likely prevalence of a phenomenon,

Situation, problem, attitude or outcome in the future.
Examples:
 To determine, under field conditions, the impact of maternal and child health
services on the level of infant mortality.
 To establish the effects of a counselling service on the extent of marital
problems.
 To find out the effect of parental involvement on the level of academic
achievement of their children.
2.4.15. The retrospective- prospective study design:
 Studies focus on past trends in a phenomenon and study it into the future.
 In a retrospective-prospective study a part of the data is collected
retrospectively from the existing records before the intervention is introduced
and then the study population is followed to ascertain the impact of the
intervention
 Example:
 Trend studies.
2.4.16. Some other types of design:
a) Cohort studies:
 Cohort studies are based upon the existence of a common characteristic such as
year of birth, graduation or marriage, within a subgroup of a population.
 Example:
 Suppose you want to study the employment patterns of a batch of accountants
who graduated from a university in 1975 or to study the fertility behaviour of
women who were married in 1930.
b) Blind studies
50
 In a blind study, the study population does not know whether it is getting real or
fake treatment.
 The main objective of designing a blind study is to isolate the placebo effect.
 The placebo effect is the psychological effect on the recovery process of a
patient’s knowledge that he/she is receiving the treatment.
c) Double-Blind studies:
 In a double blind study neither the researcher nor the study participants know
who is receiving real and who is receiving fake treatment.
 Example:
 Pharmaceutical companies experimenting with the efficacy of newly developed
drugs in the prototype stage ensure that the subjects in the experimental and
control groups are kept unaware of who is given the drug, and who the placebo.
Such studies are called blind studies.
d) Solomon Four Group design:
 The experimental design that sets up two experimental groups and two control
groups, subjecting one experimental group and one control group to both the
pre-test and the post test, and the other experimental and control group to only
the post test.
Group Pre-test Treatment Post test

1. Experimental O1 X O2
2. Control O3 O4
3. Experimental X O5
4. Control O6
Treatment effect (E) could be judged by:
51
E= (O2-O1)
E= (O2-O4)
E= (O5-O6)
E= (O5-O3)
E= [(O2-O1)-(O4-O3)]
K.ANANDAKUMAR/BRM
If all Es are similar, the cause and effect relationship is highly valid.
2.5. Validity of Findings
2.5.1. Internal validity:
 Refers to the confidence we place in the cause-and-effect relationship.

 In other words, it addresses the question, “To what extend does the research
design permit us to say that the independent variable A causes a change in the
dependent variable B?
2.5.2 External validity:
 Refers to the extent of generalizability of the results of a causal study to

other settings, people or events
2.5.3 Factors affecting internal validity:
 History
 Maturation
 Testing
 Instrumentation
 Selection
52
x
E
d
i
l
a
v
r
e
t
n
I
2010
 Statistical regression
 Experimental mortality
History:
 History refers to those events which are external to the experiment, but occur at
the same time as experiment is being conducted. This may affect the result.
 Example
 Let us suppose that, manufacture makes a 20% cut in the price of a product and
monitors sales in the coming weeks.
 The purpose of research is to learn about the impact of price on sales.
 Meanwhile, if the production of the product declines due to a shortage of raw
materials, then the sales will not increase.
 Therefore, we cannot conclude that the price cut did not have any influence on
sales because the history of external has occurred during the period and we
cannot control the event. The event can only be identified.
Maturation
 Maturation refers to the changes occurring within the test units and not due to
the effect of the experiment.
 Maturation takes place due to passage of time.
 It refers to the effect of people growing older.
 Persons who use a particular product may discontinue using that product and
may switch over to an alternate product.
 Example 1:
 Pepsi is consumed when people are young. Due to passage of time, the
consumer might prefer to consume Diet Pepsi or even avoid it altogether.
 Example2:
 Assuming that a training programme is conducted for salesman, the company
wants to measure the impact of its sales programme. If the company finds that
53
the sales have improved, it may not be due to its training programme. It may be
because their salesmen have gained more experience now and know the
customer better. Better understanding between salesmen and customer may be
the reason for increased sales.
 Maturation effect is not just limited to test unit, composed of people alone.
Organisation also change, dealers grow, become more successful, diversify, and
so on.
Testing
 Pre-testing effect occurs, when the same respondents are measured more than
once. Responses given at a later stage will have a direct bearing on the
responses given during an earlier measurement.
 Example:
 Consider a respondent, who is given an initial questionnaire, intended to
measure brand awareness.
 After examining him, if a second questionnaire similar to the initial
questionnaire is given to the respondent, he will respond quite differently,
because of the respondent’s familiarity with the earlier questionnaire.
Instrumental variation:
 Instrument variation effect is a threat to internal validity when human

respondents are involved.
 Example:
 Equipment such as vacuum cleaner is left behind for the customer’s use for two
weeks. After weeks, respondents were given a questionnaire to answer. The
reply may be quite different from what was before the trial of the product. This
may be because of two reasons:
1) Some of the questions have been changed.
2) The interviewers for pre-testing and post-testing periods are different.
54
 The measurement in experiments will depend upon the instrument used for
measurements. Also, results may vary due to the application of instruments,
where there are several interviewers. Thus, it is very difficult to ensure that all
the interviewers will ask the same questions with the same tone and develop the
same rapport. There may be difference in response, because each interviewer
conducts the interview differently.
Experimental Mortality
 Some members may leave the original group and some new members may join
the old group. This is because some members might migrate to another
geographical area. This change in composition of the members will alter the
composition of the group itself.
 Example:
 Assume that a vacuum cleaner manufacturer wants to introduce a new version.
He interviews hundred respondents who are currently using the older version.
Let us assume that, these 100 respondents have rated the existing vacuum
cleaner on a 10 point scale (1 for lowest and 10 for highest). Let the mean rating
of the respondents be 7.
 Now the newer version is demonstrated to the same hundred respondents and
the equipment is left with them for two months. At the end of two months, only
80 participants respond, since the remaining 20 refused to answer. Now the
mean score of 80 respondents is 8 on the same 10 point scale. From this, ca we
conclude that the new vacuum cleaner is better?
 The answer to the above question depends on the composition of 20 respondents
who dropped out. Suppose the 20 respondents who dropped out displayed
negative reaction to the product, then the mean score would not have been 8. It
would have been even lower than 7. The difference in mean rating does not give
the true picture. It does not indicate that the new product is better than the old
one.
55
One might wonder why not we leave the 20 respondents from the original group
and calculate the mean rating of the remaining 80 and compare the two? But
this method will also not solve the mortality effect. Mortality effect will occur
in an experiment, irrespective of whether human beings are involved or not.
2.6. The external validity
 The external validity refers to the degree to which the results of an experiment
can be generalised beyond the experimental situation to other population.
2.6.1 Threats to external validity
 "A threat to external validity is an explanation of how you might be wrong in

making a generalization."
 Generally, generalizability is limited when the cause (i.e. the independent
variable) depends on other factors; therefore, all threats to external validity
interact with the independent variable.
 Aptitude-Treatment-Interaction:
 The sample may have certain features that may interact with the independent
variable, limiting generalizability.
 For example, inferences based on comparative psychotherapy studies often
employ specific samples (e.g. volunteers, highly depressed, no comorbidity). If
psychotherapy is found effective for these sample patients, will it also be
effective for non-volunteers or the mildly depressed or patients with concurrent
other disorders?
 Situation:
56
All situational specifics (e.g. treatment conditions, time, location, lighting,

noise, treatment administration, investigator, timing, scope and extent of
measurement, etc. etc.) of a study potentially limit generalizability.
 Pre-Test Effects:
If cause-effect relationships can only be found when pre-tests are carried out,
then this also limits the generality of the findings.
 Post-Test Effects:
If cause-effect relationships can only be found when post-tests are carried out,
then this also limits the generality of the findings.
 Reactivity (Placebo, Novelty, and Hawthorne Effects):

 If cause-effect relationships are found they might not be generalizable to
other settings or situations if the effects found only occurred as an effect of
studying the situation.
 Rosenthal Effects:
Inferences about cause-consequence relationships may not be generalizable to

other investigators or researchers.
2.7. Variables in Research:
 A variable is anything that can take on differing or varying values.

 The values can differ at various times for the same object or person or at the
same time for different objects or persons.
 Examples
 Production Units
 Absenteeism
57
 Motivation.
2.7.1. Types:
From the view point of causation
1. The dependent variable (criterion)

2. The independent variable (predictor)
3. The moderating
4. The intervening
The dependent variable:
 The dependent variable is the variable of primary interest to the researcher.

 The researcher’s goal is to understand and describe the dependent variable or to
explain its variability or predict it.
 Example
 A manager is concerned that the sale of a new product introduced after test
marketing it do not meet with his expectations.
 The dependent variable here is sales.
58
 Since the sales of the product can vary – can be low, medium or high – it is a
variable.
 Since sales in the main focus of interest to the manager, it is the dependent
variable.
Independent variable
 An independent variable is one that influences the dependent variable in either a

positive or negative way.
 That is, when the independent variable is present, the dependent variable is also
present and with each unit of increase in the independent variable, there is an
increase or decrease in the dependent variable also.
 Example
 Research studies indicate that successful new product development has an
influence on the stock market price of the company. Therefore the success of
the new product is the independent variable and stock market price the
dependent variable.
Stock market
New product
Price
Success
Moderating Variable: (Extraneous)
 The moderating variable is one that has a strong contingent effect on the
independent variable – dependent variable relationship. That is the presence of a
59
third variable (mv) modifies the original relationship between the independent
and dependent variables.
 (eg)., A prevalent theory is that the diversity of the workforce (comprising
people of different ethnic origins, race and nationalises) contributes more to
organizational effectiveness because each group brings its own special expertise
and skills to the work place. This synergy can be exploited, however only if
managers know how to harness the special talents of the diverse work group;
otherwise they will remain untapped.
In the above Scenario,
Organizational effectiveness – DV
Work force diversity – IV
Managerial expertise –MV
Organisational
Work force
Effectiveness
Diversity
Managerial
expertise
Interviewing Variable:
 An interviewing variable is one that surfaces

between the time the independent variables start operating to influence the
dependent variable and the time their impact is felt on it
60
r
s
a
(
g
in
k
o
S
)
t
c
ff
d
e
m
u Managerial effectiveness
(moderating variable)
From the view point of study design:
Active Variables:
K.ANANDAKUMAR/BRM
 Those variables that can be manipulated changed or controlled.
61
2010
Attribute variables:
 Those variables that can’t be manipulated changed or controlled and that reflect
the characteristics of the study population (eg)., age, gender, education &
income.
From the view point of unit of measurement
Categorical variable:
 These are measured on nominal or ordinal measurement scales.
Categorical
Constant variable dichotomous polytomous
Constant:
 When a variable can have only one value or category, for example taxi, tree or
water, it is known as constant variable.
Dichotomous:
 When the variable can have only two categories as in yes/no, good/bad it is
known as dichotomous variable.
Polytomous:
62
 When a variable can be divided into more than two categories for example
religion (Christian, Muslim, and Hindu), political parties (labour, liberal, and
democrat) then it is Polytomous variables.
Continuous variable:
 These are measured on an interval or a ratio scales.
E.g. Age, income and attitude score
2.8. Measurement and scaling
2.8.1. Definition of measurement:
 Measurement in research consists of assigning numbers to empirical events,

objects or properties, or activities in compliance with a set of rules. This
definition implies that measurement is a three part process:
(1) Selecting observable empirical events.
(2) Developing a set of mapping rules: a scheme for assigning numbers or symbols
to represent aspects of the event being measured.
(3) Applying the mapping rule(s) to each observation of that event.
2.8.2. Scaling
 The assignment of numbers or symbols to an indicant of a property or objects to

impart some of the characteristics of the numbers to the property; assigned
according to a value or magnitude.
2.8.3. Scale:
 A tool or mechanism by which individuals, events, or objects are distinguished

on the variables of interest in the some meaningful way.
2.8.4. Scaling techniques:
There are four kinds of scales namely:
63
(1) Nominal scale

(2) Ordinal scale
(3) Interval scale
(4) Ratio scale
Nominal scale:
 In this scale, numbers are used to identify the objects.
Example:
 University registration numbers assigned to students,

 Numbers on their jersey.
 The telephone numbers are an example of nominal scale, where one number is
assigned to one subscriber.
Ordinal scale:
 An ordinal scale not only categorizes the variables in such a way as to denote
the differences among the various categories, it also rank-orders the categories
in some meaningful way
Example:
Job characteristics Ranking of importance

The opportunity provided by the job -----
to:
Interact with others -------
Use a number of different skills -----
Complete a whole task from ----------
beginning to end
Serve others -------
Work independently -------
Interval scale:
64
 Interval scale is more powerful than the nominal and ordinal scale. The distance
given on the scale represents equal distance on the property being measured.
Interval scale may tell us “How far the objects are apart with respect to an
attribute?” an interval scale allows us to perform certain mathematical
operations on the data collected from the respondents.
Example:
Suppose we want to measure the rating of a refrigerator using interval scale. It

will appear as follows:
(1) Brand name poor------------------------------------------good

(2) Price high------------------------------------------low
(3) Services after sales poor-----------------------------------------good
(4) Utility poor ----------------------------------------good
Ratio scale:
 Ratio scale is a special kind of interval scale that has a meaningful zero point.
With this scale length, weight or distance can be measured. In this scale, it is
possible to say, how many times greater or smaller one object is being
compared to the other.
Example:
Sales this year for product A are twice the sales of the same product last year.
2.8.4. Rating Scales
The following rating scales are often used in organizational research:
 Dichotomous scale
 Category scale
65
 Likert scale
 Numerical scales
 Semantic differential scale
 Itemized rating scale
 Fixed or constant sum rating scale
 Stapel scale
 Graphic rating scale
 Consensus scale
Dichotomous Scale
 The dichotomous scale is used to elicit a Yes or No answer, as in the example

below. Note that a nominal scale is used to elicit the response.
E.g.
Do you own a car? Yes No
Category Scale
 The category scale uses multiple items to elicit a single response as per the
following example. This also uses the nominal scale.
E.g.
Where in northern California do you reside?
1. North Bay
2. South Bay
3. East Bay
4. Peninsula
66
5. Other
Likert Scale
 The Likert scale is designed to examine how strongly subjects agree or disagree
with statements on a 5-point scale with the following anchors:
Strongly DisagreeDisagree Neither Agree Nor Disagree AgreeStrongly

Agree
1 2 3 4 5
The responses over a number of items tapping a particular concept or variable

(as per the following example) are then summated for every respondent. This is
an interval scale and the differences in the responses any two points on the scale
remain the same.
Using the preceding Likert scale, state the extent to which you agree with
each of the following statements:
My work is very interesting 1 2 3 4 5

I am not engrossed in my work all day 1 2 3 4 5
Life without my work will be dull 1 2 3 4 5
Semantic Differential Scale
Several bipolar attributes are identified at the extremes of the scale, and
respondents are asked to indicate their attitudes, on what may be called a
semantic space, toward a particular individual, object, or event on each of the
attributes. The bipolar adjectives used, for instance, would employ such terms
as Good-Bad; Strong-Weak; Hot-Cold. The semantic differential scale is used
67
to assess respondents’ attitudes toward a particular brand, advertisement, object,

or individual. The responses can be plotted to obtain a good idea of their
perceptions. This is treated as an interval scale. An example of the semantic
differential scale follows.
e.g.
Responsive - - - - - - Unresponsive
Beautiful - - - - - - Ugly
Courageous - - - - - - Timid
Numerical Scale
 The numerical scale is similar to the semantic differential scale, with the
difference that numbers on a 5 point or 7 point scale are provided, with bipolar
adjectives at both ends, as illustrated below. This is also an interval scale.
E.g.
How pleased are you with your new real estate agent?
Extremely Pleased 7 6 5 4 3 2 1 Extremely Displeased
Itemized Rating Scale
 A 5-point or 7-point scale with anchors, as needed, is provided for each item
and the respondent states the appropriate number on the side of each item, or
circles relevant number against each item, as per the example that follow. The
responses to the items are then summated. This uses an interval scale.
Respond to each item using the scale below, and indicate your response number
on the line by each item.
1 2 3 4 5
Very unlikely Unlikely Neither Unlikely Nor Likely Likely Very Likely
1. I will be changing my job within the next 12 months.
2. I will take on new assignments in the near future.
68
3. It is possible that I will be out of this organization

Within the next 12 months.
Fixed or Constant Sum Scale

 The respondents are here asked to distribute a give number of points across
various items as per the example below. This is more in the nature of an ordinal
scale.
E.g.
In choosing toilet soap, indicate the importance you attach to each of the
following five aspects by allotting points for each to total 100 in all.
 Fragrance -
 Color -
 Shape -
 Size -
 Texture of lather -
 Total points 100
Stapel scale:
This scale simultaneously measures both the direction and intensity of the
attitude toward the items under study. The characteristic of interest to the study
is placed at the centre and a numerical scale ranging, say, from +3 to -3, on
either side of the item as illustrated below. This gives an idea of how close or
distant the individual response to the stimulus is, as shown in the example
below. Since this does not have an absolute zero point, this is an interval scale.
69
State how you would rate your supervisor’s abilities with respect to each of the characteristics mentioned
below, by circling the appropriate number.
+3 +3 +3
+2 +2 +2
+1 +1 +1
Adopting Modern Technology Product Innovation interpersonal Skills
-1 -1 -1
-2 -2 -2
-3 -3 -3
Graphic Rating Scale

 A graphical representation helps the respondents to indicate on this scale their
answers to a particular question by placing a mark at the appropriate point on
the line, as in the following example. This is an ordinal scale, though the
following example might appear to make it look like an interval scale.
On a scale of 1 to 10 10 Excellent
how would you rate your 5 All right
supervisor? 1 Very bad
2.9. CRITERIA FOR GOOD MEASUREMENT OR GOODNESS OF

MEASURES OR ESTABILISHING THE VALIDITY AND
RELIABILITY OF A RESEARCH INSTRUMENT
There are three criteria for good measurement. They are
 Reliability
 Validity
 Practicality
2.9.1. Reliability:
70
 Reliability means the extent to which the measurement process is free from
errors.
 It is an indication of the stability and consistency with which the instrument
measures the concept and helps to assess the goodness of a measure.
2.9.2. Factors affecting the reliability of a research instrument:
The wording of questions:
A slight ambiguity in the wording of questions or statements can affect the

reliability of a research instrument as respondents may interpret the questions
differently at different times resulting in different responses.
The physical setting:
In case of an instrument being used in an interview, any change in the physical

setting at the time of the repeat interview may affect the responses given by a
respondent, which may affect reliability.
The respondents’ mood:
A change in a respondents’ mood when responding to questions or writing

answers in a questionnaire can change and may affect the reliability of that
instrument.
The nature of interaction:
In an interview situation, the interaction between the interviewer and the

interviewee can affect responses significantly. During the repeat interview the
71
responses given may be different due to a change in interactions which could

affect reliability.
The regression effect of an instrument:
When a research instrument is used to measure attitudes towards an issue, some

respondents, after having expressed their opinion, may feel that they have been
either too negative or too positive towards the issue. The second time they may
express their opinion differently, thereby affecting reliability.
2.9.3. Methods of determining the reliability of an instrument:
There are number of ways of determining the reliability of an instrument. The

various procedures can be classified into two groups:
 External consistency procedures

 Internal consistency procedures
External consistency procedures:
External consistency procedures compare findings from two independent

processes of data collection with each other as a means of verifying the
reliability of the measure. The two methods are as follows:
Test / Re-test reliability:
 This is a commonly used method for establishing the reliability of a research

tool.
 In the test/re test an instrument is administered once, and then again, under the
same or similar conditions. The ratio between the test and re test scores is an
indication of the reliability of the instrument.
 The greater the value of the ratio, the higher the reliability of the instrument.
72
 As an equation,
(Test score)/ (Re-test score) = 1
Or
(Test score)- (Re-test score) =0
 The ratio of 1 shows 100% reliability between test score and retest score.
 In another way, zero difference between the test and retest scores is an
indication of 100% reliability.
Parallel forms of the same test:
 In this procedure you construct two instruments that are intended to measure the
same phenomenon.
 The two instruments are then administered to two similar populations.
 The results obtained from one test are compared with those obtained from the
other.
 If they are similar, it is assumed that the instruments are reliable.
Internal consistency procedures:
The idea behind internal consistency procedures is that items measuring the
same phenomenon should produce similar results. The following method is
commonly used for measuring the reliability of an instrument.
The split – half technique:
 A test given and divided into halves and are scored separately, then the score of
one half of test are compared to the score of the remaining half to test the
reliability (Kaplan & Saccuzzo, 2001).
73
Why use Split-Half?
1st- Divide test into halves. The most commonly used way to do this would be to
assign odd numbered items to one half of the test and even numbered items to
the other, this is called, Odd-Even reliability.
2nd- Find the correlation of scores between the two halves by using the Pearson r
formula.
3rd- Adjust or re-evaluate correlation using Spearman-Brown formula which

increases the estimate reliability even more. The longer the test the more reliable
it is. So it is necessary to apply the Spearman-Brown formula to a test that has
been shortened, as we do in split-half reliability (Kaplan & Saccuzzo, 2001).
 Split-Half Reliability is a useful measure when impractical or undesirable to

assess reliability with two tests or to have two test administrations (because of
limited time or money) (Cohen & Swerdlik, 2001).
How do I use Split-Half?
Spearman-Brown formula
r=2r
1+ r
r = estimated correlation between two halves (Pearson r) (Kaplan & Saccuzzo,

2001).
2.10. VALIDITY:
74
The test question “1 + 1 = _____” is certainly a valid basic addition question

because it is truly measuring a student’s ability to perform basic addition. It
becomes less valid as a measurement of advanced addition because as it
addresses some required knowledge for addition, it does not represent all of
knowledge required for an advanced understanding of addition. On a test
designed to measure knowledge of Indian History, this question becomes
completely invalid. The ability to add two single digits has nothing do with
history.
2.10.1 Types of validity:
Content validity:
In general, VALIDITY is an indication of how sound

your research is. This indicates how much of the
scores measured reflect the actual.
 Content validity draw an inference from test scores to a large domain of items
similar to those on the test.
75

 Content validity is the degree to which the content of the items
adequately represents the universe of all relevant items under study.
 The more the scale items represent the domain or universe of the
concept being measured, the greater the content validity.
 To put it differently, content validity is a function of how well the
dimensions and elements of a concept have been delineated.
Content validity is concerned with sample-population representativeness. i.e. the

knowledge and skills covered by the test items should be representative to the
larger domain of knowledge and skills.
 For example, computer literacy includes skills in operating system, word

processing, spreadsheet, database, graphics, internet, and many others.
However, it is difficult, if not impossible, to administer a test covering all
aspects of computing. Therefore, only several tasks are sampled from the
population of computer skills.
Criterion validity:
 Criterion validity draws an

inference from test scores to
performance.
 A high score of a valid test

indicates that the tester has met the
performance criteria.
76
 Regression analysis can be applied to establish criterion validity.
 An independent variable could be used as a predictor variable and a

dependent variable, the criterion variable.
 The correlation coefficient between them is called validity coefficients.
For instance, scores of the driving test by simulation is the predictor variable
while scores of the road test is the criterion variable. It is hypothesized that if
the tester passes the simulation test, he/she should meet the criterion of being a
safe driver. In other words, if the simulation test scores could predict the road
test scores in a regression model, the simulation test is claimed to have a high
degree of criterion validity.
In short, criterion validity is about prediction rather than explanation.

Predication is concerned with non-casual or mathematical dependence where as
explanation is pertaining to causal or logical dependence. For example, one can
predict the weather based on the height of mercury inside a thermometer. Thus,
the height of mercury could satisfy the criterion validity as a predictor.
However, one cannot explain why the weather changes by the change of
mercury height. Because of this limitation of criterion validity, an evaluator has
to conduct construct validation.
Construct validity:
 Construct validity draws an inference

form test scores to a psychological
construct.
77

The degree to which a research instrument is able to provide
evidence based on theory is called construct validity.
Because it is concerned with abstract and theoretical construct, construct validity is
also known as theoretical construct.
 According to Hunter and Schmidt (1990), construct validity is a quantitative

question rather than a qualitative distinction such as "valid" or "invalid"; it is a
matter of degree.
 Construct validity can be measured by the correlation between the intended

independent variable (construct) and the proxy independent variable (indicator,
sign) that is actually used.
 For example, an evaluator wants to study the relationship between general

cognitive ability and job performance. However, the evaluator may not be able
to administer a cognitive test to every subject. In this case, he can use a proxy
variable such as "amount of education" as an indirect indicator of cognitive
ability. After he administered a cognitive test to a portion of all subjects and
found a strong correlation between general cognitive ability and amount of
education, the latter can be used to the larger group because its construct
validity is established.
Concurrent Validity:
 Concurrent Validity refers to a measurement device’s ability to vary directly

with a measure of the same construct or indirectly with a measure of an opposite
construct.
78
 It allows you to show that your test is valid by comparing it with an already
valid test.
 A new test of adult intelligence, for example, would have concurrent validity if
it had a high positive correlation with the Wechsler Adult Intelligence Scale
since the Wechsler is an accepted measure of the construct we call intelligence.
 An obvious concern relates to the validity of the test against which you are
comparing your test.
 Some assumptions must be made because there are many who argue the
Wechsler scales, for example, are not good measures of intelligence.
Predictive Validity:
 In order for a test to be a valid screening device for some future behaviour, it
must have predictive validity.
 The SAT is used by college screening committees as one way to predict college
grades.
 The GMAT is used to predict success in business school. And the LSAT is
used as a means to predict law school performance.
 The main concern with these and many other predictive measures is predictive
validity because without it, they would be worthless.
 We determine predictive validity by computing a correlation coefficient
comparing SAT scores, for example, and college grades. If they are directly
related, then we can make a prediction regarding college grades based on SAT
score.
 We can show that students who score high on the SAT tend to receive high
grades in college.
Difference between validity &reliability:
The figure below portrays the difference between reliability and validity.
79
If the purpose of the measurement is to hit the centre of the target, we see that
reliability looks like a tight pattern regardless of where it hits, because
reliability is a function of consistency. Validity on the other hand, is a function
of shots being arranged around the bull’s eye. In statistical terms, if the
expected value is the bull’s eye, then it is valid; If the variations are small
relative to the entire target, then it is reliable.
Practicality:
Practicality has been defined as economy, convenience, and interpretability.
Economy:
 Economy considerations suggest that some trade-off is needed between the ideal
research project and that which the budget can afford.
 The length of measuring instrument is an important area where economic
pressures are quickly felt.
 The choice of data collection is also often dictated by economic factors.
 The rising cost of personal interviewing first led to an increased use of
telephone surveys and subsequently to the current rise in Internet surveys.
 In standardized tests, the cost of test materials alone can be such a significant
expense that it encourages multiple reuses.
Convenience:
 Convenience test suggests that the measuring instrument should be easy to

administer.
80
 For this purpose one should give due attention to the proper layout of the
measuring instrument.
 For instance, a questionnaire, with clear instructions (illustrated by examples),
is certainly more effective and easier to complete than one which lacks these
features.
Interpretability:
 Interpretability consideration is especially important when persons other than

the designers of the test are to interpret the results.
 The measuring instrument, in order to interpretable, must be supplemented by
 Detailed instructions for administering the test
 Scoring keys
 Evidence about the reliability and guides for using test and for interpreting
results.
81
82
Unit 3- DATA COLLECTION

Types of data-primary vs secondary data-methods of
primary data collection-survey vs observation-
experiments-construction of questionnaire and
83
3.1 DIFFERENT METHODS OF DATA COLLECTION:
84
-
S
N
A
C
I
L
B
U
P
T
V
O
G
H
R
E
D
M
S
D
O
H
T
E
R
E
T
B
V
N
I
S
Q
U
C
O
D
D
N
O
C
E
A
M
I
R
P
S
F
O
Y
R
A
N
I
T
C
E
L
O
S
R
U
G
W
R
I
A
O
S
T
N
E
M
O
S
E
C
R
U
D
G
T
A
I
V
Y
3.2 MEANING OF PRIMARY DATA:

 The data directly collected by the researcher, with respect to the problem under
study, is known as primary data.
 Primary data is also the first hand data collected by the researcher for the
immediate purpose of the study.
3.2.1 METHODS OF COLLECTING PRIMARY DATA:

85
S
S
A
M
I
R
P R
Y
B
O
R
T
N
I
S
E
U
Q
TI
A
D R
T
V
C
O
E
L
T
W
E
I
A
N
O I
T
O
E
R
G
N
N
OBSERVATION METHOD:
The different types of observation methods include
 Structured or unstructured method

 Disguised or undisguised method
 Direct or indirect method
 Participant or non participant
86
ti
a
P
c
e
i
D
l
r
t
n
o
p
e
s
b
o
c
u
r
t
S
g
D
d
e
s
i r
o
d
e
l
a
n
t
v
r
d
r
o
c
r
t
g
d
n
u
s
i d
e
r
n
c
e
d
i
r
-
o t
n
o
ti
a
d
o
h
t
e
m
ti
a
p
d
h
e
m
l
r
t
n
o
c
i d
e
l
a
n
t
Structured or unstructured:
 If the observation is characterised by a careful definition of units to be observed,
the style of recording the observed information, standardised conditions of
observation and the selection of pertinent data of observation, then the
observation is called as structured observation.
 When the observation is to take place without these characteristics to be thought
of in advance, the same is termed as unstructured observation.
Example:
 A manager of a hotel wants to know “how many of his customers visit the hotel
with their families and how many come as single customers”. Here, the
observation is structured, since it is clear “what is to be observed”. He may
instruct his waiters to record this. This information is required to decide
requirements of the chairs and tables and also the ambience.
 Suppose the manager wants to know how single customers and those with
families behave and what their attitudes are like. This study is vague, and it
needs a non –structured observation.
 It is easier to record structured observation than unstructured observation.
Disguised or undisguised method:
87
 In disguised observation, the respondents do not know that they are being
observed.
 In non –disguised observation, the respondents are well aware that they are
being observed.
Direct –indirect observation:
 When the observer is physically present and personaly monitors and records the
behaviour of the participants, then it is called direct observation.
 When the recording of data is done by mechanical, photographic or electronic
means, it is called indirect observation.
Example:
 Suppose, a researcher is interested in knowing about the soft drink consumption

of a student in hostel room. He may like to observe empty soft drink bottles
dropped into the bin. Similarly, the observer may seek the permission of the
hotel owner to visit the kitchen or stores. He may carry out a kitchen/stores
audit, to find out the consumption of various brands of spice items being used
by the hotel. It may be noted that the success of an indirect observation largely
depends on “how best the observer is able to identify physical evidence of the
problem under study”.
Participant and Non –participant observation:
 If the observer observes by making himself, more or less, a member of the

group he is observing so that he can experience what the members of the group
experience, the observation is called participant observation.
 When the observer observes as detached emissary without any attempt on his
part to experience through participation what others feel, the observation of this
type is often termed as non-participant observation.
88
3.2.
t
e
Iis
m
u
r
n
d
o
h
c
w
v
Controlled and non-controlled:
THE INTERVIEW METHOD:

K.ANANDAKUMAR/BRM
 If the observation takes place in the natural setting, it may be termed as

uncontrolled observation.
 But when observation takes place according to definite pre-arranged plans,
involving experimental procedure, the same is then termed as controlled
observation.
2010
 Interview is the commonly used method of collecting information from people.

 Any person-to-person interaction between two or more individuals with a
specific purpose in mind is called an interview.
 On the one hand, interviewing can be very flexible, when the interviewer has
the freedom to formulate questions as they come to mind around the issue being
investigated.
 On the other hand, it can be inflexible, when the investigator has to keep strictly
to the questions decided beforehand.
Unstructured interview:
t
d
e
r
is
r
 The strength of unstructured interview is the almost complete freedom they
provide in terms of content and structure.
 You are free to order these in whatever sequence you wish.
89
 You also have complete freedom in terms of the wording you use and the way
you explain questions to your respondents.
 You may formulate questions and raise issues on the spur of the moment,
depending upon what occurs to you in the context of the discussion.
 There are several types of unstructured interviewing...
 In depth interviewing
 Focus group interviewing
 Narratives
 Oral histories
In depth interviews:
 In depth interviewing is “repeated face to face encounters between the

researcher and informants directed towards understanding informants’
perspectives on their lives, experiences or situations as expressed in their own
words”.
Focus group interview:
 Here a group of people jointly participate in an unstructured indirect interview

conducted by a moderator. The group usually consists of six to ten people. In
general the selected persons have similar backgrounds. The moderator attempts
to focus the discussion on the problem areas
Narratives:
 The person tells his/her story about an incident or situation and the researcher
listen passively. Occasionally the researcher encourages the individual by using
the techniques of active listening.
Example:
90
 Asking the sexually abused people to narrate their experience and how they
have been affected.
Oral histories:
 Oral histories are more commonly used for learning about a historical event or
episode that took place in the past or for gaining information about a cultural,
custom or story that has been passed from generation to generation.
Example:
 Suppose you want to find out about the life after World War II in some regional
town of Western Australia or about living conditions of Aboriginal and Torres
Strait Islander people in the 1930s. You would talk to persons who were alive
during that period and ask them about life at that time.
Structured interviews:
 In a structured interview the researcher asks a predetermined set of questions,

using the same wording and order of questions as specified in the interview
schedule.
 An interview schedule is a written list of questions, open-ended or closed ended,
prepared for use by an interviewer in a person-to-person interaction.
Other types:
Focussed interview:
 Focussed interview is meant to focus attention on the given experience of the

respondent and its effects.
 Under it the interviewer has the freedom to decide the manner and sequence in
which the questions would be asked and has also the freedom to explore reasons
and motives.
91
Clinical interview:
 The clinical interview is concerned with broad underlying feelings or

motivations or with the course of individual’s life experience.
3.3. EXPERIMENTAL RESEARCH
 Experimental research is commonly used in sciences such as sociology and

psychology, physics, chemistry, biology and medicine etc.
 It is a collection of research designs which use manipulation and controlled

testing to understand causal processes. Generally, one or more variables are
manipulated to determine their effect on a dependent variable.
The experimental method:

It is a systematic and scientific approach to research in which the researcher
manipulates one or more variables, and controls and measures any change in
other variables.
Experimental Research is often used where:
1. There is time priority in a causal relationship (cause precedes effect)

2. There is consistency in a causal relationship (a cause will always lead to
the same effect)
3. The magnitude of the correlation is great.
Aims of experimental research
Experiments are conducted to be able to predict phenomenon. Typically, an

experiment is constructed to be able to explain some kind of causation.
Experimental research is important to society - it helps us to improve our
everyday lives.
92
Identifying the research problem
 After deciding the topic of interest, the researcher tries to define the research
problem. This helps the researcher to focus on a more narrow research area to
be able to study it appropriately.
 The research problem is often operationalizationed, to define how to measure
the research problem. The results will depend on the exact measurements that
the researcher chooses and may be operationalized differently in another study
to test the main conclusions of the study.
 Defining the research problem helps you to formulate a research hypothesis,
which is tested against the null hypothesis.
 An ad hoc analysis is a hypothesis invented after testing is done, to try to
explain why the contrary evidence. A poor ad hoc analysis may be seen as the
researcher's inability to accept that his/her hypothesis is wrong, while a great ad
hoc analysis may lead to more testing and possibly a significant discovery.
Constructing the experiment
There are various aspects to remember when constructing an experiment.

Planning ahead ensures that the experiment is carried out properly and that the
results reflect the real world, in the best possible way.
Sampling groups to study
 Deciding the sample groups can be done in using many different

sampling techniques. Population sampling may chosen by a number of methods,
such as randomization, "quasi-randomization" and pairing.
93
 Reducing sampling errors is vital for getting valid results from

experiments. Researchers often adjust the sample size to minimize chances of
random errors.
 Here are some common sampling techniques:
 probability sampling
 non-probability sampling
 Simple random sampling
 convenience sampling
 stratified sampling
 systematic sampling
 cluster sampling
 sequential sampling
 disproportional sampling
 judgmental sampling
 snowball sampling
 quota sampling
Creating the design
The research design is chosen based on a range of factors. Important factors

when choosing the design are feasibility, time, cost, ethics, measurement
problems and what you would like to test. The design of the experiment is
critical for the validity of the results.
Typical designs and features in experimental design
 Pretest-Posttest Design
Check whether the groups are different before the manipulation starts and the
effect of the manipulation. Pretests sometimes influence the effect.
94
 Control Group
Control groups are designed to measure research bias and measurement effects,
such as the Hawthorne Effect or the Placebo Effect. A control group is a group
not receiving the same manipulation as the experimental group.
 Randomized Controlled Trials
Randomized Sampling, comparison between an Experimental Group and a
Control Group and strict control/randomization of all other variables
 Solomon Four-Group Design
With two control groups and two experimental groups. Half the groups have a
pretest and half do not have a pretest. This to test both the effect itself and the
effect of the pretest.
 Between Subjects Design
Grouping Participants to Different Conditions
 Within Subject Design
Participants Take Part in the Different Conditions
 Counterbalanced Measures Design
Testing the effect of the order of treatments when no control group is
available/ethical
 Matched Subjects Design
Matching Participants to Create Similar Experimental- and Control-Groups
 Double-Blind Experiment
Neither the researcher, nor the participants, know which is the control group.
The results can be affected if the researcher or participants know this.
 Bayesian Probability
Using bayesian probability to "interact" with participants is a more "advanced"
experimental design. It can be used for settings were there are many variables
which are hard to isolate. The researcher starts with a set of initial beliefs, and
tries to adjust them to how participants have responded
95
Pilot study
 It may be wise to first conduct a pilot-study or two before you do the real
experiment. This ensures that the experiment measures what it should, and that
everything is set up right.
 Minor errors, which could potentially destroy the experiment, are often found
during this process. With a pilot study, you can get information about errors and
problems, and improve the design, before putting a lot of effort into the real
experiment.
 If the experiments involve humans, a common strategy is to first have a pilot

study with someone involved in the research, but not too closely, and then
arrange a pilot with a person who resembles the subject(s). Those two different
pilots are likely to give the researcher good information about any problems in
the experiment.
Conducting the experiment
 An experiment is typically carried out by manipulating a variable, called the
independent variable, affecting the experimental group. The effect that the
researcher is interested in, the dependent variable(s), is measured.
 Identifying and controlling non-experimental factors which the researcher does
not want to influence the effects, is crucial to drawing a valid conclusion. This
is often done by controlling variables, if possible, or randomizing variables to
minimize effects that can be traced back to third variables. Researchers only
want to measure the effect of the independent variable(s) when conducting an
experiment, allowing them to conclude that this was the reason for the effect.
Analysis and conclusions
96
 In quantitative research, the amount of data measured can be enormous. Data
not prepared to be analyzed is called "raw data". The raw data is often
summarized as something called "output data", which typically consists of one
line per subject (or item). A cell of the output data is, for example, an average of
an effect in many trials for a subject. The output data is used for statistical
analysis, e.g. significance tests, to see if there really is an effect.
 The aim of an analysis is to draw a conclusion, together with other observations.
The researcher might generalize the results to a wider phenomenon, if there is

no indication of confounding variables "polluting" the results.
 If the researcher suspects that the effect stems from a different variable than the
independent variable, further investigation is needed to gauge the validity of the

results. An experiment is often conducted because the scientist wants to know if
the independent variable is having any effect upon the dependent variable.
Variables correlating are not proof that there is causation.
 Experiments are more often of quantitative nature than qualitative nature,
although it happens.
Examples of experiments
Here are some examples of scientific experiments:
Social psychology
 Stanley Milgram Experiment - Will people obey orders, even if clearly

dangerous?
 Asch Experiment - Will people conform to group behavior?
 Stanford Prison Experiment - How does people react to roles? Will you
behave differently?
97
 Good Samaritan Experiment - Would You Help a Stranger? - Explaining

Helping Behavior
Genetics
 Law Of Segregation - The Mendel Pea Plant Experiment

 Transforming Principle - Griffith's Experiment about Genetics
Physics
 Ben Franklin Kite Experiment - Struck by Lightening

 J J Thomson Cathode Ray Experiment
3.4 QUESTIONNAIRE METHOD:
 Data collection instrument used for gathering data;

 A formalized schedule of an assembly of a carefully formulated questions;
 A questionnaire is a written list of questions, the answer to which is recorded by
respondents. In a questionnaire respondents read the questions, interpret what is
expected and then write down the answers.
3.4.1 Six important functions
• Converts research objectives into specific questions
• Standardizes the questions
• Keeps respondents motivated to complete the research
98
• Serve as a permanent record
• Speed-up the process of data analysis
• Reliability and validity purposes
3.4.2 Types:
 Structured and non –disguised

 Structured and disguised
 Non-structured and disguised
 Non-structured and non-disguised
Serial Types Characteristics

No
1  Questions are structured so as to obtain the facts.
Structured and non
–disguised  The interviewer will ask the questions strictly in
accordance with the pre-arranged order.
 Questions are presented with exactly the same
wording and same order to all respondents.
 All respondents reply the same questions.
 Widely used in market research.
 Example :
“Subjects attitude towards Cyber laws and the
need for government legislation to regulate it”.
 Certainly ,not needed at present
99
 Certainly not needed

 I can’t say
 Very urgently needed
 Not urgently needed
2 Structured  Used to know the respondents’’ attitude.

and
disguised  In this type of questionnaire, what comes out is
“what does the respondent know” rather than
what he feels.
 The same question is posed to each respondent.
 Administering the questionnaire and post-
administration work is simple.(i.e. coding,
tabulating is easy)
 Respondents’ bias is minimized.
 Example:
 What do you think about the Babri Masjid
demolition?
 Tell me your opinion about Mr. Ben’s healing
effect show conducted at Bangalore?
3  The main objective is to conceal the topic of
Non-structured and
disguised enquiry by using a disguised stimulus.
 The assumption made here is that individual’s
reaction is an indication of respondent’s basic
perception
 Commonly used for focus group discussions.
 No fixed set of questions.
 The inner self (why) of an individual is
researched.
 Example:
100
t
e
r
p
s
4
K.ANANDAKUMAR/BRM
 Motivation research,
 Projective techniques.
 The purpose of the study is clear, but the
Non-structured and
non-disguised responses to the question are open-ended.
 Example
2010
 How do you feel about the Cyber law currently in

practice and its need for further modification?
 The question asked by the interviewer varies from
person to person. The major advantage is the
freedom permitted to the interviewer
 The main disadvantage is that it takes time and
the respondents may not cooperate
 Coding and tabulating are difficult.
 Not a very frequently used method.
3.4.3 Process of questionnaire designing:
101
Steps in a Questionnaire Development Process
Determine Decide Data

Survey Collection Question
Objectives Methods Development
Question
Pre-design activities Evaluation
by Researcher
and by Client
Design
Pretest the activities
Post-design activities Questionna
ire
Tabulate and Gather Data Revise,

Analyze Data Using the Finalize,
and Finalize Question- and
Report naire Duplicate
3.4.5 Types of questions:
Open Ended Questions Vs closed Ended Questions:
 These are the questions where respondents are free to answer in their own
words.
 Example:
 State five things that are interesting and challenging in the job.
 What factor do you consider while buying a suit?
 A closed ended questions, in contrast would ask the respondents to make
choices among a set of alternatives given by the researcher
 Example:
102
 Please tell us your overall reaction to this commercial.

i. A great commercial would like to see again.
ii. Just so-so, like other commercials.
iii. Another bad commercial
iv. Pretty good commercial.
Advantages and disadvantages of Open-ended Questions
Advantages
 Since they do not restrict the respondent’s response, the widest scope of
response can be attained.
 Most appropriate where the range of possible responses is broad, or

cannot be predetermined.
 Less subject to interviewer bias.
• Responses may often be used as direct quotes to bring realism and life to
the written report.
Disadvantages
 Inappropriate for self-administered questionnaire since people tend to

write more briefly than they speak.
 The interviewer may only record a summary of the responses given by an

interview and fail to capture the interviewer’s own ideas.
 It is difficult to categorize and summarize the diverse responses of

different respondents.
 May annoy a respondent and prompt him/her to terminate the interview,

or ignore the mail questionnaire.
Advantages and Disadvantages of Closed-ended Questions
103
Advantages
 All respondents reply on a standard response set. This ensures

comparability of responses, facilitates coding, tabulating and interpreting the
data.
 Easier to administer and most suited for self-administered questionnaire.
 If used in interviews, less skilled interviewer may be engaged to do the

job
Disadvantages
 Preparing the list of responses is time-consuming.
 If the list of responses is long, the respondents may be confused.
 If the list of responses is not comprehensive, responses may often fail to

represent the respondent’s point of views.
Positively and negatively worded questions:
 Instead of phrasing all questions positively, it is advisable to include some

negatively worded questions as well, so the tendency in respondents to
mechanically circle the points toward one end of the scale is minimized.
Double –Barrelled questions:
 These are the questions, in which the respondents can agree with one part of the
question, but not agree with the other or cannot answer without making a
particular assumption.
 Example:
 Do you feel that firm today are employee oriented and customer oriented?
104
Ambiguous Questions:
 Questions that are not clearly worded and likely to be interpreted by

respondents in different ways.
Recall –dependent questions:
 Some questions might require respondents to recall experiences from the past
that are hazy in their memory. Answers to such questions might have bias.
 For example , If an employee who has had 30 years of services in the
organisation is asked to state when he first started working in a particular
department and for how long , he may not be able to give the correct answers
and may be off on his responses.
Leading questions:
 A leading question is one that suggests the answer to the respondent. The
question itself will influence the answer, when respondents get an idea that the
data is being collected by a company.
 Example:
 How do you like the programme on Radio Mirchy?
 Don’t you think that in these days of escalating costs of living, employees
should be given good pay raises?
Loaded questions:
 Questions that would elicit highly biased emotional responses from subjects.
 Example:
 Do you think the civic body is incompetent?
 To what extend do you think the management is likely to be vindictive if the
union decides to go on strike?
 Here the words- incompetent, strike and vindictive are loaded.
105
Funneling technique:
The questioning technique that consists of initially asking general and broad
questions, and gradually narrrowing the focus thereafter on more specific
themes is called funnelling technique.
3.4.6 Features of good questionnaire:
 Questionnaire should be printed/cyclostyled/photo-copied.

 The first part of the questionnaire should specify the object or purpose for
which the information is required.
106
 The information furnished would be kept confidentially.

 It must be simple. The respondents should be able to understand the questions.
 Personal questions on wealth, habits etc could be avoided.
 Questions should be in sequence.
 Questions should not require any referencing before replying.
 Questions should not force the respondent to recall from his memory anything
to answer.
 Questions needing computation/consultation should be avoided.
 Questions on sentiments/belief/faith should be avoided.
 Repetition of questions should be eliminated.
 It should be well arranged, to facilitate analysis and interpretation.
 It must keep the respondent interest throughout.
 If any diagram or map is used, then it should be printed clearly.
3.4.7 Merits of questionnaire:
 It involves lesser cost as questionnaire could be sent by post to a wide area.

 It does not interfere with the respondent while answering the question.
 The influence of interviewer on the respondent is eliminated.
 Respondents are given sufficient time to fill up the questionnaire.
 This method is useful when the sample size is large.
 No need to train interviewers.
 Personal and sensitive questions are well answered.
3.4.8 Demerits:
 It is always found that the response rate in questionnaire is very poor compared
to using schedules.
 Bias of the respondents cannot be determined easily.
 Respondents need to be educated.
107
 Follow up on non response or unfilled questionnaire only adds to the cost and
time.
 Accuracy of response cannot be ensured.
 A lot of care is required to design and structure a questionnaire.
 When the researcher is interested in a spontaneous response, this method is
unsuitable.
 Any clarification required by the respondent regarding questions is not possible.
3.4.9 Difference between questionnaire & schedule
Basic difference Questionnaire Schedule

1.Usage Respondent himself Researcher /Enumerator
records the answer records the answers obtained.
obtained.
2.Cost Relatively cheaper, as it Costlier, as the investigators
is sent by mail to the has to be appointed, trained
targeted respondents. and meet every informant at
the latter’s place.
3. Coverage Wide coverage possible, Relatively limited coverage
as it can be sent to any as the investigator cannot be
place by post. sent to every place.
4. Degree of Less, as all the Relatively good as the
response respondents do not investigator is more focused
respond. and obtain details personality
5.Quality of Not good, as the Relatively better as the
response respondent answers the investigator guides the
questions the way it is respondents in understanding
understood. the questions in right
context.
6.Identity of It is known clearly who It is clearly known, as the
108
respondent answered the enumerator himself elicits

questionnaire and this in the information. So the
turn might affect accuracy of information is
accuracy of information more.
obtained.
7.Time taken for It cannot be established, It is possible to plan the
reply as the respondent may enquiry and depute the
reply at his convenience. investigators accordingly and
collect information within a
targeted time.
8. Personal It is completely absent It is absolutely possible and
contract and that extent there is so the quality of response is
no scope for giving any better. The investigator can
clarification to help the respondent to
respondents understand the questions
clearly
9.Pre – condition The respondent should The literacy status of the
for use be a literate and respondent is not a
cooperative. limitation. The investigator
can explain the question and
obtain the response.
10. Sample It is possible to cover a This is not possible, as the
coverage wide range of sample investigator has to personally
elements, as the contract each respondent.
questionnaire is only
sent by post.
11.Accuracy of It is not likely to be Relatively accuracy is better
information high, as it depends on in this method, as the
the structure of the investigator can determine
questionnaire itself. the accuracy on the field and
109
adopt appropriate method to

ensure accuracy.
12.presention Questionnaire should be No such requirement is a
requirements designed properly and condition.
make attractive to
encourage the
respondents to fill it.
13. Scope for This is not possible as There is a lot scope for the
application other the questionnaire is investigator to apply
methods of data filled by the respondent. observation method or
collection. interview method of data
collection, along with the use
of schedules.
14. Field control This is not possible as There is good scope for
and monitoring the questionnaire is controlling, editing and
filled by the respondent monitoring information on
himself. the field itself.
15. Bias in There is no way to test If the investigator is trained
information the extent of bias of the and experienced, then there
collected information given by the is very little scope for bias in
respondent. information content.
3.5 Projective techniques:
Word or picture association:
 Participants are asked to match images, experiences, emotions, products and

services, even people and places, to whatever is being studied.
“Tell me what you think of when you think of Kellogg’s special K cereal”
Sentence completion:
110
 Participants are asked to complete a sentence.
“Complete the sentence: People who buy over the internet.....................”
Cartoons or empty balloons:
 Participants are asked to write the dialog for a cartoon like picture.
“What will the customer comment when she sees the salesperson approaching
her in the new-car showroom”
Thematic Apperception Test :( TAT)
 Participants are confronted with a picture (usually photograph or drawing) and

asked to describe how the person in the picture feels and thinks.
Component sorts:
 Participants are presented with flash cards containing component features and
asked to create new combinations.
Sensory sorts:
 Participants are presented with scents, textures and sounds usually verbalized on
cards and asked to arrange them by one or more criteria.
Laddering or benefit chain:
 Participants are asked to link functional features to their physical and

psychological benefits, both real and ideal.
Personification:
 Participants are asked to imagine inanimate objects with the traits,

characteristics and features and personalities of humans.
“If brand X were a person, what type of person would brand x be?
111
Semantic mapping:
 Participants are presented with a four quadrant map where different variables
anchor the two different axes; they then spatially place brands, product
components or organisations within four quadrants.
Brand mapping:
 Participants are presented with different brands and asked to talk about their
perceptions, usually in relation to several criteria. They may also be asked to
spatially place each brand on one or more semantic maps.
Ambiguities and paradoxes:
 Participants are asked to imagine a brand as something else (e.g., a Tide dog
food or Marlboro cereal), describing its attributes and position.
Holtzman Inkblot test (HIT):
 This test consists of 45 ink blot cards which are based on colour, movement,
shading and other factors involved in inkblot perception.
 Only one response per card is obtained from the respondent and the responses
of a respondent are interpreted at three levels of form appropriateness.
 Form responses are interpreted for knowing the accuracy or inaccuracy of
respondent’s percepts; shading and colour for ascertaining his affectional and
emotional needs; and movement responses for assessing the dynamic aspects of
his life.
3.6 Sampling:
3.6.1 Some basic terminologies:
112
Population:
 All items that have been chosen to study are called population.
 Example :
 Total number of families living in a city, total number of employees in an
organisation etc.
Sample:
 A portion of population chosen for the direct examination or measurement is

called sample.
 Example:
 A small group of families or a group of workers chosen to calculate the average
income or to identify job satisfaction.
Sample size:
 The number of units in a sample is called sample size.

 Example:
 The number of students, families or electors from whom you obtain the required
information is called the sample size.
Sample frame:
 Sampling frame is the list of elements from which the sample is actually drawn.
Actually sampling frame is nothing but the correct list of population.
 Example:
 Telephone directory
 Yellow pages
Census:
 Census refers to complete inclusion of all the elements in the population.

 It is appropriate if the size of population is small.
113
 Example
 A researcher may be interested in contacting firms in iron and steel or
petroleum products industry. These industries are limited in number, so a census
will be suitable.
 100% enumeration of all elements in the population is called census.
3.6.2 When is sample appropriate?
 When the size of population is large.

 When time and cost are the main considerations in research.
 If the population is homogeneous.
 Also, there are circumstances when a census is not possible. Example:
Reactions to global advertising by a company.
3.6.3 Why sample?
There are several compelling reasons for sampling, including,
 Lower cost
 Greater accuracy of results
 Greater speed of data collection
 Availability of population elements
3.6.4 The different types of sampling.
 The technique of selecting a sample from a population usually depends on the

nature of the data and the type of enquiry. The procedure of sampling may be
broadly classified under the following heads:
 Probability sampling or random sampling
 Non-Probability sampling or non-random sampling
Probability Sampling:
114
 Probability sampling is a method of sampling that ensures that every unit in the
population has a known non –zero chance of being included in the sample.
 The different methods of random sampling are.
Simple random Sampling:
 It is a special case of probability sampling in which every unit in the population

has an equal chance of being included in a sample.
 Sampling may be don e with or without replacement.
 If one wants to select n units from a population of size N without replacement,
then every possible selection of n units must have the same probability. Thus
there are possible ways to pick up n units from the population of size N.
Simple random sampling guarantees that a sample of n units has the same
probability being selected.
 Example:
o A bank wants to study the savings bank account holders perception of the
service quality rendered over a period of one year. The bank has to prepare a
complete list of savings bank account holders, called as sampling frame, say
500. New the process involves selecting a sample of 50 out of 500 and
interviewing them. This could be achieved in many ways. Two common ways
are:
 Lottery method
 Select 50 ships from a box containing well shuffled 500 ships of account
numbers without replacement. This method can be applied when the population
is small enough to handle.
 Random numbers method:
 When the population size is very large, the most practical and inexpensive
method of selecting a sample is by using the random number tables.
115
The procedure for selecting a simple random sample:

Step1: identify by a number all elements or sampling units in
the population
Step2: decide on the sample size(n)
Step3: select (n) using either the fishbowl draw, the table of
random numbers or a computer program
Stratified Random Sampling:
 Stratified random sampling involves dividing the population into a number of

groups called strata in such a manner that the units within a stratum are
homogeneous and the units between the strata are heterogeneous.
 The next step involves selecting a simple random sample of appropriate size
from each stratum.
 Example:
 A marketing manager in a consumer product company wants to study the
customer’s attitude towards a new product in order to the sales. Then three
typical cities that will influence the sales will be considered as three strata. The
customers within a city are similar but between the cities are vastly different.
Selection of the customers for the study from each city has to be a random
sample to draw a meaningful inference on the whole population.
Systematic sampling:
 In this method, the units are selected from the population at a uniform interval.
To facilitate this we arrange the items in numerical, alphabetical, geographical
or any other order. When a complete list of the population is available, this
method is used.
116
 In a systematic sampling, the N items in the population are partitioned into K

groups by dividing the size of the population by the desired sample size n. that
is.
K=N/n.
 Example:
 If you want to take a systematic sample of from the population of
N=600 employees, the population of 600 would be partitioned into 600/40=15
groups. For example, if the first number selected was 005, the next selections
would be 020, 035, 050, 065....
The procedure for selecting a systematic sample:

Step1: Prepare a list of all the elements in the study population (N)
Step2: Decide on the sample size (n)
Step3: determine the width of the interval (k) =total population/sample size
Step4: using the SRS, select an element from the first interval (nth order)
Step5: Select the same order element from each subsequent interval.
Cluster Sampling:
 In cluster sampling, the population is divided into groups or clusters, such that
each cluster is a representative of the population
 If a study has to be done to find out the no. of children that each family in
Chennai has, then the city can be divided into several clusters and a few clusters
can be chosen at random. Every family in the chosen clusters can be a sample
unit.
 While using cluster sampling, the following points should be noted.
117
(i) For getting precise results, clusters should be as small as possible consistent
with the cost and limitations for the survey.
(ii) The no. of units in each cluster must be more or less equal.
Area sampling:
 It is a probability sampling design.

 Cluster sampling within a specified area or region is called Area sampling.
 Example:
 If someone wants to measure the sales of toffee in retail stores, one might
choose a city locality and then audit toffee sales in retail outlets in those
localities.
 You may like to choose shops which sell the brand-Cadbury Dairy milk.
 The main disadvantage of the area sampling is that it is expensive and time
consuming.
Double sampling:
 A probability sampling design that involves the process of collecting

information from a set of subjects twice-such as using a sample to collect
preliminary information, and later using a sub sample of the primary sample for
more information.
 In short, selecting a sample within a sample is called double sampling.
 It is nothing but a procedure for selecting a subsample from a sample.
 Example:
 The management of a newly-opened club solicits ne membership. During the
first rounds, all corporate were sent details so that those who are interested may
enrol. Having enrolled, the second round concentrates on how many are
interested to enrol for various entertainment activities that club offers such as
billiards, indoor sports, swimming and gym. After obtaining this information,
you might stratify the interested respondents. This will also tell you the reaction
118
of new members to various activities. This technique is considered to be

scientific, since there is no possibility of ignoring the characteristics of the
universe.
II) Non-Probability Sampling:
 In non-probability sampling, the selection of the sample units does not ensure a
known chance to the units being selected. In other words, the units are selected
without using the principle of probability. It is suitable for pilot studies and
exploratory research.
 The methods of non-random sampling are,
Purposive Sampling/Judgement sampling:
 In this sampling, the sample is selected with definite purpose in view and the
choice of the sampling units depends entirely on the discretion and the
judgments of the investigator.
 Example:
 If an investigator wants to give the picture that the standard of living has
increased in the city of Madurai, he may take the individual in the sample from
the posh localities and ignore the localities where low and middle income group
families live.
Quota Sampling:
 This is a restricted type of purposive sampling. This consists in specifying

quotas of the samples to be drawn from different groups and the drawing the
required samples from these groups by purposive sampling.
 Quota sampling is widely used in opinion and market research surveys.
119
Expert Sampling:
 Expert opinion sampling involves gathering a set of people who have

knowledge and expertise in certain key area that are crucial to decision making.
 The advantage of this sampling is that it acts as a support mechanism for some
of our decisions in situations where virtually no data are available.
 The major disadvantage is that even the experts can have prejudices, likes and
dislikes that might distort the result
Convenience sampling:
A non probability sampling where researchers use any readily available

individuals as participants is called convenience sampling.
Snowball sampling (or chain referral sampling)
 Snowball sampling is a non-probability sampling technique that is used by

researchers to identify potential subjects in studies where subjects are hard to
locate.
 Researchers use this sampling method if the sample for the study is very rare or
is limited to a very small subgroup of the population. This type of sampling
technique works like chain referral. After observing the initial subject, the
researcher asks for assistance from the subject to help identify people with a
similar trait of interest.
 The process of snowball sampling is much like asking your subjects to nominate
another person with the same trait as your next subject. The researcher then
observes the nominated subjects and continues in the same way until the
obtaining sufficient number of subjects.
 For example, if obtaining subjects for a study that wants to observe a rare
disease, the researcher may opt to use snowball sampling since it will be
120
difficult to obtain subjects. It is also possible that the patients with the same
disease have a support group; being able to observe one of the members as your
initial subject will then lead you to more subjects for the study.
TYPES OF SNOWBALL SAMPLING
 Linear Snowball Sampling
 Exponential Non-Discriminative Snowball Sampling
 Exponential Discriminative Snowball Sampling
Advantages of snowball sampling
 The chain referral process allows the researcher to reach populations that
are difficult to sample when using other sampling methods.
 The process is cheap, simple and cost-efficient.
121
 This sampling technique needs little planning and fewer workforce

compared to other sampling techniques.
Disadvantages of snowball sampling
 The researcher has little control over the sampling method. The subjects
that the researcher can obtain rely mainly on the previous subjects that were
observed.
 Representativeness of the sample is not guaranteed. The researcher has no
idea of the true distribution of the population and of the sample.
 Sampling bias is also a fear of researchers when using this sampling
technique. Initial subjects tend to nominate people that they know well. Because
of this, it is highly possible that the subjects share the same traits and
characteristics, thus, it is possible that the sample that the researcher will obtain
is only a small subgroup of the entire population.
3.6.5 Sample size- determination:
(1) The number of elementary units in a sample is called sample size.

(2) The first factor that must be considered in estimating the sample size, is the
permissible error (E)
(3) Greater the desired precision, larger will be the sample size.
(4) Higher the confidence level in the estimate, the lager the sample must be. There
is a trade off between the degree of confidence and degree of precision with a
sample of fixed size.
(5) The greater the number of sub-groups of interest within the sample, the greater
the size must be.
(6) Cost is a factor that determines the size of the sample.
(7) The issue of response rate: the issue to be considered in deciding the necessary
sample size is the actual number of questionnaires that must be sent out.
122
Calculationwise, we may send questionnaires, to the required number of people,

but we may not receive the response.
Sample size for estimating sample mean:
N=z2σ2 /E2
Sample size for estimating sample parameter:
N=z2.p.q /E2
N= sample size
Z= confidence level
σ = standard deviation
P=sample proportion
q=1-p
E = maximum permissible error/tolerable error
Golden rule:
The greater the sample size, more accurately your findings will reflect the ‘true
picture’
123
124
[Type sidebar content. A sidebar is a standalone

supplement to the main document. It is often aligned
on the left or right of the page, or located at the top or
bottom. Use the Text Box Tools tab to change the
formatting of the sidebar text box.]
125
UNIT 4-DATA PREPARATION AND ANALYSIS
Introduction:
 Data preparation includes editing, coding and data entry and is the
activity that ensures the accuracy of the data and their conversion from raw
form to reduced and classified forms that are more appropriate for analysis.
126
 After collecting the data, the next task of the researcher is to analyze and
interpret the data.
 The purpose of analysis is to draw conclusions.
 There are two parts in processing the data:
 Data analysis
 Interpretation of data.
 Analysis of the data involves organizing the data in a particular manner.
 Interpretation of the data is a method for deriving conclusions from the
data analyzed.
 Analysis of data is not complete, unless it is interpreted.
Steps in processing of DATA:
 Preparing raw data.

 Editing
 Coding
 Tabulation
 Summarizing the data
 Usage of statistical tools.
Preparing raw data:
Raw data can be collected through
 Interviews
 Questionnaires
 Observation
 In-depth interviews
 Focus group interviews
 Secondary sources
Editing:
127
 Editing is nothing but process of ensuring that the data are clean-that is free
from inconsistencies and incompleteness.
 Editing detects errors and omissions, corrects them when possible, and certifies
that maximum data quality standards are achieved.
 The editor’s purpose is to guarantee that data are:
 Accurate
 Consistent with the intent of the question and other information in the survey.
 Uniformly entered
 Arranged to simplify coding and tabulation.
Coding rules:
Four rules guide the pre and post coding and categorization of a data set. They
are
(1) Appropriate to the research problem and purpose

(2) Exhaustive
(3) Mutually exclusive
(4) Derived from one classification principle
Appropriateness:
Appropriateness is determined at two levels:
1) The best partitioning of the data for testing hypotheses and showing
relationships and
2) The availability of comparison of data.
Example:
128
Suppose the researcher is analyzing the inconvenience that a car owner is facing
with his present model. Therefore, the factor chosen for coding may be
inconvenience. Under this there could be 4 types
(1) Inconvenience in entering the backseat.

(2) Due to insufficient legroom
(3) With respect to interior
(4) In door locking and opening the dickey.
Now the researcher may classify these four answers based on internal
inconvenience and other inconveniences referring to the exterior. Each is
assigned a different number for the purpose of codification.
Mutually exclusive and exhaustiveness:
Researchers often add an “other “option to a measurement question because

they know that they cannot anticipate all possible answers.
Example:
 Sometimes the respondents might think that they belong to more than one
category. This is because sales personnel may be doing a sales job and therefore
should be placed under the sales category. Also, he may supervise the work of
other sales executives. In this case he is doing a managerial function. Viewed in
this context, he should be placed under the managerial category, which has a
different code. Therefore, he can only be put under one category, which is to be
decided. One way of deciding this could be to analyze “in which of the two
functions does he spend most time?”
 Another scenario assumes that there is a salesman who is currently employed.
Under the column of occupation, he will tick it as sales, while under the current
employment column, he will mark unemployed. How does one codify this?
Under which category should be placed. One of the solutions is to have a
129
classification, such as employed salesman, unemployed salesman to represent

the two categories.
Tabulation of Data
 The process of placing classified data into tabular form is known as tabulation.
 A table is a symmetric arrangement of statistical data in rows and columns.
 Rows are horizontal arrangements whereas columns are vertical arrangements.
 It may be simple, double or complex depending upon the type of classification.
Types of Tabulation:
Simple Tabulation or One-way Tabulation:

When the data are tabulated to one characteristic, it is said to be simple tabulation or
one-way tabulation.
For Example: Tabulation of data on population of world classified by one
characteristic like Religion is example of simple tabulation.
 There may be two types of univariate tabulation:

(1) Question with only one response.
(2) Multiple responses to question.
(1) Question with only one response:
If the question has only one answer, the tabulation may be of the following.
No Of children Family Percentage

0 10 5
1 30 15
2 70 35
3 60 30
4 20 10
More than 4 10 5
Total 200 100
130
(B) Question with multiple response:
Sometimes, respondents may give more than one answer to a given question. In this
case, there will be an overlap, and responses when tabulated, need not add to 100
percent.
Example:
What do you dislike about the car which you own at present?
Parameter No of respondents
Engine 10
Body 15
Mileage 15
Interior 06
Colour 18
Maintenance frequency 16
Inconvenience 20
Total 100
(2) Double Tabulation or Two-way Tabulation:

When the data are tabulated according to two characteristics at a time. It is said
to be double tabulation or two-way tabulation.
For Example: Tabulation of data on population of world classified by two
characteristics like Religion and Sex is example of double tabulation.
Example:
Popularity of a healthy drink among families having different incomes. Suppose 500
families are contacted and data collected is as follows:
Income per No of children per family No

month of
131
families
0 1 2 3 4 5 More than 5
<1000 5 0 8 9 11 15 25 73
1001-2000 10 5 8 10 13 18 27 91
2001-3000 20 10 12 14 20 22 32 130
3001-4000 12 3 6 7 13 20 30 91
4001-5000 6 2 6 5 10 15 20 64
>5000 6 1 4 5 7 10 18 51
59 21 44 50 74 100 152 500
Note: The above table shows that consumption of a health drink not only depends on
income but also on the number of children per family.
(3) Complex Tabulation:

When the data are tabulated according to many characteristics, it is said to be
complex tabulation.
For Example: Tabulation of data on population of world classified by two
characteristics like Religion, Sex and Literacy etc…is example of complex tabulation.
Construction of Statistical Table
A statistical table has at least four major parts and some other minor parts.
(1) The Title
(2) The Box Head (column captions)
(3) The Stub (row captions)
(4) The Body
(5) Prefatory Notes
(6) Foots Notes
(7) Source Notes
The general sketch of table indicating its necessary parts is shown below:
---THE TITLE----
----Prefatory Notes----
132
----Box Head----

----Row
----Column Captions----
Captions----

----Stub
----The Body----
Entries----
Foot Notes…
Source Notes…
(1) The Title:

A title is the main heading written in capital shown at the top of the table. It
must explain the contents of the table and throw light on the table as whole different
parts of the heading can be separated by commas there are no full stop be used in the
little.
(2) The Box Head (column captions):

The vertical heading and subheading of the column are called columns captions.
The spaces were these column headings are written is called box head. Only the first
letter of the box head is in capital letters and the remaining words must be written in
small letters.
(3) The Stub (row captions):

The horizontal headings and sub heading of the row are called row captions and
the space where these rows headings are written is called stub.
(4) The Body:

It is the main part of the table which contains the numerical information
133
classified with respect to row and column captions.
(5) Prefatory Notes :

A statement given below the title and enclosed in brackets usually describe the
units of measurement is called prefatory notes.
(6) Foot Notes:

It appears immediately below the body of the table providing the further
additional explanation.
(7) Source Notes:

The source notes is given at the end of the table indicating the source from
when information has been taken. It includes the information about compiling agency,
publication etc…
General Rules of Tabulation:
 A table should be simple and attractive. There should be no need of further

explanations (details).
 Proper and clear headings for columns and rows should be needed.
 Suitable approximation may be adopted and figures may be rounded off.
 The unit of measurement should be well defined.
 If the observations are large in number they can be broken into two or three
tables.
 Thick lines should be used to separate the data under big classes and thin lines to
separate the sub classes of data.
Summarizing the data:
134
Before taking up summarizing, the data should be classified into (1) relevant
data and (2) Irrelevant data. During the field study, the researcher collects lot of
data which he may think would be of use. Summarizing the data includes:
(1) Classification of data.

(2) Frequency distribution.
(3) Use of appropriate statistical tool.
Classification of data:
 The process of arranging data into homogenous group or classes according to

some common characteristics present in the data is called classification.
For Example: The process of sorting letters in a post office, the letters are
classified according to the cities and further arranged according to streets.
Bases of Classification:
There are four important bases of classification:
(1) Qualitative Base
(2) Quantitative Base
(3) Geographical Base
(4) Chronological or Temporal Base
(1) Qualitative Base:

When the data are classified according to some quality or attributes such
as sex, religion, literacy, intelligence etc…
(2) Quantitative Base:

When the data are classified by quantitative characteristics like heights,
weights, ages, income etc…
135
(3) Geographical Base:

When the data are classified by geographical regions or location, like
states, provinces, cities, countries etc…
(4) Chronological or Temporal Base:
When the data are classified or arranged by their time of occurrence, such
as years, months, weeks, days etc… For Example: Time series data.
Types of Classification:
 (1) One -way Classification:

If we classify observed data keeping in view single characteristic, this
type of classification is known as one-way classification.
For Example: The population of world may be classified by religion as
Muslim, Christians etc…
 (2) Two -way Classification:
If we consider two characteristics at a time in order to classify the
observed data then we are doing two way classifications.
For Example: The population of world may be classified by Religion and Sex.
 (3) Multi -way Classification:
We may consider more than two characteristics at a time to classify given
data or observed data. In this way we deal in multi-way classification.
For Example: The population of world may be classified by Religion, Sex and
Literacy.
Difference between Classification and Tabulation
(1) First the data are classified and then they are presented in tables, the classification
and tabulation in fact goes together. So classification is the basis for tabulation.
(2) Tabulation is a mechanical function of classification because in tabulation

classified data are placed in row and columns.
(3) Classification is a process of statistical analysis where as tabulation is a process of
136
presenting the data in suitable form.
Frequency Distribution
 A frequency distribution is a tabular arrangement of data into classes according to the

size or magnitude along with corresponding class frequencies (the number of values fall
in each class).
Ungrouped Data or Raw Data:

Data which have not been arranged in a systemic order is called ungrouped or raw
data.
Grouped Data:
Data presented in the form of frequency distribution is called grouped data.
Array:
The numerical raw data is arranged in ascending or descending order is called an
array.
Example:
Array the following data in ascending or descending order 6, 4, 13, 7, 10, 16, 19.
Solution:
Array in ascending order is 4, 6, 7, 10, 13, 16, and 19
Array in descending order id 19, 16, 13, 10, 7, 6, and 4
Class Limits:
 The variant values of the classes or groups are called the class limits.
 The smaller value of the class is called lower class limit and larger value of the class is
called upper class limit.
 Class limits are also called inclusive classes.
For Example: Let us take the class 10 – 19, the smaller value 10 is lower class limit and
larger value 19 is called upper class limit.
Class Boundaries:
The true values, which describe the actual class limits of a class, are called class
boundaries.
The smaller true value is called the lower class boundary and the larger true value is
called the upper class boundary of the class.
It is important to note that the upper class boundary of a class coincides with the lower
137
class boundary of the next class.

Class boundaries are also known as exclusive classes.
For Example:
No of Students
Weights in Kg
60 – 65 8
65 – 70 12
70 – 75 5
25
A student whose weights are between 60kg and 64.5kg would be included in the 60 – 65
class. A student whose weight is 65kg would be included in next class 65 – 70.
Open-end Classes:

A class has either no lower class limit or no upper class limit in a frequency table is
called an open-end class. We do not like to use open-end classes in practice, because
they create problems in calculation.
For Example:
No of Persons
Weights (Pounds)
Below – 110 6
110 – 120 12
120 – 130 20
130 – 140 10
140 – Above 2
Class Mark or Mid Point:
The class marks or mid point is the mean of lower and upper class limits or boundaries.
So it divides the class into two equal parts.
It is obtained by dividing the sum of lower and upper class limit or class boundaries of a
138
class by 2.
For Example: The class mark or midpoint of the class 60 – 69 is 60+69/2 = 64.5
Size of Class Interval:
The difference between the upper and lower class boundaries (not between class limits)
of a class or the difference between two successive mid points is called size of class
interval.
Example Construction of Frequency Distribution
Construct a frequency distribution with suitable class interval size of marks obtained
by students of a class are given below:
23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15, 21, 51, 54, 72, 68,
36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 50, 41, 57, 65, 54, 43, 56, 44, 30, 46,
67, 53
Solution:
Arrange the marks in ascending order as
12, 15, 21, 23, 26, 27, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43, 43, 44, 46, 47, 47, 48, 48, 50,
50, 51, 51, 52, 52, 53, 54, 54, 55, 56, 56, 57, 58, 59, 59, 60, 62, 63, 64, 65, 65, 67, 68, 72,
75, 77
Minimum Value = Maximum =
Range = Maximum Value – Minimum Value = =
Number of Classes =
=
=
= = or approximate
Class Interval Size ( ) = = = or
Marks Number of Class Class

Tally Students
Class Boundary Marks
Marks
Limits C.B
C.L
139

Note: For finding the class boundaries, we take half of the difference between lower class
limit of the 2nd class and upper class limit of the 1st class . This value is
subtracted from lower class limit and added in upper class limit to get the required class
boundaries.
Frequency Distribution by Exclusive Method
Class Tally Frequency

Boundary Marks
C.B
140

Frequency Distribution of Discrete Data
Discrete data is generated by counting; each and every observation is exact. When
an observation is repeated. It is counted the number for which the observation is repeated
is called frequency of that observation. The class limits in discrete data are true class
limit; there are no class boundaries in discrete data.
Example:
The following are the number of female employees in different branches of
commercial banks. Make a frequency distribution.
2, 4, 6, 1, 3, 5, 3, 7, 8, 6, 4, 7, 4, 4, 2, 1, 3, 6, 4, 2, 5, 7, 9, 1, 2, 10, 1, 8, 9, 2, 3, 1, 2, 3, 4,
4, 4, 6, 6, 5, 5, 4, 5, 8, 5, 4, 3, 3, 2, 5, 0, 5, 9, 9, 8, 10, 0, 4, 10, 10, 1, 1, 2, 2, 1, 8, 6, 9, 10
Solution:
The involved variable is “the number of female employees” which is a discrete
variable. The largest and smallest values of the given data are 10 and 0 respectively.
Number of Branches
Tally (Frequency)
Employees
Marks
(Classes)
141
Cumulative Frequency Distribution
The total frequency of all classes less than the upper class boundary of a given
class is called the cumulative frequency of that class. “A table showing the cumulative
frequencies is called a cumulative frequency distribution”. There are two types of
cumulative frequency distributions.
Less than cumulative frequency distribution:

It is obtained by adding successively the frequencies of all the previous classes
including the class against which it is written. The cumulate is started from the lowest to
the highest size.
More than cumulative frequency distribution:

It is obtained by finding the cumulate total of frequencies starting from the highest
to the lowest class. The less than cumulative frequency distribution and more than
cumulative frequency distribution for the frequency distribution given below are:
Less than C.F More than C.F

Class Limit C.B Marks C.F Marks C.F
Less than or
more
Less than or
142
more
Less than or
more
Less than or
more
Less than or
more
Less than or
more
Less than or
more

Frequency distribution simply reports the number of responses that each

question receives. FD organizes the data into classes or groups. It shows the
number of data that falls into particular data.
Diagrams and Graphs of Statistical Data
We have discussed the techniques of classification and tabulation that help us in

organizing the collected data in a meaningful fashion. However, this way of
presentation of statistical data does not always prove to be interesting to a layman. Too
many figures are often confusing and fail to convey the massage effectively.
One of the most effective and interesting alternative way in which a statistical data may
be presented is through diagrams and graphs. There are several ways in which statistical
data may be displayed pictorially such as different types of graphs and diagrams. The
commonly used diagrams and graphs to be discussed in subsequent paragraphs are
given as under:
Types of Diagrams/Charts:
 Simple Bar Chart

 Multiple Bar Chart or Cluster Chart
143
 Staked Bar Chart or Sub-Divided Bar Chart or Component Bar Chart

 Simple Component Bar Chart
 Percentage Component Bar Chart
 Sub-Divided Rectangular Bar Chart
 Pie Chart
Types of Diagrams/Charts:
 Histogram
 Frequency Curve and Polygon
 Lorenz Curve
 Historigram
Simple Bar Chart

A simple bar chart is used to represents data involving only one variable classified on
spatial, quantitative or temporal basis.
In simple bar chart, we make bars of equal width but variable length, i.e.
the magnitude of a quantity is represented by the height or length of the bars.
Following steps are undertaken in drawing a simple bar diagram:
 Draw two perpendicular lines one horizontally and the other vertically at an
appropriate place of the paper.
 Take the basis of classification along horizontal line (X-axis) and the observed
variable along vertical line (Y-axis) or vice versa.
 Marks signs of equal breath for each class and leave equal or not less than half
breath in between two classes.
 Finally marks the values of the given variable to prepare required bars.
Example:
Draw simple bar diagram to represent the profits of a bank for 5 years.
144
Years
Profit
(million $)
Simple bar chart showing the profit of a bank for 5 years.
Multiple Bar Charts
By multiple bars diagram two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomena).
The technique of simple bar chart is used to draw this diagram but the difference is that
we use different shades, colors, or dots to distinguish between different phenomena.
We use to draw multiple bar charts if the total of different phenomena is meaningless.
Example:

Draw a multiple bar chart to represent the import and export of Canada (values
in $) for the years 1991 to 1995.
Years Imports Exports
145
Multiple bar chart showing the import and export of Canada from 1991 – 1995.
Component Bar Chart
Sub-divided or component bar chart is used to represent data in which the total
magnitude is divided into different or components.
In this diagram, first we make simple bars for each class taking total magnitude in that
class and then divide these simple bars into parts in the ratio of various components.
This type of diagram shows the variation in different components within each class as
well as between different classes. Sub-divided bar diagram is also known as
component bar chart or staked chart.
Example:
The table below shows the quantity in hundred kgs of Wheat, Barley and Oats
produced on a certain form during the years 1991 to 1994.
Years Wheat Barley Oats
146
Construct a component bar chart to illustrate this data.
Solution:
To make the component bar chart, first of all we have to take year wise total
production.
Years Wheat Barley Oats Total
The required diagram is given below:
Percentage Component Bar Chart
Sub-divided bar chart may be drawn on percentage basis. To draw sub-divided bar
chart on percentage basis, we express each component as the percentage of its respective
total. In drawing percentage bar chart, bars of length equal to 100 for each class are drawn at
first step and sub-divided in the proportion of the percentage of their component in the
second step. The diagram so obtained is called percentage component bar chart or percentage
staked bar chart. This type of chart is useful to make comparison in components holding the
difference of total constant.
Example:
The table below shows the quantity in hundred kgs of Wheat, Barley and Oats
produced on a certain form during the years 1991 to 1994.
Years Wheat Barley Oats
147
Construct a percentage component bar chart to illustrate this data.
Solution:
Necessary computations for the construction of percentage bar chart given below:
Item
cum cum cum cum
Wheat
Barley
Oats
Total
indicates Percentage of each item
Cum indicates the cumulative percentage.
Pie Chart
 Pie chart can used to compare the relation between the whole and its
components.
148
 Pie chart is a circular diagram and the area of the sector of a circle is
used in pie chart.
 Circles are drawn with radii proportional to the square root of the
quantities because the area of a circle is .

 To construct a pie chart (sector diagram), we draw a circle with radius
(square root of the total).
 The total angle of the circle is .

 The angles of each component are calculated by the formula.
Angle of Sector

These angles are made in the circle by mean of a protractor to show
different components. The arrangement of the sectors is usually anti-
clock wise.
Example:

The following table gives the details of monthly budget of a family. Represent these
figures by a suitable diagram.
Family Budget
Item of Expenditure
Food
Clothing
House Rent
Fuel and Lighting
Miscellaneous
Total
Solution:
149
The necessary computations are given below:
Angle of Sector
Family Budget
Items Expenditure $ Angle of Sectors Cumulative Angle
Food
Clothing
House Rent
Fuel and Lighting
Miscellaneous
Total
Usage of statistical tools:
Measures of central tendency:
Some measures of central tendency are,
150
 Mean
 Median
 Mode
 Harmonic mean
 Geometric mean
Mean:
A central tendency measure representing the arithmetic average of set of

observation is called mean.
Mean = sum of the observation / total number of observation
Formula for calculating AM:
The arithmetic means of can be computed by any of the following methods.
Nature of Data
Method’s Name
Ungrouped Data Grouped Data
Direct Method
Indirect or
Short-Cut Method
Method of
Step-Deviation
Where
Indicates values of the variable .
Indicates number of values of .
Indicates frequency of different groups.
Indicates assumed mean.
151
Indicates deviation from i.e,

Step-deviation and Indicates common divisor
Indicates size of class or class interval in case of grouped data.
Summation or addition.
Example (1):
The one-sided train fare of five selected BS students is recorded as follows
, , , and . Calculate arithmetic mean of the following data.
Solution:
Let train fare is indicated by , then
Arithmetic mean of , we decide to use above-mentioned formula.

Form the given data, we have and . Placing these two quantities
in above formula, we get the arithmetic mean for given data.

Example (2):
152
Given the following frequency distribution of first year students of a

particular college.
Age (Years)
Number of Students
Solution:

The given distribution belongs to a grouped data and the variable involved is
ages of first year students. While the number of students Represent frequencies.
Ages (Years) Number of Students
Total
Now we will find the Arithmetic Mean as years.
Example (3):
The following data shows distance covered by persons to perform
their routine jobs.
Distance (Km)
Number of
Persons
153
Solution:
The given distribution belongs to a grouped data and the variable
involved is ages of “distance covered”. While the “number of persons”
Represent frequencies.
Number of
Distance (Km) Persons Mid Points
Total
Now we will find the Arithmetic Mean as Km.
Example (4):
The following data shows distance covered by persons to perform
their routine jobs.
Distance (Km)
Number of
Persons
Calculate Arithmetic Mean by Step-Deviation Method; also explain why it is

better than direct method in this particular case.
Solution:
The given distribution belongs to a grouped data and the variable
involved is ages of “distance covered”. While the “number of persons”
Represent frequencies.
Distance Number of
Covered in Persons Mid Points
(Km)
154
Total
Now we will find the Arithmetic Mean as

Where
, , and
Km
Explanation:
Here from the mid points ( ) it is very much clear that each mid point is
multiple of and there is also a gap of from mid point to mid point i.e.
class size or interval ( ). Keeping in view this, we should prefer to take method
of Step-Deviation instead of Direct Method.
Example (5):
The following frequency distribution showing the marks obtained by
students in statistics at a certain college. Find the arithmetic mean using (1)
Direct Method (2) Short-Cut Method (3) Step-Deviation.
Marks
Frequenc
y
Solution:

Direct Short-Cut Step-Deviation

Method Method Method
Marks
155
Total
(1) Direct Method:
or Marks

(2) Short-Cut Method:
Where
Marks

(3) Step-Deviation Method:
Where

Marks
Weighted Arithmetic Mean

 In calculation of arithmetic mean, the importance of all the items was
considered to be equal.
 However, there may be situations in which all the items under
considerations are not equal importance.
 For example, we want to find average number of marks per subject who
appeared in different subjects like Mathematics, Statistics, Physics and
Biology. These subjects do not have equal importance. If we find
arithmetic mean by giving Mean.
 Thus, arithmetic mean computed by considering relative importance of
each items is called weighted arithmetic mean.
156
 To give due importance to each item under consideration, we assign

number called weight to each item in proportion to its relative
importance.
 Weighted Arithmetic Mean is computed by using following formula:

Where:
Stands for weighted arithmetic mean.

Stands for values of the items and
Stands for weight of the item
Example 1:
A student obtained 40, 50, 60, 80, and 45 marks in the subjects of Math,
Statistics, Physics, Chemistry and Biology respectively. Assuming weights 5, 2,
4, 3, and 1 respectively for the above mentioned subjects. Find Weighted
Arithmetic Mean per subject.
Solution:

Marks Obtained Weight

Subjects
Math
Statistics
Physics
Chemistry
Biology
Total
157
Now we will find weighted arithmetic mean as:
marks/subject.
Merits and Demerits of Arithmetic Mean
Merits:
 It is rigidly defined.
 It is easy to calculate and simple to follow.
 It is based on all the observations.
 It is determined for almost every kind of data.
 It is finite and not indefinite.
 It is readily put to algebraic treatment.
 It is least affected by fluctuations of sampling.
Demerits:
 The arithmetic mean is highly affected by extreme values.

 It cannot average the ratios and percentages properly.
 It is not an appropriate average for highly skewed distributions.
 It cannot be computed accurately if any item is missing.
 The mean sometimes does not coincide with any of the observed value.
Geometric
 It is another measure of central tendency based on mathematical footing like

arithmetic mean.
 Geometric mean can be defined in the following terms:
 “Geometric mean is the nth positive root of the product of “n” positive given
158
values”
 Hence, geometric mean for a value containing values such as
is denoted by of and given as under:
(For Ungrouped Data)

 If we have a series of positive values with repeated values such as
are repeated times respectively then geometric

mean will becomes:
(For Grouped Data)
Where
Example 4:
Find the Geometric Mean of the values 10, 5, 15, 8, 12
Solution:
Here , and

Example 6:
Find the Geometric Mean of the following Data
159
Solution:
We may write it as given below:
Here ,
, , , ,

Using the formula of geometric mean for grouped data, geometric mean in
this case will become:

The method explained above for the calculation of geometric mean is useful
when the numbers of values in given data are small in number and the facility of
electronic calculator is available. When a set of data contains large number of values
then we need an alternative way for computing geometric mean. The modified or
alternative way of computing geometric mean is given as under:
For Grouped Data

For Ungrouped Data
160
Example 7: Find the Geometric Mean of the values 10, 5, 15, 8, 12

Total

Example 8:
Find the Geometric Mean for the following distribution of students’ marks:
Marks
No. of Students
Solution:
No. of Students Mid Points

Marks
Total
161

Harmonic Mean
 Harmonic mean is another measure of central tendency and also based

on mathematic footing like arithmetic mean and geometric mean.
 Like arithmetic mean and geometric mean, harmonic mean is also
useful for quantitative data.
 Harmonic mean is defined in following terms:
Harmonic mean is quotient of “number of the given values” and
“sum of the reciprocals of the given values”.
Harmonic mean in mathematical terms is defined as follows:
For Grouped Data

For Ungrouped Data
Example 9:
Calculate the harmonic mean of the numbers: 13.5, 14.5, 14.8, 15.2 and 16.1
Solution:
The harmonic mean is calculated as
below:
162
Total
Example 10:
Given the following frequency distribution of first year students of a particular
college. Calculate the Harmonic Mean.
Age (Years)
Number of Students
Solution:
The given distribution belongs to a grouped data and the variable involved is
ages of first year students. While the number of students Represent frequencies.
Number of Students
Ages (Years)
Total
Now we will find the Harmonic Mean as
years.
Example 11:
163
Calculate the harmonic mean for the given below:
Marks
Solution:
The necessary calculations are given below:
Marks
Total
Now we will find the Harmonic Mean as

164
Mode:
 The mode is the value that occurs most often.

 In some fields, notably education, sample data are often called scores,
and the sample mode is known as the modal score.
 The mode is a way of capturing important information about a random
variable or a population in a single quantity.
 The mode of a data sample is the element that occurs most often in the
collection.
The following MATLAB code example computes the mode of a sample:
X = sort(x);
indices = find(diff([X; realmax]) > 0); % indices where repeated values
change
[modeL,i] = max (diff([0; indices])); % longest persistence length of
repeated values
mode = X(indices(i));
Examples:
Example 1: The following is the number of

problems that Ms. Matty assigned for
homework on 10 different days. What is
the mode?
8, 11, 9, 14, 9, 15, 18, 6, 9, 10
Solution: Ordering the data from least to greatest,

we get:
6, 8, 9, 9, 9, 10, 11 14, 15, 18
Answer: The mode is 9.
165
Example 2: In a crash test, 11 cars were tested to

determine what impact speed was
required to obtain minimal bumper
damage. Find the mode of the speeds
given in miles per hour below.
24, 15, 18, 20, 18, 22, 24, 26, 18,
26, 24
Solution: Ordering the data from least to greatest,

we get:
15, 18, 18, 18, 20, 22, 24, 24, 24,
26, 26
Answer: Since both 18 and 24 occur three times, the modes are 18
and 24 miles per hour. This data set is bimodal.
Example 3: A marathon race was completed by

5 participants. What is the mode of
these times given in hours?
2.7 hr, 8.3 hr, 3.5 hr, 5.1 hr, 4.9
hr
Solution: Ordering the data from least to

greatest, we get:
2.7, 3.5, 4.9, 5.1, 8.3
Answer: Since each value occurs only once in the data set, there is
no mode for this set of data.
Median
 The Median is the "middle number" (in a sorted list of numbers).
 Half the numbers in the list will be less, and half the numbers will be greater.
How to Fin the Median Value
 To find the Median, place the numbers you are given in value order and find
the middle number.
Examples:
166
 Look at these numbers:
3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
 If we put those numbers in order we have:
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56
 There are fifteen numbers. Our middle number will be the eighth number:
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 40, 56
The median value of this set of numbers is 23.
And you can see that "half the numbers in the list are less, and half the numbers
are greater."
(Note that it didn't matter if we had some numbers the same in the list)
Two Numbers in the Middle
 BUT, if there are an even amount of numbers things are slightly different.
 In that case we need to find the middle pair of numbers, and then find the value
that would be half way between them.
 This is easily done by adding them together and dividing by two.

An example will help:
3, 13, 7, 5, 21, 23, 23, 40, 23, 14, 12, 56, 23, 29
If we put those numbers in order we have:
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, and 56
 There are now fourteen numbers and so we don't have just one middle number,
we have a pair of middle numbers:
3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56
 In this example the middle numbers are 21 and 23.
167
 To find the value half-way between them, add them together and divide by 2:
21 + 23 = 44
44 ÷ 2 = 22
And, so, the Median in this example is 22.
Measures of dispersion:
Type Description Example Result
A modern student of statistics is mainly interested in the study of variability
Arithmetic Total sum divided by quantity of (1+2+2+3+4+7+9) /
and uncertainty. In this section we shall discuss variability and its measures4 and
mean integers 7
uncertainty will be discussed in probability. We live in a changing world. Changes
Middle value that separates the
are Median 1, 2, 2,does
taking place in every sphere of life. A man of statistics 3, 4,not
7, 9show much
3
greater and lesser halves of a data set
interest in those things which are constant. The total area of the earth may not be very
Mode Most frequent number in a data set 1, 2, 2, 3, 4, 7, 9 2
important to a research minded person but the area under different crops, area
covered by forests, area covered by residential and commercial buildings are figures
of great importance because these figures keep on changing form time to time and
from place to place. Very large number of experts is engaged in the study of changing
phenomenon. Experts working in different countries of the world keep a watch on
forces which are responsible for bringing changes in the fields of human interest. The
agricultural, industrial and mineral production and their transportation from one part
to the other parts of the world are the matters of great interest to the economists,
statisticians, and other experts. The changes in human population, the changes in
standard living, and changes in literacy rate and the changes in price attract the
experts to make detailed studies about them and then correlate these changes with the
human life. Thus variability or variation is something connected with human life and
study is very important for mankind.
Dispersion:
The word dispersion has a technical meaning in statistics. The average
measures the centre of the data. It is one aspect observations. Another feature of the
observations is as to how the observations are spread about the centre. The
168
observation may be close to the centre or they may be spread away from the centre. If
the observation are close to the centre (usually the arithmetic mean or median), we
say that dispersion, scatter or variation is small. If the observations are spread away
from the centre, we say dispersion is large. Suppose we have three groups of students
who have obtained the following marks in a test. The arithmetic means of the three
groups are also given below:
Group A: 46, 48, 50, 52, 54
Group B: 30, 40, 50, 60, 70
Group C: 40, 50, 60, 70, 80
In a group A and B arithmetic means are equal i.e. . But in group A the
observations are concentrated on the centre. All students of group A have almost the
same level of performance. We say that there is consistence in the observations in
group A. In group B the mean is 50 but the observations are not closed to the centre.
One observation is as small as 30 and one observation is as large as 70. Thus there is
greater dispersion in group B. In group C the mean is 60 but the spread of the
observations with respect to the centre 60 is the same as the spread of the
observations in group B with respect to their own centre which is 50. Thus in group B
and C the means are different but their dispersion is the same. In group A and C the
means are different and their dispersions are also different. Dispersion is an important
feature of the observations and it is measured with the help of the measures of
dispersion, scatter or variation. The word variability is also used for this idea of
dispersion.
The study of dispersion is very important in statistical data. If in a certain
factory there is consistence in the wages of workers, the workers will be satisfied. But
if some workers have high wages and some have low wages, there will be unrest
169
among the low paid workers and they might go on strikes and arrange
demonstrations. If in a certain country some people are very poor and some are very
high rich, we say there is economic disparity. It means that dispersion is large. The
idea of dispersion is important in the study of wages of workers, prices of
commodities, standard of living of different people, distribution of wealth,
distribution of land among framers and various other fields of life. Some brief
definitions of dispersion are:
 The degree to which numerical data tend to spread about an average value is
called the dispersion or variation of the data.
 Dispersion or variation may be defined as a statistics signifying the extent of
the scatteredness of items around a measure of central tendency.
 Dispersion or variation is the measurement of the scatter of the size of the
items of a series about the average.
For the study of dispersion, we need some measures which show whether the
dispersion is small or large. There are two types of measure of dispersion which are:
(a) Absolute Measure of Dispersion
(b) Relative Measure of Dispersion
Absolute Measures of Dispersion:

These measures give us an idea about the amount of dispersion in a set of
observations. They give the answers in the same units as the units of the original
observations. When the observations are in kilograms, the absolute measure is also in
kilograms. If we have two sets of observations, we cannot always use the absolute
measures to compare their dispersion. We shall explain later as to when the absolute
measures can be used for comparison of dispersion in two or more than two sets of
data. The absolute measures which are commonly used are:
 The Range
170
 The Quartile Deviation

 The Mean Deviation
 The Standard deviation and Variance
Relative Measure of Dispersion:

These measures are calculated for the comparison of dispersion in two or
more than two sets of observations. These measures are free of the units in which the
original data is measured. If the original data is in dollar or kilometres, we do not use
these units with relative measure of dispersion. These measures are a sort of ratio and
are called coefficients. Each absolute measure of dispersion can be converted into its
relative measure. Thus the relative measures of dispersion are:
 Coefficient of Range or Coefficient of Dispersion.

 Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion.
 Coefficient of Mean Deviation or Mean Deviation of Dispersion.
 Coefficient of Standard Deviation or Standard Coefficient of Dispersion.
 Coefficient of Variation (a special case of Standard Coefficient of Dispersion)
Range and Coefficient of Range
The Range:
Range is defined as the difference between the maximum and the minimum
observation of the given data. If denotes the maximum observation denotes the
minimum observation then the range is defined as
Range
In case of grouped data, the range is the difference between the upper
boundary of the highest class and the lower boundary of the lowest class . It is also
calculated by using the difference between the mid points of the highest class and the
171
lowest class. It is the simplest measure of dispersion. It gives a general idea about the
total spread of the observations. It does not enjoy any prominent place in statistical
theory. But it has its application and utility in quality control methods which are used
to maintain the quality of the products produced in factories. The quality of products
is to be kept within certain range of values.
The range is based on the two extreme observations. It gives no weight to the
central values of the data. It is a poor measure of dispersion and does not give a good
picture of the overall spread of the observations with respect to the center of the
observations. Let us consider three groups of the data which have the same range:
Group A: 30, 40, 40, 40, 40, 40, 50

Group B: 30, 30, 30, 40, 50, 50, 50
Group C: 30, 35, 40, 40, 40, 45, 50
In all the three groups the range is 50 – 30 = 20. In group A there is
concentration of observations in the centre. In group B the observations are friendly
with the extreme corner and in group C the observations are almost equally
distributed in the interval from 30 to 50. The range fails to explain these differences
in the three groups of data. This defect in range cannot be removed even if we
calculate the coefficient of range which is a relative measure of dispersion. If we
calculate the range of a sample, we cannot draw any inferences about the range of the
population.
Coefficient of Range:
It is relative measure of dispersion and is based on the value of range. It is
also called range coefficient of dispersion. It is defined as:
Coefficient of Range
The range is standardized by the total
172
Let us take two sets of observations. Set A contains marks of five students in
Mathematics out of 25 marks and group B contains marks of the same student in
English out of 100 marks.
Set A: 10, 15, 18, 20, 20

Set B: 30, 35, 40, 45, 50
The values of range and coefficient of range are calculated as:
Range Coefficient of Range

Set A: (Mathematics)
Set B: (English)
In set A the range is 10 and in set B the range is 20. Apparently it seems as if
there is greater dispersion in set B. But this is not true. The range of 20 in set B is for
large observations and the range of 10 in set A is for small observations. Thus 20 and
10 cannot be compared directly. Their base is not the same. Marks in Mathematics
are out of 25 and marks of English are out of 100. Thus, it makes no sense to compare
10 with 20. When we convert these two values into coefficient of range, we see that
coefficient of range for set A is greater than that of set B. Thus there is greater
dispersion or variation in set A. The marks of students in English are more stable than
their marks in Mathematics.
Example 1:
Following are the wages of 8 workers of a factory. Find the range and the
coefficient of range. Wages in ($) 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.
Solution:
Here Largest value and Smallest Value
Range
173
Example 2:
The following distribution gives the numbers of houses and the number of
persons per house.
Number of
Persons
Number of
Houses
Calculate the range and coefficient of range.
Solution:
Here Largest value and Smallest Value
Range
Example 3:
Find the range of the weight of the students of a university.
Weights (Kg)
Number of Students
Calculate the range and coefficient of range.
Solution:

Weights (Kg) Class Boundaries Mid Value No. of Students
174
Solution:
Method 1:
Here Upper class boundary of the highest class
Lower class boundary of the lowest class
Range Kilogram
Method 2:
Here Mid value of the highest class
Mid value of the lowest class
Range Kilogram
Quartile Deviation and its Coefficient
Quartile Deviation:
It is based on the lower quartile and the upper quartile . The difference
175
is called the inter quartile range. The difference divided by is
called semi-inter-quartile range or the quartile deviation. Thus
Quartile Deviation (Q.D)

The quartile deviation is a slightly better measure of absolute dispersion than the
range. But it ignores the observation on the tails. If we take difference samples from a
population and calculate their quartile deviations, their values are quite likely to be
sufficiently different. This is called sampling fluctuation. It is not a popular measure of
dispersion. The quartile deviation calculated from the sample data does not help us to
draw any conclusion (inference) about the quartile deviation in the population.
Coefficient of Quartile Deviation:

A relative measure of dispersion based on the quartile deviation is called the
coefficient of quartile deviation. It is defined as Coefficient of Quartile Deviation

It is pure number free of any units of measurement. It can be used for comparing the
dispersion in two or more than two sets of data.
Example 1:
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470,
1750, and 1885. Find the quartile deviation and coefficient of quartile deviation.
Solution:
After arranging the observations in ascending order, we get
176
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730,
1750, 1755, 1785, 1880, 1885, 1960.

Coefficient of Quartile Deviation
Example 2:
Calculate the quartile deviation and coefficient of quartile deviation from the
177
data given below:
Maximum Load
Number of Cables
(short-tons)
Solution:
The necessary calculations are given below:
Maximum Load Number of Cables Class Cumulative

(short-tons) Boundaries Frequencies

lies in the class
178

Where , , , and

lies in the class

Where , , , and

Coefficient of Quartile Deviation

Mean Deviation and its Coefficient
The Mean Deviation:

The mean deviation or the average deviation is defined as the mean of the
absolute deviations of observations from some suitable average which may be the
179
arithmetic mean, the median or the mode. The difference ( ) is called

deviation and when we ignore the negative sign, this deviation is written as
and is read as mod deviations. The mean of these mod or absolute

deviations is called the mean deviation or the mean absolute deviation. Thus for sample
data in which the suitable average is the , the mean deviation ( ) is given by the
relation:

For frequency distribution, the mean deviation is given by

When the mean deviation is calculated about the median, the formula becomes

The mean deviation about the mode is

For a population data the mean deviation about the population mean is

The mean deviation is a better measure of absolute dispersion than the range
and the quartile deviation.
180
A drawback in the mean deviation is that we use the absolute deviations
which does not seem logical. The reason for this is that is
always equal to zero. Even if we use median or mode in place of , even then the
summation or will be zero or approximately zero with

the result that the mean deviation would always be either zero or close to zero. Thus
the very definition of the mean deviation is possible only on the absolute deviations.
The mean deviation is based on all the observations, a property which is not
possessed by the range and the quartile deviation. The formula of the mean deviation
gives a mathematical impression that is a better way of measuring the variation in the
data. Any suitable average among the mean, median or mode can be used in its
calculation but the value of the mean deviation is minimum if the deviations are taken
from the median. A series drawback of the mean deviation is that it cannot be used in
statistical inference.
Coefficient of the Mean Deviation:

A relative measure of dispersion based on the mean deviation is called the
coefficient of the mean deviation or the coefficient of dispersion. It is defined as the
ratio of the mean deviation to the average used in the calculation of the mean deviation.
Thus

181
Example 1:
Calculate the mean deviation form (1) arithmetic mean (2) median (3) mode in
respect of the marks obtained by nine students gives below and show that the mean
deviation from median is minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7
Solution:
After arranging the observations in ascending order, we get
Marks: 4, 7, 7, 7, 9, 9, 10, 12, 15

(Since is repeated maximum number of times)
Marks
Total
182

From the above calculations, it is clear that the mean deviation from the median
hast the least value.
Example 2:
Calculate the mean deviation from mean and its coefficients from the following
data.
Size of Items
Frequency
Solution:
The necessary calculation is given below:
Size of
Items
183
Total

Standard Deviation
The standard deviation is defined as the positive square root of the mean of
the square deviations taken from arithmetic mean of the data.
For the sample data the standard deviation is denoted by and is defined as:

For a population data the standard deviation is denoted by (sigma) and is
defined as:

For frequency distribution the formulas becomes
or
The standard deviation is in the same units as the units of the original
observations. If the original observations are in grams, the value of the standard
deviation will also be in grams.
184
The standard deviation plays a dominating role for the study of variation in
the data. It is a very widely used measure of dispersion. It stands like a tower among
measure of dispersion. As far as the important statistical tools are concerned, the first
important tool is the mean and the second important tool is the standard deviation
. It is based on all the observations and is subject to mathematical treatment. It is of
great importance for the analysis of data and for the various statistical inferences.
However some alternative methods are also available to compute standard
deviation. The alternative methods simplify the computation. Moreover in discussing
these methods we will confirm ourselves only to sample data because sample data
rather than whole population confront mostly a statistician.
Actual Mean Method:

In applying this method first of all we compute arithmetic mean of the given
data either ungroup or grouped data. Then take the deviation from the actual mean.
This method is already defined above. The following formulas are applied:

For Ungrouped Data For Grouped Data
This method is also known as direct method
Assumed Mean Method:

(a) We use following formulas to calculate standard deviation:

185
Where and is any assumed mean other than zero. This method is also
known as short-cut method.
(b) If is considered to be zero then the above formulas are reduced to the
following formulas:
For Grouped Data

For Ungrouped Data
(c) If we are in a position to simplify the calculation by taking some common
factor or divisor from the given data the formulas for computing standard deviation
are:
Where , Class Interval and Common Divisor. This method

is also called method of step-deviation.
Examples of Standard Deviation

Example 1:
Calculate the standard deviation for the following sample data using all methods: 2,
186
4, 8, 6, 10, and 12.

Solution:
Method-I: Actual Mean Method

Method-II: Taking Assumed Mean as

Total
187
Method-III: Taking Assume Mean as Zero
Method-IV: Taking as common divisor or factor
Total
Example 2:
188
Calculate standard deviation from the following distribution of marks by using all the
methods.
Marks No. of Students
Solution:
Method-I: Actual Mean Method
Marks
Total

Marks
Method-II: Taking assumed mean as
Marks
189
Total

Marks
Method-III: Using assumed mean as Zero
Marks
Total

Marks
Method-IV: By taking as the common divisor
Marks
Total

Mark
190
Coefficient of Standard Deviation and Variation
Coefficient of Standard Deviation:

The standard deviation is the absolute measure of dispersion. Its relative
measure is called standard coefficient of dispersion or coefficient of standard
deviation. It is defined as:
Coefficient of Standard Deviation
Coefficient of Variation:
The most important of all the relative measure of dispersion is the coefficient of
variation. This word is variation not variance. There is no such thing as coefficient of
variance. The coefficient of variation is defined as:
Coefficient of Variation

Thus is the value of when is assumed equal to 100. It is a pure

number and the unit of observations is not mentioned with its value. It is written in
percentage form like 20% or 25%. When its value is 20%, it means that when the
mean of the observations is assumed equal to 100, their standard deviation will be 20.
The is used to compare the dispersion in different sets of data particularly the data
which differ in their means or differ in the units of measurement. The wages of
workers may be in dollars and the consumption of meat in their families may be in
kilograms. The standard deviation of wages in dollars cannot be compared with the
standard deviation of amounts of meat in kilograms. Both the standard deviations
need to be converted into coefficient of variation for comparison. Suppose the value
of for wages is 10% and the values of for kilograms of meat is 25%. This
means that the wages of workers are consistent. Their wages are close to the overall
191
average of their wages. But the families consume meat in quite different quantities.
Some families use very small quantities of meat and some others use large quantities
of meat. We say that there is greater variation in their consumption of meat. The
observations about the quantity of meat are more dispersed or more variant.
Example 1:
Calculate the coefficient of standard deviation and coefficient of variation for
the following sample data: 2, 4, 8, 6, 10, and 12.
Solution:
192
Example 2:
Calculate coefficient of standard deviation and coefficient of variation from
the following distribution of marks:
Marks No. of Students
Solution:
Marks
Total

Marks
193
Variance:
Variance is another absolute measure of dispersion.

It is defined as the average of the squared difference between each of the
observations in a set of data and the mean.
For a sample data the variance is denoted is denoted by and the population variance is
denoted by (sigma square).
The sample variance has the formula:

Where is sample mean and is the number of observations in the sample.
The population variance is defined as:

Where is the mean of the population and is the number of observations in the data.
It may be remembered that the population variance is usually not calculated.
The sample variance is calculated and if need be, this is used to make inference
about the population variance.
The term is positive, therefore is always positive.

If the original observations are in centimeter, the value of the variance will be
(centimeter)2.
Thus the unit of is the square of the units of the original measurement.
194
For a frequency distribution the sample variance is defined as:

For s frequency distribution the population variance is defined as:

In simple words we can say that variance is the square of standard deviation.

Example 1:
Calculate the variance for the following sample data: 2, 4, 8, 6, 10, and 12.
Solution:
195
Example 2:
Calculate variance from the following distribution of marks:
No. of Students
Marks
Solution:
Marks
Total

196
Introduction to Regression and Correlation

Statistical methods discussed so far are used to analyze the data involving only
one variable. Often an analysis of data concerning two or more variables is needed to
look for any statistical relationship or association between them. Few instances where
the knowledge of an association or relationship between two variables would vital to
make decision are:
Family income and expenditure on luxury items.

Sales revenue and expenses incurred on advertising.
Yield of a crop and quantity of fertilizer applied.
Following aspects are considered in examining the statistical relationship between two
or more variables.
Is there an association between two or more variables? If yes, what is the form
and degree of that relationship?
Is the relationship strong or significant enough to arrive at a desirable
conclusion?
Can the relationship be used for predictive purpose, that is, to predict the most
likely value of a dependent variable corresponding to the given value of independent
variable or variables?
There are two different techniques which are used for the study of two or more than
two variables. These are regression and correlation. Both study the behaviour of the
variables but they differ in their end results. Regression studies the relationship where
dependence is necessarily involved. One variable has the dependence on a certain
197
number of variables. Regression can be used for predicting the values of the variable
which depends upon other variables. The term regression was introduced by the
English biometrician, Sir Francis Galton (1822 - 1911). Correlation attempts to study
the strength of the mutual relationship between two variables. In correlation we assume
that the variables are random and dependence of any nature is not involved.
Linear Model
Regression involves the study of equations. First we talk about some simple
equations or linear models. The simplest mathematical model or equation is the
equation of straight line.
Example: Suppose a shop keeper is selling pencils. He sells one pencil for 2 cents.
Table as shown gives the number of pencils sold and the sale price of the pencils.
Number of pencils sold

Sales Prices (Cents)

Let us examine the two variables given in table. For the sake of our
convenience, we can give some names to the variables given in the table. Let denote
the number of pencils sold and ( for sale) denote the amount realized by selling
pencils. Thus,
The information written above can be presented in some other forms as well.
For example we can write an equation describing the above relation between and .
It is very simple to write the equation. The algebraic equation connecting and is.
.
It is called mathematical equation or mathematical model in which depends
198
upon . Here is called independent variable and is called dependent variable.
Cent . Neither less than nor more than . The above model is called deterministic
mathematical model because we can determine the value of without any error by
putting the value of in the equation. The sale is said to be function of . This
statement in symbolic form is written as: .

It is read as “ is function of ”. It means that depends upon and only
and no other element. The data in the table can be presented in the form of a graph as
shown in the figure.
The main features of the graph in the figure are:
 The graph lies in the first quadrant because all the values of and are
positive.
 It is an exact straight line. But all graphs are not in the form of a straight line. It
could be some curve also.
 All the points (pair of and ) lies on the straight line.
 The line passes through the origin.
 Take any point on the line and draw a perpendicular line which joins
199
with the X-axis. Let us find the ratio . Here units and units.
Thus units.
 It is called the slope of the line and in general it is denoted by “ ”. The slope of
the line is the same at all points on the line. The slope “ ” is equal to the change
in for a unit change in . The relation is also called linear equation
between and
Example: Suppose a carpenter wants to make some wooden toys for the small
children. He has purchased some wood and some other material for $ . The cost of
making each toy is $ . Table gives the information about the number of toys made
and cost of the toys.

Number of Toys
Cost of Toys
Let denote the number of toys and denote the cost of the toys. What is the
algebraic relation between and . When , . This is called fixed or
starting cost and it may be denoted by “ ”. For each additional toy, the cost is $ .
Thus and are connected through the following equation:

It is called equation of straight line. It is also mathematical model of deterministic

nature. Let us make the graph of the data in given table. Figure as shown is the graph
of the data in table.
200
Let us note some important features of the graph obtained in figure.
1. The line does not pass through the origin. It passes through the point on
Y-axis. The distance between and the origin is called the intercept and is usually
denoted by “ ”.
2. Take any point on the line and complete a triangle as shown in the figure.
Let us find the ratio between the perpendicular and the base of this triangle.
The ratio is, units.
This ratio is denoted by “ ” in the equation of straight line. Thus the equation of
straight line has the intercept and slope . In general, when the
values of intercept and slope are not known, we write the equation of straight line as
. It is also called linear equation between and , and the relation between
and is called linear. The equation may also be called exact linear
model between and or simply linear model between and . The value of can
be determined completely when is given. The relation is therefore,

201
called the deterministic linear model between and . In statistics, when we shall use
the term “Linear Model”, we shall not mean a mathematical model as described
above.
Non Linear Model
Let us consider an equation
By putting the values of in this equation; we find the values of
as given in the table below. The first and second differences are calculated in that
given table.
First differences Second differences
The second differences are exactly constant. The general quadratic equation or
non linear model is written as

It is also called second degree parabola or second degree curve. The graph of
the data is shown in the figure given below:
202

This figure is not a straight line. It is a curve or we say that the model
in non-linear.
 The readers are advised to remember that if in a certain observed data, the
second differences are constant or almost constants, we find the second degree
curve close to the observed data.
 We shall face this type of situation in time series.
Scatter Diagram
 Scatter diagram is a graphic picture of the sample data.

 Suppose a random sample of n pairs of observations has the values
.
 These points are plotted on a rectangular co-ordinate system taking
independent variable on X-axis and the dependent variable on Y-axis.
 Whatever be the name of the independent variable, it is to be taken on X-
axis.
 Suppose the plotted points are as shown in figure (a).
 Such a diagram is called scatter diagram. In this figure, we see that when
203
X has a small value, Y is also small and when X takes a large value, Y also
takes a large value.
 This is called direct or positive relationship between X and Y.
 The plotted points cluster around a straight line.
 It appears that if a straight line is drawn passing through the points, the
line will be a good approximation for representing the original data.
 Suppose we draw a line AB to represent the scattered points.
 The line AB rises from left to the right and has positive slope.
 This line can be used to establish an approximate relation between the
random variable Y and the independent variable X.
 It is nonmathematical method in the sense that different persons may
draw different lines.
 This line is called the regression line obtained by inspection or judgment.
 Making a scatter diagram and drawing a line or curve is the primary
204
investigation to assess the type of relationship between the variables.

 The knowledge gained from the scatter diagram can be used for further
analysis of the data.
 In most of the cases the diagrams are not as simple as in figure (a).
 There are quite complicated diagrams and it is difficult to choose a
proper mathematical model for representing the original data.
 The scatter diagram gives an indication of the appropriate model which
should be used for further analysis with the help of method of least
squares.
 Figure (b) shows that the points in the scatter diagram are falling from the
top left corner to the right.
 This is a relation called inverse or indirect. The points are in the
neighbourhood of a certain line called the regression line.
 As long as the scattered points show closeness to a straight line of some
direction, we draw a straight line to represent the sample data.
 But when the points do not lie around a straight line, we do not draw the
regression line.
 Figure (c) shows that the plotted points have a tendency to fall from left
to right in the form of a curve.
 This is a relation called non-linear or curvilinear. Figure (d) shows the
points which apparently do not follow any pattern.
 If X takes a small value, Y may take a small or large value. There seems to
be no sympathy between X and Y. Such a diagram suggests that there is
no relationship between the two variables.
Correlation
205
 Correlation is a technique which measures the strength of association between

two variables.
 Both the variables X and Y may be random or may be that one variable is
independent (non-random) and the other to be correlated are dependent.
 When the changes in one variable appear to be linked with the changes in the
other variable, the two variables are said to be correlated.
 When the two variables are meaningfully related and both increase or both
decrease simultaneously, then the correlation is termed as positive.
 If increase in any one variable is associated with decrease in the other variable,
the correlation is termed as negative or inverse.
 Suppose marks in Mathematics are denoted by X and marks are Statistics are
denoted by Y.
 If small values of X appear with small values of Y and large values of X come
with large values of Y, then correlation is said to be positive.
 If X stands for marks in English and Y stands for marks in Mathematics, it is
possible that small values of X appear with large values of Y.
 It is a case of negative correlation.
Linear and Non Linear Correlation

 Linear Correlation:
Correlation is said to be linear if the ratio of change is constant. The amount of
output in a factory is doubled by doubling the number of workers is the example of
linear correlation.
In other words it can be defined as if all the points on the scatter diagram tends
to lie near a line which are look like a straight line, the correlation is said to be linear,
as shown in the figure.
Non Linear (Curvilinear) Correlation:

Correlation is said to be non linear if the ratio of change is not constant. In
206
other words it can be defined as if all the points on the scatter diagram tends to lie near
a smooth curve, the correlation is said to be non linear (curvilinear), as shown in the
figure.
Positive and Negative Correlation
Positive Correlation:
The correlation in the same direction is called positive correlation. If one
variable increase other is also increase and one variable decrease other is also
decrease. For example, the length of an iron bar will increase as the temperature
increases.
Negative Correlation:
The correlation in opposite direction is called negative correlation, if one
variable is increase other is decrease and vice versa, for example, the volume of gas
will decrease as the pressure increase or the demand of a particular commodity is
increase as price of such commodity is decrease.
No Correlation or Zero Correlation:

207
If there is no relationship between the two variables such that the value of one
variable change and the other variable remain constant is called no or zero correlation.
Perfect Correlation
If there is any change in the value of one variable, the value of the others variable
is changed in a fixed proportion, the correlation between them is said to be perfect
correlation. It is indicated numerically as +1 and -1.
Perfect Positive Correlation:

If the values of both the variables are move in same direction with fixed
proportion is called perfect positive correlation. It is indicated numerically as +1.
Perfect Negative Correlation:

If the values of both the variables are move in opposite direction with fixed
proportion is called perfect negative correlation. It is indicated numerically as -1.
Coefficient of Correlation
 The degree or level of correlation is measured with the help of correlation
208
coefficient or coefficient of correlation.

 For population data, the correlation coefficient is denoted by .
 The joint variation of X and Y is measured by the covariance of X and Y.
 The covariance of X and Y denoted by Cov(X, Y) is defined as:


The Cov(X, Y) may be positive, negative or zero.
 The covariance has the same units in which X and Y are measured.
 When Cov(X, Y) is divided by and , we get the correlation coefficient

.
 Thus , is free of the units of measurement.
 It is a pure number and lies between -1 and +1. If , it is called perfect

correlation.
 If , it is called perfect negative correlation.

 If there is no correlation between X and Y, then X and Y are independent and
.
 For sample data the correlation coefficient denoted by “r” is a measure of
strength of the linear relation between X and Y variables where “r” is a pure
number and lies between
 -1 and +1.
 On the other hand Karl Pearson’s coefficient of correlation is:

209
Examples of Correlation
Examples 1:
Calculate and analyze the correlation coefficient between the number of study
hours and the number of sleeping hours of different students.
2 4 6 8 10
Number of Study hours
Number of sleeping hours 10 9 8 7 6
Solution:

The necessary calculation is given below:
X Y
2 10 -4 +2 -8 16 4
4 9 -2 +1 -2 4 1
6 8 0 0 0 0 0
8 7 +2 -1 -2 4 1
10 6 +4 -2 -8 16 1
And

There is perfect negative correlation between the number of study hours and the
210
number of sleeping hours.
Example 2:
From the following data, compute the coefficient of correlation between X and Y:
X Series Y Series

Number of Items 15 15
Arithmetic Mean 25 18
Sum of Square
136 138
Deviations
Summation of products of deviations of X and Y series from their arithmetic means = 122.
Solution:
Here
and hence

Curve Fitting and Method of Least Squares
Curve Fitting:
Curve fitting is a process of introduction mathematical relationship between
dependent and independent variables in the form of an equation for a given set of
211
data.
Method of Least Square:

The method of least square helps us to find the values of unknowns ‘
‘b’ in such a way that following two conditions are satisfied.
The sum of residual (deviations) of observed values of Y and corresponding expected
(estimated) values of Y (Y) will be zero. .

The sum of squares of residual (deviations) of observed values of
corresponding expected values ( ) should be least is least.
Fitting of a Straight Line:

A straight line can be fitted to the given data by the method of least square.
The equation of straight line or least square line is , where ‘

constants or unknowns.
To compute the values of these constant we need as many equations as the
number of constants in the equation, these equations are called normal equations. In
straight line there are two constant ‘a’ and ‘b’ so we required two normal equations.
Normal Equation for ‘a’
Normal Equation for ‘b’

Direct formula of finding ‘a’ and ‘b’ is written as

212
Example Method of Least Squares
The given example explains you that how to find the equation of straight line or
least square line by using the method of least square, which is very useful in statistics as
well as in mathematics.
Example:
Fit a least square line to the following data. Also find trend values and show that
X 1 2 3 4 5
Y 2 5 3 8 7
Solution:
1 2 2 1 2.4 -0.4
2 5 10 4 3.7 +1.3
3 3 9 9 5.0 -2
4 8 32 16 6.3 1.7
5 7 35 25 7.6 -0.6
Trend Values
The equation of least square line
Normal Equation for ‘a’
Normal Equation for ‘b’
Eliminate ‘a’ from equation (1) and (2), multiply equation (2) by 3 and subtract form
equation (2), we get the values of ‘a’ and ‘b’.
213
Here and , the equation of least square line becomes

For the trends values, put the values of ‘X’ in above equation; see the above table column
4.
Linear Regression
Regression:
The word regression was used by Frances Galton in 1985. It is defined as
“The dependence of one variable upon other variable”. For example, a weight
depends upon the heights. The yield of wheat depends upon the amount of fertilizer.
In regression we can estimate the unknown values of one (dependent) variable from
known values of the other (independent) variable.
Linear Regression:
When the dependence of the variable is represented by a straight line then it is
called linear regression, otherwise it is said to be non linear or curvilinear regression.
For Example, if ‘X’ is dependent variable and ‘Y’ is dependent variable, then the
relation Y = a + bX is linear regression.
Regression Line of Y on X:
Regression lines study the average relationship between two variables. In
regression line Y on X, we estimate the average value of Y for a given value of X.
Y = a + bX
Where Y is dependent and X is independent variable. Alternate form of
regression line Y on X is:

214
Regression Line of X on Y:

In regression line X on Y we estimate the average value of X for a given value of
Y.
X = C + dY or . Where X is dependent and Y is independent

variable. Alternate form of regression line X on Y is:

Qualitative Data:
Qualitative data is information gathered in a nonnumeric form. Common

examples of such data are:
 Interview transcript
 Field notes (notes taken in the field being studied)
 Video
 Audio recordings
 Images
 Documents (reports, meeting minutes, e-mails)
Such data usually involve people and their activities, signs, symbols, artefacts
and other objects they imbue with meaning. The most common forms of
qualitative data are what people have said or done.
What is Qualitative Data Analysis?
215
 Qualitative Data Analysis (QDA) is the range of processes and procedures

whereby we move from the qualitative data that have been collected into some
form of explanation, understanding or interpretation of the people and situations
we are investigating.
 QDA is usually based on an interpretative philosophy.
 The idea is to examine the meaningful and symbolic content of qualitative data.
 For example, by analysing interview data the researcher may be attempting to
identify any or all of:
 Someone's interpretation of the world,
 Why they have that point of view,
 How they came to that view,
 What they have been doing,
 How they conveyed their view of their situation,
 How they identify or classify themselves and others in what they say,
Different types of qualitative data analysis:
 There are many different types of qualitative data analysis.
 The method you use will depend on your research topic, your personal
preferences and the time, equipment and finances available to you.
 Also, qualitative data analysis is a very personal process, with few rigid rules
and procedures.
Formats for analysis
 However, to be able to analyse your data you must first of all produce it in a
format that can be easily analysed.
216
 This might be a transcript from an interview or focus group, a series of written

answers on an open-ended questionnaire, or field notes or memos written by the
researcher.
 It is useful to write memos and notes as soon as you begin to collect data as
these help to focus your mind and alert you to significant points which may be
coming from the data.
 These memos and notes can be analysed along with your transcripts or
questionnaires.
 You can think of the different types of qualitative data analysis as positioned on
a continuum.
 At the one end are the highly qualitative, reflective types of analysis, whereas
on the other end are those which treat the qualitative data in a quantitative way,
by counting and coding data.
 For those at the highly qualitative end of the continuum, data analysis tends to
be an on-going process, taking place throughout the data collection process.
 The researcher thinks about and reflects upon the emerging themes, adapting
and changing the methods if required.
 For example, a researcher might conduct three interviews using an interview

schedule she has developed beforehand.
 However, during the three interviews she finds that the participants are raising
issues that she has not thought about previously.
 So she refines her interview schedule to include these issues for the next few
interviews. This is data analysis.
 She has thought about what has been said, analysed the words and refined her
schedule accordingly.
Thematic analysis
217
 When data is analysed by theme, it is called thematic analysis.
 This type of analysis is highly inductive, that is, the themes emerge from the
data and are not imposed upon it by the researcher.
 In this type of analysis, the data collection and analysis take place
simultaneously.
 Even background reading can form part of the analysis process, especially if it
can help to explain an emerging theme.
 Closely connected to thematic analysis is comparative analysis.
 Using this method, data from different people is compared and contrasted and
the process continues until the researcher is satisfied that no new issues are
arising.
 Comparative and thematic analyses are often used in the same project, with the
researcher moving backwards and forwards between transcripts, memos, notes
and the research literature.
Content analysis
 For those types of analyses at the other end of the qualitative data continuum,
the process is much more mechanical with the analysis being left until the data
has been collected.
 Perhaps the most common method of doing this is to code by content. This is
called content analysis.
 Using this method the researcher systematically works through each transcript
assigning codes, which may be numbers or words, to specific characteristics
within the text.
 The researcher may already have a list of categories or she may read through
each transcript and let the categories emerge from the data.
218
 Some researchers may adopt both approaches.
 This type of analysis can be used for open-ended questions which have been
added to questionnaires in large quantitative surveys, thus enabling the
researcher to quantify the answers.
Discourse analysis
 Falling in the middle of the qualitative analysis continuum is discourse analysis,

which some researchers have named conversational analysis, although others
would argue that the two are quite different.
 These methods look at patterns of speech, such as how people talk about a
particular subject, what metaphors they use, how they take turns in
conversation, and so on.
 These analysts see speech as a performance; it performs an action rather than

describes a specific state of affairs or specific state of mind.
 Much of this analysis is intuitive and reflective, but it may also involve some
form of counting, such as counting instances of turn-taking and their influence
on the conversation and the way in which people speak to others.
Noticing, Collecting and Thinking model
 Seidel (1998) developed a useful model to explain the basic process of

qualitative data analysis. The model consists of 3 parts: Noticing, Collecting,
and Thinking about interesting things. These parts are interlinked and cyclical.
For example while thinking about things you notice further things and collect
them. Seidel likens the process to solving a jigsaw puzzle. Noticing interesting
things in the data and assigning ‘codes’ to them, based on topic or theme,
potentially breaks the data into fragments. Codes which have been applied to
the data then act as sorting and collection devices.
219
Figure The Data Analysis Process (Seidel, 1998)
Multidimensional scaling
 Multidimensional scaling transforms consumer judgments/perceptions of

similarity or preferences in a multidimensional space(usually 2 or 3
dimensions).
 It is useful for designing products and services.
 In fact, MDS is a set of procedures for drawing pictures of data so that the
researcher can:
 Visualize relationships described by the data more clearly.
 Offer clearer explanations of those relationships.
 These techniques help to identify the product attributes that are important to the
customers and to measure their relative importance.
 This scaling is used to describe similarity and preference of brands.
 The respondents were asked to indicate their perception, or similarity between
various objects (products, brands, etc.) and preference among the objects.
 This scaling is also known as perceptual mapping
Uses of MDS:
1) Illustrating market segments based on preference and judgments.
220
2) Determining which products are more competitive.

3) Deriving the criteria used by people while judging objects (products, brands,
advertisements etc.)
There are two ways of collecting the input data to plot perceptual mapping:
1) Non- attribute method

2) Attribute method
Non-attribute method:
Here, the researcher asks the respondents to make a judgment about the objects
directly. In this method, the criteria for comparing the objects are decided by the
respondent himself.
Attribute method:
In this method, instead of respondents selecting the criteria, they were asked to
compare the objects based on the criteria specified by the researcher.
Example 1:
“To determine the perception of a consumer”
Assume there are five insurance companies to be evaluated on two attributes

namely
1) Convenient locality
2) Courteous personal service
Customers’ perceptions regarding the five insurance companies are as follows:
221
Inconvenient
B A
B
Courteous Not courteous
D E
Convenient
A, B, C, D and E are five insurance companies.

According to the map, B & E are dissimilar companies.
C is being located very conveniently.
A is less convenient in location compared to E.
D is a less convenient in location than C
E is less convenient location compared to D
Example 2:
Similar study could be conducted for a group of companies to have an

assessment of the perception of investors about the attitude of companies
towards interest of their shareholders, vis-à-vis interest of their staff.
For example, from the following MDS graph, it is observed that company A is
perceived to be taking more interest in the welfare of the staff than company B.
222
Interest of the shareholders
B
A
Interest of staff
Example 3:
An all-India organization had six zonal offices, each headed by a zonal

manager. The top management of the organization wanted to have a detailed
assessment of all the zonal managers for selecting two of them for higher
positions in the Head office. They approached a consultant for helping them in
the selection. The management indicated that they would like to have
assessment on several parameters associated with the functioning of a zonal
manger. The management also briefed the consultant that they laid great
emphasis on the staff with a view to developing and retaining them.
The consultant collected a lot of relevant data, analyzed it and offered their
recommendations. In one of the presentations, they showed the following
diagram obtained through multi dimensional scaling technique. The diagram
shows the concerns of various zonal managers, indicated by letters A to F,
towards the organization and also towards the staff working under them.
Concern for organization
☻D
223
☻A ☻B
☻E
Concern for staff
☻C ☻F
It is observed that two zonal managers viz.B and E exhibit high concern for
both the organization as well as staff. If these criteria are crucial to the
organization, then these two zonal managers could be the right candidates for
higher positions in the head office.
Multivariate Analysis
Definition:
 Multivariate analysis is defined as “those statistical technique which focus

upon, and bring in bold relief, the structure of simultaneous relationships
among three or more phenomena.”
 Multivariate analysis is largely empirical and deals with the reality.
 They possess the ability to analyze complex data.
 Besides being a tool for analyzing the data, multivariate techniques also helps in
various decision making.
224
 For example , take the case of college entrance exam, wherein a number of tests
are administered to candidates , and the candidates scoring high marks based on
many subjects are administered are admitted.
 This system though apparently fair, may at sometimes be biased in favour of
some subjects with the larger standard deviations.
 If the researcher is interested in making probability statements on the basis of
sampled multiple measurements, then the best strategy of data analysis is to use
some suitable multivariate statistical technique.
 Multivariate techniques may be appropriately used in such situations for
developing norms as to who should be admitted in college.
 The objective underlying multivariate techniques is to represent a collection of
massive data in a simplified way.
 The main contribution of these techniques is in arranging a large amount of
complex information in the real data into a simplified visible form.
Multivariate procedure:
 In selecting a multivariate technique, two aspects should be considered

1) Whether the variables can be grouped as dependent and independent or whether
it is based on the interdependency or dependency based technique.
2) Whether the data is metric or non metric.
 Dependence method:
 A dependence method can be defined as one in which a variable is identified as
the dependent variable to be predicted or explained by other independent
variables.
 In dependence method, multivariate techniques are used to explain or predict
the dependent variable on the basis of two or more independent variables.
 Dependence techniques include
1) Multiple regression analysis
2) Discriminant analysis
225
3) MANOVA
4) Conjoint analysis
 Interdependence method:
 In interdependence method, no single variable or group of variables is defined
as being independent or dependent.
 The multivariate procedure here involves the analysis of all the variables in the
data set simultaneously.
 The goal of interdependence method is to group respondents or objects together.
 The most frequently used methods of interdependence techniques are
1) Cluster analysis
2) Factor analysis
3) Multidimensional scaling
 If the dependent variable is measured nonmetrically, the appropriate methods

are Discriminant and conjoint analysis.
 If the dependent variable is measured metrically, the appropriate methods are
multiple regressions, ANOVA, MANOVA and conjoint analysis.
Factor analysis:
 Factor analysis is a set of techniques which by analyzing correlation between

variables, reduces their number into fewer factors which explain much of
original data, more economically.
 It is a class of procedures primarily used for data reduction and summarization.
 The purpose of factor analysis is to simplify the data
 Each factor will account for one or more component.
 Each factor is a combination of many variables.
Important terminologies:
226
Factor:
 A factor is an underlying dimension that account for several observed variables.

 Factor is a linear combination of data and hence the coordinates of each
observation or variable is measured to obtain the factor loadings.
 There can be one or more factors, depending upon the nature of the study and
the variables involved in it.
Factor loadings:
 Factor loadings are those values which explain how closely the variables are
related to each one of the factors discovered.
 They are also known as factor-variable correlations.
 In fact, factor loadings work as key to understanding what the factors mean.
 It is the absolute size of the loadings that is important in the interpretation of a
factor.
Communality (h2):
 It shows how much of each variable is accounted for by the underlying factor
taken together.
 A high value of communality means that not much of the variable is left over
after whatever the factors represent is taken into consideration.
 It is worked out in respect of each variable a under:
 H2 of the ith variable= ( i th factor loading of factor A)+( i th factor loading of
factor B)+…..
Eigen value (latent root):
 When we take the sum of squared values of factor loadings relating to a factor,
then such sum is referred as Eigen value or latent root.
 Eigen value indicates the relative importance of each factor in accounting for
the particular set of variables being analyzed
227
Factor scores:
 Factor scores represents the degree to which each respondent gets high scores
on the group of items that load high on each factor.
 Factor scores can help explain what the factors mean. With such scores, several
other multivariate analyses can be performed.
Steps in conducting factor analysis:
 The first step involved in conducting factor analysis is to define the problem
and identify the variables involved.
 A correlation matrix is to be constructed and a method of factor analysis to be
performed is to be selected.
 Decision regarding the number of factors to be extracted and the method of
method of factor analysis is made.
 The rotated factors are interpreted. Depending upon the objective the factor
scores are calculated or surrogate variables selected so as to represent the
factors in subsequent multivariate analysis.
 Finally the fit of the factor analysis model is determined.
1) Formulate the problem:
Problem formulation includes several tasks. The objectives of factor analysis
should be identified and the variables to be included in the factor analysis
should be specified based on the past research, theory and judgment of the
researcher. The variables should be appropriately measured in an interval or
ratio scale. An appropriate sample size should be identified. The sample size
should be at least four or five times more than the variables identified for study.
For e.g., if the study includes 20 variables, then the sample size should be a
minimum of 80 or 40. If the sample size is small and the ratio is not maintained,
the results should be interpreted cautiously.
2) Construct the correlation matrix.
228
The variables identified for the study should be correlated in order to conduct
the factor analysis. If the correlation between the variables is small, factor
analysis may not be appropriate. It can also be expected that the variables that
are highly correlated with each other would also highly correlate with the same
factor or factors. Formal statistics are available for testing the appropriateness of
the factor model. Bartlett’s test of sphericity can be used to test the null
hypothesis that the variables are uncorrelated in the null hypothesis cannot be
rejected, and then the appropriateness of factor analysis should be questioned.
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy can also be used.

The index compares the magnitude of the partial correlation coefficients. Small
values of the KMO statistics indicate that the correlation between pairs of
variables cannot be explained by other variables ad that factor analysis may not
be appropriate. Generally a value greater than 0.5 is desirable
3) Identify the method of factor analysis.

After determining the appropriateness of factor analysis for analyzing the data, a
suitable method should be selected. The approach used to derive the weights or
factor scores coefficients differentiates the various method of factor analysis the
two most commonly employed factors analytic procedure are principal
component and common factor analysis. Based on the researcher’s objective
procedure to be used is chosen. Principal component analysis is used when the
objective is to summarize information in a larger set of variables into a fewer
factors. It is recommended if the primary concern is to determine the minimum
number of factors that will account for maximum variance in the data for use in
subsequent multivariate analysis. The factors are called principal components. If
the researcher is attempting to uncover underlying dimension surrounding the
original variables, common factor analysis is used. Principal component
analysis is based on the total information in each variable, whereas common
229
factor analysis is concerned only with the variance shared among all the
variables.
4) Determine the number of factors.

It is possible to compute as many principal components as there are variables,
but it does not serve the purpose of conducting a factor analysis. In order to
summarize the information contained in the original variables, smaller number
of factor should be extracted. The question of how many factors are to be
extracted arises. Several procedures are discussed below for determining the
number of factors.
a) A priori determination
Due to prior knowledge the researcher knows how many factors to extract and
thus can specified the number of factors to be extracted beforehand. The
extraction of factors is completed as soon as the desired number of factors is
extracted.
b) Determination based on Eigen values.
In this approach only factor with Eigen values greater than 1.0 or retained, the
other factors are not included in the modern. An Eigen value represents the
amount of variance associated with the factor. Hence, factors with variance
greater than 1.0 are included. If the number of variables is less than 20, this
approach will result in conservative number of factors.
c) Determination based Scree plot
A scree plot is a plot of the Eigen values against the number of factors in order
of extraction. The shape of plot is used to determine the number of factors. The
230
plot typically as a distinct break between the steep slope of factors with large
Eigen values and gradual trailing off associated with the rest of the factors. The
gradual trailing off is refereed as scree. The point at which the scree begins
denotes the true number of factors.
d) Determination based on the % of variance
In this approach the number of factors to be extracted is determined in such a

way that the cumulative % of variance extracted by the factors reaches a
satisfactory level. Satisfactory level depends upon the problem at hand.
However, it is recommended that the factors extracted should account for at
least 60% of the variance.
e) Determination based on split-half reliability.
231
The sample is split in half and factor analysis is performed on each half. Only
factors with high correspondence of factors loadings across the two subsamples
retained.
f) Determination based on significance tests.
It is possible to determine the statistical significance of the separate Eigen

values are retained only those factors that is statistically significant. A drawback
is that with large samples sizes greater than 200, many factors are likely to be
statistically significant, although many of these may account for only a small
proportion of the total variance.
Methods of factor analysis:
There are two most commonly employed factor analysis procedures. They are
1) Principle component analysis
2) Common factor analysis
Principle component analysis:
When the objective is to summarize information from a large set of variables

into fewer factors, principle component factor analysis is used.
Example:
Purpose: customer feedback about a two-wheeler manufactured by a company.
Method:
The MR Manager prepares a questionnaire to study the customer feedback. The

researcher has identified six variables or factors for this purpose. They are as
follows:
1) Fuel efficiency (A)

2) Durability (life) (B)
232
3) Comfort (C)
4) Spare parts availability (D)
5) Breakdown frequency (E)
6) Price (F)
The questionnaire may be administered to 5000 respondents. The opinion of

the customers is gathered. Let us points 1 to 10 for the variables factors A to F.
1 is the lowest and 10 is the highest. Let us assume that application of factor
analysis has led to grouping the variables as follows:
● ● ●
For future analysis, while
A, B, D, E into factor -1 conducting a
study to F into factor -2 obtain
C into factor-3 customer’s

opinion, three factors
Factor -1 can be termed as technical factor.
mentioned above would
Factor-2 can be termed as price factor.
be sufficient. One
basic Factor-3 can be termed as personal factor. purpose of
233
using factor analysis is to reduce the number of independent variables in the

study.
Common factor analysis:
If the researcher wants to analyze the components of the main factor, common
factor analysis is used.
Example:
Common factor-Inconvenience inside a car. The components may be:
1) Leg room
2) Seat arrangement
3) Entering the rare seat
4) Inadequate dickey space
5) Door locking mechanism
Cluster analysis:
Cluster analysis, also called as classification analysis or numerical taxonomy is

a class of techniques used to classify objects or cases into relatively
homogeneous groups called clusters.
Objects within a cluster are similar and between the clusters are dissimilar.
Like factor analysis cluster analysis examines an entire set of interdependent

relationships.
Cluster analysis is used:
 To classify persons or objects into small number of clusters or groups.

 To identify specific customer segment for the company’s brand
Where can cluster analysis be applied?

234
 The marketing application of cluster analysis is in customer segmentation and

estimation of segment sizes.
 Industries where this technique is useful include automobiles, retail store,
insurance, B-to-B, durables and packaged goods.
 Some of the well-known frameworks in consumer behavoiur (like VALS are
based on value cluster analysis.
 An FMCG company wants to map the profile of its target audience in terms of
lifestyle, attitude and perceptions.
 A consumer durable company wants to know the features and services a
consumer takes into account, when purchasing through catalogues.
 A housing finance corporation wants to identify and cluster the basic
characteristics, lifestyles and mindset of persons who would be availing housing
loans. Clustering can be done based on parameters such as interest rates,
documentation, processing fee, number of installments etc.
Cluster analysis on three dimensions:
 The example below shows cluster analysis based on three dimensions age,
income and family size.
 Cluster analysis is used to segment the car- buying population in a metro.
 For example” A” might represent potential buyers of low end cars. Example:
Maruti 800(for common man). These are the people who are graduating from
the two-wheeler market segment.
 Cluster B may represent mid-population segment buying Zen, Santro, and Alto
etc.
 Cluster C represents car buyers, who belong to upper strata of society. Buyers
of Lancer, Honda city etc.
 Cluster D represents the super rich cluster i.e. Buyers of Benz, BMW etc.
Income
235 B
Age
Family size
Steps involved in clustering procedure:
 The most important aspect in formulating a problem is selecting the variable on

the basis of which clusters is to be formed.
 Including irrelevant variables will affect the clustering solution.
 The variables selected should describe the similarity between objects in terms of
the problem selected.
 The variables should be selected based on past research, theory or in
consideration of the hypothesis being tested.
 In exploratory research, the researcher should act based on judgment and
intuition.
2) Select a distance measure:
236
 The objective of clustering is to group similar objects together.

 For this purpose some measure should be adopted to assess how similar or
different the objects are.
 The most common approach is to measure similarity in terms of distance
between pairs of objects.
 Objects with smaller distances between them are most similar to each other than
those at larger distances.
 The following are some of the methods are available to measure the distance
between objects:
 The Euclidean distance is the most commonly used measure. It is the square
root of the sum of the squared differences in values for each variable.
 The city-block or Manhattan distance measure the distance between two
objects in terms of the sum of the absolute differences in values for each
variable.
 The chebychev distance between two objects id the maximum absolute
difference in values for any variable.
3) Select a cluster procedure:
I) Hierarchical clustering:
It is characterized by development of hierarchy or tree like structure.
237
S
t
p
m
o
C
k
n
i
l
g
a
r
e
v
A K.ANANDAKUMAR/BRM
Clusters are divided until each object is in a separate cluster.

2010
a) Divisive clustering: starts with all the objects grouped in a single cluster.
b) Agglomerative clustering: starts with each object in a separate cluster. Clusters

are formed by grouping objects into bigger and bigger clusters. This process is
continued until all objects are formed into a single cluster.
a) Linkage methods:
The single linkage method:

SCingle
om pl
Av er agel i
link age
l n
i k
et e
nk age
age
238
It is based on minimum distance or the nearest neighbor rule. The first two
objects clustered are those that have the smallest distance between them. The
next shortest distance is identified and either the third object is clustered with
the first two, or a new two-object cluster is formed. At every stage, the distance
between two clusters is the distance between their two closest points as
illustrated below;
Complete linkage method:

It is based on the maximum distance or the further neighbor approach. The
distance between two clusters is calculated as the distance between their two
furthest points.
239
Average linkage method:

In this method, the distance between two clusters is defined as the average of
the distances between all pairs of objects, where one member of the pairs is
from each of the clusters. This method uses information on all pairs of
distances, not merely the minimum or maximum distances. Hence it is
preferable to single and complete linkage method.
b) Variance methods:
The variance method attempts to minimize the within cluster variance. Wards
procedure is a commonly used variance method. For each cluster, the means of
all the variables are computed; subsequently for each object the squared
Euclidean distance to the clusters means is calculated. The distances are
summed for all the objects. At each stage, the two clusters with the smallest
increase in the overall sum of squares within the cluster are combined. This is
illustrated as follows:
240
c) Centroid methods:
In the Centroid mehods, the distance between two clusters is the distance
between their centroids i.e. means of all the variables. Every time objects are
grouped, new Centroid is computed.
The average linkage method and ward’s method perform better than other
procedures.
II) Non-Hierarchical clustering:
241
 The non hierarchical clustering method is also known as K-means clustering.

 This method includes sequential threshold, parallel threshold and optimizing
partitioning.
 In sequential threshold method, a cluster center is selected and all objects
within a prespecified threshold value from the center are grouped together.
Next a new cluster or seed is selected and the process is repeated for the
unclustered points. Once an object is clustered with a seed, it is no longer
considered for clustering with subsequent seeds.
 The parallel threshold method operated similarly, however several clusters are
selected simultaneously and objects within the threshold are grouped with the
nearest center.
 The optimizing partitioning method differs from the other threshold method
i.e., the objects can later be reassigned to clusters to optimize an overall
criterion, such as average within-cluster distance for a given number of clusters.
Non-hierarchical clustering is faster than the hierarchical methods and is

preferable when the number of objects or observation is large. The major
drawback of non-hierarchical procedure is that the number of clusters must be
prespecified and the selection of cluster centres is arbitrary. The clustering
results depend on how the centres are selected.
242
4) Decide on the number of clusters
Some guidelines to make decision regarding the number of clusters are:
Theoretical, conceptual or practical considerations may suggest the number of

clusters.
1. In hierarchical clustering, the distance at which clusters are combined can be

used as criteria. This information can be obtained from the agglomeration
schedule or from the dendrogram.
2. In non-hierarchical clustering, the ratio of total within-group variance to
between group variance can be plotted against the number of clusters. The
point at which a sharp bend occurs indicates an appropriate number of clusters.
Increasing the number of clusters beyond this point will not be useful
The relative sizes of the cluster should be meaningful with each cluster having
more elements. It is not useful to have only one element in a cluster.
5) Interpret and Profile the clusters
Interpreting and profiling clusters involves examining the cluster centroids. The
centroids represent the mean values of the objects contained in the cluster on
each of the variables. The centroids enable us to describe each cluster by
assigning it a name or label. It will be more helpful to profile the clusters in
terms of variables that are not used for clustering. The demographic,
psychographic, product usage, media usage or other variables can be used for
profiling. The variables that significantly differentiate between clusters can be
identified via discriminant analysis and one-way analysis of variance.
6. Assess Reliability and Validity
243
Several decisions are made on the basis of cluster analysis; hence clustering
solutions should not be accepted without assessing the reliability and validity.
The following procedure can be followed to provide adequate checks on the
quality of clustering results.
1. Perform cluster analysis on the same data using different distance measure.
Compare the results across measures to determine the stability of the solutions.
2. Use different methods of clustering and compare the results.
3. Split the data randomly into halves, perform clustering separately on each half
and compare the cluster centroids across the two sub samples.
4. Delete variables randomly. Perform clustering based on the reduced set of
variables. Compare the results with those obtained by clustering based on the
entire set of variables. In non-hierarchical clustering, the solution may depend
on the order of cases in the data set. Multiple runs using different order of cases
can be performed until solutions are stabilized.
244
DISCRIMINANT ANALYSIS
 Discriminant analysis is a dependence multivariate technique.

 The purpose of dependence technique is to predict a variable form a set of
independent variables.
 It is also used for predicting group membership on the basis of two or more
independent variables.
 Discriminant analysis is a technique for analyzing data when the criterion or
dependent variable is categorical and the predicator or independent variables are
interval in nature.
 For e.g. The dependent variable may be the choice of a brand and the
independent variable may be the ratings of attributes of soft drinks on 5 point
Likert scale.
Example
Where discriminant analysis is used
1) Those who buy our brand and those who buy competitor’s brand
2) Good salesman, poor salesman, and medium salesman
3) Those who go to food world to buy and those who buy in a kirana shop.
4) Heavy user, medium user and light user f the product.
Suppose there is a comparison between the groups mentioned above as along

with demographic and socio-economic factors, then discriminant analysis can
be used.
The discriminant analysis model involves the linear combination of the

following form:
D=bo+b1x1+b2x2+b3x3+…………..bkxk
D =Discriminant score
bn =Discriminant coefficients or weights
245
xn=Predictors or independent variable
Basic objectives of Discriminant Analysis:
 To test whether any significant differences exist between the mean values (all
predictor variables taken simultaneously) of two or more a prori defined groups.
 To find the linear combinations of the predictor variables that enables us to
represent the groups by maximizing the ratio of the squared difference between
group means to the variance within the groups
 To establish procedures for assigning new observation to one of the groups,
assuming a priori that they belong to one of the defined groups.
Assumptions underlying Discriminant Analysis:
(1) The objects (elements) of the population belong to two or more mutually
exclusive groups (the elementary units may be people, states or countries, the
economy at different points in time or of different regions, etc.)
(2) The Discriminant function, a mathematical equation, used for the purpose of
classification of the objects is a linear function. These equations combine the
group characteristic in a way that allows one to identify the group with an object
is closely associated.
(3) The Discriminant variables, or the characteristics used to distinguish among the
groups, must be measured at an interval or ratio scale, so that the means and
variances can be calculated and they can be used in the analysis.
(4) It is assumed that each group is drawn from a multivariate normal population.
This allows the precise computation of tests of significance and probabilities of
each group membership.
Application:
246
 A company manufacturing FMCG products introduces a sales contest among its

marketing executives to find out “How many distributors can be roped in to
handle the company’s product”.
 Assume that this contest runs for three months.
 Each marketing executive is given target regarding number of new distributors
and sales they can generate during the period.
 This target is fixed and based on the past sales achieved by them about which,
the data is available in the company.
 It is also announced that marketing executives who add 15 or more distributors
will be given a Maruti Omni van as prize.
 Those who generate between 5 and 10 distributors will be given a two-wheeler
as the prize.
 Those who generate less than 5 distributors will get nothing.
 Now assume that 5 marketing executives won a Maruti van and 4 won a two
wheeler.
 The company wants to find out, “Which activities of the marketing executive
made the difference in terms of winning a prize and not winning the prize£.
 One can proceed in a number of ways.
 The company could compare those who won the Maruti van against the others.
 Alternatively, the company might compare those who won, one of the two
prizes against those who won nothing.
 It might compare each group against each of the other two.
 Discriminant analysis will highlight the difference in activities performed by
each group members to get the prize. The activity might include:
1) More number of calls made to the distributors
2) More personal visits to the distributors with advance appointments.
3) Use of better convincing skills.
247
Steps in conducting Two group Discriminant analysis

 The first step in discriminant analysis is to formulate the problem by identifying
the objectives, the criterion variable and the dependent variables.
 The criterion variables must consist of two or more mutually exclusive and
collectively exhaustive categories.
 When the dependent variable is interval or ratio scaled, it must first be
converted into categories.
 The predictor variable should be selected based on a theoretical model or
previous research or in the case of exploratory research; the experience of the
researcher should guide the selection.
2) Research design issues
Research design for discriminant analysis requires consideration of the

following issues
a. Selection of both dependent and independent variables,
248
b. Deciding the sample size needed for estimation of discriminant function and
c. Division of sample for validation purpose.
a) Selection of dependent and independent variable
 To apply discriminant analysis the researcher should specify the dependent and
the independent variables.
 Dependent variable should be categorical and the independent variables are
metric.
 The number of dependent variables categories can be two or more, but these
groups must be one group.
 The dependent variable in some cases may involve two groups eg., purchasers
and non light users and non users of a product.
 After the decision regarding the dependent variables, the researcher must decide
about the independent variables to be included in the analysis.
 Independent variables can be selected in the following two ways.
1) Identifying the variables from the previous research or from the
theoretical model that is underlying the basis of research question.
2) The second approach is intuition ie utilizing the researchers’
knowledge and intuitively selecting variables for which previous research is not
available.
b) Sample size
 The ratio of sample size to the number of predictor variables should be
considered in discriminant analysis.
 Many studies suggest a ratio of 20 observations for each predictor variable. If
adequate sample is not maintained the results became unstable.
 The minimum size recommended is five observations per independent variable.
 The ratio applies to all variables considered in the analysis, even if all of the
variables considered are not entered into the discriminant function.
249
 In addition to the overall sample size, the researcher must also consider sample
size of each group.
 The smallest group size must exceed the number of independent variables.
 The practical guideline is that each group should have at least 20 observations.
c) Division of sample
 The sample should be divided into two groups called as estimation or analysis
sample and the holdout or validation sample.
 The analysis sample is used for estimation of the discriminant function.
 The hold out or validation sample is reserved for validating the discriminant
function.
 It is essential that each subsample should be of adequate size to support
conclusions from the results.
 If the sample is large enough, it can be split in half. One half serves as the
analysis sample and the other is used for validation. The analysis sample is
used to develop the discriminant function and the validation sample is used to
test the Discriminant function.
 The method of validation the sample is referred to as the split-sample or cross-
validation approach.
 The role of the halves is then the interchanged and the analysis is repeated. This
is called double cross-validation.
 The distributions of the number of cases in the analysis and validation samples
follow the distribution in the total sample.
 For example, if the total sample contains 60 percent users and 40 percent non
users of the product, then the analysis and validation sample would each contain
60 percent users and 40 percent non users.
250
3) Assumption
1) The objects (elements) of the population belong to two or
more mutually exclusive groups (the elementary units may be people, states or
countries, the economy at different points in time or of different regions, etc.)
2) The Discriminant function, a mathematical equation, used
for the purpose of classification of the objects is a linear function. These
equations combine the group characteristic in a way that allows one to identify
the group with an object is closely associated.
3) The Discriminant variables, or the characteristics used to
distinguish among the groups, must be measured at an interval or ratio scale, so
that the means and variances can be calculated and they can be used in the
analysis.
4) It is assumed that each group is drawn from a multivariate
normal population. This allows the precise computation of tests of significance
and probabilities of each group membership.
4) Estimating the Discriminant function:

Two computational methods are used to derive the Discriminant function. They
are
a) Simultaneous / Direct method: The direct method involves estimating the
Discriminant function so that all the predictors are included simultaneously
b) Step-wise method: In this, the independent variables are entered one at a time,
based on their ability to discriminate among groups. The stepwise method is
useful when the researcher wants to consider a relatively large number of
independent variables for inclusion in the function.
5) Assessing overall fit:
Assessing overall fit of the selected Discriminant function involves three tasks:
1) Calculating Discriminant scores for each observation
251
2) Evaluating group differences on the Discriminant Z scores and

3) Assessing group membership prediction accuracy.
6) Interpretation of Discriminant functions:
Interpretation involves examining the Discriminant functions to determine the

relative importance of each independent variable in discriminating between the
groups. Three methods are available to assess the importance of the
discriminating function.
1) The sign and magnitude of the standardized Discriminant weights or

Discriminant coefficient assigned to each variable is taken into consideration. A
small weight may indicate that the corresponding variable is irrelevant in
determining the relationship.
2) Discriminat loadings also referred as structure correlations, measure the simple
linear correlations between each independent variable and the Discriminant
functions.
3) If stepwise method is selected in deriving Discriminant functions , an additional
means of interpreting the relative discriminating power of the independent
variable is available through partial F values. The absolute sizes of the
significant F values are examined and ranked. Large F values indicate greater
discriminatory power.
7) Validation of the discrimination results:

 The final stage in Discriminant analysis involves validating the Discriminant
results to provide assurance that the results have external as well as internal
validity.
252
 The most frequently used procedure to validate the Discriminant function is to

divide the groups randomly into analysis and hold out sample.
 This involves developing a Discriminant function with the analysis sample and
applying the same to the holdout sample.
Conjoint analysis
 Conjoint analysis is concerned with the measurement of the joint effect of two
or more attributes that are important from the customer’s point of view.
 In a situation where the company would like to know the most desirable
attributes or their combination for a new product or service, the use of conjoint
analysis is most appropriate.
 Example:
 An airline would like to know, which is the most desirable combination of
attributes to a frequent traveller: (a) Punctuality (b) Air fare (c) Quality of food
served on the flight and (d) Hospitality and empathy shown.
 Conjoint analysis is a multivariate technique that captures the exact levels of
utility that an individual customer places on various attributes of the product
offering. Conjoint analysis enables a direct comparison,
 Example
 A comparison between the utility of a price level of Rs.400 versus Rs.500, a
delivery period of 1 week versus 2 weeks ,or an after-sales response of 24 hours
versus 48 hours.
 Once we know the utility levels for each attribute (and at individual levels as
well), we can combine these to find the best combination of attributes that gives
the customer the highest utility, the second best combination that gives the
second highest utility, and so on. This information is then used to design a
product or service offering.
Application
253
 Conjoint analysis is extremely versatile and the range of applications includes

virtually in any industry. new product or service design, including the concepts
in the pre-prototyping stage can specifically benefit from the conjoint
application.
Some examples of other areas where this technique can be used are:
 Designing an automobile loan or insurance plan in the insurance industry,

 Designing a complex machine for business customers.
Process
 Design attributes for a product are first indentified. For a shirt manufacturer,
these could be design such a designer shirts Vs plain shirts, this price of Rs 400
versus Rs 800. The outlets can have exclusive distribution or mass distribution.
All possible combinations of these attribute levels are then listed out. Each
design combination will be ranked by customers and used as input data for
conjoint analysis. Then the utility of the products relative to price can be
measured.
 The output is apart-worth or utility for each level of each attribute. For example,
the design may get a utility level of 5 and plain, 7.5. Similarly, the exclusive
distribution may have part utility of 2, and mass distribution, 5.8. We then put
together the part utilities and come up with a total utility for any product
combination we want to offer, and compare that with the maximum utility
combination for this customer segment.
 This process clarifies to the marketer about the product or service regarding the
attributes that they should focus on in the design.
 If a retail store finds that the height of a shelf is an important attribute for selling
at a particular level, a well-designed shelf may result from this knowledge.
Similarly, a designer of clocks will benefit from knowing the utility attached by
customers to the dial size, background colours, and price range of the clocks.
254
Approach
From a discussion with the client, identify the design attributes to be studied and
the levels at which they can be offered. Then build a list of product concepts of
offer. These product concepts are then ranked by customers. Once his data is
available, use conjoint analysis to derive the part utilities of each attribute level.
This is then used to predict the best product design for the given customer
segment. Use the SPSS conjoint procedure to analyse the data.
There are three steps in conjoint analysis:
(a) Identification of relevant products or service attributes.

(b) Collection of data.
(c) Estimation of worth for the attribute chosen.
For attributes selection, the market researcher can conduct interview with the
customers directly.
Example of conjoint analysis for a laptop:
For a laptop, consider 3 attributes:
 Weight (3kg or 5 kg)

 Battery life (2 hours or 4 hours)
 Brand name (Lenovo or dell)
Task: Rank orders the following combination of these characteristics:
1 =most preferred, 8=least preferred
Combination Rank
3kg,2hours,lenovo 4
5kg,4hours,dell 5
5kg,2hours,lenovo 8
3kg,4hours,lenovo 3
3kg,2hours,dell 2
255
5kg,4hous,lenovo 7
5kg,2hous,dell 6
3kg,4hours,dell 1
One combination 3kg, 4 hours, bell clearly dominates and 5kg, 2hours, Lenovo
is least preferred.
Let us now take the average rank for 3kg option =4+3+2+1/4=2.5
For 5kg option average rank is 5+8+7+6/4=6.5
For 4hour option 5+3+7+1/4=4
For 2 hour option 4+8+2+6/4=5
For dell 5+6+1+2/4=3.5
For Lenovo 5.5
Looking at the difference in average ranks, the most important characteristic to

this respondent is weight =4, followed by brand name =2and battery life =1.
Canonical correlation:
 It is defined as a correlation between the linear combination of dependent

variable and linear combination of independent variable.
 This is an appropriate technique when the researcher has two dependent
variables and multiple independent variables.
 The difference between the multiple regression and canonical correlation
analysis is that, in the former linear relationship between a single dependent
variable and multiple independent variables are established.
Application:
 This technique is used to determine the relationship between multiple dependent

and multiple independent variables.
256
Example:
To study the relationship between factors describing the characteristics of a firm

and the market captured. X below represents characteristics of the firm such as ,
1) Technology
2) Trained manpower
3) High quality
X= a1p1+a2p2+a3p3
Y represents the subject of interest namely market share, sales volume, brand
image etc.
Y=b1q1+b2q2+b3q3
Thus the correlation between X and Y is defined as canonical correlation. For

each variable , a weightage is attached.
Flowchart – canonical correlation Analysis
257
Formulating the objectives
Designing the analysis
Assumption
Deriving the canonical function and assessing overall fit
Interpreting the canonical variate
Validation and diagnosis
258
Multivariate analysis in a nut shell
S.n Technique Relevance & uses

o
1 Multivariate analysis of variance Determines whether statistically significant
(MANOVA) differences of means of several variables
 It explores, simultaneously, the occur simultaneously between two levels of a
relationship between several non- variable.
metric independent 
variables For example, in assessing whether:
(Treatments, say fertilizers) and A change in the compensation system has
two or more metric dependent brought about changes in sales, profit and job
variables’ (say, yield & Harvest satisfaction.
time). If there is only one Geographic region (North, South, East and
dependent variable, MANOVA is West) has any impact on consumers’
same as ANOVA. preferences, purchase intentions or attitudes
towards specified products or services.
 A number of fertilizers have equal impact on
the yield of rice as also on the harvest time of
the crop.
2 Factor Analysis Helps in assessing:
 
It is a statistical approach that is The image of a company/enterprise.
used to analyze inter-relationships Attitudes of sales personnel and customers
among a large number of variables Preference or priority for the characteristics
and to explain these variables in of a
terms of a few dimensions  Product like television, mobile phone etc.
(factors).  Service like TV programme, air travel etc.
 The statistical approach involves
finding a way of condensing the
information contained in a number
of original variables into a smaller
set of dimensions (factors)-mostly
one or two –with a minimum loss
of information.
 Identifies the smallest number of
common factors that best explain
or account for most of the
correlation among the indicators.
 For example, intelligent quotient
of a student might explain most of
the marks obtained in
Mathematics, Physics, Statistics
,etc. as yet another example, when
two variables x and y are highly
259
correlated. Only one of them could

be used to represent the entire
data.
3 Cluster Analysis  It helps in classifying a given set of entities

 It is an analytical technique that is into a smaller set of distinct entities by
used to develop meaningful analyzing similarities among the given set of
subgroups of entities which are entities.
homogeneous with respect to  Some situations where the technique could be
certain characteristics. used are:
 A bank could classify its large network of
branches into cluster (groups) of branches
which are similar to each other with respect
to specified parameters.
 An investment bank could identify groups of
firms that are vulnerable for takeover.
 A marketing department could identify
similar markets where products or services
could be tested or used for target marketing.
 An insurance company could identify groups
of auto insurance policy-holders with high
claims.
4 Discriminant Analysis  The basic objective of Discriminant Analysis
 It is a statistical technique for is to perform a classification function.
classification or determinating  a From the analysis of past data, it can classify
linear function called Discriminant a given group of entities or individuals into
function, of the variables which two categories-one those which would turn
helps in discriminating between out to be successful and others which would
two groups of entities or not be so. For example, it can predict whether
individuals. a company or an individual would turn out to
be a good borrower.
 With the help of financial parameters, a firm
could be classified as worthy of extending
credit or not.
 With the help of financial and personal
parameters, an individual can be classified as
eligible for loan or not or whether he would
be a buyer of a particular product/service or
not.
 Salesmen could be classified according to
their age, health ,sales aptitude score,
communication ability score, etc.
260
5 Conjoint Analysis  Useful for analyzing consumer responses and

 Involves determining the use the same for designing of products and
contribution of variables (each of services.
several levels) to the choice  Helps in determining the contributions of the
preference over combinations of predictor variables and their respective levels
variables that represent realistic to the desirability of the combinations of
choice sets (products, concepts, variables.
services, companies, etc.)  For example,
 How much does the quality of food
contribute to continued loyalty of a traveler
to an airline? Which type of food is liked
most?
6. Multi dimensional scaling  Useful for designing of products and

 It is a set of procedures drawing services.
pictures of data so as to visualize It helps in
and clarify relationships described Identifying market segments based on
by the data more clearly. indicated preferences.
 The requisite data is typically Identifying the products and services that are
collected by having respondents more competitive in relation to the others.
give simple one-dimensional Understanding the criteria used by people
responses. while judging objects (products, services,
 Transforms consumer companies, advertisements, etc.)
judgments/perceptions of
similarity or preferences in usually
a two dimensional space.
7 Canonical correlation Analysis Used in studying relationship between types

 An extension of Multiple of products purchased and consumer life
Regression Analysis (MRA) styles and personal traits.
involving one dependent variable Also, for assessing impact of life styles and
and several metric independent eating habits on health as measured by
variables. number of health related parameters.
 It is used for situations wherein Given assets and liabilities of a set of
there are several dependent banks/financial institutions, helps in
variables and several independent examining interrelationship of variables on
variables. the asset and liability sides.
261
 Involves developing a linear  HRD department might like to study the

combination of each set of relationship between set of behavioral,
variables (both dependent and technical and social skills of a salesman with
independent) and studying the the set of variables representing sales
relationship between two sets. performance, discipline and cordial relations
 The weights in the linear with staff.
combination are derived based in
the criterion that maximizes the
correlation between the two sets of
variables.
Application of statistical software for data analysis:
Introduction:
 A statistical package is a suite (group) of computer programs that are

specialised for statistical analysis.
 It enables people to obtain the results of standard statistical procedures and
statistical significance tests, without requiring low-level numerical
programming.
 Most statistical packages also provide facilities for data management.
Statistical packages
 ADMB - a software suite for non-linear statistical modelling based on C+

+ which uses automatic differentiation.
 Apophenia - a library of statistical functions for C, on the same level of
abstraction as most stats packages.
 Bayesian Filtering Library
 DAP - A free replacement for SAS
 gretl - Gnu Regression, Econometrics and Time-series Library
 JAGS - Just another Gibbs sampler (JAGS) is a program for analysis of
Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC)
developed by Martyn Plummer. It is similar to WinBUGS.
262
 JHepWork - Java-based data analysis framework for scientists and

engineers. It includes an advanced IDE and Jython shell.
 JMulTi
 Octave - programming language (very similar to Matlab) with statistical
features
 OpenBUGS
 OpenEpi - A web-based, open source, operating-independent series of
programs for use in epidemiology and statistics based on JavaScript and HTML
 Ploticus - software for generating a variety of graphs from raw data
 PSPP - A free software replacement for SPSS
 R
 R Commander - GUI interface for R
 RapidMiner, a Machine Learning toolbox
 Shogun, an open source Large Scale Machine Learning toolbox that
provides several SVM (Support Vector Machine) implementations (like
libSVM, SVMlight) under a common framework and interfaces to Octave,
Matlab, Python, R
 Simfit - Simulation, curve fitting, statistics, and plotting
 SOCR
 SOFA Statistics - a desktop GUI program focused on ease of use, learn as
you go, and beautiful output.
 Statistical Lab - R-based and focusing on educational purposes
 STATPerl [1] (Statistics with Perl) is a Statistical Software based on Perl.
Perl source codes for various statistical analysis are given with it. User can add
new analysis and edit existing one. An inbuilt Perl IDE comes with it.
 ViSta [2] A program for exploratory data analysis based on Xlisp-stat
 Weka is also a suite of machine learning software written at the
University of Waikato.
263
 Xlisp-stat
 Yxilon
Public domain
 BrightStat
 CSPro
 Epi Info
 X-12-ARIMA
 MINUIT
Freeware
 BV4.1
 GeoDA
 WinBUGS - Bayesian analysis using Markov chain Monte Carlo methods
 Winpepi - package of statistical programs for epidemiologists
 WinIDAMS
 Zaitun Time Series
Proprietary
 ADAPA - batch and real-time scoring of statistical models

 Analysis Studio - An interactive statistical analysis and data mining.
 Analytica - visual modeling software
 Angoss
 ASReml - for restricted maximum likelihood analyses
 BMDP - general statistics package
 CalEst - general statistics and probability package with didactic tutorials
 CHARTrunner - for quality improvement charts
264
 CMA - Comprehensive Meta Analysis

 Data Applied - for building statistical models
 EViews - for econometric analysis
 FAME - a system for managing time series statistics and time series
databases
 GAUSS - programming language for statistics
 GenStat - general statistics package
 GLIM - early package for fitting generalized linear models
 GraphPad Prism - Biostatistics and nonlinear regression with clear
explanations
 GraphPad InStat - Very simple with lots of guidance and explanations
 IMSL Numerical Libraries - software library with statistical algorithms
 ioGAS for exploratory data analysis in the Geosciences
 JMP - general statistics package
 LISREL - statistics package used in structural equation modeling
 Longevitas - statistics package specialising in mortality and longevity of
holders of life-insurance policies
 Maple - programming language with statistical features
 MATLAB - programming language with statistical features
 Mathematica - programming language with statistical features
 MedCalc - for biomedical sciences
 Mentor - for market research
 Minitab - general statistics package
 MLwiN - multilevel models (free to UK academics)
 NCSS - general statistics package
 NMath Stats - statistical package for .NET Framework
 O-Matrix - programming language
265
 Partek - general statistics package with specific applications for genomic,

HTS, and QSAR data
 Primer-E_Primer - environmental and ecological specific.
 PV-WAVE - programming language comprehensive data analysis and
visualization with IMSL statistical package
 Quantum - part of the SPSS MR product line, mostly for data validation
and tabulation in Marketing and Opinion Research
 RATS - comprehensive econometric analysis package
 SAS - comprehensive statistical package
 SHAZAM - for econometric analysis
 SigmaStat - for group analysis
 Speakeasy - numerical computational environment and programming
language with many statistical and econometric analysis features
 SPSS - comprehensive statistics package
 Stata - comprehensive statistics package
 StatCrunch - comprehensive statistics package and statistical survey tool,
with community sharing of analysis
 Statgraphics - general statistics package
 StatsDirect - general statistics package mostly used in medical statistics
 STATISTICA - comprehensive statistics package
 StatXact - package for exact nonparametric and parametric statistics
 SOCR - online tools for teaching statistics and probability theory
 Systat - general statistics package
 S-PLUS - general statistics package
 Unistat - general statistics package that can also work as Excel add-in
 The Unscrambler (free-to-try commercial Multivariate analysis software
for Windows)
266
 WINKS - Windows KWIKSTAT from TexaSoft - a general statistics

package designed for scientific data analysis
 XploRe
Add-ons
 Analyse-it - add-on to Microsoft Excel for statistical analysis

 SigmaXL - add-on to Microsoft Excel for graphical and statistical
analysis
 SPC XL - add-on to Microsoft Excel for general statistics
 StatEL - add-on to Microsoft Excel (Windows and Mac OS X) for
general statistics, biomedical and data analysis specially developed in a didactic
way for all users of statistics, regardless of their skill level.
 SUDAAN - add-on to SAS and SPSS for statistical surveys
 Total Access Statistics - add-on to Microsoft Access for statistical
analysis
 XLfit add-on to Microsoft Excel for curve fitting and statistical analysis
MATLAB
 MATLAB stands for "Matrix Laboratory" and is a numerical computing
environment and fourth-generation programming language.
 Developed by The MathWorks, MATLAB allows matrix manipulations,
plotting of functions and data, implementation of algorithms, creation of user

interfaces, and interfacing with programs written in other languages, including
C, C++, and Fortran.
267
 Although MATLAB is intended primarily for numerical computing, an optional
toolbox uses the MuPAD symbolic engine, allowing access to symbolic

computing capabilities.
 An additional package, Simulink, adds graphical multi-domain simulation and
Model-Based Design for dynamic and embedded systems.
 In 2004, MathWorks claimed that MATLAB was used by more than one
million people across industry and the academic world. [2] MATLAB users come
from various backgrounds of engineering, science, and economics.
MATLAB
Developer(s) The MathWorks
Stable release R2010a / March 5, 2010; 24 days ago
Written in C, Java
Operating system Cross-platform[1]
Type Technical computing
License Proprietary
Website MATLAB product page
MINITAB
 Minitab is a statistics package. It was developed at the Pennsylvania State
University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L.

Joiner in 1972. Minitab began as a light version of OMNITAB, a statistical
analysis program by NIST.
268
 Minitab is distributed by Minitab Inc, a privately owned company

headquartered in State College, Pennsylvania, with subsidiaries in Coventry,
England (Minitab Ltd.) Paris, France (Minitab SARL) and Sydney, Australia
(Minitab Pty.).
 Today, Minitab is often used in conjunction with the implementation of Six
Sigma, CMMI and other statistics-based process improvement methods.
 Minitab 15, the latest version of the software, is available in 6 different

languages (English, French, German, Korean, Simplified Chinese, & Spanish).
 Minitab produce two other products that complement Minitab 15. Quality
Trainer; an eLearning package that teaches statistical tools and concepts in the
context of quality improvement that integrates with Minitab 15 to
simultaneously develop the user's statistical knowledge and ability to use the
Minitab software; and Quality Companion 3, an integrated tool for managing
Six Sigma and Lean Manufacturing projects that allows Minitab 15 data to be
combined with project management and governance tools and documents.
Developer(s) Minitab, Inc
Stable release 15.1.3 / September, 2008
Operating system Windows
Type numerical analysis
License Proprietary software
Website http://www.minitab.com/
SAS (Statistical Analysis System)
269
SAS (pronounced "sass", originally Statistical Analysis System) is an integrated

system of software products provided by SAS Institute Inc. that enables the
programmer to perform:
 data entry, retrieval, management, and mining

 report writing and graphics
 statistical analysis
 business planning, forecasting, and decision support
 operations research and project management
 quality improvement
 applications development
 data warehousing (extract, transform, load)
 platform independent and remote computing
In addition, SAS has many business solutions that enable large scale software
solutions for areas such as IT management, human resource management,
financial management, business intelligence, customer relationship management
and more.
Features
 Read and write many different file formats.

 Process data in many different formats.
 SAS programming language is a 4th generation programming language.
SAS DATA steps are written in a 3rd-generation procedural language very
similar to PL/I; SAS PROCS, especially PROC SQL, are non-procedural and
therefore better fit the definition of a 4GL.
 SAS AF/SCL is a fifth generation programming language [citation needed] that
is similar in syntax to Java.
270
 WHERE filtering available in DATA steps and PROCs; based on SQL

WHERE clauses, incl. operators like LIKE and BETWEEN/AND.
 Many built-in statistical and random number functions.
 Hundreds of built-in functions for manipulating character and numeric
variables. Version 9 includes Perl Regular Expression processing.
 System of formats and informats. These control representation and
categorization of data and may be used within DATA step programs in a wide
variety of ways. Users can create custom formats, either by direct specification
or via an input dataset.
 Comprehensive date- and time-handling functions; wide variety of
formats to represent date and time information without transformation of
underlying values.
 Interaction with database products through a subset of SQL (and ability to
use SQL internally to manipulate SAS data sets). Almost all SAS functions and
operators available in PROC SQL.
 SAS/ACCESS modules allow communication with databases (incl. via
ODBC); in most cases, database tables can be viewed as though they were
native SAS data sets. As a result, applications may combine data from many
platforms without the end-user needing to know details of or distinctions
between data sources.
 Direct output of reports to CSV, HTML, PCL, PDF, PostScript, RTF,
XML, and more using Output Delivery System. Templates, custom tagsets,
styles incl. CSS and other markup tools available and fully programmable.
 Interaction with the operating system (for example, pipelining on Unix
and Windows and DDE on Windows).
 Fast development time, particularly from the many built-in procedures,
functions, in/formats, the macro facility, etc.
 An integrated development environment.
271
 Dynamic data-driven code generation using the SAS Macro language.

 Can process files containing millions of rows and thousands of columns
of data.
 University research centers often offer SAS code for advanced statistical
techniques, especially in fields such as Political Science, Economics and
Business Administration.
 Large user community supported by SAS Institute. Users have a say in
future development, e.g., via the annual SASWare Ballot. SAS Wiki at.[5]
SPSS
 SPSS is a computer program used for statistical analysis. Between 2009 and
2010 the premier software for SPSS was called PASW (Predictive Analytics
SoftWare) Statistics. [1] The company announced July 28, 2009 that it was being
acquired by IBM for US$1.2 billion.[2] As of January 2010, it became "SPSS:
An IBM Company".
Developer(s) SPSS Inc.
Initial release 1968
Stable release 18.0 (Win / Mac / Linux) / 2009
Operating system Windows, Linux / UNIX & Mac
Platform Java
Type Statistical analysis
272
License Proprietary software
Website http://www.spss.com/
Statistics program
 SPSS (originally, Statistical Package for the Social Sciences) was released in its
first version in 1968 after being developed by Norman H. Nie and C. Hadlai
Hull. Norman Nie was then a political science postgraduate at Stanford
University, and now Research Professor in the Department of Political Science
at Stanford and Professor Emeritus of Political Science at the University of
Chicago
 SPSS is among the most widely used programs for statistical analysis in social
science.
 It is used by market researchers, health researchers, survey companies,
government, education researchers, marketing organizations and others.
 The original SPSS manual (Nie, Bent & Hull, 1970) has been described as one
of "sociology's most influential books".[4] In addition to statistical analysis, data
management (case selection, file reshaping, creating derived data) and data
documentation (a metadata dictionary is stored in the data file) are features of
the base software.
Statistics included in the base software:
 Descriptive statistics: Cross tabulation, Frequencies, Descriptives,

Explore, Descriptive Ratio Statistics
 Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,
distances), Nonparametric tests
 Prediction for numerical outcomes: Linear regression
 Prediction for identifying groups: Factor analysis, cluster analysis (two-
step, K-means, hierarchical), Discriminant
273
Add-on modules provide additional capabilities. The available modules are:
 SPSS Programmability Extension (added in version 14). Allows Python

programming control of SPSS.
 SPSS Data Validation (added in version 14). Allows programming of
logical checks and reporting of suspicious values.
 SPSS Regression Models - Logistic regression, ordinal regression,
multinomial logistic regression, and mixed models.
 SPSS Advanced Models - Multivariate GLM and repeated measures
ANOVA (removed from base system in version 14).
 SPSS Classification Trees. Creates classification and decision trees for
identifying groups and predicting behaviour.
 SPSS Tables. Allows user-defined control of output for reports.
 SPSS Exact Tests. Allows statistical testing on small samples.
 SPSS Categories
 SPSS Trends
 SPSS Conjoint
 SPSS Missing Value Analysis. Simple regression-based imputation.
 SPSS Map
 SPSS Complex Samples (added in Version 12). Adjusts for stratification
and clustering and other sample selection biases.
SPSS Server is a version of SPSS with client/server architecture. It has some

features not available in the desktop version, such as scoring functions.
Download Click here to download a fully functional trial version of XLSTAT
XLSTAT
 XLSTAT is the leading data analysis and statistical solution for Microsoft Excel
274
 The XLSTAT statistical analysis add-in offers a wide variety of functions to

enhance the analytical capabilities of Excel, making it the ideal tool for your
everyday data analysis and statistics requirements.
 XLSTAT's statistical analysis software is compatible with all Excel versions
from version 97 to version 2007, and is compatible with the Windows 9x till
Windows 7 systems, as well as with the PowerPC and Intel based Mac systems.
 Because it is quick, reliable, easy to install and to use - and well-priced -
XLSTAT has grown to be one of the most commonly used statistical software
in the market and is used today by more than 25,000 customers in businesses
and universities, big and small, in over 100 countries throughout the world.
Statistical Analysis with SAS/STAT® Software

From traditional statistical analysis of variance and predictive modelling to
exact methods and statistical visualization techniques, SAS/STAT software is
designed for both specialized and enterprise wide analytical needs. SAS/STAT
software provides a complete, comprehensive set of tools that can meet the data
analysis needs of the entire organization.
Benefits
 Take advantage of all data in order to uncover new business opportunities

and increase revenue.
 Move the scientific discovery process forward by applying the latest
statistical techniques.
 Achieve corporate and governmental compliance.
Features
 Analysis of variance
275
 Mixed models
 Regression
 Categorical data analysis
 Bayesian analysis
 Multivariate analysis
 Survival analysis
 Psychometric analysis
 Cluster Analysis
 Nonparametric analysis
 Survey data analysis
 Multiple imputation for missing values
How SAS Is Different
 SAS has over 30 years of developing and delivering advanced statistical

analysis software.
 SAS/STAT works in both specialized and general enterprise application
environments.
 Statistical procedures in SAS are constantly being updated to reflect the
latest advances in statistical methodology, thus enabling you to go beyond the
basics for more advanced statistical analyses.
 Technical support at SAS is provided by masters- and doctorate-level
statisticians who can help address almost any issue quickly.
276
277
278
Research report-different types-contents of report-need of

executive summary-chapterization-contents of chapter-
report writing-the role of audience-readability-
Unit 5-Report design, writing and ethics in business research
279
Research report:
 The documents that describes the research project, its findings, analysis of
findings, interpretations, conclusions, and, sometimes, recommendations is
called research report.
Types of report:
There are two types of reports:
 Oral report
 Written report
Oral report:
 This type of reporting is required, when the researchers are asked to make an
oral presentation.
 Making an oral presentation is somewhat difficult when compared to the written
report. This is because the reporter has to interact directly with the audience.
 The presenter may have to face a barrage of questions from the audience.
 In oral presentation, communication plays a big role.
280
Points to be remembered in oral presentation:
1) Language used must be simple and understandable.

2) Time management should be adhered
3) Use of charts, graphs, etc, will enhance understanding by the audience.
4) Vital data such as figures may be printed and circulated to the audience so that
their ability to comprehend increases, since they can refer to it when the
presentation is going on.
5) The presenter should know his target audience well in advance to prepare tailor
made presentation.
6) The presenter should know the purpose of report such as “Is it for making a
decision?”, “Is it for the sake of information?” etc.
Guidelines:
u
h
e
s
o
l
d
e
h
i
d
a
p
l
u c
n
e
y
o
l
p
m
e
o
t
p
g
s
a
r e
h
n
a
r
t
i
p
s
e
i
v
s
a
u
l
a
m
r
o
f
n
i
ti
f
o
t
e
r
c
n
s
d
i
n
o
d
i
l
s
e
y
u
a
n e
c
n
d
e
o
v
i
u
a
o
ti
c
a
r
en
i
e
t
n
n
m
o
tie
c
s
i
n
d
e
l
k
c
b
o y
a
n
o
l
e
r
d
g
i
w
g
p
i
h
o
i
d
g
n
a
e
r
e
h
t
s
s
t
n
i
p
s
e
r
n
o
i
t t
281
v
ti
e
r
t
u
c
x
p
n
s
a t
a
s
ti
n
o
o
i
y
t
A
Y
K
u
s
a
s
d
l
u
o
v
i
r
e
h
r
u
s
l
e
v
ti
y
f
t
r
o
k
p
e
w
n
y
ff
i
d
e
n
p
s
y
l
a
c r
u
o
t
n
r
e
h
t
i
d
e
u
a
n
r
o
f
a
d
ti
n
i
m
u e
c
e
c
n
i
o
p
n
i
o
a
m s
t
o
f
w
l
o m
ti
e
n
a
l
p
e
h
t
k
a
m
v
o
c
e
b
r
a
t
n
s
e
r
p
d
n
a
d
n
o
ti
a
n
g
m
n
h
t
i
w m
e
e
v
i
l
e
d
v
ti
c
ff
e
t
n
o
l
a
ti
d
e
t r
e
m
Written reports:
Types of report:
1) Short report
2) Long report
3) Formal report
4) Informal report
282
5) Government report
Short report:
 Short reports are produced when the problem is very well defined and if the
scope is limited.
 It will run into about five pages.
 It consists of report about the progress made with respect to a particular product
in clearly specified geographical locations.
 E.g. monthly sales report
Long report:
 This could be both a technical report as well as non-technical report.

 This will present the outcome of the research in detail.
Technical report:
This will include the sources of data, research procedure, sample design, tools
used for gathering data, data analysis methods used, appendix, conclusion and
detailed recommendations with respect to specific findings. If any journal, paper
or periodical is referred, such references must be given for the benefit of reader.
Non-technical report/management report:
 This report is meant for those who are not technically qualified.
 E.g. chief of the finance department.
 He may be interested in financial implications only, such as margins, volumes
etc.
 He may not be interested in the methodology.
Final report:
Example: The report prepared by the marketing manager to be submitted to the

vice president (marketing) on quarterly performance, reports on test marketing.
283
Informal report:
The report prepared by the supervisor by way of filling the shift log book, to be
used by his colleagues.
Government report:
These may be prepared by state governments or the central government on a

given issue.
Example: Programme announced for rural employment strategy as a part of five

year plan or report on children’s education etc.
Preparation of research report:
Format of a research report:
Format of the research report contains three sections.
I. Preliminary section
a) Title page
b) Certificate
c) Declaration
d) Ackowledgement
e) Preface
f) Forward
g) Abstract
h) Table of contents
i) List of tables
j) List of figures
II. Main body of the report
1. Introduction
284
a) Statement of the problem

b) Significance of the study
c) Purpose
d) Definition of important terms
e) Objectives
f) Hypothesis
g) Methododlogy
h) Period o fthe study
i) The study area
j) The data structure
k) Chapterisation
2. Review of literature
a) Critical analysis of the previous research
b) Nrief statement of the present study
3. Design of the study
a) Procedures
b) Methods of gathering data
c) Description of data
4. Presentation and analysis of data
a) Text
b) Preliminary tables
c) Supplementary tables
d) Figures
e) Statistical tools used for data analysis
f) Statistical computations and summary of results
5. Summary and conclusions
a) Brief restatement of the study
b) Main findings and conclusions
c) Recommendations for further research
285
III. Reference section

a) Bibiliography/ References
b) Appendix
c) Index
Structure of the report:
Structure your writing around the IMR&D framework and you will ensure a
beginning, middle and end to your report.
I Introductio Why did i do this research? (beginning)

n
M Method What did i do and how did i go about (middle)
doing it?
R results What did i find? (Middle)
AND
D Discussio What does it all mean? (end)

n
What do I put in the beginning part?
TITLE PAGE Title of the project, sub-title( where

appropriate), data, author, organisation
logo.
BACK GROUND History ( if any ) behind the project
286
ACKNOWLEDGEMENT Author thanks people andorganisation

who helped during the project.
SUMMARY ( SOMETIMES A condensed version of a repor-
CALLED ABSTRACT OR ouylines salient points, emphasizes
SYNOPSIS) main conclusions and ( where
appropriate) the main recommendatios.
LIST OF CONTENTS An at a glance list that tells the reader
what is in the report and what page
number(s) to find it on.
LIST OF TABLES As above , specifically for tables
LIST OF APPENDICES As above, specifically for appendices
INTRODUCTION Author sets the scene and states his/her
intentions.
AIMS AND OBJECTIVES AIMS- general aim of the
audit/project, broad statement of
intent.
OBJECTIVES-specific things
expected to do/deliver( e.g. expected
outcomes)
What do I put in the middle part?
METHOD Work steps: what was done – how, by whom, when?

RESULT/FINDINGS Honest presentation of the findings, whether these were
as expected or not give the facts, including any
inconsistencies or difficulties encountered.
What do I put in the end part?
DISCUSSION Explanation of the results( you might like to eeep

the SWOT analysis in mind and think about your
project’s strengths,weakness, opportunities and
287
threats,as you write)

CONCLUSIONS The author links the rsults/ findings with the points
made in the introduxction and strives to reach
clear, simply stated and unbiased conclusions.
Make sure they are fully supported by evidence and
argumnets of tyhe main body of your audit/ project.
RECOMMENDATION The author states what specific actions should be
S taken, by whom and why. They must always be
linked ti the future anfd should always be ralistic.
Don’t make them unless asked to.
REFERENCES A section of a report which provides full details of

publications mentioned in the text, or from which
exteacts have been quoted.
APPENDIX The purpose of an appendix is to supplement the
information contained in the main body of the
report.
Bibliography:
A bibliography, also called a works cited page, provides source information,

and helps keep all resources and references together and organized, and helps
people know where you got your sources. Most important, it lets readers know
that you did not plagiarize any part of your paper. A bibliography or reference
section is necessary for any research paper. Unfortunately, there are as many
ways to write a bibliography as there are names for it. Here is the standard
method.
Instructions:
1. Step 1
288
Create a page at the end of the paper. Call it either "Bibliography" or "Works
Cited."
2. Step 2
List, alphabetically by author's last name, all the sources used in writing the
paper.
3. Step 3
Write the last name of the author first, followed by a comma and his or her first
name, followed by a period.
4. Step 4
Write the name of the book in italics, followed by a period. You can also
underline the book title.
5. Step 5
Cite the name of an article, in quotation marks, in place of a book title. Then
write the name of the journal or magazine from which it came (italicize the
name or underline it). Include a volume number, if applicable.
6. Step 6
Write the name of the city in which the work was published, followed by a
colon.
7. Step 7
Include the name of the publisher, followed by a comma.
8. Step 8
289
Conclude the entry with the date of publication, followed by a period.
9. Step 9
Number your bibliography page when adding numbers to your paper.
290
291
CHARACTERISTICS OF A GOOD REPORT

Context and Style
 Appropriate, informative title for the content o f report.
 Explicit statement of research auspices (i.e., funding sources).
 Crisp, specific, unbiased writing with minimal jargon.
 Adequate analysis of prior relevant research, putting this study in perspective
(with theoretical context if a theory-building study).
Questions/Hypotheses (and Construct Validity)
 Clearly stated questions/hypotheses (including X, Y, and whether the research is

causal or not).
 Thorough operational definitions of key concepts (with validation if
questionable), plus exact wording/measurement of key variables.
Research Procedures
 Full and clear description of the research design.
 Explanation of steps taken to obtain participants/subjects (e.g., sampling
system/frame, recruitment, informed consent).
 Demographic profile of the participants/subjects.
 Specific data-gathering procedures (inc., where/when conducted).
Issues for Causal Research (Internal Validity)
 Ideally, no (or minimal) threats to internal validity – thanks to a good control
group and avoiding mortality/attrition and intragroup history – or thanks to
adroit use of other techniques (like time series) to help rule out alternative
explanations. If other threats emerged, (e.g., history, maturation, regression,
practice/testing, instrumentation, regression to the mean, selection, reactivity), a
candid discussion of the extent to which those threats were controlled.
292
 Successfully addressing (implicitly or explicitly) all three elements of causal

inference (time order, correlation, nonspuriousness).
Data Analysis (Statistical Conclusion Validity)
 Appropriate inferential statistics for sample or experimental data and
appropriate use of descriptive statistics.
 Clear, reasonable interpretation of the statistical findings, accompanied by
effective tables and figures.
Summary (including External Validity)
 Fair assessment of the implications and limitations of the findings (i.e., its
external validity).
 Effective commentary on the overall implications of the findings for theory
and/or policy.
293
Ethics in research
Ethics-Definition:
 Ethics are norms or standards of behaviour that guide moral choices about
our behaviour and our relationship with others.
 The goal of ethics in research is to ensure that no one is harmed or suffers
adverse consequences from research activities.
 According to Collins Dictionary, ethical means “in accordance with principles
of conduct that are considered correct, especially those of a given
profession or group”.
 Ethics is nothing but the accepted code of conduct.
 Ethics in business research is very much required and relevant in today’s
industrial scenario.
Ethical issues concerning research participants/ethics in the treatment of

the respondent:
There are many ethical issues in relation to participants of a research activity.
Seeking consent:
 It is considered unethical to collect information without the knowledge of

participants, and their expressed willingness and informed consent.
 Informed consent implies that subjects are made adequately aware of the type of
information you want from them, why the information is being sought, what
purpose it will be put to, how they are expected to participated in the study, and
how it will directly or indirectly affect them.
 It is important that the consent should also voluntary and without pressure of
any kind.
 Schinke and Gilchrist write:
294
 F oCompetent
r according to eSchinke and
x Gilchrist,’is
a m
concerned p the legal
with l e
and mental capacities of participants to give permission’.
them from making informed decisions, people in crisis, people who cannot
speak the language in which research is being carried out, people who are
dependent upon you for a service and the children are not considered to be
competent.
Under standards set by the National Commission for the Protection of

Human Subjects, all informed –consent procedures must meet three
criteria:
1) Participants must be competent to give consent.
2) Sufficient information must be provided to allow for a reasoned
decision.
3) Consent must be voluntary and uncoerced.
Providing incentives:
 Some researchers provide incentives to participants for their participation in a

study, feeling this to be quite proper as participants giving their time.
 Others think that the offering of inducements is unethical.
 In general,
Giving a small gift after having obtained your information, as a

token of appreciation is not unethical; however giving a present
before data collection is unethical.
Seeking sensitive information:
295
 Information sought can pose an ethical dilemma in research.

 Certain types of information can be regarded as sensitive or confidential by
some people and thus an invasion of privacy.

Not only hazardous medical experiments but also any social research
that might involve such things as discomfort, anxiety, harassment,
invasion of privacy, or demeaning or dehumanising procedures.
Asking for this information may upset or embarrass a respondent.

 For most people, questions on sexual behaviour, drug use and shoplifting are
intrusive.
 Even question on marital status, income and age may be considered to be an
invasion of privacy by some. In collecting data you need to be careful about the
sensitiveness of your respondents.
It is not unethical to ask such

questions(sensitive and intrusive) provided
that you tell your respondents the type of
information you are going to ask clearly and
frankly, and give them sufficient time to decide
if they want to participate, without any major
inducement.
The possibility of causing harm to participants:
Harm includes:
296
 When you collect data from respondents or involve subjects in an experiment,

you need to examine carefully whether their involvement is likely to harm them
in any way.
 If it is likely to, you must make sure that the risk is minimal.
 If the way information is sought creates anxiety or harassment, you need to take
steps to prevent this.
Maintaining confidentiality:
 Sharing information about a respondent with others

for purposes other than research is unethical.
 Sometimes you need to identify your study population
to put your findings into context.
 In such situation you need to make sure that at least
the information provided by respondents is kept
anonymous.
Protecting the Rights of the respondents:
297
Rights to choose:
The customer must be allowed to choose what he wants. No force

should be exerted by sellers on the buyer.
Rights to safety:
The researcher must not inflict psychological harm by putting the

respondents under pressure to answer.
Rights to be informed:
The researcher must inform the customer in advance about the

purpose of gathering the information.
Rights to privacy:
The researcher should convince the customer that the survey

does not involve unethical things and it is being conducted for
mutual benefits.
E.g. Skin care products should not mislead the user.
Ethical issues relating to the researcher:
Avoiding bias:
 Bias on the part of the researcher is unethical.

 Bias is a deliberate attempt either to hide what you have found in your study or
to high light something disproportionately to its true existence.
298
Using inappropriate research methodology:
 A researcher has an obligation to use appropriate research methodology in

conducting study.
 It is unethical to use a method or procedure you know to be inappropriate.
 Examples
Incorrect reporting:
 To use an appropriate methodology, but to report the findings in a way that

changes or slants them to serve your own or someone else’s interest, is
unethical.
Selecting the bidders:
Sometimes firms, for the sake of formality, call for quotations from a number of
market research agencies, even though they have already decided to whom the
project should be given. This is unethical practice in the matter of selection of
researchers.
Limited funds:
 Certain 1) Selecting a highly biased sample, firms have limited funds

allocated 2) Using an invalid instrument or to carry out the research.
3) Drawing wrong conclusions.
 For example, the firm may
have a budget to be conducted
on a regional basis but the firm does not make this clear to the researcher while
inviting proposals.
299
 It may happen that such ambiguity may cause the researcher to prepare his
proposals for a nationwide research, but upon bagging the project, the funds
released are sufficient only to conduct research on regional basis.
 This may frustrate researchers besides, it is an unethical practice.
Non-availability of data:
 Some firms give projects to their researcher, but do not provide him with
required sales and cost data.
 Since this may be the basis for carrying out the research, the researcher feels
frustrated at not receiving the basic promised data.
 This is an unethical on the part of the client firm.
Pseudo-Pilot studies:
 Some clients ask the research agencies to conduct pilot studies and promise that
if the researcher does a good job during the pilot study stages, there will be an
additional major contract immediately.
 Most often, this comprehensive study never materialises and the research
agencies absorb a huge loss.
 This is not an ethical practice.
Political research:
 Political organisations hire some research consultants to carry out a research.

 In such cases, there will be likelihood that the consulting firm or organisation
will be politically pressurised to produce reports and forecast in favour of the
party commissioning it.
 This is also a very unethical practice.
Ethical issues regarding the sponsoring organisation:
300
Restrictions imposed by the sponsoring organisation:
 Most research in the social sciences is carried out using funds provided by
sponsoring organisations for a specific purpose.
 The funds may be given to develop a program or evaluate it; to examine its
effectiveness and efficiency; to study the impact of a policy; to test a product; to
study the behaviour of a group of community; or to study a phenomenon, issue
or attitude.
 Sometimes there may be direct or indirect controls exercised by sponsoring
organisations.
 They may select the methodology, prohibit the publication of ‘what was found’
or impose other restrictions on the research that may stand in the way of
obtaining and disseminating accurate information.
 Both the imposition and acceptance of these controls and restrictions are
unethical, as they constitute interference and could amount to the sponsoring
organisation tailoring research findings to meet its vested interests.
The misuse of information:
 Sometimes sponsoring organisations use research as a pretext for obtaining

management’s agenda.
 It is unethical to let your research be used as a reason for justifying management
decisions.
301

K.Anandakumar: Lecturer Department of Management Studies Velammal Institute of Technology

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

K.Anandakumar: Lecturer Department of Management Studies Velammal Institute of Technology

Uploaded by

Copyright:

Available Formats

K.

1.1. Definition of Research:

 Research in common man’s language refers to ‘search for knowledge’.

1.2: Need of the Research

 To identify and find solutions to the problem

 To help making decisions

1.3: Objectives of Research

 It helps in optimum utilization of resources.

1.4: Significance of Research

1.5 Research Process

Research is a process. A process is a set of activities that are performed to

(1)Observation – Identification of broad Problem area:

 Identification of broad problem area through observation is the first step in a

(2)Preliminary Data Gathering:

 This could be done by

(3)Extensive Literature Survey:

 He learns the methodology and approach developed by these past studies.

(5) Theoretical Frame work:

 A hypothesis can be defined as a logically conjectured relationship between

(7)Preparation of research design:

(8)Determining the sample design:

It means assigning numbers to each of the answers, so that they can be

(10)Interpretation and report:

The flowchart of research process

Clarifying the Research Question

The Research Process Discover the Management Dilemma

Define the Management Question

Define the Research Questions

Research Design Strategy

Instrument development & Pilot

1.6.1. Based on application:

 Gathering knowledge for knowledge’s sake is termed pure or basic

 Pure research and

Pure research/desk research/basic research:

research. It is mainly concerned with generalization and with the formulation

 Pure research is also concerned with the development, examination, verification

 Research conducted in a particular setting with the specific objective of solving

1.6.2. Based on Objectives:

From the perspective of the objective, a research broadly classified into

 To describe what is prevalent regarding

The main emphasis in a correlation research is to discover or establish the

 A research study where very little knowledge or information is available on the

1.6.3. Based on inquiry mode:

Broadly there are two approaches to inquiry.

 The structured approach-quantitative research

 The historical enumeration of events

1.6.4. Difference between quantitative and qualitative.

Serial Aspects Qualitative Quantitative

preparation descriptions Reduced to numerical

1.6.5. Other types of research:

 A research design in which the major emphasis is on determining a cause-and-

 In marketing, casual research is used for many types of research including

 Cross-sectional analysis studies the relationship between difference variables at

1.6.6. Difference between cross sectional and longitudinal studies:

The fundamental difference between cross sectional and longitudinal studies

 The process by which practitioners attempt to study their problems scientifically

 Historical Research is nothing but objective location, evaluation and synthesis

 Studies done across two or more cultures to understand, describe, analysis or

 Library Research is conducted with the help of written materials mostly

 A particular data gathering technique directed toward surfacing information,

 The research related to some abstract idea or theory is known as conceptual

 Empirical Research exclusively relies on the observation or experience with