BRM Merged 1-13 PDF

1/7/2020
Business Research Method
Session-1
XLRI- Xavier School of Management, Jamshedpur
Course Outline: Brief

• Functional Knowledge
– Quizzes : 30%
– Class Participation/Assignment : 10%
– Group Project : 30%
– End Term : 30%
Note:
– Class Participation will be assessed by the instructor(s) based on class
preparation, meaningful participation, sincerity, discipline, regular
attendance, and general behavior in the class. Disruptive attendance and
indiscipline related to course will be penalized.
– Group Project: Group of 6 members shall work on a topic. Project submission

guidelines & submission dates will be communicated later.
• Grading will be as per institute norms.
1
1/7/2020
Class Conduct: Guideline

• Discipline in the class will be guided by student manual of the institute.
• Use of mobile phone is not allowed in the class. Please respect others in
the class by turning off mobile phones & other electronic devices.
• Request you to refrain from class disturbing activities (e.g., Late arrivals
to class, cross-talking, movements during the class or any other
disturbing activity).
• Please attend your BRM session only as per your sections (Pls don't
attend BRM session of other section).
– However, if your situation is compelling (other than medical), you may attend BRM
session with other section, only after mail-based permission with 'NO attendance for
that class'. Medical issue will be treated differently.
Discussion
Can you share any example of research?
2
1/7/2020
I don’t know if we
should change the
package of Colgate
toothpaste?
Discussion
Why should we conduct research?
3
1/7/2020
Research…
Provides information
to guide decisions
Research…
Reduces risk in
decision making
4
1/7/2020
Research: Different Terms

• Business Research Method
• Social Research Method
• Market Research
• Social Science Research
Discussion
What is Research?
10
5
1/7/2020
Defining Research…
• Systematic investigation into and study of materials &
sources in order to establish facts & reach new conclusions
(Oxford dictionary).
• A studious inquiry or examination (Merriam-Webster Online Dictionary).
• Systematic and objective process of gathering, recording,

and analyzing data for aid in making business decisions
(Zikmund, 2007).
• Systematic enquiry that provides information to guide

managerial decisions (Cooper & Schindler, 2009).
11
Defining Research
• Research is the systematic & objective
– Identification (of information)
– Collection (of information)
– Analysis (of information)
– Dissemination (of information) &
– Use of information
• for improving decision making related to…
– Identification and Solution of problems & opportunities
in business
12
6
1/7/2020
Defining Research
Used to identify & define

opportunities and problems
Generate, refine, & evaluate

Information (managerial) actions
Monitor performance (of firm or

any other entity)
Improve understanding of process
13
Summary of pointers about Research

• Research is all about finding something, absence of which may
distort our ability to take informed decisions.
• Ability to take an informed decision is generated through a

systematic study that is conducted through various interrelated
stages.
• All steps of research process are information-centric.
• All steps in a research are interrelated & no independent activity

is launched without considering decisions on previous stages.
14
7
1/7/2020
Research Suppliers & Services
RESEARCH
SUPPLIERS
INTERNAL
EXTERNAL
FULL SERVICE LIMITED SERVICE
Field Other
Internet
Services Services
Services
Syndicate Customized
Services Focus Groups & Technical &
Services
Qualitative Analytical
Services Services
15
Research Classification
Discussion:
How can we classify Research?
16
8
1/7/2020
A Classification of Research
Research
Problem Identification Problem Solving

Research Research
To help identify problems

which are not necessarily
apparent on the surface & yet To help solve specific
exist or are likely to arise in problems.
the future
17
Research Classification: Discussion

• Research on
– Market Potential Research; Market Share Research; Market
Characteristics Research; Sales Analysis Research; Forecasting
Research; Business Trends Research
• Should McDonalds add Italian pasta dinners to its menu?

– To assess preference for Italian pasta dinners among TG.
• Should P&G add a high-priced less-foam based detergent

powder kit to its product line?
– To identify/assess customer preference for less-foam based
detergent powder.
18
9
1/7/2020
Research Classification: Discussion

• Why SBI’s market share of educational loan is decreasing in
recent years and What steps can be done to improve market
share?
– To identify factors influencing educational loan buying
– To assess SBI’s performance on criteria of educational loan purchase
– To identify methods of improving those parameter.
19
Steps involved in Research
Discussion
What can be broad steps of Research?
20
10
1/7/2020
Steps of Research Process
Step 1: Defining the Problem
Step 2: Developing an Approach to the Problem
Step 3: Formulating a Research Design
Step 4: Doing Field Work or Collecting Data
Step 5: Preparing and Analyzing Data
Step 6: Preparing and Presenting the Report
21
21
Problem Definition
“The truly serious mistakes are made not as a result of

wrong answers but because of asking wrong questions”
… Peter Drucker
22
11
1/7/2020
Problem Definition
• The most important step in research
• Problem Definition covers purpose of study, relevant

background information, information needed, and how it
will be used in decision making
23
Why is it important to clearly define problem?

• Because problem definition sets the course of entire
project
• Because client is paying for research so both need to know
what to expect
• Problem definition process provides guidelines on how to
correctly define research problem
• Because mistakes made at this level grow into larger, more
expensive mistakes later.
• All the effort, time & money spent from this point on will be
wasted if problem is not properly defined.
24
12
1/7/2020
Problem Definition: Genesis

• Drivers for problem formulation:
– Unanticipated change, basically in the environment of
focal firm
– Planned change (estimation, effects, outcome)
– Serendipity (random ideas or information)
• Situation Narration by management
25
Exercise before Problem Definition

1. Discussion with Decision maker
2. Discussion with Industry expert
3. Secondary Data
4. Qualitative Research
26
13
1/7/2020
Environmental Context of the Problem
Past Information & Forecasts
Resources & Constraints
Objectives
Buyer Behavior
Legal Environment
Economic Environment
Marketing & Technological Skills
27
Management Decision Problem (MDP)

• A statement specifying the type of managerial action
required to solve the problem.
– It asks what a decision maker needs to do.
– It is action oriented.
– It focuses on symptoms.
• The Problem being faced by decision maker for which

research is intended to provide answers or information
28
14
1/7/2020
Research Problem (RP)

• A statement specifying the type of information needed by
the decision maker to help solve the management decision
problem and how the information can be obtained
efficiently & effectively.
– It asks what information is needed & how it should be
obtained
– Information oriented
– Focus on underlying cause
• A statement of the decision problem in research terms
29
MDP vs. RP: Illustration

MDP RP
Should a new product be To determine consumer preferences

introduced? and purchase intentions for the
proposed new product.
30
15
1/7/2020
Problem Definition
• MDP asks what Decision maker needs to do where as,
• RP asks what information is needed & how it can be

obtained effectively & efficiently
31
Problem Definition: Steps Involved

• Understanding Genesis of Problem
• Conducting four must exercise
• Considering Environmental Context of the problem
• Developing MDP & Developing relevant RP(s)
32
16
1/7/2020
33
17
1/10/2020
Prof. Ravi Shekhar Kumar

ravishekhar@xlri.ac.in
Session-2
Problem Definition: Steps Involved

• Understanding Genesis of Problem
• Conducting four must exercise
• Considering Environmental Context of the problem
• Developing MDP & Developing relevant RP(s)
– MDP asks what Decision maker needs to do where as,
– RP asks what information is needed & how it can be obtained

effectively & efficiently
1
1/10/2020

MDP RP
Should a new product be To determine consumer preferences

introduced? and purchase intentions for the
proposed new product.
Should the advertising To determine the effectiveness

campaign be changed? of the current advertising
campaign.

MDP RP
Should the price of the To determine the price elasticity

brand be increased? of demand and the impact on sales
and profits of various levels
of price changes.
Should management share explicit To identify merit and demerit of

career-development plan sharing explicit career-development
with new recruit? plan with new recruit.
2
1/10/2020
Problem Definition: Discussion

• Are these problems correct?
– Improve the company’s image
– Develop a suitable employee strategy for firm
– Improve the competitive position of the firm
– Develop a marketing strategy for the brand.
• Is the problem correct?
– How should the firm adjust its pricing given that a major competitor
has initiated price changes?
• (How should the firm respond to the competitor’s Price changes)
• Is the problem correct?

– What are the drivers of employee engagement and What can be
done to improve employee engagement?
Developing an Approach to the Problem

• Focus:
– Developing more specific devices to address the components of
research problems defined at previous step
• Includes:
– Theory / Objective evidence
– Analytical model (verbal/graphical/mathematical)
– Research question (define it)
– Hypotheses (End product in this step)
3
1/10/2020
Theory/Objective evidence
• Theory
– Example for choice making related theories:
• Theory of rationality
• Bounded rationality Theory
• Objective Evidence
– Empirical observation (available in literature)
Model: Theory of Planned Behavior
(Ajzen, 1985)
4
1/10/2020
Model: Technology Acceptance Model
(Davis 1989)
Research Question
• MDP: what should be done to improve the patronage of Big
Bazar store?
RP: To determine the relative strengths and weaknesses of Big

bazar, vis-à-vis other major competitors, with respect to
factors that influence store patronage.
Specific Specific Specific

Question 1 Question 2 Question n
• Research questions(RQ):
Refined Questions or statements of the specific components
of the (research) problem.
10
5
1/10/2020
Possible Research Questions

• What criteria do households use when selecting department stores?
• How do households evaluate Big Bazar and competing stores in terms of
the choice criteria identified in above question?
• Which stores are patronized when shopping for specific product
categories?
• What is the market share of Big Bazar and its competitors for specific
product categories?
• What is the demographic and psychological profile of the customers of
Big Bazar? Does it differ from the profile of customers of competing
stores?
• Can store patronage and preference be explained in terms of store
evaluations and customer characteristics?
11
MDP, RP & RQ: Exercise

• Management Decision Problem (MDP):
– Should Amul launch packaged sweet products
• Research Problem (RP):

– To determine the customer preference for packaged Sweets of
Amul
• Research Question (RQ):

– Are milk-based sweets popular among target customers
– What is the perceived quality of packaged sweets
12
6
1/10/2020
Development of Research Questions
Research Problem
Objective/
Theoretical
Framework;
Analytical
Model
RQ- 1: Specific RQ- 2: Specific RQ- n: Specific

Question 1 Question 2 Question n
13
Research Hypothesis
• An unproven statement or proposition about a factor or
phenomenon that is of interest to the researcher.
– Often, a hypothesis is a possible answer to the research
question.
– It is mostly about the relationship between two
variables/ two phenomena.
– An empirically testable statement
14
7
1/10/2020
Null vs Alternate Hypothesis

• Null Hypothesis (H0): is a statement about a population, this is
assumed to be true.
– … is generally assumed to be true until evidence indicates
otherwise.
– … is a statement of the status quo, one of no difference or no effect.
– If null hypothesis is not rejected, no changes will be made.
• Alternate Hypothesis (H1): is a statement that directly contradicts

a null hypothesis by stating contrary thing about population.
– … is one in which some difference or effect is expected.
– Statement that is hoped or expected to be true instead of null
hypothesis.
– Accepting alternative hypothesis will lead to changes in opinions
or actions.
15
Department Store: Illustration

• RQ
– Do the customers of Big Bazar exhibit store loyalty and what is their
characteristics?
• Hypothesis
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
– H2: Store-loyal customers are more risk-averse than are non-loyal
customers.
– H3: Customers of Big Bazar are loyal.
• Inappropriate hypothesis
16
8
1/10/2020
Research Questions & Hypotheses: Illustration

RQ: What is the lifestyle of consumers who purchase athletic
footwear based on image?
• Hypotheses:
– H1: Consumers who purchase athletic footwear based on image are
not price sensitive.
(A lifestyle typically reflects an individual's attitudes, way of life, values, or world
view. It can denote the attitudes, interests, opinions, behaviors, and behavioral
orientations.)
RQ: What is the lifestyle of the typical Nike consumer?

• Hypotheses:
– H1: The typical Nike consumer is ‘young and independent’.
– H2: The typical Nike consumer watches sports on television.
17
18
9
1/15/2020

Session-3
Problem Definition Process
Discussion Environmental
with Decision Context of
Problem
Maker
Defining
RP-1
Interview
with Experts
Problem Situation
Defining Defining
Genesis of MDP RP-2
Problem
Secondary
Data Analysis
Defining
RP-3
Qualitative
Research
1
1/15/2020
Developing an Approach to Problem
Based on Theoretical
Knowledge
Based on
Defining • Literature Review
Defining • Qualitative study
RP-1 RQ-1
Defining
Defining RQ-2 Developing
RP-2 Hypothesis-1
Defining
RQ-3 Developing
Hypothesis-2
Defining
RP-3
Defining
RQ-4 Developing
Hypothesis-2
Situation
• Harley- Davidson made such an important comeback in
early 2000s that there was a long waiting list to get Harley-
Davidson bike.
• In 2007 market share was about 50% in heavyweight bike
category.
• Distributors urging for expansion.
• But the company was skeptical about investing in new
production facilities.
2
1/15/2020
Exercise before Problem Definition

• Discussion with Decision maker:
– Years of declining sales has taught management to be more risk
averse than risk prone.
• Discussion with Industry expert:
– Brand Loyalty was a major factor influencing sale and repeat sales of
bike.
• Secondary Data:
– A vast majority of bike owners also owned automobiles such as cars,
SUVs and trucks.
• Qualitative Research:
– Focus groups with bike owner indicated bikes were not used as
primary means of transportation but as a means of recreation. Also
highlighted the importance of brand.
Environmental context of Problem

• Forecast called for an increase in consumer spending on
recreation & entertainment in 2015
• Harley has necessary resources to achieve its objective of
being the dominant motorcycle brand on global basis
• Brand image & brand loyalty played a significant role in
buyer behavior with well-known brands continuing to
command a premium.
• Harley has necessary marketing & technological skills to
achieve its objective.
3
1/15/2020
Problem Definition
• MDP:
– Should Harley-Davidson invest to produce more bike?
• RP:
– To determine if customer would be loyal buyers of
Harley-Davidson
Approach to Problem
RQ:
• Who are customers? What are their demographic &
psychographic characteristics?
• Can different types of customers be segmented? Is it
possible to segment market in a meaningful way?
• How do customers feel regarding their Harleys? Are all
customers motivated by the same appeal?
• Are the customers loyal to Harley-Davidson? What is the
extent of brand loyalty?
4
1/15/2020
Approach to Problem: Hypothesis
• RQ:
– Can different types of customers be segmented based
on psychographic characteristics?
• Hypothesis:
– H1: There are distinct segments of bike buyers
Psychographics is the study of personality, values, opinions, attitudes,
interests, and lifestyles.
– H2: Each segment is motivated to own a Harley-Davidson

for a different reason.
– H3: Brand loyalty is high among Harley-Davidson
customers in all segment
Research Design
10
5
1/15/2020
Research Design: Definition

• Research design is framework or blueprint for conducting
research project.
– …details procedures necessary for obtaining information needed to
structure or solve research problem(s).
– …lays the foundation for conducting project.
• Involves:
– (Define the information needed)
– Design exploratory, descriptive, and/or causal phases of research
– Specify measurement & scaling procedures
– Construct & pretest a questionnaire or an appropriate form for data
collection
– Specify sampling process & sample size
– Develop a plan of data analysis
11
Classification of Research Designs
Research Design
Exploratory Research Design Conclusive Research Design

Provision of insights into & Assist in determining, evaluating &
comprehension of the problem selecting the best course of action
situation confronting researchers to take in a given situation
12
6
1/15/2020
Exploratory vs Conclusive Research
Exploratory Conclusive
Objective: To provide insights & To test specific hypotheses and
understanding examine relationships
Character- Information needed is defined Information needed is clearly

istics: only loosely. defined.
Research process is flexible & Research process is formal and
unstructured. structured.
Sample is small & non- Sample is large & representative.
representative. Data analysis is quantitative
Analysis of primary data is
qualitative
Findings/ Tentative Conclusive

Results:
Generally followed by further Findings used as input into

Outcome: exploratory or conclusive decision making
research
13
Classification of Research Designs
Research Design
Exploratory Research Design Conclusive Research Design

Provision of insights into & Assist in determining, evaluating &
comprehension of the problem selecting the best course of action
situation confronting researchers to take in a given situation
Descriptive Research Causal Research

Description of something To obtain evidence
usually characteristics & regarding cause-and-
functions effect relationship.
Cross-Sectional Longitudinal
Design Design
14
7
1/15/2020
A Comparison of Basic Research Designs
Exploratory Descriptive Causal
Objective Discovery of ideas Describe Determine cause

and insights characteristics or and effect
functions relationships
Characteristics Flexible, versatile Marked by the prior Manipulation of

formulation of independent
specific hypotheses variables
Measure effect on
Preplanned and dependent
Often the front end structured design variables
of total research
design Control mediating
variables
15
Exploratory Research
• Can be conducted by analyzing (qualitatively) Primary data
and Secondary data
Primary vs. Secondary Data
Primary Data Secondary Data
Collection purpose For the problem at hand For other problems
Collection process Very involved Rapid & easy
Collection cost High Relatively low
Collection time Long Short
16
8
1/15/2020
Secondary Data: Sources
Secondary Data
Internal External
Requires
Published Computerized Syndicated
Ready to Use Further
Materials Databases Services
Processing
17
Secondary Data: Uses

• Identify the problem & Better define the problem
• Develop an approach to the problem
• Formulate an appropriate research design (for example, by
identifying key variables)
• Answer certain research questions & test some hypotheses
• Interpret primary data more insightfully
18
9
1/15/2020
Exploratory Research: Uses

• Formulate a problem or define a problem more precisely
• Gain insights for developing an approach to problem
• Identify alternative courses of action & Establish priorities
for further research
• Isolate key variables & relationships for further examination
– Develop hypotheses
19
Exploratory Research: Methods

• Secondary data analyzed in a qualitative way
• Survey of experts or surveys with open-ended question
• Case Study
• Qualitative research (Interview, Focused Group Discussion, …)
20
10
1/15/2020
Descriptive Research: Use

• To describe characteristics of relevant groups
• To estimate percentage of units in a specified population
exhibiting a certain behavior
• To determine the perceptions about something
• To determine degree to which variables are associated
• To make specific predictions
21
Descriptive Research: Methods

• Secondary data analysed in a quantitative
• Surveys
• Observational data (from physical context or virtual context)
• Panels
– A sample of respondents who have agreed to provide information at
specified intervals over an extended period.
22
11
1/15/2020
Longitudinal vs Cross-Sectional Design
Cross- Sample
Sectional Surveyed
Design at T1
Same
Sample Sample
Longitudinal also
Surveyed
Design Surveyed
at T1
at T2
Time T1 T2
23
Longitudinal vs Cross-Sectional Design
Evaluation Cross-Sectional Longitudinal

Criteria Design Design
Detecting Change - +
Large amount of data collection - +
Accuracy - +
Representative Sampling + -
Response bias + -
Note: + indicates a relative advantage over the other procedure,

- indicates a relative disadvantage.
24
12
1/15/2020
Causal Research: Uses

• To understand which variables are the cause (independent
variables) & which variables are the effect (dependent
variables) of a phenomenon
• To determine the nature of the relationship between the

causal variables and the effect to be predicted
• METHOD: Experiments
25
Alternative Research Designs
(a) •Secondary Data Conclusive Research
Analysis •Descriptive/Causal
•Focus Groups
Conclusive Research
(b) •Descriptive/Causal
Conclusive Research •Secondary Data
(c) •Descriptive/Causal Analysis
•Focus Groups
26
13
1/15/2020
Term Project Guideline
Background of Study
Definition of MDP & RP
Developing RQ
Conducting Qualitative Research & Literature Review
Developing Hypothesis
Developing /Adopting Questionnaire
Data Collection
Testing Hypothesis
Drawing Managerial Implication & Conclusion
Limitation of Study
27
28
14
1/21/2020

Session-4
Why do we need qualitative research?
• It gives you an intimate understanding of people

– Helps understand people and their social & cultural
contexts
1
1/21/2020
Qualitative Research Vs. Discussion forums

• Focus is on listening
• Attempted objectivity
• Non-competitive
• No stakes to prove
• Information is sought one-way
• Strangers/ peers enhance comfort in sharing information
So then for what Qualitative Research is?

“Centrally concerned with understanding things”
• Exploring, Explaining, Linking

…the evidence - associations, symbols, rituals, …
…with the interpretation - their meaning, value, …
…and
• Identifying
…the deep-rooted bonds/ strength - emotional pay-offs beyond the
rational, the relationships, …
…potential triggers of change, loyalty drivers…
• Develop
…hypotheses of likely future outcomes
2
1/21/2020
And the key limitations…

• Is tentative diagnostic, not evaluative
• Does not represent your population (or all your consumers)
• Artificial behaviors as Respondents are invited
• Control issues
• Is highly researcher dependent
Forms (Types) of Qualitative Study
3
1/21/2020
Forms of Qualitative Research

• In-depth Interview & Expert Interview
• Focused Group Discussion & Online FGD
• Projective Technique
• Ethnographic Technique
• Netnographic Study
In-depth Interview
• One on one interviews
• Encourages an intimate dialogue
• Variations in interviews
– Depth Interviews – 45 minutes to 1 hour
– Intensive Depth Interviews – 2 to 3 hours
– Focused Interviews – 30 minutes (for advertising check)
• Appearance of Interviewer must match with respondent
4
1/21/2020
Focused Group Discussions

• Consists of 8-10 homogenous people
• Encourages discussion on a particular subject among
participants spontaneously
• Moderated by a researcher whose role is to guide
discussion
• Variations in group discussions
– Focus Group Discussions (FGD) – 1.5 to 2 hours
– Extended Group Discussions (EGD) – 3 hours
– Mini Group Discussions (MGD) – 4 to 6 respondents… sensitive yet group
format is more comforting
– Conflict Discussions – contrasting behavior
Focus Groups Vs. Depth Interviews
Characteristic Focus Depth

Groups Interviews
Group synergy & dynamics + -
Peer pressure/group influence - +
Generation of innovative ideas + -
In-depth probing of individuals - +
Uncovering hidden motives - +
Discussion of sensitive topics - +
Note: + indicates a relative advantage over other procedure,

10
5
1/21/2020
Focus Groups Vs. Depth Interviews
Characteristic Focus Depth

Groups Interviews
Interviewing competitors - +
Interviewing professional respondents - +
Scheduling of respondents - +
Amount of information + -
Bias in moderation & interpretation + -
Cost per respondent + -
Time (interviewing & analysis) + -
Note: + indicates a relative advantage over other procedure,

11
Choosing Appropriate Tool

• In-depth interview when
– Need depth on an individual’s practices and attitudes
– Understand practices, product interaction… mapping claimed Vs
real… Harpic usage, cleaning of exhaust fans
– Sensitivity of the subject… body odor
– Reality context… kind of houses, kind of bathroom, kind of
surroundings
• Focus groups discussion when

– Need width of responses on practices, attitudes & beliefs
– Participant dynamics will spark new thoughts
– Exploring triggers & barriers
– Concept evaluation & development
– Contrasting user profiles
12
6
1/21/2020
Defining Projective Techniques

• An unstructured, indirect form of questioning that
encourages respondents to project their underlying
motivations, beliefs, attitudes or feelings regarding issues of
concern.
• In projective techniques, respondents are asked to interpret

behaviour of others.
• In interpreting behaviour of others, respondents indirectly
project their own motivations, beliefs, attitudes, or feelings
into situation.
13
14
7
1/21/2020
Projective Technique- Broad types
Role play Guided fantasy…

Drawing Thematic Appreciation Test
Third Person…
Sentences
Conversations
Bubbles...
Grouping by
preference
 Word
Picture
Brand personification …
15
Advantages of Projective Techniques

• … may elicit responses that subjects would be hesitant or
unable to give if they know the purpose of study.
– But researchers must be aware of ethical issue
(especially on refusal).
• Helpful when the issues to be addressed are personal,

sensitive, or subject to strong social norms.
• Helpful when underlying motivations, beliefs, and attitudes

are operating at a subconscious level.
16
8
1/21/2020
Disadvantages of Projective Techniques

• Require highly-trained interviewers.
• Skilled interpreters are required to analyze responses.
• There is a serious risk of interpretation bias.
• They tend to be expensive.
• May require respondents to engage in unusual behaviour.
17
Guidelines for Using Projective Techniques

• … should be used when required information cannot be
accurately obtained by direct methods.
• …. should be used to gain initial insights & understanding.
• Given their complexity, projective techniques should not be
used naively.
18
9
1/21/2020
Ethnography
Nature of Observation
ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process
19
20
10
1/21/2020
Ethnography
Nature of Observation
ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process
Shadowing Accompanied shopping Cooking observations
In-home visit A day in the life Mystery shopping
21
Ethnography: Importance
• One of the best ways to gain deeper customer insight.
• …to get to know customers and their culture, & role
certain products play in their lives.
• …shows consumer reality rather than consumer
reconstruction.
• …helps identify contradictions between what people say
they do & what they actually do.
• … enables us to identify their hidden needs- and this is
where real breakthroughs can occur.
22
11
1/21/2020
Netnography
• Ethnography: Study of a community
• Netnography: Study of an online community
• Data Sources:
– Archival Netnographic Data
– Social Network Analysis
– Elicited Netnographic Data
23
Netnography: Good & Bad
Advantage Disadvantage
• Large sample possible quickly • Identity validation

• Immediate analysis • Loss of non-verbal
• Considerably cheaper • Loss of intangibles
• Sensitive topics accessible • Reliability & integrity of
• Historic archives often available information
• Information overload
24
24
12
1/21/2020
Reference
• Qualitative Research- Discussion Guide: Textbook (page
167- 171)
• FGD discussion Guide: Textbook (page 140-142)
25
Issue
• In recent times, education loan from bank has grown & SBI is the one of
major player in this market.
• In 2010, education loan market of X premier business school students in
India was studied for SBI. It was found that among the students of X
Business school, market share of education loan for SBI was 87%.
• However, in 2013 the market share dipped to 82%.
• Again a study was conducted in 2016, it was found that market share of
SBI in education loan among students of that business school has further
slipped to 76%.
• It was also observed that market share of CBI, another PSU bank, is
constantly increasing. In 2010, the market share of education loan for
CBI was 7% and it has increased to 18% in 2016.
• SBI is now worried about losing market share among students &
hired you to conduct research.
(Above Information is just an illustration)
26
13
1/21/2020
Issue
• Design preference & perception for mobile phones
27
28
14
1/21/2020

Session-5
A Look at Research Data
Research Data
Secondary Data Primary Data
Qualitative Data Quantitative Data
Descriptive Causal
Survey Observational & Experimental

Data Other Data Data
1
1/21/2020
Measurement and Scale
• If things exist to some extent, they ought to be

measured.
• Measurement is the assignment of numbers to

objects, events, or people according to some rules.
• To assign numbers, we need a scale.
Scale
• A scale is a system of classifying objects & persons

– in a series of steps or degrees
– according to a standard (i.e., relative size, rank, amount,
etc.).
• Measurements can be along four different scales:

– Nominal
– Ordinal
– Interval
– Ratio
2
1/21/2020
Nominal Scale
• One uses names or labels according to certain

characteristics.
– Variables assessed on nominal scale are called
categories.
• Charles Darwin used such categorical scales for

species.
– e.g., Telephone numbers; Girls vs. Boys in this session.
• Basic operation: = or ≠
Ordinal Scale
• Ordinal measurements tell ranks or difference

between items
– e.g., Class ranks; Hardiness of minerals
• Scale may also use names with an order such as:

– “Below average", “Average", and “Above average"; or
– "very unsatisfied", “Neutral", and "very unsatisfied."
• Basic operation: < vs. >
3
1/21/2020
Interval Scale
• In interval scales, the steps are considered to be

equal.
• Equality of successive steps
– Difference between 2 & 1 is the same as the difference
between 7 & 6.
• We can use numbers, steps, phrases, or distance to
represent the successive intervals.
– e.g., Ratings; Fahrenheit/Celsius temperature
• Basic operation: + vs. -
Ratio Scale
• Most measurement in physical sciences &

engineering is done on ratio scales.
• Distinguishing feature of a ratio scale is possession
of a natural zero value.
– e.g., Mass, length, time, plane angle, electric charge, &
GDP
• Basic operation: Meaning of ratios
4
1/21/2020
Scales of Measurement: Illustration
Numbers
Nominal 4 81 9
Assigned to
Runners
Rank Order of
Ordinal
Winners
Third Second First

Place Place Place
Interval Performance
Rating on a 0 to 8.2 9.1 9.6
10 Scale
Time to Finish in 15.2 14.1 13.4
Ratio
Seconds
Comparison of Scales: Characteristics
Characteristics Label Order Distance Origin

Scale
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
10
5
1/21/2020
Interval Scale: Variants

• Likert Scale requires respondents to indicate a degree of
agreement or disagreement with each statement about
stimulus object.
Strongly Disagree Neither Agree Strongly
disagree agree nor agree
disagree
Pantaloon sells high-quality merchandise. 1 2 3 4X 5
• Semantic Differential Scale is rating scale with end points

associated with bipolar labels that have semantic meaning.
Pantaloon is:
Modern :--:--X:--:--:--:--:--: Old-fashioned
11
A few exercise on Scale

• Gender: 1. Male 2. Female
Nominal Scale
• I consider myself to be loyal to Nike Brand.

(Please rate the statement)
Strongly Disagree Strongly Agree
1 2 3 4 5
Interval Scale
• How many hours do you use Internet ?...........Hrs

Ratio Scale
12
6
1/21/2020

• Education: 1. Less than 10th Std 2. 10th Std 3.12th Std
4.Graduate 5. Postgraduate
Nominal Scale or Ordinal Scale
• Why do you use the current brand? Please rank the
preference
a. This is exactly the product I have always wanted to use.
b. This is the best available brand.
c. It is a force of habit
d. There is really no choice.
Ordinal Scale
• What is your age? …..Years

Ratio Scale
13

• This supplier keeps promises it makes to our firm.
(Please rate the statement)
Strongly Disagree Strongly Agree
1 2 3 4 5
Interval Scale
• Please rank following footwear brands on your preference.

A. Bata
B. Adidas
C. Nike
D. Reebok
E. Liberty
Ordinal Scale
14
7
1/21/2020
Questionnaire
• A questionnaire is a formalized set of questions for

obtaining information from respondents.
• Questionnaire Types: Unstructured questions & Structured
questions
• Determining Order of Questions (Opening Questions, Type of

Information & Difficult Questions)
• Pretesting of Questionnaire: Refers to testing of questionnaire on a
small sample of respondents to identify & eliminate potential problems.
15
Individual Question Content :

Are Several Questions Needed Instead of One?
“Do you think Coca-Cola is a tasty and refreshing soft drink?”
(Incorrect)
• Such a question is called a double-barreled question,
because two or more questions are combined into one.
• To obtain the required information, two distinct questions

should be asked:
“Do you think Coca-Cola is a tasty soft drink?” and
“Do you think Coca-Cola is a refreshing soft drink?”
(Correct)
• Sometimes, several questions are needed to obtain the
required information in an unambiguous manner.
16
8
1/21/2020
Overcoming Unwillingness To Answer

• Please list all the departments from which you purchased
merchandise on your most recent shopping trip to a department
store.
(Incorrect)
• In the list that follows, please check all the departments from
which you purchased merchandise on your most recent shopping
trip (last shopping) to a department store.
1. Women's dresses ____
2. Men's apparel ____
3. Children's apparel ____
4. Cosmetics ____
.
.
.
16. Jewelry ____
17. Other (please specify) ____
(Correct)
17
Choosing Question Wording –

Use Ordinary Words
• “Do you think the distribution of Thumps Up is adequate?”
(Incorrect)
• “Do you think Thumps Up is readily available (within 1

kilometer) when you want to buy it?”
(Correct)
18
9
1/21/2020

Define the Issue
• Which brand of shampoo do you use?
(Incorrect)
• Define the issue in terms of who, what, when, where, why, and way
(the six Ws). Who, what, when, and where are particularly important.
• Which brand or brands of shampoo have you personally

used at home during the last month? In case of more than
one brand, please list all the brands that apply.
1. Clinic Plus ____
2. Head & Shoulders ____
3. Pantene ____
.
13. Dove ____

14. Other (please specify) ____
(Correct)
19

Use Unambiguous Words
• In a typical month, how often do you shop in department
stores?
_____ Never
_____ Occasionally
_____ Sometimes
_____ Often
_____ Regularly
(Incorrect)
• In a typical month, how often do you shop in department
stores?
_____ Less than once
_____ 1 or 2 times
_____ 3 or 4 times
_____ More than 4 times
(Correct)
20
10
1/21/2020

Avoid Leading or Biasing Questions
• Do you think that patriotic Indian should buy imported
automobiles when that would put Indian labor out of work?
_____ Yes
_____ No
_____ Don't know
(Incorrect)
• Do you think that Indian should buy imported automobiles?
_____ Yes
_____ No
_____ Don't know
(Correct)
• A leading question is one that clues the respondent to what
the answer should be.
21

Avoid Implicit Assumptions
• Are you in favor of a balanced budget?
(Incorrect)
• Questions should not be worded so that the answer is
dependent upon implicit assumptions about what will
happen as a consequence.
• Are you in favor of a balanced budget (spending on social

security vs tax collection) if it would result in an increase in
the personal income tax?
(Correct)
22
11
1/21/2020

Avoid Generalizations and Estimates
• “What is the annual per capita expenditure on groceries in
your household?”
(Incorrect)
• “What is the monthly (or weekly) expenditure on groceries

in your household?”
and
• “How many members are there in your household?”
(Correct)
23
Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?
• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size
24
12
1/21/2020
25
13
31-Jan-20

Session-6
Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?
• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size
1
31-Jan-20
Classification of Sampling Techniques
Sampling Techniques
Nonprobability Probability
Sampling Sampling
Convenience Judgmental Quota Snowball

Sampling Sampling Sampling Sampling
Simple Random Systematic Stratified Cluster

Convenience Sampling
• Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected
because they happen to be in the right place at the right
time.
– use of students, and members of social organizations
– mall intercept interviews without qualifying the
respondents
– “people on the street” interviews
2
31-Jan-20
Judgmental Sampling
• Judgmental sampling is a form of convenience sampling in
which the population elements are selected based on the
judgment of the researcher.
– test markets
– purchase engineers selected in industrial marketing
research
– expert witnesses used in court
Quota Sampling
• Quota sampling may be viewed as two-stage restricted
judgmental sampling.
– The first stage consists of developing control categories, or quotas,
of population elements.
– In the second stage, sample elements are selected based on
convenience or judgment.
Population Sample
composition composition
Control
Characteristic % % Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000
3
31-Jan-20
Snowball Sampling
• In snowball sampling, an initial group of respondents is
selected, usually at random.
• After being interviewed, these respondents are asked to
identify others who belong to the target population of
interest.
• Subsequent respondents are selected based on the
referrals.
Classification of Sampling Techniques
Sampling Techniques
Nonprobability Probability
Sampling Techniques Sampling Techniques
Convenience Judgmental Quota Snowball

Simple Random Systematic Stratified Cluster

4
31-Jan-20
Simple Random Sampling

• Each element in the population has a known & equal
probability of selection.
• Each possible sample of a given size (n) has a known and
equal probability of being the sample actually selected.
• This implies that every element is selected independently
of every other element.
Systematic Sampling
• The sample is chosen by selecting a random starting point
and then picking every ith element in succession from the
sampling frame.
– For example, there are 100,000 elements in the population and a
sample of 1,000 is desired. In this case the sampling interval, i, is
100. A random number between 1 and 100 is selected. If, for
example, this number is 23, the sample consists of elements 23, 123,
223, 323, 423, 523, and so on.
10
5
31-Jan-20
Stratified Sampling
• A two-step process in which the population is partitioned
into subpopulations, or strata.
– The strata should be mutually exclusive & collectively
exhaustive in that every population element should be assigned to
one and only one stratum and no population elements should be
omitted.
– Next, elements are selected from each stratum by a random
procedure, usually SRS.
• A major objective of stratified sampling is to increase
precision without increasing cost.
• The elements within a stratum should be as homogeneous
as possible, but the elements in different strata should be as
heterogeneous as possible.
11
Cluster Sampling
• The target population is first divided into mutually exclusive
and collectively exhaustive subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a
probability sampling technique such as SRS.
• For each selected cluster, either all the elements are
included in the sample (one-stage) or a sample of elements
is drawn probabilistically (two-stage).
• Elements within a cluster should be as heterogeneous as
possible, but clusters themselves should be as
homogeneous as possible. Ideally, each cluster should be a
small-scale representation of the population.
12
6
31-Jan-20
Technique Strengths Weaknesses

Nonprobability Sampling
Convenience sampling Least expensive, least Selection bias, sample not
time-consuming, most representative, not recommended for
convenient descriptive or causal research
Judgmental sampling Low cost, convenient, Does not allow generalization,
not time-consuming subjective
Quota sampling Sample can be controlled Selection bias, no assurance of
for certain characteristics representativeness
Snowball sampling Can estimate rare Time-consuming
characteristics
Probability sampling
Simple random sampling Easily understood, Difficult to construct sampling
results projectable frame, expensive, lower precision,
no assurance of representativeness.
Systematic sampling Can increase Can decrease representativeness
representativeness,
easier to implement than
SRS, sampling frame not
necessary
Stratified sampling Include all important Difficult to select relevant
subpopulations, stratification variables, not feasible to
precision stratify on many variables, expensive
Cluster sampling Easy to implement, cost Imprecise, difficult to compute and
effective interpret results
13
Sampling Plan: Qualitative Study
14
7
31-Jan-20
Sampling for Qualitative Research

• Type of sampling: Purposive - Always (Non-probability-
Judgmental or Convenience).
• Qualitative research needs to represent the ‘spectrum’ of all
possible points of view on the given topic
• Guideline on Sample Size
15
A few recruitment do’s & don’ts

• Don’t make the recruitment criteria so stringent that you are
meeting a very small niche compared to your universe
• Mask the category you want to research
– Else the participant rehearses & comes as the consultant
• Attempt to get homogenous sets unless the design includes a
conflict group
– Mixed gender groups don’t work due to high socially desirable behaviour
patterns or impression management efforts
• Don’t put in only 2 peer groups into a single group – can result in
group conflict
– Never more than 2 friends even in a peer group …unless the entire group is
one group of friends …but then you need many such groups
• For creative/ developmental interactions pre-decide if you want
creative/ better than average consumers
– Check for creativity
16
8
31-Jan-20
Review of what we have done till now…

• Introduction: Research
• Problem Definition
• Approach to Problem
• Research Design
• Qualitative Research
• Measurement Scale
• Questionnaire Designing
• Sampling
17
17
Hypothesis Testing: Introduction
18
9
31-Jan-20
Null vs. Alternate Hypothesis

• Null Hypothesis (H0): is a statement about a population, this is
assumed to be true.
– … is generally assumed to be true until evidence indicates
otherwise.
– … is a statement of the status quo, one of no difference or no effect.
– If null hypothesis is not rejected, no changes will be made.
• Alternate Hypothesis (H1): is a statement that directly contradicts

a null hypothesis by stating contrary thing about population.
– … is one in which some difference or effect is expected.
– … is a statement that is hoped or expected to be true instead of null
hypothesis.
– Accepting alternative hypothesis will lead to changes in opinions or
actions.
19
Hypothesis Formulation: Illustration-1

• A box is designed to have 25 kg of apples. Farmers fill such
boxes in the field & then boxes are sold to retailer. Retailers
complain that many boxes do not contain 25 kg of apples. In
order to investigate the issue, research to be conducted.
– Develop null & alternate hypothesis for this issue
• Ho μ > 25
• H1 μ < 25
– Take action if Ho is rejected (H1 is accepted).
• One tail study.

20
20
10
31-Jan-20

• Bags of passenger in air travel (Mean of weight should be
20 kg). Airline wants to investigate the issue, to see whether
weight of bag is more than 20 kg or not.
• Ho μ < 20
• H1 μ > 20
• One tail study.
21
21

• Mean of Mileage of motorbike is expected to be 50 kmpl.
The company wants to investigate whether mileage is 50
kmpl or not.
• Ho μ = 50
• H1 μ =/ 50
• Two-Tail Study
22
22
11
31-Jan-20

• RQ
– Do the customers of Big Bazar exhibit store loyalty and what is their
characteristics?
• Hypothesis (alternate)
customers.
23

• Hypothesis (Null vs. alternate)
– H0: Customers who are store loyal are more or as (at least as)
knowledgeable about the shopping environment as other
customers.
– H0: Store-loyal customers are less or as risk-averse as non-loyal

customers.
customers.
– H0: Customers of Big Bazar are not loyal.

24
12
31-Jan-20
Types of Hypotheses
Null
– H0: μ = 50
– H0: μ < 50
– H0: μ > 50
Alternate
– HA: μ =/ 50
– HA: μ > 50
– HA: μ < 50
25
25
One tail Study
26
26
13
31-Jan-20
Two-tail Study
27
27
Hypothesis Testing: Introduction

• Hypothesis Testing is a method for testing a claim or
hypothesis about a parameter in a population, using data
measured in a sample.
– In other words, it is a systematic way to test claims or ideas about a
group or population.
• Level of significance refers to a criterion of judgment

upon which a decision is made regarding the value stated in
a null hypothesis.
– The criterion is based on the probability of obtaining a statistic
measured in a sample if the value stated in the null hypothesis were
true.
28
28
14
31-Jan-20
Hypothesis Testing: Calculation

• Test statistic is a mathematical formula that allows researchers
to determine the likelihood of obtaining sample outcomes if the
null hypothesis were true. The value of the test statistic is used to
make a decision regarding the null hypothesis.
• A p value is the probability of obtaining a sample outcome,

given that the value stated in the null hypothesis is true.
– p value for obtaining a sample outcome is compared to the level of
significance.
29
29
Decision about Hypothesis

• Reject the null hypothesis
– The sample mean is associated with a low probability of
occurrence when the null hypothesis is true.
• Do not reject (Retain) the null hypothesis

– The sample mean is associated with a high probability of
occurrence when the null hypothesis is true.
• Note:
– A null hypothesis may be rejected, but it can never be accepted
based on a single test.
– In classical hypothesis testing, there is no way to determine whether
the null hypothesis is true.
30
30
15
31-Jan-20
Choose a Level of Significance

• Type I Error: false positives
– Type I error occurs when the sample results lead to the rejection of
null hypothesis when it is in fact true.
– Probability of type I error (α ) is also called level of significance.
• Type II Error: false negatives

– Type II error occurs when, based on the sample results, null
hypothesis is not rejected when it is in fact false.
– Probability of type II error is denoted by β .
31
31
A Broad Classification of Hypothesis Tests
Hypothesis Tests
Tests of Tests of
Association Differences
Median/
Distributions Means Proportions Rankings
32
32
16
31-Jan-20
Frequency Distribution
33
Frequency Distribution
• In a frequency distribution, one variable is considered at a
time.
– A frequency distribution for a variable produces a table of
frequency counts, percentages, & cumulative percentages for all
values associated with that variable.
• Statistics Associated with Frequency Distribution

– Measures of Location
– Measures of Variability
– Measures of Shape
34
34
17
31-Jan-20
Measures of Location
• Mean
– Most commonly used measure of central tendency.
– Used when data is in interval or ratio scale.
• Median
– Middle value when data are arranged in ascending or descending
order. It is the 50th percentile.
– When data is in Ordinal Scale & also interval or ratio scale
• Mode
– The value that occurs most frequently & represents the highest
peak of the distribution.
– Mode is a good measure of location when the variable is inherently
categorical or has otherwise been grouped into categories.
35
35
Measures of Variablity
• Variability is a measure of the dispersion or spread of
scores in a distribution.
– Variability ranges from 0 to ∝.
• Range
• Interquartile Range
• Variance
– Mean squared deviation from the mean. The variance can never be
negative.
• Standard Deviation
– Square root of the variance.
• Coefficient of variation
– Ratio of SD to the mean expressed as a percentage & is a unitless
measure of relative variability.
– Can be used with ratio scale only.
36
36
18
31-Jan-20
Measures of Shape: Skweness

• Skewness: A skewed distribution is a distribution of scores
that includes outliers or scores that fall substantially above
or below most other scores in a data set.
– Tendency of deviations from mean to be larger in one direction than
in the other. It can be thought of as tendency for one tail of the
distribution to be heavier than other.
Symmetric Distribution
Skewed Distribution
Mean Mean Median

Median Mode
Mode 37
37
Measures of Shape: Skewness of distribution
• A positively skewed
distribution is a
distribution of scores
where a few outliers are
substantially larger (toward
the right tail in a graph)
than most other scores.
• A negatively skewed
distribution is a
distribution of scores
where a few outliers are
substantially smaller
(toward the left tail in a
graph) than most other
scores. 38
38
19
31-Jan-20
Measures of Shape: Kurtosis

• Kurtosis
– Measure of the relative peakedness or flatness of the
curve defined by frequency distribution.
• Kurtosis of a normal distribution is zero.

• If kurtosis is positive, distribution is more peaked than a
normal distribution.
• A negative value means that distribution is flatter than a
normal distribution.
39
39
40
20
2/8/2020

Session-7
Cross-Tabulation
1
2/8/2020
Cross-Tabulation
• While a frequency distribution describes one variable at a time, a
cross-tabulation describes two or more variables simultaneously.
General rule is to
compute % in the
direction of the
independent variable,
across the dependent
variable.
First table is more

acceptable than
second
3
Statistics Associated with Cross-Tabulation

• Chi-Square Test for independence: …is a statistical
procedure to determine whether frequencies observed at
the combination of levels of two categorical variables are
similar to frequencies expected
– To determine whether a systematic association exists, probability of
obtaining a value of chi-square as large or larger than one
calculated from cross-tabulation is estimated.
– Null hypothesis (H0) of NO association between two variables will
be rejected only when calculated value of test statistic is greater
than critical value of chi-square distribution with appropriate
degrees of freedom.
– An important characteristic of chi-square statistic is df associated
with it. df = (r - 1) x (c -1).
2
2/8/2020
Strength of Association in Cross-Tabulation

• phi coefficient is used as a measure of strength of
association in special case of a table with two rows & two
columns (a 2 x 2 table).
χ2
φ=
n

• While phi coefficient is specific to a 2 x 2 table,
contingency coefficient (C) can be used to assess strength
of association in a table of any size. Can be applicable to
square table.
χ2
C=
χ2 + n
• Contingency coefficient varies between 0 & 1.

• Maximum value of contingency coefficient depends on size
of table (number of rows & number of columns). For this
reason, it should be used only to compare tables of same
size.
6
3
2/8/2020

• Cramer's V is a modified version of phi correlation
coefficient & is used in tables larger than 2 x 2. Can be used
for rectangle table
2
φ
V=
min (r-1), (c-1)
χ2/n
V=
min (r-1), (c-1)
Exercise
4
2/8/2020
Internet Usage Data

Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
Case Problem
• To find out frequency distribution of Familiarity with
Internet among sample.
• To find out
– Mean, Median & Mode;
– Standard deviation; &
– Skewness & Kurtosis of Familiarity rating with Internet
among sample.
10
10
5
2/8/2020
Case Problem: Cross Tabulation

• To make cross-table of gender and internet usage
• To find out
– Whether there is any association between theses
variables or not
11
11
Parametric Test
12
6
2/8/2020
A Broad Classification of Hypothesis Tests
Hypothesis Tests
Tests of Tests of
Association Differences
Median/
Distributions Means Proportions Rankings
13
13
Hypothesis Testing Related to Differences

• Parametric tests assume that variables of interest are
measured on at least on interval scale.
• Nonparametric tests assume that variables are measured on
a nominal or ordinal scale.
• Tests can be further classified based on whether one or two or more

samples are involved.
• Samples are independent if they are drawn randomly from different
populations.
• Samples are paired when the data for the two samples relate to the
same group of respondents.
14
14
7
2/8/2020
Snapshot of Hypothesis Testing for Difference
Hypothesis Tests
Parametric Tests Non-parametric Tests

(Metric Tests) (Nonmetric Tests)
One Sample Two or More One Sample Two or More

Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Independent Paired
Samples
Samples Samples
* Two-Group t * Paired
test t test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
15
15
Parametric Test
• One Sample Test
• Two independent Sample test
• Paired Sample test
16
16
8
2/8/2020
One Sample Test
17
Internet Usage Data

Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
18
18
9
2/8/2020
Problem: One Sample Test

• To test whether the familiarity with internet is high (>4) or
not.
• We can use one sample t-test or Z-test, depending on situation.
• Result will show whether the familiarity with internet is high

or not for the sample.
19
19
Problem: One Sample Test

One-Sample Statistics
Std. Error
N Mean Std. Deviation Mean
Familiarity 29 4.72 1.579 .293
One-Sample Test
Test Value = 4
95% Confidence
Interval of the
Difference
Mean
t Df Sig. (2-tailed) Difference Lower Upper
Familiarity 2.470 28 .020 .724 .12 1.32
20
20
10
2/8/2020
21
21
22
22
11
2/8/2020
Two Independent Sample Test
23
Problem: Two Independent Sample Test

• To test Whether the mean of familiarity with internet for
male & female is different or same,
• We can use two independent sample t-test or Z-test,

depending on situation.
• Result will show whether familiarity with internet is same or

different for male & female.
24
24
12
2/8/2020

• In the case of means for two independent samples, the
hypotheses take the following form.
25
25
Two Independent Sample Test: Variance Test

• An F test of sample variance may be performed if it is not
known whether the two populations have equal variance. In
this case, hypotheses are:
H0: 2 = 2
1 2
H1: 2 2
1 2
26
26
13
2/8/2020
Group Statistics
Std. Error
Sex N Mean Std. Deviation
Mean
Male 14 5.71 1.267 .339

Familiarity
Female 15 3.80 1.265 .327
Independent Samples Test

Levene's Test for
t-test for Equality of Means
Equality of Variances
95% Confidence
Sig. (2- Mean Std. Error Interval of the
F Sig. t df Difference
tailed) Difference Difference
Lower Upper
Equal
variances .015 .902 4.070 27 .000 1.914 .470 .949 2.879
assumed
Familiarity
Equal
variances 4.070 26.857 .000 1.914 .470 .949 2.880
not assumed
27
27
28
14
2/10/2020

Session-8
1
2/10/2020
Problem: Two Independent Sample Test

• To test Whether the mean of familiarity with internet for
male & female is different or same,
• We can use two independent sample t-test or Z-test,

depending on situation.
• Result will show whether familiarity with internet is same or

different for male & female.
Two Independent Sample Test: Variance Test

• In the case of means for two independent samples, the
hypotheses take the following form.
2
2/10/2020

• An F test of sample variance may be performed if it is not
known whether the two populations have equal variance. In
this case, hypotheses are:
H0: 2 = 2
1 2
H1: 2 2
1 2
Group Statistics
Std. Error
Sex N Mean Std. Deviation
Mean
Male 14 5.71 1.267 .339

Familiarity
Female 15 3.80 1.265 .327
Independent Samples Test

Levene's Test for
t-test for Equality of Means
Equality of Variances
95% Confidence
Sig. (2- Mean Std. Error Interval of the
F Sig. t df Difference
tailed) Difference Difference
Lower Upper
Equal
variances .015 .902 4.070 27 .000 1.914 .470 .949 2.879
assumed
Familiarity
Equal
variances 4.070 26.857 .000 1.914 .470 .949 2.880
not assumed
3
2/10/2020
Paired Sample Test
Problem: Paired Sample Test

• To test whether the mean of attitude towards internet &
attitude towards technology is same or not.
• We can use paired sample t-test.
• Result will show whether attitude towards internet & attitude

towards technology is same or different for the sample.
4
2/10/2020
Paired Sample Test

• Difference in these cases is examined by a paired samples t-test.
• To compute t for paired samples, paired difference variable, denoted by
D, is formed and its mean & variance calculated. Then t statistic is
computed.
• Degrees of freedom are n - 1, where n is number of pairs.
Paired Sample Test
Number Standard Standard

Variable of Cases Mean Deviation Error
Internet Attitude 30 5.167 1.234 0.225

Technology Attitude 30 4.100 1.398 0.255
Difference = Internet - Technology
Difference Standard Standard 2-tail t Degrees of 2-tail

Mean deviation error Correlation prob. value freedom probability
1.067 0.828 0.1511 0.809 0.000 7.059 29 0.000
10
10
5
2/10/2020
Hypothesis Testing for Examining Differences
Hypothesis Tests


Samples Samples
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples
Samples Samples
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test
11
11
Non-Parametric Test
12
6
2/10/2020
Non-Parametric Tests
• Nonparametric tests are used when the independent
variables are nonmetric.
• Like parametric tests, nonparametric tests are available for

testing variables from one sample, two independent
samples, or two related samples.
13
13
Non-Parametric Test: One Sample test
14
7
2/10/2020
Non-Parametric Test: One Sample

• Sometimes researcher wants to test whether observations
for a particular variable could reasonably have come from a
particular distribution.
– Kolmogorov-Smirnov (K-S)
– Chi-square test
– Binomial test
– Runs test
15
15
One Sample: Non-Parametric Test

• Kolmogorov-Smirnov (K-S) one-sample test is one such
goodness-of-fit test.
• Chi-square test can be performed on a single variable
from one sample. In this context, chi-square serves as a
goodness-of-fit test.
• Binomial test is also a goodness-of-fit test for dichotomous
variables.
• Runs test is a test of randomness for dichotomous
variables. (To determine whether the order or sequence in which
observations are obtained is random)
16
16
8
2/10/2020
K-S One-Sample Test

• To test whether one variable comes from a particular
distribution (Theoretical distribution vs Observed distribution)
• Hypothesis
Ho: Internet Usage are normally distributed
H1: Internet Usage are NOT normally distributed
17
17
K-S One-Sample Test
Descriptive Statistics
Std.
N Mean Minimum Maximum
Deviation
Internet Usage
30 6.60 4.296 2 15
Hrs/Week
One-Sample Kolmogorov-Smirnov Test

Internet Usage
Hrs/Week
N 30
a,b Mean 6.60
Normal Parameters
Std. Deviation 4.296
Absolute .222
Most Extreme
Positive .222
Differences
Negative -.142
Test Statistic .222
Asymp. Sig. (2-tailed) .001c
a. Test distribution is Normal.
b. Calculated from data.
c. Lilliefors Significance Correction. 18
18
9
2/10/2020
K-S One-Sample Test
19
19
K-S Table
20
20
10
2/10/2020
Lilliefors
Test Table
21
21
One-Sample Chi-Square goodness of fit Test

• Chi – Square goodness-of-fit test is a statistical procedure
used to determine…
• …whether observed frequencies at each level of one
categorical variable are similar to or different from the
frequencies we expected at each level of the categorical
variable.
22
22
11
2/10/2020
One-Sample Chi-Square goodness of fit Test

• Observed frequency vs Estimated frequency
χ2 = (fo - fe)2
Σ fe
• Uniform distribution
Ho: The ratings of familiarity with internet are uniformly distributed
H1: The ratings of familiarity with internet are not uniformly
distributed.
• Expected Distribution
Ho: The observed distribution is the same as the expected distribution
H1: The observed distribution is not the same as the expected
distribution
23
23
Exercise: One-Sample Chi-Square Test

Familiarity measured on 1-7 scale ((only six categories, as one category
count is 0)
1. To test whether rating is uniformly distributed
2. To test whether rating follows below mentioned expected distribution;
Expected Distribution of familiarity (only six categories, as one
category count is 0):
– 10%,
– 20%,
– 20%,
– 10%,
– 25%,
– 15%
– If sum of % all categories do not become 100 then rescaling of % is
done to make it 100%.
24
24
12
2/10/2020
Binomial Test
• Expected Proportion (for testing Population proportion)
Ho: p = 0.5
H1: p =/ 0.5
25
25
One-Sample Runs Test

• A "run" of a sequence is a maximal non-empty segment of
sequence consisting of adjacent equal elements.
– Ex: 22-element-long sequence "++++−−−+++−−++++++−−−−"
consists of 6 runs, 3 of which consist of "+" & others of "−".
• Run test is based on null hypothesis that each element in

sequence is independently drawn from same distribution.
• Test of Randomness
Ho: The observations in the sample are generated randomly
H1: The observations in the sample are NOT generated randomly.
26
26
13
2/10/2020
Non-parametric Test: Two Independent Sample test
Chi-Square Test for independence
27
Two Independent Sample: Nonparametric Test

Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15
Column Total 15 15
Number of males & females who use Internet for shopping.
• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?
28
28
14
2/10/2020

• Chi-Square Test for independence: …is a statistical
procedure to determine whether frequencies observed at
the combination of levels of two categorical variables are
similar to frequencies expected.
• Null hypothesis (H0) of NO association between two

variables will be rejected only when calculated value of
test statistic is greater than critical value of chi-square
distribution with appropriate degrees of freedom.
– An important characteristic of chi-square statistic is df associated
with it. df = (r - 1) x (c -1).
29
29
(f - f )2
χ2 =
Σ
o e
f
e
n rn c
fe = n
where nr = total number in the row

nc = total number in the column
n = total sample size
30
30
15
2/10/2020

Chi-Square Tests
Asymp. Sig. Exact Sig. Exact Sig.

Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square
3.333a 1 .068
Continuity
Correctionb 2.133 1 .144
Likelihood Ratio
3.398 1 .065
Fisher's Exact Test
.143 .072
Linear-by-Linear
Association 3.222 1 .073
N of Valid Cases
30
a. 0 cells (0.0%) have expected count less than 5. The minimum expected
count is 7.50.
b. Computed only for a 2x2 table
31
31

• phi coefficient is used as a measure of strength of
association in special case of a table with two rows & two
columns (a 2 x 2 table).
χ2
φ=
n
32
32
16
2/10/2020

• While phi coefficient is specific to a 2 x 2 table,
contingency coefficient (C) can be used to assess strength
of association in a table of any size. Can be applicable to
square table.
χ2
C=
χ2 + n
• Contingency coefficient varies between 0 & 1.

• Maximum value of contingency coefficient depends on size
of table (number of rows & number of columns). For this
reason, it should be used only to compare tables of same
size.
33
33

• Cramer's V is a modified version of phi correlation
coefficient & is used in tables larger than 2 x 2. Can be used
for rectangle table
2
φ
V=
min (r-1), (c-1)
χ2/n
V=
min (r-1), (c-1)
34
34
17
2/10/2020
35
18
2/14/2020

Session-9a
Non-Parametric Test: One Sample test
1
2/14/2020
Non-Parametric Test: One Sample

• Sometimes researcher wants to test whether observations
for a particular variable could reasonably have come from a
particular distribution.
– Kolmogorov-Smirnov (K-S)
– Chi-square test
– Binomial test
– Runs test
Chi-Square Test for independence
2
2/14/2020

Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15
Column Total 15 15
Number of males & females who use Internet for shopping.
• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?
Mann-Whitney U test
3
2/14/2020

• When difference in location of two populations is to be
compared based on observations from two independent
samples, & variable is measured on an ordinal scale, Mann-
Whitney U test can be used.
• In Mann-Whitney U test, two samples are combined & cases are ranked
in order of increasing size.
– Combined cases ranking is assessed.
• Hypothesis
H0: The two populations (male & female) are identical with respect to
familiarity with internet. (Mean Rank with respect to familiarity for two
populations are same)
H1: The two populations (male & female) are not identical with respect
to familiarity with internet. (Mean Rank with respect to familiarity for two
populations are not same)
7
Mann-Whitney U test: Ranking of Combined Case

Familiarity Rank Gender Familiarity Rank Group
Rating (Group) Rating
2 1.5 2 5 16 2
2 1.5 2 5 16 2
3 5.5 1 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
4 11.5 1 6 21.5 2
4 11.5 1 6 21.5 2
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
8
5 16 1
8
4
2/14/2020
Mann-Whitney U: Internet Usage by Gender

Ranks
Sex N Mean Rank Sum of Ranks

Familiarity Male
14 20.25 283.50
Female
15 10.10 151.50
Total
29
Test Statisticsa
Familiarity
Mann-Whitney U 31.500
Wilcoxon W 151.500
Z -3.277
Asymp. Sig. (2-tailed)
.001
Exact Sig. [2*(1-tailed Sig.)]
.001b
a. Grouping Variable: Sex
b. Not corrected for ties. 9
Paired Sample: Nonparametric Test
10
5
2/14/2020

• Wilcoxon matched-pairs signed-ranks test analyzes
differences between paired observations, taking into
account magnitude of the differences.
– It computes differences between pairs of variables & ranks absolute
differences.
• Hypothesis
Ho: Md = 0
H1: Md =/ 0
11
11
Wilcoxon matched-pairs signed-ranks: Attitude

toward difference (Technology- Internet)
Attitude Attitude Difference
Difference Sign of
Respondent toward toward (Technology
Rank Rank
internet technology -Internet)
2 3 3 0 0
5 7 7 0 0
19 6 6 0 0
24 6 6 0 0
26 6 6 0 0
27 5 5 0 0
7 4 5 1 7.5 -
1 7 6 -1 7.5 +
3 4 3 -1 7.5 +
6 5 4 -1 7.5 +
8 5 4 -1 7.5 +
10 7 6 -1 7.5 +
11 4 3 -1 7.5 +
13 6 5 -1 7.5 +
2/14/2020 12
14 3 2 -1 7.5 +
12
6
2/14/2020
13
13

• Paired sample Sign test analyzes differences between
paired observations, taking into account sign of differences.
• Hypothesis
Ho: pluses = minus
H1: pluses =/ minus
14
14
7
2/14/2020

Paired sample Sign test
Frequencies
N
Attitude toward Technology - Negative Differencesa 23
Attitude toward Internet
Positive Differencesb 1
Tiesc 6
Total 30
a. Attitude toward Technology < Attitude toward Internet
b. Attitude toward Technology > Attitude toward Internet
c. Attitude toward Technology = Attitude toward Internet
Test Statisticsa
Attitude toward
Technology - Attitude
toward Internet
Exact Sig. (2-tailed) .000b
a. Sign Test
b. Binomial distribution used.
15
15
Hypothesis Testing for Examining Differences
Hypothesis Tests


Samples Samples
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples
Samples Samples
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test
16
16
8
2/14/2020
17
9
2/14/2020
Session-9b
Causality
• Concept of Causality in Research

– X is only one of a number of possible causes of Y.
– The occurrence of X makes the occurrence of Y more probable (X is a
probabilistic cause of Y).
• Conditions for Causality

– Concomitant variation is the extent to which a cause, X, & an effect, Y,
occur together or vary together in way predicted by hypothesis under
consideration.
– Time order of occurrence condition states that causing event must
occur either before or simultaneously with effect; it cannot occur
afterwards.
– Absence of other possible causal factors means that factor or variable
being investigated should be only possible causal explanation.
1
2/14/2020
Definitions of Terms
• Independent variables
– Variables or alternatives that are manipulated & whose effects
are measured & compared, e.g., price levels.
• Test units
– Individuals, organizations, or other entities whose response to
the independent variables or treatments is being examined,
e.g., consumers or stores.
• Dependent variables
– Variables which measure effect of independent variables on test
units, e.g., sales, profits, market shares.
• Extraneous variables
– Variables other than independent variables that affect response
of test units, e.g., store size, store location, competitive effort.
Experiment & Experimental Design
• Experiment
– Process of manipulating one or more independent variables and
measuring their effect on one or more dependent variables, while
controlling for the extraneous variable
• Experimental design is a set of procedures specifying:

– test units & how these units are to be divided into homogeneous
subsamples
– what independent variables or treatments are to be manipulated
– what dependent variables are to be measured
– how extraneous variables are to be controlled
Illustration:
• Whether humor has positive effect on the purchase intention of the
products that are purchased impulsively.
4
2
2/14/2020
Validity in Experiment
• Internal validity refers to whether manipulation of independent

variables or treatments actually caused observed effects on
dependent variables.
– Did the manipulation of independent variable (e.g., humor) do what it
was supposed to do?
– Control of extraneous variables is a necessary condition for establishing
internal validity.
• External validity refers to whether cause-and-effect relationships

found in experiment can be generalized.
– To what populations, settings, times, independent variables, &
dependent variables can results be projected?
Laboratory Experiment Vs. Field Experiment
Extraneous Variables: Sources
• History refers to specific events that are external to experiment but

occur at the same time as experiment.
• Maturation refers to changes in test units themselves that occur
with passage of time.
• Testing effects are caused by the process of experimentation.
These are effects on experiment of taking a measure on dependent
variable before & after presentation of treatment.
• Instrumentation refers to changes in measuring instrument, in
observers, or in scores themselves.
• Selection bias refers to improper assignment of test units to
treatment conditions.
• Mortality refers to loss of test units while experiment is in progress.
3
2/14/2020
Control of Extraneous Variables
• Randomization refers to random assignment of test units to

experimental groups by using random numbers. Treatment
conditions are also randomly assigned to experimental groups.
• Matching involves comparing test units on a set of key background
variables before assigning them to treatment conditions.
• Design control involves use of experiments designed to control
specific extraneous variables.
• Statistical control involves measuring extraneous variables &
adjusting for their effects through statistical analysis.
Limitations of Experiment
• Experiments can be time consuming, particularly if researcher is

interested in measuring long-term effects.
• Experiments are often expensive. Requirements of experimental
group, control group, & multiple measurements significantly add to
cost of research.
• Experiments can be difficult to administer. It may be impossible to
control for the effects of extraneous variables, particularly in a field
environment.
• Competitors may deliberately contaminate results of a field
experiment.
4
2/14/2020
Experimental Design
One-Shot Case Study
X 01
• A single group of test units is exposed to a treatment X.
• A single measurement on dependent variable is taken.
• There is no random assignment of test units.
• One-shot case study is more appropriate for exploratory than for
conclusive research.
Note:
X: Exposure to a treatment
O: Observation
10
10
5
2/14/2020
One-Group Pretest-Posttest Design
01 X 02
• A group of test units is measured twice.
• There is no control group.
• Treatment effect is computed as 02 – 01.
• Validity of this conclusion is questionable since extraneous

variables are largely uncontrolled.
Note:
O: Observation
11
11
Static Group Design
EG: X 01
CG: 02
• A two-group experimental design.
• EG is exposed to treatment, & CG is not.
• Measurements on both groups are made only after treatment.
• Test units are not assigned at random.
• Treatment effect would be measured as 01 - 02.
Note
EG: Experimental group (EG)
CG: Control group (CG)
12
12
6
2/14/2020
Pretest-Posttest Control Group Design
EG: R 01 X 02
CG: R 03 04
• Test units are randomly assigned to either EG or CG.

• A pretreatment measure is taken on each group.
• Treatment effect is measured as: (02 - 01) - (04 - 03).
Note
R: Randomization
13
13
Posttest-Only Control Group Design
EG : R X 01
CG : R 02
• Treatment effect is obtained by: TE = 01 - 02

• Except for pre-measurement, implementation of this design is very
similar to that of pretest-posttest control group design.
Note
R: Randomization
14
14
7
2/14/2020
Randomized Block Design
• Test units are blocked, or grouped, on the basis of external

variable.
• By blocking, researcher ensures that various experimental & control

groups are matched closely on external variable.
• It is useful when there is only one major external variable, such as

store size, that might influence dependent variable.
15
15
Randomized Block Design
Treatment Groups
Block Store Commercial Commercial Commercial
Number Patronage A B C
1 Heavy A B C
2 Medium A B C
3 Low A B C
4 None A B C
16
16
8
2/14/2020
Factorial Design
• It is used to measure the effects of two or more independent

variables at various levels.
• A factorial design may also be conceptualized as a table.
• In a two-factor design, each level of one variable represents a row
and each level of another variable represents a column.
17
17
Factorial Design
Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor
Low A B C
Medium D E F
High G H I
18
18
9
2/14/2020
One way ANOVA
19
ANOVA: Introduction
• Analysis of variance (ANOVA) is used as a test of means for two or

more populations.
– Null hypothesis is that all means are equal.
• ANOVA must have a dependent variable that is metric (measured

using an interval or ratio scale).
• There must also be one or more independent variables that are all
categorical (nonmetric).
20
20
10
2/14/2020
ANOVA: Introduction
• Categorical independent variables are also called factors.

• A particular combination of factor levels, or categories, is called a
treatment.
• One-way ANOVA involves only one categorical variable, or a single
factor. In one-way ANOVA, a treatment is same as a factor level.
• If two or more factors are involved, analysis is termed n-way
ANOVA.
• If set of independent variables consists of both categorical & metric

variables, technique is Analysis of Covariance (ANCOVA).
– In this case, categorical independent variables are still referred to as
factors, whereas metric-independent variables are referred to as
covariates.
21
21
Statistics Associated with One-Way ANOVA
• SSbetween. Also denoted as SSx, this is variation in Y related to

variation in means of categories of X. This represents variation
between categories of X, or portion of sum of squares in Y related to
X.
• SSwithin. Also referred to as SSerror, this is variation in Y due to
variation within each of categories of X. This variation is not
accounted for by X.
• SSy. This is total variation in Y.
22
22
11
2/14/2020
Decomposition of Total Variation
Independent Variable X
Total
Categories Sample
Within X1 X2 X3 … Xc
Category Y1 Y1 Y1 Y1 Y1 Total
Variation Variation
Y2 Y2 Y2 Y2 Y2 =SSy
=SSwithin : :
: :
Yn Yn Yn Yn YN
Category Y1 Y2 Y3 Yc Y
Mean
Between Category Variation = SSbetween
23
23
Statistics Associated with One-way ANOVA
• F statistic. Null hypothesis that category means are equal in

population is tested by F statistic based on ratio of mean square
related to X & mean square related to error.
• Mean square: Sum of squares divided by appropriate degrees of
freedom.
• eta2 ( 2). Strength of effects of X on Y is measured by eta2 ( 2) that
varies between 0 & 1. It is calculated by SSx/SSy
24
24
12
2/14/2020
Conducting One-Way ANOVA
• Null hypothesis may be tested by the F statistic based on the ratio

between these two estimates:
SS x /(c - 1)
F= = MS x
SS error/(N - c) MS error
• This statistic follows the F distribution, with (c - 1) and (N - c)

degrees of freedom (df).
25
25
Interpret Results
• If null hypothesis of equal category means is not rejected, then

independent variable does not have a significant effect on
dependent variable.
• On other hand, if null hypothesis is rejected, then effect of

independent variable is significant.
• A comparison of category mean values will indicate nature of effect

of independent variable.
26
26
13
2/14/2020
One way ANOVA: Exercise
27
Effect of Promotion or Coupon on Sales
28
28
14
2/14/2020
Illustrative Applications of One-way ANOVA
• Department store wants to determine effect of in-store

promotion (X) on sales (Y).
Null hypothesis is that category means are equal:

H0: µ1 = µ2 = µ3.
29
29
One-Way ANOVA: Effect of In-store Promotion on

Store Sales
Source of Sum of df Mean F ratio F prob.

Variation squares square
Between groups 106.067 2 53.033 17.944 0.000
(Promotion)
Within groups 79.800 27 2.956
(Error)
TOTAL 185.867 29 6.409
Cell means
Level of Count Mean

Promotion
High (1) 10 8.300
Medium (2) 10 6.200
Low (3) 10 3.700
TOTAL 30 6.067
30
30
15
2/14/2020
Issues in Interpretation: Multiple Comparisons
• If null hypothesis of equal means is rejected, we can only conclude

that not all of group means are equal. We may wish to examine
differences among specific means.
• This can be done by specifying appropriate contrasts, or

comparisons used to determine which of means are statistically
different.
31
31
Issues in Interpretation: Multiple Comparisons
• A posteriori contrasts are made after analysis. These are generally

multiple comparison tests.
• These tests, in order of decreasing power, include least significant
difference, Duncan's multiple range test, Student-Newman-Keuls,
Tukey's alternate procedure, honestly significant difference,
modified least significant difference, and Scheffe's test.
• Of these tests, ‘least significant difference’ is the most powerful.
32
32
16
2/14/2020
Assumptions in ANOVA
• Ordinarily, categories of independent variable are assumed to be

fixed. Inferences are made only to specific categories considered.
This is referred to as fixed-effects model.
• Error term is normally distributed, with a zero mean & a constant

variance.
• Error is NOT related to any of categories of X.
• Error terms are uncorrelated. If error terms are correlated (i.e.,

observations are not independent), F ratio can be seriously
distorted.
33
33
Thank You
34
17
22-Feb-20
Session-10
Two / n-way ANOVA
1
22-Feb-20
Factorial Design
• It is used to measure the effects of two or more independent

variables at various levels.
• A factorial design may also be conceptualized as a table.
• In a two-factor design, each level of one variable represents a row
and each level of another variable represents a column.
Factorial Design
Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor
Low A B C
Medium D E F
High G H I
2
22-Feb-20
Two-way ANOVA
Source of Sum of Mean Sig. of

Variation squares df square F F ω2
Main Effects
Promotion 106.067 2 53.033 54.862 0.000 0.557
Coupon 53.333 1 53.333 55.172 0.000 0.280
Combined 159.400 3 53.133 54.966 0.000
Two-way 3.267 2 1.633 1.690 0.226
interaction
Model 162.667 5 32.533 33.655 0.000
Residual (error) 23.200 24 0.967
TOTAL 185.867 29 6.409
Two-way ANOVA
Cell Means
Promotion Coupon Count Mean
High Yes 5 9.200
High No 5 7.400
Medium Yes 5 7.600
Medium No 5 4.800
Low Yes 5 5.400
Low No 5 2.000
TOTAL 30
Factor Level
Means
Promotion Coupon Count Mean
High 10 8.300
Medium 10 6.200
Low 10 3.700
Yes 15 7.400
No 15 4.733
Grand Mean 30 6.067
6
3
22-Feb-20
Issues in Interpretation
• Multiple comparisons,
• Interactions effects
• Relative importance of factors
Issues in Interpretation: Interaction effects
• It occurs when the effect on one independent variable is NOT the

same at the levels of another independent variable.
– The effect of one independent variable (on a dependent variable)
depends on the level of another independent variable
Example:
• Medicines A & B may have no effect when either is taken alone. But, the two
together may have an effect. “The whole is different from the sum of the
parts.”
• Good teachers & small classrooms might both encourage learning. A good
teacher in a small classroom might be especially effective.
4
22-Feb-20
Patterns of Interaction
Case 1: No Interaction Case 2: Interaction

X 22 X 22
Y X 21 Y X 21
X 11 X 12 X13 X 11 X 12 X13
Case 3: Interaction Case 4: Interaction
X 22 X 22
Y X 21 Y
X21
X 11 X 12 X13 X 11 X 12 X13
Issues in Interpretation: Relative Importance
• Omega square: It indicates what proportion of variation in

dependent variable is related to a particular independent variable or
factor & is calculated as follows:
• Normally, ω2 is interpreted only for statistically significant effects.
• For in-store promotion
= 0.557
10
10
5
22-Feb-20
Issues in Interpretation: Relative Importance
• Likewise, ω2 associated with couponing is:
= 0.280
• As a guide to interpreting, a large experimental effect produces an

index of 0.15 or greater, a medium effect produces an index of
around 0.06, and a small effect produces an index of 0.01.
11
11
ANCOVA
12
6
22-Feb-20
Analysis of Covariance (ANCOVA)
• When examining the differences in mean values of dependent

variable related to the effect of controlled independent variables, it
is often necessary to take into account the influence of uncontrolled
independent variables.
13
13
ANCOVA: Examples
• In determining how different groups exposed to different

commercials evaluate a brand, it may be necessary to control for
prior knowledge.
• In determining how different price levels will affect a household's

cereal consumption, it may be essential to take household size into
account.
14
14
7
22-Feb-20
ANCOVA: Illustration
• Suppose we want to determine effect of in-store promotion &

couponing on sales while controlling for effect of clientele.
15
15
Analysis of Covariance
Sum of Mean Sig.

Source of Variation Squares df Square F of F
Covariance
Clientele 0.838 1 0.838 0.862 0.363
Main effects
Promotion 106.067 2 53.033 54.546 0.000
Coupon 53.333 1 53.333 54.855 0.000
Combined 159.400 3 53.133 54.649 0.000
2-Way Interaction
Promotion* Coupon 3.267 2 1.633 1.680 0.208
Model 163.505 6 27.251 28.028 0.000
Residual (Error) 22.362 23 0.972
TOTAL 185.867 29 6.409
Covariate Raw Coefficient
Clientele -0.078
16
16
8
22-Feb-20
MANOVA
17
Multivariate Analysis of Variance (MANOVA)
• MANOVA is similar to ANOVA, except that instead of one metric

dependent variable, we have two or more.
• In MANOVA, null hypothesis is that means on multiple dependent variables

are equal across groups.
• MANOVA is appropriate when there are two or more dependent variables

that are correlated.
• If, however, there are multiple dependent variables that are uncorrelated or
orthogonal, ANOVA on each of dependent variables is more appropriate.
18
18
9
22-Feb-20
MANOVA: Example
• Suppose that four groups, each consisting of 100 randomly selected

individuals, were exposed to four different commercial about Tide
detergent.
• After seeing the commercial, each individual provided ratings on three

dependent variables: Preference for Tide, Preference for P&G, Preference
for commercial itself.
19
19
Nonmetric Analysis of Variance
20
10
22-Feb-20
• It examines difference in central tendencies of more than two

groups when dependent variable does not exhibit normal
distribution.
– Kruskal-Wallis one-way analysis of variance.
– k-sample median test.
21
21
Kruskal–Wallis one-way analysis of variance
• This is an extension of the Mann-Whitney test.

• This test examines the difference in medians (also sometime called
one-way ANOVA on ranks).
• All cases from the k groups are ordered in a single ranking. If the k
populations are the same, the groups should be similar in terms of
ranks within each group. The rank sum is calculated for each group.
From these, the Kruskal-Wallis H statistic, which has a chi-square
distribution, is computed.
22
22
11
22-Feb-20
Kruskal–Wallis one-way analysis of variance
Ranks
PROMOTION N Mean Rank

SALES 1 10 23.50
2 10 15.40
3 10 7.60
Total
30
Test Statisticsa,b
SALES
Chi-Square 16.529
df 2
Asymp. Sig. .000
a. Kruskal Wallis Test
b. Grouping Variable: PROMOTION
23
23
K-Sample Median test
• It is a nonparametric test that tests the null hypothesis that the

medians of the populations from which two or more samples are
drawn are identical.
• The data in each sample are assigned to two groups, one consisting
of data whose values are higher than the median value in the two
groups combined, and the other consisting of data whose values are
at the median or below.
• A Pearson's chi-squared test is then used to determine whether the
observed frequencies in each sample differ from expected
frequencies derived from a distribution combining the two groups.
24
24
12
22-Feb-20
K-Sample Median test
Frequencies
PROMOTION
1 2 3
SALES >
Median 9 4 1
<=
Median 1 6 9
Test Statisticsa
SALES
N 30
Median 6.00
Chi-Square 13.125b
df 2
Asymp. Sig. .001
a. Grouping Variable: PROMOTION
b. 3 cells (50.0%) have expected frequencies less
than 5. The minimum expected cell frequency is
4.7.
25
25
• Kruskal-Wallis test is more powerful than the k-sample median test

as it uses rank value of each case, not merely its location relative to
the median.
• However, if there are a large number of tied rankings in the data,
the k-sample median test may be a better choice.
26
26
13
22-Feb-20
BUSINESS RESEARCH METHOD
Session: 11
CORRELATION
1
22-Feb-20
Product Moment Correlation

 Product moment correlation, r, summarizes strength
of association between two metric (interval or ratio
scaled) variables, say X & Y.
– …is an index used to determine whether a linear or straight-line
relationship exists between X and Y.
– Proposed by Karl Pearson, also known as Pearson correlation
coefficient.
– Also referred to as simple correlation, bivariate correlation, or
merely correlation coefficient.
 r varies between -1.0 & +1.0.
 r between two variables will be same regardless of their
underlying units of measurement
A Nonlinear Relationship for Which r = 0
Y6
0
-3 -2 -1 0 1 2 3
X
4
2
22-Feb-20
Partial Correlation
 Partial correlation coefficient measures association
between two variables after controlling for, or adjusting
for, effects of one or more additional variables.
rx y - (rx z ) (ry z )
rx y . z =
1 - rx2z 1 - ry2z
 Partial correlations have an order associated with them.

Order indicates how many variables are being adjusted
or controlled.
Partial Correlation: Example

 Correlation between Cereal consumption & Income is
0.28;
– Correlation between Income & Household size is 0.48
– Correlation between Cereal consumption & Household size is
0.56
 First-Order Partial Correlation between Cereal
consumption & Income, when effect of Household size is
constant, is 0.02
 Special case when a partial correlation is larger than its

respective zero-order correlation involves a suppressor
effect.
3
22-Feb-20
Part Correlation Coefficient

 Part correlation coefficient represents correlation
between Y & X when linear effects of other independent
variables have been removed from X but not from Y.
Part correlation coefficient, ry(x.z) is calculated as:
rx y - ry z rx z
ry (x . z ) =
1 - rx2z
 Partial correlation coefficient is generally viewed as

more important than part correlation coefficient.
Nonmetric Correlation
 Spearman's rho & Kendall's tau are two measures of
nonmetric correlation.
– Both measures use rankings rather than absolute values of
variables. Both vary from -1.0 to +1.0.
 When data contain a large number of tied ranks,

Kendall's τ seems more appropriate, otherwise
Spearman's rho should be preferred.
4
22-Feb-20
REGRESSION
Regression
• Yi = + Xi + ei
• For which type of research design regression analysis is

used?
10
5
22-Feb-20
Regression
• Examines associative relationships between a metric
dependent variable & one or more independent variables
(does not imply or assume any causality) in following ways:
– Determine whether independent variables explain a significant
variation in dependent variable: whether a relationship exists.
– Determine how much of variation in dependent variable can be
explained by independent variables: strength of relationship.
– Determine structure or form of relationship: mathematical
equation relating independent and dependent variables.
– Control for other independent variables when evaluating
contributions of a specific variable or set of variables.
– Predict values of the dependent variable.
11
Also sometime called Predictive Technique
12
6
22-Feb-20
Plot Scatter Diagram

• A scatter diagram, or scattergram, is a plot of the values
of two variables for all cases or observations.
• Most commonly used technique for fitting a straight line

to a scattergram is the least-squares procedure.
• In fitting the line, the least-squares procedure minimizes

the sum of squared errors, .
13
Plot of Attitude with Duration
9
Attitude
2.25 4.5 6.75 9 11.25 13.5 15.75 18
Duration of Residence
14
7
22-Feb-20
Which Straight Line is Best?

Line 1
Line 2
9 Line 3
Line 4
6
2.25 4.5 6.75 9 11.25 13.5 15.75 18
15
Bivariate Regression
Y β0 + β1X
YJ
eJ
eJ
YJ
X
X1 X2 X3 X4 X5
16
8
22-Feb-20
Bivariate Regression
Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329
ANALYSIS OF VARIANCE
df Sum of Squares Mean Square
Regression 1 105.95222 105.95222

Residual 10 14.96444 1.49644
F = 70.80266 Significance of F = 0.0000
VARIABLES IN THE EQUATION

Variable b SEb Beta (ß) T Significance
of T
Duration 0.58972 0.07008 0.93608 8.414 0.0000
(Constant) 1.07932 0.74335 1.452 0.1772
17
Strength & Significance of Association

• Another, equivalent test for examining significance of
linear relationship between X & Y (significance of b) is
test for significance of coefficient of determination.
Hypotheses in this case are:
H0: R2pop = 0
H1: R2pop > 0
18
9
22-Feb-20
Test for Significance

• Statistical significance of linear relationship between X
& Y may be tested by examining hypotheses:
• A t statistic with n - 2 degrees of freedom can be used,

where
• SEb denotes standard deviation of b & is called standard

error.
19
Standardized regression coefficient

• Standardization is process by which raw data are transformed into
new variables that have mean of 0 & variance of 1. When the data
are standardized, intercept assumes a value of 0.
• Term beta coefficient or beta weight is used to denote
standardized regression coefficient.
Byx = Bxy = rxy
• Relationship between standardized & non-standardized regression

coefficients:
Byx = byx (Sx /Sy)
20
10
22-Feb-20
Assumptions of Regression
 Error term is normally distributed.
 Mean of error term is 0.
 Variance of error term is constant. This variance does

not depend on the values assumed by X.
 Error terms are uncorrelated i.e. observations have
been drawn independently.
21
Multiple Regression
General form of multiple regression model is as
follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk
• As before, coefficient a represents the intercept, but b's

are now partial regression coefficients.
22
11
22-Feb-20
Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Regression 2 114.26425 57.13213

Residual 9 6.65241 0.73916

of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
23
Significance Testing
H0 : R2pop = 0
This is equivalent to the following null hypothesis:
H0: β 1 = β2 = β 3 = . . . = β k = 0
The overall test can be conducted by using an F statistic:
SS reg /k
F=
SS res /(n - k - 1)
= R 2 /k
2
(1 - R )/(n- k - 1)
which has an F distribution with k and (n - k -1) degrees of freedom.
24
12
22-Feb-20
Testing for the significance of the β i's can be done in a manner

similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation:
t= b
SE
b
which has a t distribution with n - k -1 degrees of freedom.
25
Relative Importance of Predictors

 Statistical significance.
 Square of simple correlation coefficient.
 Square of partial correlation coefficient.
 Measures based on standardized coefficients or beta

weights.
 Stepwise regression.
26
13
22-Feb-20
Stepwise Regression
 Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.
27
Stepwise Regression
 Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
 Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
 Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.
28
14
27-02-2020
BUSINESS RESEARCH METHOD
Session: 12a
Multiple Regression
General form of multiple regression model is as
follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk
• As before, coefficient a represents the intercept, but b's

are now partial regression coefficients.
1
27-02-2020
Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Regression 2 114.26425 57.13213

Residual 9 6.65241 0.73916

of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
H0 : R2pop = 0
This is equivalent to the following null hypothesis:
H0: β 1 = β2 = β 3 = . . . = β k = 0
The overall test can be conducted by using an F statistic:
SS reg /k
F=
SS res /(n - k - 1)
= R 2 /k
2
(1 - R )/(n- k - 1)
which has an F distribution with k and (n - k -1) degrees of freedom.
2
27-02-2020
Testing for the significance of the β i's can be done in a manner

similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation:
t= b
SE
b
which has a t distribution with n - k -1 degrees of freedom.
Relative Importance of Predictors

 Statistical significance.
 Square of simple correlation coefficient.
 Square of partial correlation coefficient.
 Measures based on standardized coefficients or beta

weights.
 Stepwise regression.
3
27-02-2020
Stepwise Regression
 Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.
Stepwise Regression
 Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
 Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
 Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.
4
27-02-2020
Caution about R²
 Value of R² can be “artificially” increased by simply
adding explanatory variable to regression model.
– For comparing two regression models with same dependent
variable ‘y’ but differing number of explanatory variables – the
model with higher R² value is not necessarily the better one.
 Adjusted R2. R2, coefficient of multiple determination, is

adjusted for the number of independent variables and
the sample size to account for the diminishing returns.
After the first few variables, the additional independent
variables do not make much contribution.
Adjusted R²
 For comparing two regression models, it is advisable to
compute adjusted R²
Adjusted R² =
Where
• K is the number of independent variables in the model, excluding
the constant.
• N is the number of points in your data sample.
10
5
27-02-2020
Residual Plot: Linear Relationship between

Residuals & Time (Autocorrelation)
Residuals
Time
11
Multicollinearity
 It arises when intercorrelations among predictors are
very high.
• Few Problems due to Multicollinearity
– Partial regression coefficients may not be estimated precisely.
Standard errors are likely to be high.
– Magnitudes, as well as the signs of partial regression coefficients,
may change from sample to sample.
– It becomes difficult to assess relative importance of independent
variables in explaining variation in dependent variable.
– Predictor variables may be incorrectly included or removed in
stepwise regression.
12
6
27-02-2020
Multicollinearity: Correction
• A simple procedure for adjusting for multicollinearity consists of
using only one of the variables in a highly correlated set of
variables.
• Alternatively, the set of independent variables can be transformed

into a new set of predictors that are mutually independent by
using techniques such as principal components analysis.
• More specialized techniques, such as Stepwise Regression can

also be used.
13
THANK YOU
14
7
3/3/2020

Session-12b + 13a
Factor Analysis
1
3/3/2020
An overview of Factor Analysis

• Consider these eight personality attributes:
– Assertive
– Talkative
– Dominant
– Influential
– Creative
– Imaginative
– Thoughtful
– Intellectual
• Is there redundancy?
• Can we reduce these eight concepts to more basic
dimensions?
An overview of Factor Analysis

1) Ask people to rate themselves on each term
2) Compute correlations among the terms. – Do people who score high
on one attribute score high on the other?
Correlation Matrix
2
3/3/2020
Factor Analysis
3) Interpret the pattern of correlations – what is related to what?
4) Identify groups of similar items? “Factors”
5) Psychologize – name the factors – what are the underlying

dimensions?
What is Factor Analysis

• An interdependence technique in that an entire set of
interdependent relationships is examined without making
distinction between dependent & independent variables.
– Examines interrelationships among a large number of variables and,
then, attempts to explain them in terms of their common underlying
dimension
• Procedures primarily used for data reduction &

summarization.
• Removes redundancy or duplication from a set of correlated

variables
3
3/3/2020
Two forms of factor analysis

• Exploratory
– Let the data indicate what’s going on, with no (or little) expectations
• Confirmatory
– Evaluate a specific, clearly-articulated hypotheses about a
correlational structure among variables
– Get “fit” indices & significance tests
Data Matrix
• Factor analysis is totally dependent on correlations between
variables.
• Factor analysis summarizes correlation structure
v1……...vk v1……...vk F1…..Fj

v1
O1 v1
.
. .
.
. .
.
. .
vk
. vk
.
.
. Correlation Factor
. Matrix Matrix
On
Data Matrix
4
3/3/2020
Designing Factor Analysis

• Variables to be included in factor analysis should be
specified based on past research, theory, & judgment of
researcher.
• Variables should be appropriately measured on an
interval or ratio scale.
• An appropriate sample size should be used. As a rough
guideline, there should be at least four or five times as
many observations (sample size) as there are variables.
Exercise: Factor Analysis

• Six Variables measured on 1-7 scale
– V1: Prevents Cavities
– V2: Shiny Teeth
– V3: Strengthen Gums
– V4: Freshens Breath
– V5: Tooth Decay Unimportant
– V6: Attractive Teeth
10
10
5
3/3/2020
Statistics Associated with Factor Analysis

• Bartlett's test of sphericity is a test statistic used to
examine hypothesis that variables are uncorrelated in
population.
– In other words, population correlation matrix is an identity matrix;
each variable correlates perfectly with itself (r = 1) but has no
correlation with other variables (r = 0).
• Kaiser-Meyer-Olkin (KMO) measure of sampling

adequacy is an index used to examine appropriateness of
factor analysis.
– High values (between 0.5 and 1.0) indicate factor analysis is
appropriate.
– Values below 0.5 imply that factor analysis may not be appropriate.
11
Correlation Matrix
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy.
.660
Bartlett's Test of Sphericity Approx. Chi-Square
111.314
df 15
Sig. .000
12
6
3/3/2020
Determine Number of Factors
• Determination Based on Eigenvalues.

– only factors with Eigenvalues greater than 1.0 are retained.
• Determination Based on Percentage of Variance.
– Recommended that factors extracted should account for at least 60%
of variance.
• Determination Based on Scree Plot.
• A Priori Determination.
13
Terms Associated with Factor Analysis

• Eigenvalue. The eigenvalue represents the total variance
explained by each factor.
• Percentage of variance. The percentage of the total
variance attributed to each factor.
• Scree plot. A scree plot is a plot of Eigenvalues against the
number of factors in order of extraction.
14
7
3/3/2020
Results of Principal Components Analysis
15
Scree Plot
3.0
2.5
2.0
Eigenvalue
1.5
1.0
0.5
0.0
1 2 3 4 5 6
Component Number
16
8
3/3/2020
Terms Associated with Factor Analysis

• Factor loadings. Factor loadings are simple correlations
between variables & factors.
• Factor loading plot. A factor loading plot is a plot of the
original variables using the factor loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings
of all the variables on all the factors extracted.
17
18
9
3/3/2020
19
Factor Matrix Before & After Rotation
Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
High Loadings
High Loadings
After Rotation
Before Rotation
20
10
3/3/2020
Unrotated Factors
21
Rotated Factors
22
11
3/3/2020
Criticisms of Factor Analysis

• Derived factors often obvious
– defense: but we get a quantification
• “Garbage in, garbage out”
– really a criticism of input variables
• Correlation matrix is often poor measure of association of
input variables.
• Labels of factors can be arbitrary or lack scientific basis
23
23
Thank You
24
12
3/3/2020

Session-13
Cluster Analysis
1
3/3/2020
Cluster Analysis
• Techniques used to classify objects or cases into relatively
homogeneous groups called clusters.
– Examine an entire set of interdependent relationship.
– No distinction between dependent & independent variable
• Objects in each cluster tend to be similar to each other &

dissimilar to objects in other clusters.
– No a priori information about group or cluster membership for any
of the objects. Groups or clusters are suggested by data, not defined
a priori.
An Ideal Clustering Situation

Variable 1
Variable 2
2
3/3/2020
A Practical Clustering Situation
Variable 1
X
Variable 2
Select a Distance or Similarity Measure

• Most commonly used measure of similarity is Euclidean
distance or its square.
– Euclidean distance is square root of sum of squared differences in
values for each variable.
• Use of different distance measures may lead to different

clustering results (Advisable to use different measures & compare results)
• If variables are measured in vastly different units, clustering

solution will be influenced by units of measurement.
– In these cases, before clustering respondents, we must standardize
data.
– It is also desirable to eliminate outliers (cases with atypical values).
3
3/3/2020
Clustering Procedure
Clustering Procedures
Hierarchical Nonhierarchical
Agglomerative Divisive
Linkage Variance Centroid

Methods Methods Methods
Ward’s Method
Single Complete Average
Select a Clustering Procedure

• Hierarchical clustering is characterized by development
of a hierarchy or tree-like structure.
– Agglomerative clustering starts with each object in a separate
cluster.
– Divisive clustering starts with all the objects grouped in a single
cluster.
• Agglomerative methods are commonly used in research.
4
3/3/2020
Agglomerative Clustering Methods

Single Linkage
Minimum
Distance
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2

Ward’s Procedure
• Ward's procedure: For each cluster, the means for all the variables are
computed. Then, for each object, the squared Euclidean distance to the
cluster means is calculated. These distances are summed for all the
objects. At each stage, the two clusters with the smallest increase in the
overall sum of squares within cluster distances are combined.
10
5
3/3/2020
Centroid Method
11
Select a Clustering Procedure

• Of hierarchical methods, average linkage & Ward's
methods have been shown to perform better than other
procedures.
• It has been suggested that hierarchical & nonhierarchical

methods be used in tandem.
12
6
3/3/2020
Formulate the Problem

• Most important part of clustering is selecting the variables
on which clustering is based.
– Inclusion of even one or two irrelevant variables may distort an
otherwise useful clustering solution.
– Basically, set of variables selected should describe similarity
between objects in terms that are relevant to research problem.
• Variables should be selected based on past research,

theory, or a consideration of hypotheses being tested.
– In exploratory research, researcher should exercise judgment &
intuition.
13
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2
14
7
3/3/2020
Term Associated with Cluster Analysis

• Agglomeration schedule. An agglomeration schedule gives
information on objects or cases being combined at each stage of
a hierarchical clustering process.
15
Results of Hierarchical Clustering
Agglomeration Schedule Using Ward’s Procedure

Stage cluster
Clusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage
1 14 16 1.000 0 0 6
2 6 7 2.000 0 0 7
3 2 13 3.500 0 0 15
4 5 11 5.000 0 0 11
5 3 8 6.500 0 0 16
6 10 14 8.160 0 1 9
7 6 12 10.167 2 0 10
8 9 20 13.000 0 0 11
9 4 10 15.583 0 6 12
10 1 6 18.500 6 7 13
11 5 9 23.000 4 8 15
12 4 19 27.750 9 0 17
13 1 17 33.100 10 0 14
14 1 15 41.333 13 0 16
15 2 5 51.833 3 11 18
16 1 3 64.500 14 5 19
17 4 18 79.667 12 0 18
18 2 4 172.662 15 17 19
19 1 2 328.600 16 18 0
16
8
3/3/2020

• Dendrogram. A dendrogram, or tree graph, is a graphical device
for displaying clustering results. Vertical lines represent clusters
that are joined together. Position of line on scale indicates
distances at which clusters were joined. The dendrogram is read
from left to right.
17
18
9
3/3/2020

• Icicle diagram. An icicle diagram is a graphical display of
clustering results, so called because it resembles a row of icicles
hanging from the eaves of a house. Columns correspond to
objects being clustered, & rows correspond to number of
clusters. An icicle diagram is read from bottom to top.
19
20
10
3/3/2020
Vertical Icicle Plot using Ward’s Method
21
Results of Hierarchical Clustering
Cluster Membership of Cases Using Ward’s Procedure

Number of Clusters
Label case 4 3 2
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
22
11
3/3/2020
Decide on Number of Clusters

• In hierarchical clustering, distances at which clusters are
combined can be used as criteria.
– This information can be obtained from agglomeration schedule or
from dendrogram.
• Relative sizes of clusters should be meaningful.
• Theoretical, conceptual, or practical considerations may
suggest a certain number of clusters.
23
Interpreting & Profiling Clusters

• Interpreting & profiling clusters involves examining cluster
centroids.
– Centroids enable us to describe each cluster by assigning it a name
or label.
• It is often helpful to profile clusters in terms of variables that

were not used for clustering.
– These may include demographic, psychographic, product usage,
media usage, or other variables.
24
12
3/3/2020
Cluster Centroids
Means of Variables
Cluster No. V1 V2 V3 V4 V5 V6
1 5.750 3.625 6.000 3.125 1.750 3.875
2 1.667 3.000 1.833 3.500 5.500 3.333
3 3.500 5.833 3.333 6.000 3.500 6.000
25
Results of Nonhierarchical Clustering
Initial Cluster Centers
Cluster
1 2 3
V1 4 2 7
V2 6 3 2
V3 3 2 6
V4 7 4 4
V5 2 7 1
V6 7 2 3
a
Iteration History
Change in Cluster Centers
Iteration 1 2 3
1 2.154 2.102 2.550
2 0.000 0.000 0.000
a. Convergence achieved due to no or small distance
change. The maximum distance by which any center
has changed is 0.000. The current iteration is 2. The
minimum distance between initial centers is 7.746.
26
13
3/3/2020

Cluster Membership
Case Number Cluster Distance

1 3 1.414
2 2 1.323
3 3 2.550
4 1 1.404
5 2 1.848
6 3 1.225
7 3 1.500
8 3 2.121
9 2 1.756
10 1 1.143
11 2 1.041
12 3 1.581
13 2 2.598
14 1 1.404
15 3 2.828
16 1 1.624
17 3 2.598
18 1 3.555
19 1 2.154
20 2 2.102
27
Final Cluster Centers
Cluster
1 2 3
V1 4 2 6
V2 6 3 4
V3 3 2 6
V4 6 4 3
V5 4 6 2
V6 6 3 4
Distances between Final Cluster Centers
Cluster 1 2 3
1 5.568 5.698
2 5.568 6.928
3 5.698 6.928
28
14
3/3/2020
ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
V1 29.108 2 0.608 17 47.888 0.000
V2 13.546 2 0.630 17 21.505 0.000
V3 31.392 2 0.833 17 37.670 0.000
V4 15.713 2 0.728 17 21.585 0.000
V5 22.537 2 0.816 17 27.614 0.000
V6 12.171 2 1.071 17 11.363 0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Number of Cases in each Cluster

Cluster 1 6.000
2 6.000
3 8.000
Valid 20.000
Missing 0.000
29
Ethical Issue in Research
30
15
3/3/2020
Ethical issue in Research Process

• Central to research in social science is inclusion of living
organism as research subjects
• This imposes an obligation to treat these organism in a
humane, respectful & ethical manner.
31
Ethical issue: Nuremberg Code

• Participant in research should be voluntary
• Participant has the right to know the nature, purposes &
duration of research
• Researcher to ensure participants are not exposed to
harmful research practice
• Research can be terminated by either participant or
researcher if it becomes obvious to either that continuation
of the experiment could be unacceptable.
32
16
3/3/2020
Ethical issue:
American Psychology Association
• Informed consent must include
– Purpose, expected duration, & procedures of research
– Right to decline to participate & to withdraw from research from
research once participation has begun
– Foreseeable consequences of declining or withdrawing
– Reasonably foreseeable factors that may be expected to influence
their willingness to participate such as potential risk, discomfort, or
adverse effects
– Any prospective research benefits
– Limits of confidentiality, Incentives for participants
– Whom to contact for questions about research & research
participants’ right
• Consent must be obtained for recording
• Steps taken to protect prospective participants
33
Fraud in Research
• Data Fabrication
– Making up data or results and reporting them
• Falsification
– Manipulating research materials, equipment, or processes, or
changing or omitting data or results such that research is not
accurately represented in research record.
• Plagiarism
– Appropriation of another person’s idea, processes, results or words
without giving appropriate credit
34
17
3/3/2020
35
18

BRM Merged 1-13 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BRM Merged 1-13 PDF

Uploaded by

Copyright:

Available Formats

1/7/2020

Business Research Method

XLRI- Xavier School of Management, Jamshedpur

Course Outline: Brief

– Group Project: Group of 6 members shall work on a topic. Project submission

• Grading will be as per institute norms.

Class Conduct: Guideline

Can you share any example of research?

Why should we conduct research?

Research: Different Terms

• A studious inquiry or examination (Merriam-Webster Online Dictionary).

• Systematic and objective process of gathering, recording,

• Systematic enquiry that provides information to guide

Used to identify & define

Generate, refine, & evaluate

Monitor performance (of firm or

Improve understanding of process

Summary of pointers about Research

• Ability to take an informed decision is generated through a

• All steps of research process are information-centric.

• All steps in a research are interrelated & no independent activity

Research Suppliers & Services

FULL SERVICE LIMITED SERVICE

How can we classify Research?

Problem Identification Problem Solving

To help identify problems

Research Classification: Discussion

• Should McDonalds add Italian pasta dinners to its menu?

• Should P&G add a high-priced less-foam based detergent

Research Classification: Discussion

Steps involved in Research

What can be broad steps of Research?

Steps of Research Process

Step 1: Defining the Problem

Step 2: Developing an Approach to the Problem

Step 3: Formulating a Research Design

Step 4: Doing Field Work or Collecting Data

Step 5: Preparing and Analyzing Data

Step 6: Preparing and Presenting the Report

“The truly serious mistakes are made not as a result of

• The most important step in research

• Problem Definition covers purpose of study, relevant

Why is it important to clearly define problem?

Problem Definition: Genesis

• Situation Narration by management

Exercise before Problem Definition

Environmental Context of the Problem

Past Information & Forecasts

Resources & Constraints

Marketing & Technological Skills

Management Decision Problem (MDP)

• The Problem being faced by decision maker for which

Research Problem (RP)

• A statement of the decision problem in research terms

MDP vs. RP: Illustration

Should a new product be To determine consumer preferences

• RP asks what information is needed & how it can be

Problem Definition: Steps Involved

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur

Problem Definition: Steps Involved

– RP asks what information is needed & how it can be obtained