You are on page 1of 209

1/7/2020

Business Research Method

Session-1

XLRI- Xavier School of Management, Jamshedpur

Course Outline: Brief


• Functional Knowledge
– Quizzes : 30%
– Class Participation/Assignment : 10%
– Group Project : 30%
– End Term : 30%
Note:
– Class Participation will be assessed by the instructor(s) based on class
preparation, meaningful participation, sincerity, discipline, regular
attendance, and general behavior in the class. Disruptive attendance and
indiscipline related to course will be penalized.

– Group Project: Group of 6 members shall work on a topic. Project submission


guidelines & submission dates will be communicated later.

• Grading will be as per institute norms.

1
1/7/2020

Class Conduct: Guideline


• Discipline in the class will be guided by student manual of the institute.
• Use of mobile phone is not allowed in the class. Please respect others in
the class by turning off mobile phones & other electronic devices.
• Request you to refrain from class disturbing activities (e.g., Late arrivals
to class, cross-talking, movements during the class or any other
disturbing activity).
• Please attend your BRM session only as per your sections (Pls don't
attend BRM session of other section).
– However, if your situation is compelling (other than medical), you may attend BRM
session with other section, only after mail-based permission with 'NO attendance for
that class'. Medical issue will be treated differently.

Discussion

Can you share any example of research?

2
1/7/2020

I don’t know if we
should change the
package of Colgate
toothpaste?

Discussion

Why should we conduct research?

3
1/7/2020

Research…

Provides information
to guide decisions

Research…

Reduces risk in
decision making

4
1/7/2020

Research: Different Terms


• Business Research Method
• Social Research Method
• Market Research
• Social Science Research

Discussion

What is Research?

10

5
1/7/2020

Defining Research…
• Systematic investigation into and study of materials &
sources in order to establish facts & reach new conclusions
(Oxford dictionary).

• A studious inquiry or examination (Merriam-Webster Online Dictionary).

• Systematic and objective process of gathering, recording,


and analyzing data for aid in making business decisions
(Zikmund, 2007).

• Systematic enquiry that provides information to guide


managerial decisions (Cooper & Schindler, 2009).

11

Defining Research
• Research is the systematic & objective
– Identification (of information)
– Collection (of information)
– Analysis (of information)
– Dissemination (of information) &
– Use of information
• for improving decision making related to…
– Identification and Solution of problems & opportunities
in business

12

6
1/7/2020

Defining Research

Used to identify & define


opportunities and problems

Generate, refine, & evaluate


Information (managerial) actions

Monitor performance (of firm or


any other entity)

Improve understanding of process

13

Summary of pointers about Research


• Research is all about finding something, absence of which may
distort our ability to take informed decisions.

• Ability to take an informed decision is generated through a


systematic study that is conducted through various interrelated
stages.

• All steps of research process are information-centric.

• All steps in a research are interrelated & no independent activity


is launched without considering decisions on previous stages.

14

7
1/7/2020

Research Suppliers & Services

RESEARCH
SUPPLIERS
INTERNAL
EXTERNAL

FULL SERVICE LIMITED SERVICE

Field Other
Internet
Services Services
Services

Syndicate Customized
Services Focus Groups & Technical &
Services
Qualitative Analytical
Services Services

15

Research Classification

Discussion:

How can we classify Research?

16

8
1/7/2020

A Classification of Research

Research

Problem Identification Problem Solving


Research Research

To help identify problems


which are not necessarily
apparent on the surface & yet To help solve specific
exist or are likely to arise in problems.
the future

17

Research Classification: Discussion


• Research on
– Market Potential Research; Market Share Research; Market
Characteristics Research; Sales Analysis Research; Forecasting
Research; Business Trends Research

• Should McDonalds add Italian pasta dinners to its menu?


– To assess preference for Italian pasta dinners among TG.

• Should P&G add a high-priced less-foam based detergent


powder kit to its product line?
– To identify/assess customer preference for less-foam based
detergent powder.

18

9
1/7/2020

Research Classification: Discussion


• Why SBI’s market share of educational loan is decreasing in
recent years and What steps can be done to improve market
share?
– To identify factors influencing educational loan buying
– To assess SBI’s performance on criteria of educational loan purchase
– To identify methods of improving those parameter.

19

Steps involved in Research

Discussion

What can be broad steps of Research?

20

10
1/7/2020

Steps of Research Process

Step 1: Defining the Problem

Step 2: Developing an Approach to the Problem

Step 3: Formulating a Research Design

Step 4: Doing Field Work or Collecting Data

Step 5: Preparing and Analyzing Data

Step 6: Preparing and Presenting the Report

21

21

Problem Definition

“The truly serious mistakes are made not as a result of


wrong answers but because of asking wrong questions”
… Peter Drucker

22

11
1/7/2020

Problem Definition

• The most important step in research

• Problem Definition covers purpose of study, relevant


background information, information needed, and how it
will be used in decision making

23

Why is it important to clearly define problem?


• Because problem definition sets the course of entire
project
• Because client is paying for research so both need to know
what to expect
• Problem definition process provides guidelines on how to
correctly define research problem
• Because mistakes made at this level grow into larger, more
expensive mistakes later.
• All the effort, time & money spent from this point on will be
wasted if problem is not properly defined.

24

12
1/7/2020

Problem Definition: Genesis


• Drivers for problem formulation:
– Unanticipated change, basically in the environment of
focal firm
– Planned change (estimation, effects, outcome)
– Serendipity (random ideas or information)

• Situation Narration by management

25

Exercise before Problem Definition


1. Discussion with Decision maker
2. Discussion with Industry expert
3. Secondary Data
4. Qualitative Research

26

13
1/7/2020

Environmental Context of the Problem

Past Information & Forecasts

Resources & Constraints

Objectives

Buyer Behavior

Legal Environment

Economic Environment

Marketing & Technological Skills

27

Management Decision Problem (MDP)


• A statement specifying the type of managerial action
required to solve the problem.
– It asks what a decision maker needs to do.
– It is action oriented.
– It focuses on symptoms.

• The Problem being faced by decision maker for which


research is intended to provide answers or information

28

14
1/7/2020

Research Problem (RP)


• A statement specifying the type of information needed by
the decision maker to help solve the management decision
problem and how the information can be obtained
efficiently & effectively.
– It asks what information is needed & how it should be
obtained
– Information oriented
– Focus on underlying cause

• A statement of the decision problem in research terms

29

MDP vs. RP: Illustration


MDP RP

Should a new product be To determine consumer preferences


introduced? and purchase intentions for the
proposed new product.

30

15
1/7/2020

Problem Definition
• MDP asks what Decision maker needs to do where as,

• RP asks what information is needed & how it can be


obtained effectively & efficiently

31

Problem Definition: Steps Involved


• Understanding Genesis of Problem
• Conducting four must exercise
• Considering Environmental Context of the problem
• Developing MDP & Developing relevant RP(s)

32

16
1/7/2020

33

17
1/10/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in
Session-2

Problem Definition: Steps Involved


• Understanding Genesis of Problem
• Conducting four must exercise
• Considering Environmental Context of the problem
• Developing MDP & Developing relevant RP(s)
– MDP asks what Decision maker needs to do where as,

– RP asks what information is needed & how it can be obtained


effectively & efficiently

1
1/10/2020

MDP vs. RP: Illustration


MDP RP

Should a new product be To determine consumer preferences


introduced? and purchase intentions for the
proposed new product.

Should the advertising To determine the effectiveness


campaign be changed? of the current advertising
campaign.

MDP vs. RP: Illustration


MDP RP

Should the price of the To determine the price elasticity


brand be increased? of demand and the impact on sales
and profits of various levels
of price changes.

Should management share explicit To identify merit and demerit of


career-development plan sharing explicit career-development
with new recruit? plan with new recruit.

2
1/10/2020

Problem Definition: Discussion


• Are these problems correct?
– Improve the company’s image
– Develop a suitable employee strategy for firm
– Improve the competitive position of the firm
– Develop a marketing strategy for the brand.
• Is the problem correct?
– How should the firm adjust its pricing given that a major competitor
has initiated price changes?
• (How should the firm respond to the competitor’s Price changes)

• Is the problem correct?


– What are the drivers of employee engagement and What can be
done to improve employee engagement?

Developing an Approach to the Problem


• Focus:
– Developing more specific devices to address the components of
research problems defined at previous step

• Includes:
– Theory / Objective evidence
– Analytical model (verbal/graphical/mathematical)
– Research question (define it)
– Hypotheses (End product in this step)

3
1/10/2020

Theory/Objective evidence
• Theory
– Example for choice making related theories:
• Theory of rationality
• Bounded rationality Theory

• Objective Evidence
– Empirical observation (available in literature)

Model: Theory of Planned Behavior

(Ajzen, 1985)

4
1/10/2020

Model: Technology Acceptance Model

(Davis 1989)

Research Question
• MDP: what should be done to improve the patronage of Big
Bazar store?

RP: To determine the relative strengths and weaknesses of Big


bazar, vis-à-vis other major competitors, with respect to
factors that influence store patronage.

Specific Specific Specific


Question 1 Question 2 Question n

• Research questions(RQ):
Refined Questions or statements of the specific components
of the (research) problem.

10

5
1/10/2020

Possible Research Questions


• What criteria do households use when selecting department stores?
• How do households evaluate Big Bazar and competing stores in terms of
the choice criteria identified in above question?
• Which stores are patronized when shopping for specific product
categories?
• What is the market share of Big Bazar and its competitors for specific
product categories?
• What is the demographic and psychological profile of the customers of
Big Bazar? Does it differ from the profile of customers of competing
stores?
• Can store patronage and preference be explained in terms of store
evaluations and customer characteristics?

11

MDP, RP & RQ: Exercise


• Management Decision Problem (MDP):
– Should Amul launch packaged sweet products

• Research Problem (RP):


– To determine the customer preference for packaged Sweets of
Amul

• Research Question (RQ):


– Are milk-based sweets popular among target customers
– What is the perceived quality of packaged sweets

12

6
1/10/2020

Development of Research Questions

Research Problem

Objective/
Theoretical
Framework;
Analytical
Model

RQ- 1: Specific RQ- 2: Specific RQ- n: Specific


Question 1 Question 2 Question n

13

Research Hypothesis
• An unproven statement or proposition about a factor or
phenomenon that is of interest to the researcher.
– Often, a hypothesis is a possible answer to the research
question.
– It is mostly about the relationship between two
variables/ two phenomena.
– An empirically testable statement

14

7
1/10/2020

Null vs Alternate Hypothesis


• Null Hypothesis (H0): is a statement about a population, this is
assumed to be true.
– … is generally assumed to be true until evidence indicates
otherwise.
– … is a statement of the status quo, one of no difference or no effect.
– If null hypothesis is not rejected, no changes will be made.

• Alternate Hypothesis (H1): is a statement that directly contradicts


a null hypothesis by stating contrary thing about population.
– … is one in which some difference or effect is expected.
– Statement that is hoped or expected to be true instead of null
hypothesis.
– Accepting alternative hypothesis will lead to changes in opinions
or actions.

15

Department Store: Illustration


• RQ
– Do the customers of Big Bazar exhibit store loyalty and what is their
characteristics?

• Hypothesis
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
– H2: Store-loyal customers are more risk-averse than are non-loyal
customers.
– H3: Customers of Big Bazar are loyal.
• Inappropriate hypothesis

16

8
1/10/2020

Research Questions & Hypotheses: Illustration


RQ: What is the lifestyle of consumers who purchase athletic
footwear based on image?
• Hypotheses:
– H1: Consumers who purchase athletic footwear based on image are
not price sensitive.
(A lifestyle typically reflects an individual's attitudes, way of life, values, or world
view. It can denote the attitudes, interests, opinions, behaviors, and behavioral
orientations.)

RQ: What is the lifestyle of the typical Nike consumer?


• Hypotheses:
– H1: The typical Nike consumer is ‘young and independent’.
– H2: The typical Nike consumer watches sports on television.

17

18

9
1/15/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-3

Problem Definition Process

Discussion Environmental
with Decision Context of
Problem
Maker
Defining
RP-1

Interview
with Experts
Problem Situation
Defining Defining
Genesis of MDP RP-2
Problem
Secondary
Data Analysis

Defining
RP-3
Qualitative
Research

1
1/15/2020

Developing an Approach to Problem

Based on Theoretical
Knowledge

Based on
Defining • Literature Review
Defining • Qualitative study
RP-1 RQ-1

Defining
Defining RQ-2 Developing
RP-2 Hypothesis-1

Defining
RQ-3 Developing
Hypothesis-2
Defining
RP-3
Defining
RQ-4 Developing
Hypothesis-2

Situation
• Harley- Davidson made such an important comeback in
early 2000s that there was a long waiting list to get Harley-
Davidson bike.
• In 2007 market share was about 50% in heavyweight bike
category.
• Distributors urging for expansion.
• But the company was skeptical about investing in new
production facilities.

2
1/15/2020

Exercise before Problem Definition


• Discussion with Decision maker:
– Years of declining sales has taught management to be more risk
averse than risk prone.
• Discussion with Industry expert:
– Brand Loyalty was a major factor influencing sale and repeat sales of
bike.
• Secondary Data:
– A vast majority of bike owners also owned automobiles such as cars,
SUVs and trucks.
• Qualitative Research:
– Focus groups with bike owner indicated bikes were not used as
primary means of transportation but as a means of recreation. Also
highlighted the importance of brand.

Environmental context of Problem


• Forecast called for an increase in consumer spending on
recreation & entertainment in 2015
• Harley has necessary resources to achieve its objective of
being the dominant motorcycle brand on global basis
• Brand image & brand loyalty played a significant role in
buyer behavior with well-known brands continuing to
command a premium.
• Harley has necessary marketing & technological skills to
achieve its objective.

3
1/15/2020

Problem Definition
• MDP:
– Should Harley-Davidson invest to produce more bike?
• RP:
– To determine if customer would be loyal buyers of
Harley-Davidson

Approach to Problem

RQ:
• Who are customers? What are their demographic &
psychographic characteristics?
• Can different types of customers be segmented? Is it
possible to segment market in a meaningful way?
• How do customers feel regarding their Harleys? Are all
customers motivated by the same appeal?
• Are the customers loyal to Harley-Davidson? What is the
extent of brand loyalty?

4
1/15/2020

Approach to Problem: Hypothesis

• RQ:
– Can different types of customers be segmented based
on psychographic characteristics?
• Hypothesis:
– H1: There are distinct segments of bike buyers
Psychographics is the study of personality, values, opinions, attitudes,
interests, and lifestyles.

– H2: Each segment is motivated to own a Harley-Davidson


for a different reason.
– H3: Brand loyalty is high among Harley-Davidson
customers in all segment

Research Design

10

5
1/15/2020

Research Design: Definition


• Research design is framework or blueprint for conducting
research project.
– …details procedures necessary for obtaining information needed to
structure or solve research problem(s).
– …lays the foundation for conducting project.

• Involves:
– (Define the information needed)
– Design exploratory, descriptive, and/or causal phases of research
– Specify measurement & scaling procedures
– Construct & pretest a questionnaire or an appropriate form for data
collection
– Specify sampling process & sample size
– Develop a plan of data analysis

11

Classification of Research Designs

Research Design

Exploratory Research Design Conclusive Research Design


Provision of insights into & Assist in determining, evaluating &
comprehension of the problem selecting the best course of action
situation confronting researchers to take in a given situation

12

6
1/15/2020

Exploratory vs Conclusive Research

Exploratory Conclusive
Objective: To provide insights & To test specific hypotheses and
understanding examine relationships

Character- Information needed is defined Information needed is clearly


istics: only loosely. defined.
Research process is flexible & Research process is formal and
unstructured. structured.
Sample is small & non- Sample is large & representative.
representative. Data analysis is quantitative
Analysis of primary data is
qualitative

Findings/ Tentative Conclusive


Results:

Generally followed by further Findings used as input into


Outcome: exploratory or conclusive decision making
research

13

Classification of Research Designs

Research Design

Exploratory Research Design Conclusive Research Design


Provision of insights into & Assist in determining, evaluating &
comprehension of the problem selecting the best course of action
situation confronting researchers to take in a given situation

Descriptive Research Causal Research


Description of something To obtain evidence
usually characteristics & regarding cause-and-
functions effect relationship.

Cross-Sectional Longitudinal
Design Design

14

7
1/15/2020

A Comparison of Basic Research Designs

Exploratory Descriptive Causal

Objective Discovery of ideas Describe Determine cause


and insights characteristics or and effect
functions relationships

Characteristics Flexible, versatile Marked by the prior Manipulation of


formulation of independent
specific hypotheses variables

Measure effect on
Preplanned and dependent
Often the front end structured design variables
of total research
design Control mediating
variables

15

Exploratory Research
• Can be conducted by analyzing (qualitatively) Primary data
and Secondary data

Primary vs. Secondary Data

Primary Data Secondary Data

Collection purpose For the problem at hand For other problems

Collection process Very involved Rapid & easy

Collection cost High Relatively low

Collection time Long Short

16

8
1/15/2020

Secondary Data: Sources

Secondary Data

Internal External

Requires
Published Computerized Syndicated
Ready to Use Further
Materials Databases Services
Processing

17

Secondary Data: Uses


• Identify the problem & Better define the problem
• Develop an approach to the problem
• Formulate an appropriate research design (for example, by
identifying key variables)
• Answer certain research questions & test some hypotheses
• Interpret primary data more insightfully

18

9
1/15/2020

Exploratory Research: Uses


• Formulate a problem or define a problem more precisely
• Gain insights for developing an approach to problem
• Identify alternative courses of action & Establish priorities
for further research
• Isolate key variables & relationships for further examination
– Develop hypotheses

19

Exploratory Research: Methods


• Secondary data analyzed in a qualitative way
• Survey of experts or surveys with open-ended question
• Case Study
• Qualitative research (Interview, Focused Group Discussion, …)

20

10
1/15/2020

Descriptive Research: Use


• To describe characteristics of relevant groups
• To estimate percentage of units in a specified population
exhibiting a certain behavior
• To determine the perceptions about something
• To determine degree to which variables are associated
• To make specific predictions

21

Descriptive Research: Methods


• Secondary data analysed in a quantitative
• Surveys
• Observational data (from physical context or virtual context)
• Panels
– A sample of respondents who have agreed to provide information at
specified intervals over an extended period.

22

11
1/15/2020

Longitudinal vs Cross-Sectional Design

Cross- Sample
Sectional Surveyed
Design at T1

Same
Sample Sample
Longitudinal also
Surveyed
Design Surveyed
at T1
at T2

Time T1 T2

23

Longitudinal vs Cross-Sectional Design

Evaluation Cross-Sectional Longitudinal


Criteria Design Design

Detecting Change - +
Large amount of data collection - +
Accuracy - +
Representative Sampling + -
Response bias + -

Note: + indicates a relative advantage over the other procedure,


- indicates a relative disadvantage.

24

12
1/15/2020

Causal Research: Uses


• To understand which variables are the cause (independent
variables) & which variables are the effect (dependent
variables) of a phenomenon

• To determine the nature of the relationship between the


causal variables and the effect to be predicted

• METHOD: Experiments

25

Alternative Research Designs

Exploratory Research
(a) •Secondary Data Conclusive Research
Analysis •Descriptive/Causal
•Focus Groups

Conclusive Research
(b) •Descriptive/Causal

Exploratory Research
Conclusive Research •Secondary Data
(c) •Descriptive/Causal Analysis
•Focus Groups

26

13
1/15/2020

Term Project Guideline

Background of Study
Definition of MDP & RP
Developing RQ
Conducting Qualitative Research & Literature Review
Developing Hypothesis
Developing /Adopting Questionnaire
Data Collection
Testing Hypothesis
Drawing Managerial Implication & Conclusion
Limitation of Study

27

28

14
1/21/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-4

Why do we need qualitative research?

• It gives you an intimate understanding of people


– Helps understand people and their social & cultural
contexts

1
1/21/2020

Qualitative Research Vs. Discussion forums


• Focus is on listening
• Attempted objectivity
• Non-competitive
• No stakes to prove
• Information is sought one-way
• Strangers/ peers enhance comfort in sharing information

So then for what Qualitative Research is?


“Centrally concerned with understanding things”

• Exploring, Explaining, Linking


…the evidence - associations, symbols, rituals, …
…with the interpretation - their meaning, value, …
…and
• Identifying
…the deep-rooted bonds/ strength - emotional pay-offs beyond the
rational, the relationships, …
…potential triggers of change, loyalty drivers…
• Develop
…hypotheses of likely future outcomes

2
1/21/2020

And the key limitations…


• Is tentative diagnostic, not evaluative
• Does not represent your population (or all your consumers)
• Artificial behaviors as Respondents are invited
• Control issues
• Is highly researcher dependent

Forms (Types) of Qualitative Study

3
1/21/2020

Forms of Qualitative Research


• In-depth Interview & Expert Interview
• Focused Group Discussion & Online FGD
• Projective Technique
• Ethnographic Technique
• Netnographic Study

In-depth Interview
• One on one interviews
• Encourages an intimate dialogue
• Variations in interviews
– Depth Interviews – 45 minutes to 1 hour
– Intensive Depth Interviews – 2 to 3 hours
– Focused Interviews – 30 minutes (for advertising check)
• Appearance of Interviewer must match with respondent

4
1/21/2020

Focused Group Discussions


• Consists of 8-10 homogenous people
• Encourages discussion on a particular subject among
participants spontaneously
• Moderated by a researcher whose role is to guide
discussion
• Variations in group discussions
– Focus Group Discussions (FGD) – 1.5 to 2 hours
– Extended Group Discussions (EGD) – 3 hours
– Mini Group Discussions (MGD) – 4 to 6 respondents… sensitive yet group
format is more comforting
– Conflict Discussions – contrasting behavior

Focus Groups Vs. Depth Interviews

Characteristic Focus Depth


Groups Interviews
Group synergy & dynamics + -
Peer pressure/group influence - +
Generation of innovative ideas + -
In-depth probing of individuals - +
Uncovering hidden motives - +
Discussion of sensitive topics - +

Note: + indicates a relative advantage over other procedure,


- indicates a relative disadvantage.

10

5
1/21/2020

Focus Groups Vs. Depth Interviews

Characteristic Focus Depth


Groups Interviews
Interviewing competitors - +
Interviewing professional respondents - +
Scheduling of respondents - +
Amount of information + -
Bias in moderation & interpretation + -
Cost per respondent + -
Time (interviewing & analysis) + -

Note: + indicates a relative advantage over other procedure,


- indicates a relative disadvantage.

11

Choosing Appropriate Tool


• In-depth interview when
– Need depth on an individual’s practices and attitudes
– Understand practices, product interaction… mapping claimed Vs
real… Harpic usage, cleaning of exhaust fans
– Sensitivity of the subject… body odor
– Reality context… kind of houses, kind of bathroom, kind of
surroundings

• Focus groups discussion when


– Need width of responses on practices, attitudes & beliefs
– Participant dynamics will spark new thoughts
– Exploring triggers & barriers
– Concept evaluation & development
– Contrasting user profiles

12

6
1/21/2020

Defining Projective Techniques


• An unstructured, indirect form of questioning that
encourages respondents to project their underlying
motivations, beliefs, attitudes or feelings regarding issues of
concern.

• In projective techniques, respondents are asked to interpret


behaviour of others.
• In interpreting behaviour of others, respondents indirectly
project their own motivations, beliefs, attitudes, or feelings
into situation.

13

14

7
1/21/2020

Projective Technique- Broad types

Role play Guided fantasy…


Drawing Thematic Appreciation Test
Third Person…
Sentences
Conversations
Bubbles...
Grouping by
preference

 Word
Picture
Brand personification …

15

Advantages of Projective Techniques


• … may elicit responses that subjects would be hesitant or
unable to give if they know the purpose of study.
– But researchers must be aware of ethical issue
(especially on refusal).

• Helpful when the issues to be addressed are personal,


sensitive, or subject to strong social norms.

• Helpful when underlying motivations, beliefs, and attitudes


are operating at a subconscious level.

16

8
1/21/2020

Disadvantages of Projective Techniques


• Require highly-trained interviewers.
• Skilled interpreters are required to analyze responses.
• There is a serious risk of interpretation bias.
• They tend to be expensive.
• May require respondents to engage in unusual behaviour.

17

Guidelines for Using Projective Techniques


• … should be used when required information cannot be
accurately obtained by direct methods.
• …. should be used to gain initial insights & understanding.
• Given their complexity, projective techniques should not be
used naively.

18

9
1/21/2020

Ethnography

Nature of Observation

ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process

19

20

10
1/21/2020

Ethnography

Nature of Observation

ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process

Shadowing Accompanied shopping Cooking observations

In-home visit A day in the life Mystery shopping

21

Ethnography: Importance
• One of the best ways to gain deeper customer insight.
• …to get to know customers and their culture, & role
certain products play in their lives.
• …shows consumer reality rather than consumer
reconstruction.
• …helps identify contradictions between what people say
they do & what they actually do.
• … enables us to identify their hidden needs- and this is
where real breakthroughs can occur.

22

11
1/21/2020

Netnography
• Ethnography: Study of a community

• Netnography: Study of an online community

• Data Sources:
– Archival Netnographic Data
– Social Network Analysis
– Elicited Netnographic Data

23

Netnography: Good & Bad

Advantage Disadvantage

• Large sample possible quickly • Identity validation


• Immediate analysis • Loss of non-verbal
• Considerably cheaper • Loss of intangibles
• Sensitive topics accessible • Reliability & integrity of
• Historic archives often available information
• Information overload

24

24

12
1/21/2020

Reference
• Qualitative Research- Discussion Guide: Textbook (page
167- 171)
• FGD discussion Guide: Textbook (page 140-142)

25

Issue
• In recent times, education loan from bank has grown & SBI is the one of
major player in this market.
• In 2010, education loan market of X premier business school students in
India was studied for SBI. It was found that among the students of X
Business school, market share of education loan for SBI was 87%.
• However, in 2013 the market share dipped to 82%.
• Again a study was conducted in 2016, it was found that market share of
SBI in education loan among students of that business school has further
slipped to 76%.
• It was also observed that market share of CBI, another PSU bank, is
constantly increasing. In 2010, the market share of education loan for
CBI was 7% and it has increased to 18% in 2016.
• SBI is now worried about losing market share among students &
hired you to conduct research.
(Above Information is just an illustration)

26

13
1/21/2020

Issue
• Design preference & perception for mobile phones

27

28

14
1/21/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-5

A Look at Research Data

Research Data

Secondary Data Primary Data

Qualitative Data Quantitative Data

Descriptive Causal

Survey Observational & Experimental


Data Other Data Data

1
1/21/2020

Measurement and Scale

• If things exist to some extent, they ought to be


measured.

• Measurement is the assignment of numbers to


objects, events, or people according to some rules.

• To assign numbers, we need a scale.

Scale

• A scale is a system of classifying objects & persons


– in a series of steps or degrees
– according to a standard (i.e., relative size, rank, amount,
etc.).

• Measurements can be along four different scales:


– Nominal
– Ordinal
– Interval
– Ratio

2
1/21/2020

Nominal Scale

• One uses names or labels according to certain


characteristics.
– Variables assessed on nominal scale are called
categories.

• Charles Darwin used such categorical scales for


species.
– e.g., Telephone numbers; Girls vs. Boys in this session.

• Basic operation: = or ≠

Ordinal Scale

• Ordinal measurements tell ranks or difference


between items
– e.g., Class ranks; Hardiness of minerals

• Scale may also use names with an order such as:


– “Below average", “Average", and “Above average"; or
– "very unsatisfied", “Neutral", and "very unsatisfied."

• Basic operation: < vs. >

3
1/21/2020

Interval Scale

• In interval scales, the steps are considered to be


equal.
• Equality of successive steps
– Difference between 2 & 1 is the same as the difference
between 7 & 6.
• We can use numbers, steps, phrases, or distance to
represent the successive intervals.
– e.g., Ratings; Fahrenheit/Celsius temperature

• Basic operation: + vs. -

Ratio Scale

• Most measurement in physical sciences &


engineering is done on ratio scales.
• Distinguishing feature of a ratio scale is possession
of a natural zero value.
– e.g., Mass, length, time, plane angle, electric charge, &
GDP

• Basic operation: Meaning of ratios

4
1/21/2020

Scales of Measurement: Illustration

Numbers
Nominal 4 81 9
Assigned to
Runners

Rank Order of
Ordinal
Winners

Third Second First


Place Place Place
Interval Performance
Rating on a 0 to 8.2 9.1 9.6
10 Scale
Time to Finish in 15.2 14.1 13.4
Ratio
Seconds

Comparison of Scales: Characteristics

Characteristics Label Order Distance Origin


Scale
Nominal Yes No No No

Ordinal Yes Yes No No

Interval Yes Yes Yes No

Ratio Yes Yes Yes Yes

10

5
1/21/2020

Interval Scale: Variants


• Likert Scale requires respondents to indicate a degree of
agreement or disagreement with each statement about
stimulus object.
Strongly Disagree Neither Agree Strongly
disagree agree nor agree
disagree

Pantaloon sells high-quality merchandise. 1 2 3 4X 5

• Semantic Differential Scale is rating scale with end points


associated with bipolar labels that have semantic meaning.
Pantaloon is:
Modern :--:--X:--:--:--:--:--: Old-fashioned

11

A few exercise on Scale


• Gender: 1. Male 2. Female
Nominal Scale

• I consider myself to be loyal to Nike Brand.


(Please rate the statement)
Strongly Disagree Strongly Agree
1 2 3 4 5
Interval Scale

• How many hours do you use Internet ?...........Hrs


Ratio Scale

12

6
1/21/2020

A few exercise on Scale


• Education: 1. Less than 10th Std 2. 10th Std 3.12th Std
4.Graduate 5. Postgraduate
Nominal Scale or Ordinal Scale
• Why do you use the current brand? Please rank the
preference
a. This is exactly the product I have always wanted to use.
b. This is the best available brand.
c. It is a force of habit
d. There is really no choice.
Ordinal Scale

• What is your age? …..Years


Ratio Scale

13

A few exercise on Scale


• This supplier keeps promises it makes to our firm.
(Please rate the statement)
Strongly Disagree Strongly Agree
1 2 3 4 5
Interval Scale

• Please rank following footwear brands on your preference.


A. Bata
B. Adidas
C. Nike
D. Reebok
E. Liberty
Ordinal Scale

14

7
1/21/2020

Questionnaire

• A questionnaire is a formalized set of questions for


obtaining information from respondents.
• Questionnaire Types: Unstructured questions & Structured
questions

• Determining Order of Questions (Opening Questions, Type of


Information & Difficult Questions)
• Pretesting of Questionnaire: Refers to testing of questionnaire on a
small sample of respondents to identify & eliminate potential problems.

15

Individual Question Content :


Are Several Questions Needed Instead of One?
“Do you think Coca-Cola is a tasty and refreshing soft drink?”
(Incorrect)
• Such a question is called a double-barreled question,
because two or more questions are combined into one.

• To obtain the required information, two distinct questions


should be asked:
“Do you think Coca-Cola is a tasty soft drink?” and
“Do you think Coca-Cola is a refreshing soft drink?”
(Correct)
• Sometimes, several questions are needed to obtain the
required information in an unambiguous manner.

16

8
1/21/2020

Overcoming Unwillingness To Answer


• Please list all the departments from which you purchased
merchandise on your most recent shopping trip to a department
store.
(Incorrect)
• In the list that follows, please check all the departments from
which you purchased merchandise on your most recent shopping
trip (last shopping) to a department store.
1. Women's dresses ____
2. Men's apparel ____
3. Children's apparel ____
4. Cosmetics ____
.
.
.
16. Jewelry ____
17. Other (please specify) ____
(Correct)

17

Choosing Question Wording –


Use Ordinary Words
• “Do you think the distribution of Thumps Up is adequate?”
(Incorrect)

• “Do you think Thumps Up is readily available (within 1


kilometer) when you want to buy it?”
(Correct)

18

9
1/21/2020

Choosing Question Wording –


Define the Issue
• Which brand of shampoo do you use?
(Incorrect)
• Define the issue in terms of who, what, when, where, why, and way
(the six Ws). Who, what, when, and where are particularly important.

• Which brand or brands of shampoo have you personally


used at home during the last month? In case of more than
one brand, please list all the brands that apply.
1. Clinic Plus ____
2. Head & Shoulders ____
3. Pantene ____
.

13. Dove ____


14. Other (please specify) ____
(Correct)
19

Choosing Question Wording –


Use Unambiguous Words
• In a typical month, how often do you shop in department
stores?
_____ Never
_____ Occasionally
_____ Sometimes
_____ Often
_____ Regularly
(Incorrect)
• In a typical month, how often do you shop in department
stores?
_____ Less than once
_____ 1 or 2 times
_____ 3 or 4 times
_____ More than 4 times
(Correct)

20

10
1/21/2020

Choosing Question Wording –


Avoid Leading or Biasing Questions
• Do you think that patriotic Indian should buy imported
automobiles when that would put Indian labor out of work?
_____ Yes
_____ No
_____ Don't know
(Incorrect)
• Do you think that Indian should buy imported automobiles?
_____ Yes
_____ No
_____ Don't know
(Correct)
• A leading question is one that clues the respondent to what
the answer should be.

21

Choosing Question Wording –


Avoid Implicit Assumptions
• Are you in favor of a balanced budget?
(Incorrect)
• Questions should not be worded so that the answer is
dependent upon implicit assumptions about what will
happen as a consequence.

• Are you in favor of a balanced budget (spending on social


security vs tax collection) if it would result in an increase in
the personal income tax?
(Correct)

22

11
1/21/2020

Choosing Question Wording –


Avoid Generalizations and Estimates
• “What is the annual per capita expenditure on groceries in
your household?”
(Incorrect)

• “What is the monthly (or weekly) expenditure on groceries


in your household?”
and
• “How many members are there in your household?”
(Correct)

23

Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?

• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size

24

12
1/21/2020

25

13
31-Jan-20

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-6

Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?

• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size

1
31-Jan-20

Classification of Sampling Techniques

Sampling Techniques

Nonprobability Probability
Sampling Sampling

Convenience Judgmental Quota Snowball


Sampling Sampling Sampling Sampling

Simple Random Systematic Stratified Cluster


Sampling Sampling Sampling Sampling

Convenience Sampling
• Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected
because they happen to be in the right place at the right
time.
– use of students, and members of social organizations
– mall intercept interviews without qualifying the
respondents
– “people on the street” interviews

2
31-Jan-20

Judgmental Sampling
• Judgmental sampling is a form of convenience sampling in
which the population elements are selected based on the
judgment of the researcher.
– test markets
– purchase engineers selected in industrial marketing
research
– expert witnesses used in court

Quota Sampling
• Quota sampling may be viewed as two-stage restricted
judgmental sampling.
– The first stage consists of developing control categories, or quotas,
of population elements.
– In the second stage, sample elements are selected based on
convenience or judgment.

Population Sample
composition composition
Control
Characteristic % % Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000

3
31-Jan-20

Snowball Sampling
• In snowball sampling, an initial group of respondents is
selected, usually at random.
• After being interviewed, these respondents are asked to
identify others who belong to the target population of
interest.
• Subsequent respondents are selected based on the
referrals.

Classification of Sampling Techniques

Sampling Techniques

Nonprobability Probability
Sampling Techniques Sampling Techniques

Convenience Judgmental Quota Snowball


Sampling Sampling Sampling Sampling

Simple Random Systematic Stratified Cluster


Sampling Sampling Sampling Sampling

4
31-Jan-20

Simple Random Sampling


• Each element in the population has a known & equal
probability of selection.
• Each possible sample of a given size (n) has a known and
equal probability of being the sample actually selected.
• This implies that every element is selected independently
of every other element.

Systematic Sampling
• The sample is chosen by selecting a random starting point
and then picking every ith element in succession from the
sampling frame.
– For example, there are 100,000 elements in the population and a
sample of 1,000 is desired. In this case the sampling interval, i, is
100. A random number between 1 and 100 is selected. If, for
example, this number is 23, the sample consists of elements 23, 123,
223, 323, 423, 523, and so on.

10

5
31-Jan-20

Stratified Sampling
• A two-step process in which the population is partitioned
into subpopulations, or strata.
– The strata should be mutually exclusive & collectively
exhaustive in that every population element should be assigned to
one and only one stratum and no population elements should be
omitted.
– Next, elements are selected from each stratum by a random
procedure, usually SRS.
• A major objective of stratified sampling is to increase
precision without increasing cost.
• The elements within a stratum should be as homogeneous
as possible, but the elements in different strata should be as
heterogeneous as possible.

11

Cluster Sampling
• The target population is first divided into mutually exclusive
and collectively exhaustive subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a
probability sampling technique such as SRS.
• For each selected cluster, either all the elements are
included in the sample (one-stage) or a sample of elements
is drawn probabilistically (two-stage).
• Elements within a cluster should be as heterogeneous as
possible, but clusters themselves should be as
homogeneous as possible. Ideally, each cluster should be a
small-scale representation of the population.

12

6
31-Jan-20

Technique Strengths Weaknesses


Nonprobability Sampling
Convenience sampling Least expensive, least Selection bias, sample not
time-consuming, most representative, not recommended for
convenient descriptive or causal research
Judgmental sampling Low cost, convenient, Does not allow generalization,
not time-consuming subjective
Quota sampling Sample can be controlled Selection bias, no assurance of
for certain characteristics representativeness
Snowball sampling Can estimate rare Time-consuming
characteristics
Probability sampling
Simple random sampling Easily understood, Difficult to construct sampling
results projectable frame, expensive, lower precision,
no assurance of representativeness.
Systematic sampling Can increase Can decrease representativeness
representativeness,
easier to implement than
SRS, sampling frame not
necessary
Stratified sampling Include all important Difficult to select relevant
subpopulations, stratification variables, not feasible to
precision stratify on many variables, expensive
Cluster sampling Easy to implement, cost Imprecise, difficult to compute and
effective interpret results

13

Sampling Plan: Qualitative Study

14

7
31-Jan-20

Sampling for Qualitative Research


• Type of sampling: Purposive - Always (Non-probability-
Judgmental or Convenience).
• Qualitative research needs to represent the ‘spectrum’ of all
possible points of view on the given topic

• Guideline on Sample Size

15

A few recruitment do’s & don’ts


• Don’t make the recruitment criteria so stringent that you are
meeting a very small niche compared to your universe
• Mask the category you want to research
– Else the participant rehearses & comes as the consultant
• Attempt to get homogenous sets unless the design includes a
conflict group
– Mixed gender groups don’t work due to high socially desirable behaviour
patterns or impression management efforts
• Don’t put in only 2 peer groups into a single group – can result in
group conflict
– Never more than 2 friends even in a peer group …unless the entire group is
one group of friends …but then you need many such groups
• For creative/ developmental interactions pre-decide if you want
creative/ better than average consumers
– Check for creativity

16

8
31-Jan-20

Review of what we have done till now…


• Introduction: Research
• Problem Definition
• Approach to Problem
• Research Design
• Qualitative Research
• Measurement Scale
• Questionnaire Designing
• Sampling

17

17

Hypothesis Testing: Introduction

18

9
31-Jan-20

Null vs. Alternate Hypothesis


• Null Hypothesis (H0): is a statement about a population, this is
assumed to be true.
– … is generally assumed to be true until evidence indicates
otherwise.
– … is a statement of the status quo, one of no difference or no effect.
– If null hypothesis is not rejected, no changes will be made.

• Alternate Hypothesis (H1): is a statement that directly contradicts


a null hypothesis by stating contrary thing about population.
– … is one in which some difference or effect is expected.
– … is a statement that is hoped or expected to be true instead of null
hypothesis.
– Accepting alternative hypothesis will lead to changes in opinions or
actions.

19

Hypothesis Formulation: Illustration-1


• A box is designed to have 25 kg of apples. Farmers fill such
boxes in the field & then boxes are sold to retailer. Retailers
complain that many boxes do not contain 25 kg of apples. In
order to investigate the issue, research to be conducted.
– Develop null & alternate hypothesis for this issue

• Ho μ > 25
• H1 μ < 25
– Take action if Ho is rejected (H1 is accepted).

• One tail study.


20

20

10
31-Jan-20

Hypothesis Formulation: Illustration-2


• Bags of passenger in air travel (Mean of weight should be
20 kg). Airline wants to investigate the issue, to see whether
weight of bag is more than 20 kg or not.
– Develop null & alternate hypothesis for this issue

• Ho μ < 20
• H1 μ > 20
– Take action if Ho is rejected (H1 is accepted).

• One tail study.

21

21

Hypothesis Formulation: Illustration-3


• Mean of Mileage of motorbike is expected to be 50 kmpl.
The company wants to investigate whether mileage is 50
kmpl or not.
– Develop null & alternate hypothesis for this issue

• Ho μ = 50
• H1 μ =/ 50
– Take action if Ho is rejected (H1 is accepted).

• Two-Tail Study

22

22

11
31-Jan-20

Department Store: Illustration


• RQ
– Do the customers of Big Bazar exhibit store loyalty and what is their
characteristics?

• Hypothesis (alternate)
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
– H2: Store-loyal customers are more risk-averse than are non-loyal
customers.
– H3: Customers of Big Bazar are loyal.

23

Department Store: Illustration


• Hypothesis (Null vs. alternate)

– H0: Customers who are store loyal are more or as (at least as)
knowledgeable about the shopping environment as other
customers.
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.

– H0: Store-loyal customers are less or as risk-averse as non-loyal


customers.
– H1: Store-loyal customers are more risk-averse than are non-loyal
customers.

– H0: Customers of Big Bazar are not loyal.


– H1: Customers of Big Bazar are loyal.

24

12
31-Jan-20

Types of Hypotheses

Null
– H0: μ = 50
– H0: μ < 50
– H0: μ > 50
Alternate
– HA: μ =/ 50
– HA: μ > 50
– HA: μ < 50

25

25

One tail Study

26

26

13
31-Jan-20

Two-tail Study

27

27

Hypothesis Testing: Introduction


• Hypothesis Testing is a method for testing a claim or
hypothesis about a parameter in a population, using data
measured in a sample.
– In other words, it is a systematic way to test claims or ideas about a
group or population.

• Level of significance refers to a criterion of judgment


upon which a decision is made regarding the value stated in
a null hypothesis.
– The criterion is based on the probability of obtaining a statistic
measured in a sample if the value stated in the null hypothesis were
true.

28

28

14
31-Jan-20

Hypothesis Testing: Calculation


• Test statistic is a mathematical formula that allows researchers
to determine the likelihood of obtaining sample outcomes if the
null hypothesis were true. The value of the test statistic is used to
make a decision regarding the null hypothesis.

• A p value is the probability of obtaining a sample outcome,


given that the value stated in the null hypothesis is true.
– p value for obtaining a sample outcome is compared to the level of
significance.

29

29

Decision about Hypothesis


• Reject the null hypothesis
– The sample mean is associated with a low probability of
occurrence when the null hypothesis is true.

• Do not reject (Retain) the null hypothesis


– The sample mean is associated with a high probability of
occurrence when the null hypothesis is true.

• Note:
– A null hypothesis may be rejected, but it can never be accepted
based on a single test.
– In classical hypothesis testing, there is no way to determine whether
the null hypothesis is true.
30

30

15
31-Jan-20

Choose a Level of Significance


• Type I Error: false positives
– Type I error occurs when the sample results lead to the rejection of
null hypothesis when it is in fact true.
– Probability of type I error (α ) is also called level of significance.

• Type II Error: false negatives


– Type II error occurs when, based on the sample results, null
hypothesis is not rejected when it is in fact false.
– Probability of type II error is denoted by β .

31

31

A Broad Classification of Hypothesis Tests

Hypothesis Tests

Tests of Tests of
Association Differences

Median/
Distributions Means Proportions Rankings

32

32

16
31-Jan-20

Frequency Distribution

33

Frequency Distribution
• In a frequency distribution, one variable is considered at a
time.
– A frequency distribution for a variable produces a table of
frequency counts, percentages, & cumulative percentages for all
values associated with that variable.

• Statistics Associated with Frequency Distribution


– Measures of Location
– Measures of Variability
– Measures of Shape

34

34

17
31-Jan-20

Measures of Location
• Mean
– Most commonly used measure of central tendency.
– Used when data is in interval or ratio scale.
• Median
– Middle value when data are arranged in ascending or descending
order. It is the 50th percentile.
– When data is in Ordinal Scale & also interval or ratio scale
• Mode
– The value that occurs most frequently & represents the highest
peak of the distribution.
– Mode is a good measure of location when the variable is inherently
categorical or has otherwise been grouped into categories.

35

35

Measures of Variablity
• Variability is a measure of the dispersion or spread of
scores in a distribution.
– Variability ranges from 0 to ∝.
• Range
• Interquartile Range
• Variance
– Mean squared deviation from the mean. The variance can never be
negative.
• Standard Deviation
– Square root of the variance.
• Coefficient of variation
– Ratio of SD to the mean expressed as a percentage & is a unitless
measure of relative variability.
– Can be used with ratio scale only.
36

36

18
31-Jan-20

Measures of Shape: Skweness


• Skewness: A skewed distribution is a distribution of scores
that includes outliers or scores that fall substantially above
or below most other scores in a data set.
– Tendency of deviations from mean to be larger in one direction than
in the other. It can be thought of as tendency for one tail of the
distribution to be heavier than other.

Symmetric Distribution
Skewed Distribution

Mean Mean Median


Median Mode
Mode 37

37

Measures of Shape: Skewness of distribution

• A positively skewed
distribution is a
distribution of scores
where a few outliers are
substantially larger (toward
the right tail in a graph)
than most other scores.

• A negatively skewed
distribution is a
distribution of scores
where a few outliers are
substantially smaller
(toward the left tail in a
graph) than most other
scores. 38

38

19
31-Jan-20

Measures of Shape: Kurtosis


• Kurtosis
– Measure of the relative peakedness or flatness of the
curve defined by frequency distribution.

• Kurtosis of a normal distribution is zero.


• If kurtosis is positive, distribution is more peaked than a
normal distribution.
• A negative value means that distribution is flatter than a
normal distribution.

39

39

40

20
2/8/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-7

Cross-Tabulation

1
2/8/2020

Cross-Tabulation
• While a frequency distribution describes one variable at a time, a
cross-tabulation describes two or more variables simultaneously.

General rule is to
compute % in the
direction of the
independent variable,
across the dependent
variable.

First table is more


acceptable than
second
3

Statistics Associated with Cross-Tabulation


• Chi-Square Test for independence: …is a statistical
procedure to determine whether frequencies observed at
the combination of levels of two categorical variables are
similar to frequencies expected
– To determine whether a systematic association exists, probability of
obtaining a value of chi-square as large or larger than one
calculated from cross-tabulation is estimated.
– Null hypothesis (H0) of NO association between two variables will
be rejected only when calculated value of test statistic is greater
than critical value of chi-square distribution with appropriate
degrees of freedom.
– An important characteristic of chi-square statistic is df associated
with it. df = (r - 1) x (c -1).

2
2/8/2020

Strength of Association in Cross-Tabulation


• phi coefficient is used as a measure of strength of
association in special case of a table with two rows & two
columns (a 2 x 2 table).

χ2
φ=
n

Strength of Association in Cross-Tabulation


• While phi coefficient is specific to a 2 x 2 table,
contingency coefficient (C) can be used to assess strength
of association in a table of any size. Can be applicable to
square table.
χ2
C=
χ2 + n

• Contingency coefficient varies between 0 & 1.


• Maximum value of contingency coefficient depends on size
of table (number of rows & number of columns). For this
reason, it should be used only to compare tables of same
size.
6

3
2/8/2020

Strength of Association in Cross-Tabulation


• Cramer's V is a modified version of phi correlation
coefficient & is used in tables larger than 2 x 2. Can be used
for rectangle table

2
φ
V=
min (r-1), (c-1)

χ2/n
V=
min (r-1), (c-1)

Exercise

4
2/8/2020

Internet Usage Data


Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00

Case Problem
• To find out frequency distribution of Familiarity with
Internet among sample.

• To find out
– Mean, Median & Mode;
– Standard deviation; &
– Skewness & Kurtosis of Familiarity rating with Internet
among sample.

10

10

5
2/8/2020

Case Problem: Cross Tabulation


• To make cross-table of gender and internet usage

• To find out
– Whether there is any association between theses
variables or not

11

11

Parametric Test

12

6
2/8/2020

A Broad Classification of Hypothesis Tests

Hypothesis Tests

Tests of Tests of
Association Differences

Median/
Distributions Means Proportions Rankings

13

13

Hypothesis Testing Related to Differences


• Parametric tests assume that variables of interest are
measured on at least on interval scale.
• Nonparametric tests assume that variables are measured on
a nominal or ordinal scale.

• Tests can be further classified based on whether one or two or more


samples are involved.
• Samples are independent if they are drawn randomly from different
populations.
• Samples are paired when the data for the two samples relate to the
same group of respondents.

14

14

7
2/8/2020

Snapshot of Hypothesis Testing for Difference

Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial

Independent Paired
Samples Independent Paired
Samples
Samples Samples
* Two-Group t * Paired
test t test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon

15

15

Parametric Test
• One Sample Test
• Two independent Sample test
• Paired Sample test

16

16

8
2/8/2020

One Sample Test

17

Internet Usage Data


Respondent Sex Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00

18

18

9
2/8/2020

Problem: One Sample Test


• To test whether the familiarity with internet is high (>4) or
not.

• We can use one sample t-test or Z-test, depending on situation.

• Result will show whether the familiarity with internet is high


or not for the sample.

19

19

Problem: One Sample Test


One-Sample Statistics
Std. Error
N Mean Std. Deviation Mean
Familiarity 29 4.72 1.579 .293

One-Sample Test

Test Value = 4

95% Confidence
Interval of the
Difference
Mean
t Df Sig. (2-tailed) Difference Lower Upper
Familiarity 2.470 28 .020 .724 .12 1.32

20

20

10
2/8/2020

21

21

22

22

11
2/8/2020

Two Independent Sample Test

23

Problem: Two Independent Sample Test


• To test Whether the mean of familiarity with internet for
male & female is different or same,

• We can use two independent sample t-test or Z-test,


depending on situation.

• Result will show whether familiarity with internet is same or


different for male & female.

24

24

12
2/8/2020

Two Independent Sample Test


• In the case of means for two independent samples, the
hypotheses take the following form.

25

25

Two Independent Sample Test: Variance Test


• An F test of sample variance may be performed if it is not
known whether the two populations have equal variance. In
this case, hypotheses are:

H0: 2 = 2
1 2

H1: 2 2
1 2

26

26

13
2/8/2020

Two Independent Sample Test

Group Statistics

Std. Error
Sex N Mean Std. Deviation
Mean

Male 14 5.71 1.267 .339


Familiarity
Female 15 3.80 1.265 .327

Independent Samples Test


Levene's Test for
t-test for Equality of Means
Equality of Variances
95% Confidence
Sig. (2- Mean Std. Error Interval of the
F Sig. t df Difference
tailed) Difference Difference
Lower Upper
Equal
variances .015 .902 4.070 27 .000 1.914 .470 .949 2.879
assumed
Familiarity
Equal
variances 4.070 26.857 .000 1.914 .470 .949 2.880
not assumed

27

27

28

14
2/10/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-8

Two Independent Sample Test

1
2/10/2020

Problem: Two Independent Sample Test


• To test Whether the mean of familiarity with internet for
male & female is different or same,

• We can use two independent sample t-test or Z-test,


depending on situation.

• Result will show whether familiarity with internet is same or


different for male & female.

Two Independent Sample Test: Variance Test


• In the case of means for two independent samples, the
hypotheses take the following form.

2
2/10/2020

Two Independent Sample Test


• An F test of sample variance may be performed if it is not
known whether the two populations have equal variance. In
this case, hypotheses are:

H0: 2 = 2
1 2

H1: 2 2
1 2

Two Independent Sample Test

Group Statistics

Std. Error
Sex N Mean Std. Deviation
Mean

Male 14 5.71 1.267 .339


Familiarity
Female 15 3.80 1.265 .327

Independent Samples Test


Levene's Test for
t-test for Equality of Means
Equality of Variances
95% Confidence
Sig. (2- Mean Std. Error Interval of the
F Sig. t df Difference
tailed) Difference Difference
Lower Upper
Equal
variances .015 .902 4.070 27 .000 1.914 .470 .949 2.879
assumed
Familiarity
Equal
variances 4.070 26.857 .000 1.914 .470 .949 2.880
not assumed

3
2/10/2020

Paired Sample Test

Problem: Paired Sample Test


• To test whether the mean of attitude towards internet &
attitude towards technology is same or not.

• We can use paired sample t-test.

• Result will show whether attitude towards internet & attitude


towards technology is same or different for the sample.

4
2/10/2020

Paired Sample Test


• Difference in these cases is examined by a paired samples t-test.
• To compute t for paired samples, paired difference variable, denoted by
D, is formed and its mean & variance calculated. Then t statistic is
computed.
• Degrees of freedom are n - 1, where n is number of pairs.

Paired Sample Test

Number Standard Standard


Variable of Cases Mean Deviation Error

Internet Attitude 30 5.167 1.234 0.225


Technology Attitude 30 4.100 1.398 0.255

Difference = Internet - Technology

Difference Standard Standard 2-tail t Degrees of 2-tail


Mean deviation error Correlation prob. value freedom probability

1.067 0.828 0.1511 0.809 0.000 7.059 29 0.000

10

10

5
2/10/2020

Hypothesis Testing for Examining Differences

Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Independent Paired
Samples
Samples Samples
* Two-Group t * Paired
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test

11

11

Non-Parametric Test

12

6
2/10/2020

Non-Parametric Tests
• Nonparametric tests are used when the independent
variables are nonmetric.

• Like parametric tests, nonparametric tests are available for


testing variables from one sample, two independent
samples, or two related samples.

13

13

Non-Parametric Test: One Sample test

14

7
2/10/2020

Non-Parametric Test: One Sample


• Sometimes researcher wants to test whether observations
for a particular variable could reasonably have come from a
particular distribution.
– Kolmogorov-Smirnov (K-S)
– Chi-square test
– Binomial test
– Runs test

15

15

One Sample: Non-Parametric Test


• Kolmogorov-Smirnov (K-S) one-sample test is one such
goodness-of-fit test.
• Chi-square test can be performed on a single variable
from one sample. In this context, chi-square serves as a
goodness-of-fit test.
• Binomial test is also a goodness-of-fit test for dichotomous
variables.
• Runs test is a test of randomness for dichotomous
variables. (To determine whether the order or sequence in which
observations are obtained is random)

16

16

8
2/10/2020

K-S One-Sample Test


• To test whether one variable comes from a particular
distribution (Theoretical distribution vs Observed distribution)

• Hypothesis
Ho: Internet Usage are normally distributed
H1: Internet Usage are NOT normally distributed

17

17

K-S One-Sample Test

Descriptive Statistics
Std.
N Mean Minimum Maximum
Deviation
Internet Usage
30 6.60 4.296 2 15
Hrs/Week

One-Sample Kolmogorov-Smirnov Test


Internet Usage
Hrs/Week
N 30
a,b Mean 6.60
Normal Parameters
Std. Deviation 4.296
Absolute .222
Most Extreme
Positive .222
Differences
Negative -.142
Test Statistic .222
Asymp. Sig. (2-tailed) .001c
a. Test distribution is Normal.
b. Calculated from data.
c. Lilliefors Significance Correction. 18

18

9
2/10/2020

K-S One-Sample Test

19

19

K-S Table

20

20

10
2/10/2020

Lilliefors
Test Table

21

21

One-Sample Chi-Square goodness of fit Test


• Chi – Square goodness-of-fit test is a statistical procedure
used to determine…
• …whether observed frequencies at each level of one
categorical variable are similar to or different from the
frequencies we expected at each level of the categorical
variable.

22

22

11
2/10/2020

One-Sample Chi-Square goodness of fit Test


• Observed frequency vs Estimated frequency

χ2 = (fo - fe)2
Σ fe
• Uniform distribution
Ho: The ratings of familiarity with internet are uniformly distributed
H1: The ratings of familiarity with internet are not uniformly
distributed.

• Expected Distribution
Ho: The observed distribution is the same as the expected distribution
H1: The observed distribution is not the same as the expected
distribution
23

23

Exercise: One-Sample Chi-Square Test


Familiarity measured on 1-7 scale ((only six categories, as one category
count is 0)
1. To test whether rating is uniformly distributed
2. To test whether rating follows below mentioned expected distribution;
Expected Distribution of familiarity (only six categories, as one
category count is 0):
– 10%,
– 20%,
– 20%,
– 10%,
– 25%,
– 15%
– If sum of % all categories do not become 100 then rescaling of % is
done to make it 100%.

24

24

12
2/10/2020

Binomial Test
• Expected Proportion (for testing Population proportion)

Ho: p = 0.5

H1: p =/ 0.5

25

25

One-Sample Runs Test


• A "run" of a sequence is a maximal non-empty segment of
sequence consisting of adjacent equal elements.
– Ex: 22-element-long sequence "++++−−−+++−−++++++−−−−"
consists of 6 runs, 3 of which consist of "+" & others of "−".

• Run test is based on null hypothesis that each element in


sequence is independently drawn from same distribution.
• Test of Randomness
Ho: The observations in the sample are generated randomly
H1: The observations in the sample are NOT generated randomly.

26

26

13
2/10/2020

Non-parametric Test: Two Independent Sample test

Chi-Square Test for independence

27

Two Independent Sample: Nonparametric Test


Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15

Column Total 15 15

Number of males & females who use Internet for shopping.

• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?

28

28

14
2/10/2020

Two Independent Sample: Nonparametric Test


• Chi-Square Test for independence: …is a statistical
procedure to determine whether frequencies observed at
the combination of levels of two categorical variables are
similar to frequencies expected.

• Null hypothesis (H0) of NO association between two


variables will be rejected only when calculated value of
test statistic is greater than critical value of chi-square
distribution with appropriate degrees of freedom.
– An important characteristic of chi-square statistic is df associated
with it. df = (r - 1) x (c -1).

29

29

Two Independent Sample: Nonparametric Test

(f - f )2
χ2 =
Σ
o e

f
e

n rn c
fe = n

where nr = total number in the row


nc = total number in the column
n = total sample size

30

30

15
2/10/2020

Two Independent Sample: Nonparametric Test


Chi-Square Tests

Asymp. Sig. Exact Sig. Exact Sig.


Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square
3.333a 1 .068
Continuity
Correctionb 2.133 1 .144
Likelihood Ratio
3.398 1 .065
Fisher's Exact Test
.143 .072
Linear-by-Linear
Association 3.222 1 .073
N of Valid Cases
30
a. 0 cells (0.0%) have expected count less than 5. The minimum expected
count is 7.50.
b. Computed only for a 2x2 table

31

31

Strength of Association in Cross-Tabulation


• phi coefficient is used as a measure of strength of
association in special case of a table with two rows & two
columns (a 2 x 2 table).

χ2
φ=
n

32

32

16
2/10/2020

Strength of Association in Cross-Tabulation


• While phi coefficient is specific to a 2 x 2 table,
contingency coefficient (C) can be used to assess strength
of association in a table of any size. Can be applicable to
square table.
χ2
C=
χ2 + n

• Contingency coefficient varies between 0 & 1.


• Maximum value of contingency coefficient depends on size
of table (number of rows & number of columns). For this
reason, it should be used only to compare tables of same
size.
33

33

Strength of Association in Cross-Tabulation


• Cramer's V is a modified version of phi correlation
coefficient & is used in tables larger than 2 x 2. Can be used
for rectangle table

2
φ
V=
min (r-1), (c-1)

χ2/n
V=
min (r-1), (c-1)

34

34

17
2/10/2020

35

18
2/14/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in

Session-9a

Non-Parametric Test: One Sample test

1
2/14/2020

Non-Parametric Test: One Sample


• Sometimes researcher wants to test whether observations
for a particular variable could reasonably have come from a
particular distribution.
– Kolmogorov-Smirnov (K-S)
– Chi-square test
– Binomial test
– Runs test

Non-parametric Test: Two Independent Sample test

Chi-Square Test for independence

2
2/14/2020

Two Independent Sample: Nonparametric Test


Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15

Column Total 15 15

Number of males & females who use Internet for shopping.

• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?

Non-parametric Test: Two Independent Sample test

Mann-Whitney U test

3
2/14/2020

Two Independent Sample: Nonparametric Test


• When difference in location of two populations is to be
compared based on observations from two independent
samples, & variable is measured on an ordinal scale, Mann-
Whitney U test can be used.
• In Mann-Whitney U test, two samples are combined & cases are ranked
in order of increasing size.
– Combined cases ranking is assessed.
• Hypothesis
H0: The two populations (male & female) are identical with respect to
familiarity with internet. (Mean Rank with respect to familiarity for two
populations are same)

H1: The two populations (male & female) are not identical with respect
to familiarity with internet. (Mean Rank with respect to familiarity for two
populations are not same)
7

Mann-Whitney U test: Ranking of Combined Case


Familiarity Rank Gender Familiarity Rank Group
Rating (Group) Rating
2 1.5 2 5 16 2
2 1.5 2 5 16 2
3 5.5 1 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
3 5.5 2 6 21.5 1
4 11.5 1 6 21.5 2
4 11.5 1 6 21.5 2
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
4 11.5 2 7 27.5 1
8
5 16 1
8

4
2/14/2020

Mann-Whitney U: Internet Usage by Gender


Ranks

Sex N Mean Rank Sum of Ranks


Familiarity Male
14 20.25 283.50
Female
15 10.10 151.50
Total
29

Test Statisticsa

Familiarity
Mann-Whitney U 31.500
Wilcoxon W 151.500
Z -3.277
Asymp. Sig. (2-tailed)
.001
Exact Sig. [2*(1-tailed Sig.)]
.001b
a. Grouping Variable: Sex
b. Not corrected for ties. 9

Paired Sample: Nonparametric Test

10

5
2/14/2020

Paired Sample: Nonparametric Test


• Wilcoxon matched-pairs signed-ranks test analyzes
differences between paired observations, taking into
account magnitude of the differences.
– It computes differences between pairs of variables & ranks absolute
differences.

• Hypothesis
Ho: Md = 0
H1: Md =/ 0

11

11

Wilcoxon matched-pairs signed-ranks: Attitude


toward difference (Technology- Internet)
Attitude Attitude Difference
Difference Sign of
Respondent toward toward (Technology
Rank Rank
internet technology -Internet)
2 3 3 0 0
5 7 7 0 0
19 6 6 0 0
24 6 6 0 0
26 6 6 0 0
27 5 5 0 0
7 4 5 1 7.5 -
1 7 6 -1 7.5 +
3 4 3 -1 7.5 +
6 5 4 -1 7.5 +
8 5 4 -1 7.5 +
10 7 6 -1 7.5 +
11 4 3 -1 7.5 +
13 6 5 -1 7.5 +
2/14/2020 12
14 3 2 -1 7.5 +
12

6
2/14/2020

Paired Sample: Nonparametric Test

13

13

Paired Sample: Nonparametric Test


• Paired sample Sign test analyzes differences between
paired observations, taking into account sign of differences.

• Hypothesis
Ho: pluses = minus
H1: pluses =/ minus

14

14

7
2/14/2020

Paired Sample: Nonparametric Test


Paired sample Sign test
Frequencies

N
Attitude toward Technology - Negative Differencesa 23
Attitude toward Internet
Positive Differencesb 1
Tiesc 6
Total 30
a. Attitude toward Technology < Attitude toward Internet
b. Attitude toward Technology > Attitude toward Internet
c. Attitude toward Technology = Attitude toward Internet

Test Statisticsa

Attitude toward
Technology - Attitude
toward Internet
Exact Sig. (2-tailed) .000b
a. Sign Test
b. Binomial distribution used.
15

15

Hypothesis Testing for Examining Differences

Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Independent Paired
Samples
Samples Samples
* Two-Group t * Paired
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test

16

16

8
2/14/2020

17

9
2/14/2020

Business Research Method

Prof. Ravi Shekhar Kumar

Session-9b

Causality

• Concept of Causality in Research


– X is only one of a number of possible causes of Y.
– The occurrence of X makes the occurrence of Y more probable (X is a
probabilistic cause of Y).

• Conditions for Causality


– Concomitant variation is the extent to which a cause, X, & an effect, Y,
occur together or vary together in way predicted by hypothesis under
consideration.
– Time order of occurrence condition states that causing event must
occur either before or simultaneously with effect; it cannot occur
afterwards.
– Absence of other possible causal factors means that factor or variable
being investigated should be only possible causal explanation.

1
2/14/2020

Definitions of Terms

• Independent variables
– Variables or alternatives that are manipulated & whose effects
are measured & compared, e.g., price levels.
• Test units
– Individuals, organizations, or other entities whose response to
the independent variables or treatments is being examined,
e.g., consumers or stores.
• Dependent variables
– Variables which measure effect of independent variables on test
units, e.g., sales, profits, market shares.
• Extraneous variables
– Variables other than independent variables that affect response
of test units, e.g., store size, store location, competitive effort.

Experiment & Experimental Design

• Experiment
– Process of manipulating one or more independent variables and
measuring their effect on one or more dependent variables, while
controlling for the extraneous variable

• Experimental design is a set of procedures specifying:


– test units & how these units are to be divided into homogeneous
subsamples
– what independent variables or treatments are to be manipulated
– what dependent variables are to be measured
– how extraneous variables are to be controlled

Illustration:
• Whether humor has positive effect on the purchase intention of the
products that are purchased impulsively.
4

2
2/14/2020

Validity in Experiment

• Internal validity refers to whether manipulation of independent


variables or treatments actually caused observed effects on
dependent variables.
– Did the manipulation of independent variable (e.g., humor) do what it
was supposed to do?
– Control of extraneous variables is a necessary condition for establishing
internal validity.

• External validity refers to whether cause-and-effect relationships


found in experiment can be generalized.
– To what populations, settings, times, independent variables, &
dependent variables can results be projected?

Laboratory Experiment Vs. Field Experiment

Extraneous Variables: Sources

• History refers to specific events that are external to experiment but


occur at the same time as experiment.
• Maturation refers to changes in test units themselves that occur
with passage of time.
• Testing effects are caused by the process of experimentation.
These are effects on experiment of taking a measure on dependent
variable before & after presentation of treatment.
• Instrumentation refers to changes in measuring instrument, in
observers, or in scores themselves.
• Selection bias refers to improper assignment of test units to
treatment conditions.
• Mortality refers to loss of test units while experiment is in progress.

3
2/14/2020

Control of Extraneous Variables

• Randomization refers to random assignment of test units to


experimental groups by using random numbers. Treatment
conditions are also randomly assigned to experimental groups.
• Matching involves comparing test units on a set of key background
variables before assigning them to treatment conditions.
• Design control involves use of experiments designed to control
specific extraneous variables.
• Statistical control involves measuring extraneous variables &
adjusting for their effects through statistical analysis.

Limitations of Experiment

• Experiments can be time consuming, particularly if researcher is


interested in measuring long-term effects.
• Experiments are often expensive. Requirements of experimental
group, control group, & multiple measurements significantly add to
cost of research.
• Experiments can be difficult to administer. It may be impossible to
control for the effects of extraneous variables, particularly in a field
environment.
• Competitors may deliberately contaminate results of a field
experiment.

4
2/14/2020

Experimental Design

One-Shot Case Study

X 01
• A single group of test units is exposed to a treatment X.
• A single measurement on dependent variable is taken.
• There is no random assignment of test units.
• One-shot case study is more appropriate for exploratory than for
conclusive research.

Note:
X: Exposure to a treatment
O: Observation

10

10

5
2/14/2020

One-Group Pretest-Posttest Design

01 X 02
• A group of test units is measured twice.

• There is no control group.

• Treatment effect is computed as 02 – 01.

• Validity of this conclusion is questionable since extraneous


variables are largely uncontrolled.

Note:
X: Exposure to a treatment
O: Observation

11

11

Static Group Design

EG: X 01
CG: 02
• A two-group experimental design.
• EG is exposed to treatment, & CG is not.
• Measurements on both groups are made only after treatment.
• Test units are not assigned at random.
• Treatment effect would be measured as 01 - 02.
Note
EG: Experimental group (EG)
CG: Control group (CG)
X: Exposure to a treatment

12

12

6
2/14/2020

Pretest-Posttest Control Group Design

EG: R 01 X 02
CG: R 03 04

• Test units are randomly assigned to either EG or CG.


• A pretreatment measure is taken on each group.
• Treatment effect is measured as: (02 - 01) - (04 - 03).

Note
EG: Experimental group (EG)
CG: Control group (CG)
R: Randomization
X: Exposure to a treatment

13

13

Posttest-Only Control Group Design

EG : R X 01
CG : R 02

• Treatment effect is obtained by: TE = 01 - 02


• Except for pre-measurement, implementation of this design is very
similar to that of pretest-posttest control group design.

Note
EG: Experimental group (EG)
CG: Control group (CG)
R: Randomization
X: Exposure to a treatment

14

14

7
2/14/2020

Randomized Block Design

• Test units are blocked, or grouped, on the basis of external


variable.

• By blocking, researcher ensures that various experimental & control


groups are matched closely on external variable.

• It is useful when there is only one major external variable, such as


store size, that might influence dependent variable.

15

15

Randomized Block Design

Treatment Groups
Block Store Commercial Commercial Commercial
Number Patronage A B C

1 Heavy A B C

2 Medium A B C

3 Low A B C

4 None A B C

16

16

8
2/14/2020

Factorial Design

• It is used to measure the effects of two or more independent


variables at various levels.
• A factorial design may also be conceptualized as a table.
• In a two-factor design, each level of one variable represents a row
and each level of another variable represents a column.

17

17

Factorial Design

Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor

Low A B C

Medium D E F

High G H I

18

18

9
2/14/2020

One way ANOVA

19

ANOVA: Introduction

• Analysis of variance (ANOVA) is used as a test of means for two or


more populations.
– Null hypothesis is that all means are equal.

• ANOVA must have a dependent variable that is metric (measured


using an interval or ratio scale).

• There must also be one or more independent variables that are all
categorical (nonmetric).

20

20

10
2/14/2020

ANOVA: Introduction

• Categorical independent variables are also called factors.


• A particular combination of factor levels, or categories, is called a
treatment.
• One-way ANOVA involves only one categorical variable, or a single
factor. In one-way ANOVA, a treatment is same as a factor level.
• If two or more factors are involved, analysis is termed n-way
ANOVA.

• If set of independent variables consists of both categorical & metric


variables, technique is Analysis of Covariance (ANCOVA).
– In this case, categorical independent variables are still referred to as
factors, whereas metric-independent variables are referred to as
covariates.

21

21

Statistics Associated with One-Way ANOVA

• SSbetween. Also denoted as SSx, this is variation in Y related to


variation in means of categories of X. This represents variation
between categories of X, or portion of sum of squares in Y related to
X.
• SSwithin. Also referred to as SSerror, this is variation in Y due to
variation within each of categories of X. This variation is not
accounted for by X.
• SSy. This is total variation in Y.

22

22

11
2/14/2020

Decomposition of Total Variation

Independent Variable X
Total
Categories Sample
Within X1 X2 X3 … Xc
Category Y1 Y1 Y1 Y1 Y1 Total
Variation Variation
Y2 Y2 Y2 Y2 Y2 =SSy
=SSwithin : :
: :
Yn Yn Yn Yn YN
Category Y1 Y2 Y3 Yc Y
Mean
Between Category Variation = SSbetween

23

23

Statistics Associated with One-way ANOVA

• F statistic. Null hypothesis that category means are equal in


population is tested by F statistic based on ratio of mean square
related to X & mean square related to error.
• Mean square: Sum of squares divided by appropriate degrees of
freedom.
• eta2 ( 2). Strength of effects of X on Y is measured by eta2 ( 2) that
varies between 0 & 1. It is calculated by SSx/SSy

24

24

12
2/14/2020

Conducting One-Way ANOVA

• Null hypothesis may be tested by the F statistic based on the ratio


between these two estimates:

SS x /(c - 1)
F= = MS x
SS error/(N - c) MS error

• This statistic follows the F distribution, with (c - 1) and (N - c)


degrees of freedom (df).

25

25

Interpret Results

• If null hypothesis of equal category means is not rejected, then


independent variable does not have a significant effect on
dependent variable.

• On other hand, if null hypothesis is rejected, then effect of


independent variable is significant.

• A comparison of category mean values will indicate nature of effect


of independent variable.

26

26

13
2/14/2020

One way ANOVA: Exercise

27

Effect of Promotion or Coupon on Sales

28

28

14
2/14/2020

Illustrative Applications of One-way ANOVA

• Department store wants to determine effect of in-store


promotion (X) on sales (Y).

Null hypothesis is that category means are equal:


H0: µ1 = µ2 = µ3.

29

29

One-Way ANOVA: Effect of In-store Promotion on


Store Sales

Source of Sum of df Mean F ratio F prob.


Variation squares square
Between groups 106.067 2 53.033 17.944 0.000
(Promotion)
Within groups 79.800 27 2.956
(Error)
TOTAL 185.867 29 6.409

Cell means

Level of Count Mean


Promotion
High (1) 10 8.300
Medium (2) 10 6.200
Low (3) 10 3.700
TOTAL 30 6.067

30

30

15
2/14/2020

Issues in Interpretation: Multiple Comparisons

• If null hypothesis of equal means is rejected, we can only conclude


that not all of group means are equal. We may wish to examine
differences among specific means.

• This can be done by specifying appropriate contrasts, or


comparisons used to determine which of means are statistically
different.

31

31

Issues in Interpretation: Multiple Comparisons

• A posteriori contrasts are made after analysis. These are generally


multiple comparison tests.
• These tests, in order of decreasing power, include least significant
difference, Duncan's multiple range test, Student-Newman-Keuls,
Tukey's alternate procedure, honestly significant difference,
modified least significant difference, and Scheffe's test.

• Of these tests, ‘least significant difference’ is the most powerful.

32

32

16
2/14/2020

Assumptions in ANOVA

• Ordinarily, categories of independent variable are assumed to be


fixed. Inferences are made only to specific categories considered.
This is referred to as fixed-effects model.

• Error term is normally distributed, with a zero mean & a constant


variance.

• Error is NOT related to any of categories of X.

• Error terms are uncorrelated. If error terms are correlated (i.e.,


observations are not independent), F ratio can be seriously
distorted.

33

33

Thank You

34

17
22-Feb-20

Business Research Method

Prof. Ravi Shekhar Kumar

Session-10

Two / n-way ANOVA

1
22-Feb-20

Factorial Design

• It is used to measure the effects of two or more independent


variables at various levels.
• A factorial design may also be conceptualized as a table.
• In a two-factor design, each level of one variable represents a row
and each level of another variable represents a column.

Factorial Design

Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor

Low A B C

Medium D E F

High G H I

2
22-Feb-20

Two-way ANOVA

Source of Sum of Mean Sig. of


Variation squares df square F F ω2
Main Effects
Promotion 106.067 2 53.033 54.862 0.000 0.557
Coupon 53.333 1 53.333 55.172 0.000 0.280
Combined 159.400 3 53.133 54.966 0.000
Two-way 3.267 2 1.633 1.690 0.226
interaction
Model 162.667 5 32.533 33.655 0.000
Residual (error) 23.200 24 0.967
TOTAL 185.867 29 6.409

Two-way ANOVA

Cell Means
Promotion Coupon Count Mean
High Yes 5 9.200
High No 5 7.400
Medium Yes 5 7.600
Medium No 5 4.800
Low Yes 5 5.400
Low No 5 2.000
TOTAL 30

Factor Level
Means
Promotion Coupon Count Mean
High 10 8.300
Medium 10 6.200
Low 10 3.700
Yes 15 7.400
No 15 4.733
Grand Mean 30 6.067
6

3
22-Feb-20

Issues in Interpretation

• Multiple comparisons,
• Interactions effects
• Relative importance of factors

Issues in Interpretation: Interaction effects

• It occurs when the effect on one independent variable is NOT the


same at the levels of another independent variable.
– The effect of one independent variable (on a dependent variable)
depends on the level of another independent variable

Example:

• Medicines A & B may have no effect when either is taken alone. But, the two
together may have an effect. “The whole is different from the sum of the
parts.”

• Good teachers & small classrooms might both encourage learning. A good
teacher in a small classroom might be especially effective.

4
22-Feb-20

Patterns of Interaction

Case 1: No Interaction Case 2: Interaction


X 22 X 22

Y X 21 Y X 21

X 11 X 12 X13 X 11 X 12 X13
Case 3: Interaction Case 4: Interaction

X 22 X 22
Y X 21 Y

X21

X 11 X 12 X13 X 11 X 12 X13

Issues in Interpretation: Relative Importance

• Omega square: It indicates what proportion of variation in


dependent variable is related to a particular independent variable or
factor & is calculated as follows:

• Normally, ω2 is interpreted only for statistically significant effects.

• For in-store promotion

= 0.557

10

10

5
22-Feb-20

Issues in Interpretation: Relative Importance

• Likewise, ω2 associated with couponing is:

= 0.280

• As a guide to interpreting, a large experimental effect produces an


index of 0.15 or greater, a medium effect produces an index of
around 0.06, and a small effect produces an index of 0.01.

11

11

ANCOVA

12

6
22-Feb-20

Analysis of Covariance (ANCOVA)

• When examining the differences in mean values of dependent


variable related to the effect of controlled independent variables, it
is often necessary to take into account the influence of uncontrolled
independent variables.

13

13

ANCOVA: Examples

• In determining how different groups exposed to different


commercials evaluate a brand, it may be necessary to control for
prior knowledge.

• In determining how different price levels will affect a household's


cereal consumption, it may be essential to take household size into
account.

14

14

7
22-Feb-20

ANCOVA: Illustration

• Suppose we want to determine effect of in-store promotion &


couponing on sales while controlling for effect of clientele.

15

15

Analysis of Covariance

Sum of Mean Sig.


Source of Variation Squares df Square F of F
Covariance
Clientele 0.838 1 0.838 0.862 0.363
Main effects
Promotion 106.067 2 53.033 54.546 0.000
Coupon 53.333 1 53.333 54.855 0.000
Combined 159.400 3 53.133 54.649 0.000
2-Way Interaction
Promotion* Coupon 3.267 2 1.633 1.680 0.208
Model 163.505 6 27.251 28.028 0.000
Residual (Error) 22.362 23 0.972
TOTAL 185.867 29 6.409
Covariate Raw Coefficient
Clientele -0.078

16

16

8
22-Feb-20

MANOVA

17

Multivariate Analysis of Variance (MANOVA)

• MANOVA is similar to ANOVA, except that instead of one metric


dependent variable, we have two or more.

• In MANOVA, null hypothesis is that means on multiple dependent variables


are equal across groups.

• MANOVA is appropriate when there are two or more dependent variables


that are correlated.

• If, however, there are multiple dependent variables that are uncorrelated or
orthogonal, ANOVA on each of dependent variables is more appropriate.

18

18

9
22-Feb-20

MANOVA: Example

• Suppose that four groups, each consisting of 100 randomly selected


individuals, were exposed to four different commercial about Tide
detergent.

• After seeing the commercial, each individual provided ratings on three


dependent variables: Preference for Tide, Preference for P&G, Preference
for commercial itself.

19

19

Nonmetric Analysis of Variance

20

10
22-Feb-20

Nonmetric Analysis of Variance

• It examines difference in central tendencies of more than two


groups when dependent variable does not exhibit normal
distribution.
– Kruskal-Wallis one-way analysis of variance.
– k-sample median test.

21

21

Kruskal–Wallis one-way analysis of variance

• This is an extension of the Mann-Whitney test.


• This test examines the difference in medians (also sometime called
one-way ANOVA on ranks).
• All cases from the k groups are ordered in a single ranking. If the k
populations are the same, the groups should be similar in terms of
ranks within each group. The rank sum is calculated for each group.
From these, the Kruskal-Wallis H statistic, which has a chi-square
distribution, is computed.

22

22

11
22-Feb-20

Kruskal–Wallis one-way analysis of variance

Ranks

PROMOTION N Mean Rank


SALES 1 10 23.50
2 10 15.40
3 10 7.60
Total
30

Test Statisticsa,b

SALES
Chi-Square 16.529
df 2
Asymp. Sig. .000
a. Kruskal Wallis Test
b. Grouping Variable: PROMOTION

23

23

K-Sample Median test

• It is a nonparametric test that tests the null hypothesis that the


medians of the populations from which two or more samples are
drawn are identical.
• The data in each sample are assigned to two groups, one consisting
of data whose values are higher than the median value in the two
groups combined, and the other consisting of data whose values are
at the median or below.
• A Pearson's chi-squared test is then used to determine whether the
observed frequencies in each sample differ from expected
frequencies derived from a distribution combining the two groups.

24

24

12
22-Feb-20

K-Sample Median test

Frequencies
PROMOTION
1 2 3
SALES >
Median 9 4 1

<=
Median 1 6 9

Test Statisticsa

SALES
N 30
Median 6.00
Chi-Square 13.125b
df 2
Asymp. Sig. .001
a. Grouping Variable: PROMOTION
b. 3 cells (50.0%) have expected frequencies less
than 5. The minimum expected cell frequency is
4.7.
25

25

Nonmetric Analysis of Variance

• Kruskal-Wallis test is more powerful than the k-sample median test


as it uses rank value of each case, not merely its location relative to
the median.
• However, if there are a large number of tied rankings in the data,
the k-sample median test may be a better choice.

26

26

13
22-Feb-20

BUSINESS RESEARCH METHOD

Session: 11

CORRELATION

1
22-Feb-20

Product Moment Correlation


 Product moment correlation, r, summarizes strength
of association between two metric (interval or ratio
scaled) variables, say X & Y.
– …is an index used to determine whether a linear or straight-line
relationship exists between X and Y.
– Proposed by Karl Pearson, also known as Pearson correlation
coefficient.
– Also referred to as simple correlation, bivariate correlation, or
merely correlation coefficient.
 r varies between -1.0 & +1.0.
 r between two variables will be same regardless of their
underlying units of measurement

A Nonlinear Relationship for Which r = 0

Y6

0
-3 -2 -1 0 1 2 3
X
4

2
22-Feb-20

Partial Correlation
 Partial correlation coefficient measures association
between two variables after controlling for, or adjusting
for, effects of one or more additional variables.
rx y - (rx z ) (ry z )
rx y . z =
1 - rx2z 1 - ry2z

 Partial correlations have an order associated with them.


Order indicates how many variables are being adjusted
or controlled.

Partial Correlation: Example


 Correlation between Cereal consumption & Income is
0.28;
– Correlation between Income & Household size is 0.48
– Correlation between Cereal consumption & Household size is
0.56
 First-Order Partial Correlation between Cereal
consumption & Income, when effect of Household size is
constant, is 0.02

 Special case when a partial correlation is larger than its


respective zero-order correlation involves a suppressor
effect.

3
22-Feb-20

Part Correlation Coefficient


 Part correlation coefficient represents correlation
between Y & X when linear effects of other independent
variables have been removed from X but not from Y.
Part correlation coefficient, ry(x.z) is calculated as:

rx y - ry z rx z
ry (x . z ) =
1 - rx2z

 Partial correlation coefficient is generally viewed as


more important than part correlation coefficient.

Nonmetric Correlation
 Spearman's rho & Kendall's tau are two measures of
nonmetric correlation.
– Both measures use rankings rather than absolute values of
variables. Both vary from -1.0 to +1.0.

 When data contain a large number of tied ranks,


Kendall's τ seems more appropriate, otherwise
Spearman's rho should be preferred.

4
22-Feb-20

REGRESSION

Regression
• Yi = + Xi + ei

• For which type of research design regression analysis is


used?

10

5
22-Feb-20

Regression
• Examines associative relationships between a metric
dependent variable & one or more independent variables
(does not imply or assume any causality) in following ways:
– Determine whether independent variables explain a significant
variation in dependent variable: whether a relationship exists.
– Determine how much of variation in dependent variable can be
explained by independent variables: strength of relationship.
– Determine structure or form of relationship: mathematical
equation relating independent and dependent variables.
– Control for other independent variables when evaluating
contributions of a specific variable or set of variables.
– Predict values of the dependent variable.

11

Also sometime called Predictive Technique

12

6
22-Feb-20

Plot Scatter Diagram


• A scatter diagram, or scattergram, is a plot of the values
of two variables for all cases or observations.

• Most commonly used technique for fitting a straight line


to a scattergram is the least-squares procedure.

• In fitting the line, the least-squares procedure minimizes


the sum of squared errors, .

13

Plot of Attitude with Duration

9
Attitude

2.25 4.5 6.75 9 11.25 13.5 15.75 18

Duration of Residence

14

7
22-Feb-20

Which Straight Line is Best?


Line 1

Line 2

9 Line 3

Line 4
6

2.25 4.5 6.75 9 11.25 13.5 15.75 18

15

Bivariate Regression

Y β0 + β1X
YJ
eJ

eJ

YJ

X
X1 X2 X3 X4 X5

16

8
22-Feb-20

Bivariate Regression
Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 1 105.95222 105.95222


Residual 10 14.96444 1.49644
F = 70.80266 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta (ß) T Significance
of T
Duration 0.58972 0.07008 0.93608 8.414 0.0000
(Constant) 1.07932 0.74335 1.452 0.1772

17

Strength & Significance of Association


• Another, equivalent test for examining significance of
linear relationship between X & Y (significance of b) is
test for significance of coefficient of determination.
Hypotheses in this case are:

H0: R2pop = 0

H1: R2pop > 0

18

9
22-Feb-20

Test for Significance


• Statistical significance of linear relationship between X
& Y may be tested by examining hypotheses:

• A t statistic with n - 2 degrees of freedom can be used,


where

• SEb denotes standard deviation of b & is called standard


error.

19

Standardized regression coefficient


• Standardization is process by which raw data are transformed into
new variables that have mean of 0 & variance of 1. When the data
are standardized, intercept assumes a value of 0.
• Term beta coefficient or beta weight is used to denote
standardized regression coefficient.

Byx = Bxy = rxy

• Relationship between standardized & non-standardized regression


coefficients:

Byx = byx (Sx /Sy)

20

10
22-Feb-20

Assumptions of Regression
 Error term is normally distributed.
 Mean of error term is 0.

 Variance of error term is constant. This variance does


not depend on the values assumed by X.
 Error terms are uncorrelated i.e. observations have
been drawn independently.

21

Multiple Regression
General form of multiple regression model is as
follows:

Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk

• As before, coefficient a represents the intercept, but b's


are now partial regression coefficients.

22

11
22-Feb-20

Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 2 114.26425 57.13213


Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta (ß) T Significance
of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668

23

Significance Testing

H0 : R2pop = 0

This is equivalent to the following null hypothesis:

H0: β 1 = β2 = β 3 = . . . = β k = 0

The overall test can be conducted by using an F statistic:

SS reg /k
F=
SS res /(n - k - 1)

= R 2 /k
2
(1 - R )/(n- k - 1)

which has an F distribution with k and (n - k -1) degrees of freedom.

24

12
22-Feb-20

Significance Testing

Testing for the significance of the β i's can be done in a manner


similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation:

t= b
SE
b

which has a t distribution with n - k -1 degrees of freedom.

25

Relative Importance of Predictors


 Statistical significance.
 Square of simple correlation coefficient.

 Square of partial correlation coefficient.

 Measures based on standardized coefficients or beta


weights.
 Stepwise regression.

26

13
22-Feb-20

Stepwise Regression
 Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.

27

Stepwise Regression
 Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
 Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
 Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.

28

14
27-02-2020

BUSINESS RESEARCH METHOD

Session: 12a

Multiple Regression
General form of multiple regression model is as
follows:

Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk

• As before, coefficient a represents the intercept, but b's


are now partial regression coefficients.

1
27-02-2020

Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 2 114.26425 57.13213


Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta (ß) T Significance
of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668

Significance Testing

H0 : R2pop = 0

This is equivalent to the following null hypothesis:

H0: β 1 = β2 = β 3 = . . . = β k = 0

The overall test can be conducted by using an F statistic:

SS reg /k
F=
SS res /(n - k - 1)

= R 2 /k
2
(1 - R )/(n- k - 1)

which has an F distribution with k and (n - k -1) degrees of freedom.

2
27-02-2020

Significance Testing

Testing for the significance of the β i's can be done in a manner


similar to that in the bivariate case by using t tests. The
significance of the partial coefficient for importance
attached to weather may be tested by the following equation:

t= b
SE
b

which has a t distribution with n - k -1 degrees of freedom.

Relative Importance of Predictors


 Statistical significance.
 Square of simple correlation coefficient.

 Square of partial correlation coefficient.

 Measures based on standardized coefficients or beta


weights.
 Stepwise regression.

3
27-02-2020

Stepwise Regression
 Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.

Stepwise Regression
 Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
 Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
 Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.

4
27-02-2020

Caution about R²
 Value of R² can be “artificially” increased by simply
adding explanatory variable to regression model.
– For comparing two regression models with same dependent
variable ‘y’ but differing number of explanatory variables – the
model with higher R² value is not necessarily the better one.

 Adjusted R2. R2, coefficient of multiple determination, is


adjusted for the number of independent variables and
the sample size to account for the diminishing returns.
After the first few variables, the additional independent
variables do not make much contribution.

Adjusted R²
 For comparing two regression models, it is advisable to
compute adjusted R²

Adjusted R² =

Where
• K is the number of independent variables in the model, excluding
the constant.
• N is the number of points in your data sample.

10

5
27-02-2020

Residual Plot: Linear Relationship between


Residuals & Time (Autocorrelation)

Residuals

Time

11

Multicollinearity
 It arises when intercorrelations among predictors are
very high.
• Few Problems due to Multicollinearity
– Partial regression coefficients may not be estimated precisely.
Standard errors are likely to be high.
– Magnitudes, as well as the signs of partial regression coefficients,
may change from sample to sample.
– It becomes difficult to assess relative importance of independent
variables in explaining variation in dependent variable.
– Predictor variables may be incorrectly included or removed in
stepwise regression.

12

6
27-02-2020

Multicollinearity: Correction
• A simple procedure for adjusting for multicollinearity consists of
using only one of the variables in a highly correlated set of
variables.

• Alternatively, the set of independent variables can be transformed


into a new set of predictors that are mutually independent by
using techniques such as principal components analysis.

• More specialized techniques, such as Stepwise Regression can


also be used.

13

THANK YOU

14

7
3/3/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in
Session-12b + 13a

Factor Analysis

1
3/3/2020

An overview of Factor Analysis


• Consider these eight personality attributes:
– Assertive
– Talkative
– Dominant
– Influential
– Creative
– Imaginative
– Thoughtful
– Intellectual
• Is there redundancy?
• Can we reduce these eight concepts to more basic
dimensions?

An overview of Factor Analysis


1) Ask people to rate themselves on each term
2) Compute correlations among the terms. – Do people who score high
on one attribute score high on the other?
Correlation Matrix

2
3/3/2020

Factor Analysis
3) Interpret the pattern of correlations – what is related to what?

4) Identify groups of similar items? “Factors”

5) Psychologize – name the factors – what are the underlying


dimensions?

What is Factor Analysis


• An interdependence technique in that an entire set of
interdependent relationships is examined without making
distinction between dependent & independent variables.
– Examines interrelationships among a large number of variables and,
then, attempts to explain them in terms of their common underlying
dimension

• Procedures primarily used for data reduction &


summarization.

• Removes redundancy or duplication from a set of correlated


variables

3
3/3/2020

Two forms of factor analysis


• Exploratory
– Let the data indicate what’s going on, with no (or little) expectations

• Confirmatory
– Evaluate a specific, clearly-articulated hypotheses about a
correlational structure among variables
– Get “fit” indices & significance tests

Data Matrix
• Factor analysis is totally dependent on correlations between
variables.
• Factor analysis summarizes correlation structure

v1……...vk v1……...vk F1…..Fj


v1
O1 v1
.
. .
.
. .
.
. .
vk
. vk
.
.
. Correlation Factor
. Matrix Matrix
On

Data Matrix

4
3/3/2020

Designing Factor Analysis


• Variables to be included in factor analysis should be
specified based on past research, theory, & judgment of
researcher.
• Variables should be appropriately measured on an
interval or ratio scale.
• An appropriate sample size should be used. As a rough
guideline, there should be at least four or five times as
many observations (sample size) as there are variables.

Exercise: Factor Analysis


• Six Variables measured on 1-7 scale
– V1: Prevents Cavities
– V2: Shiny Teeth
– V3: Strengthen Gums
– V4: Freshens Breath
– V5: Tooth Decay Unimportant
– V6: Attractive Teeth

10

10

5
3/3/2020

Statistics Associated with Factor Analysis


• Bartlett's test of sphericity is a test statistic used to
examine hypothesis that variables are uncorrelated in
population.
– In other words, population correlation matrix is an identity matrix;
each variable correlates perfectly with itself (r = 1) but has no
correlation with other variables (r = 0).

• Kaiser-Meyer-Olkin (KMO) measure of sampling


adequacy is an index used to examine appropriateness of
factor analysis.
– High values (between 0.5 and 1.0) indicate factor analysis is
appropriate.
– Values below 0.5 imply that factor analysis may not be appropriate.

11

Correlation Matrix

KMO and Bartlett's Test


Kaiser-Meyer-Olkin Measure of Sampling Adequacy.
.660
Bartlett's Test of Sphericity Approx. Chi-Square
111.314
df 15
Sig. .000

12

6
3/3/2020

Determine Number of Factors

• Determination Based on Eigenvalues.


– only factors with Eigenvalues greater than 1.0 are retained.
• Determination Based on Percentage of Variance.
– Recommended that factors extracted should account for at least 60%
of variance.
• Determination Based on Scree Plot.
• A Priori Determination.

13

Terms Associated with Factor Analysis


• Eigenvalue. The eigenvalue represents the total variance
explained by each factor.
• Percentage of variance. The percentage of the total
variance attributed to each factor.
• Scree plot. A scree plot is a plot of Eigenvalues against the
number of factors in order of extraction.

14

7
3/3/2020

Results of Principal Components Analysis

15

Scree Plot
3.0

2.5

2.0
Eigenvalue

1.5

1.0

0.5
0.0

1 2 3 4 5 6
Component Number

16

8
3/3/2020

Terms Associated with Factor Analysis


• Factor loadings. Factor loadings are simple correlations
between variables & factors.
• Factor loading plot. A factor loading plot is a plot of the
original variables using the factor loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings
of all the variables on all the factors extracted.

17

Results of Principal Components Analysis

18

9
3/3/2020

Results of Principal Components Analysis

19

Factor Matrix Before & After Rotation

Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
High Loadings
High Loadings
After Rotation
Before Rotation

20

10
3/3/2020

Unrotated Factors

21

Rotated Factors

22

11
3/3/2020

Criticisms of Factor Analysis


• Derived factors often obvious
– defense: but we get a quantification
• “Garbage in, garbage out”
– really a criticism of input variables
• Correlation matrix is often poor measure of association of
input variables.
• Labels of factors can be arbitrary or lack scientific basis

23

23

Thank You

24

12
3/3/2020

Business Research Method

Prof. Ravi Shekhar Kumar

XLRI- Xavier School of Management, Jamshedpur


ravishekhar@xlri.ac.in
Session-13

Cluster Analysis

1
3/3/2020

Cluster Analysis
• Techniques used to classify objects or cases into relatively
homogeneous groups called clusters.
– Examine an entire set of interdependent relationship.
– No distinction between dependent & independent variable

• Objects in each cluster tend to be similar to each other &


dissimilar to objects in other clusters.
– No a priori information about group or cluster membership for any
of the objects. Groups or clusters are suggested by data, not defined
a priori.

An Ideal Clustering Situation


Variable 1

Variable 2

2
3/3/2020

A Practical Clustering Situation

Variable 1

X
Variable 2

Select a Distance or Similarity Measure


• Most commonly used measure of similarity is Euclidean
distance or its square.
– Euclidean distance is square root of sum of squared differences in
values for each variable.

• Use of different distance measures may lead to different


clustering results (Advisable to use different measures & compare results)

• If variables are measured in vastly different units, clustering


solution will be influenced by units of measurement.
– In these cases, before clustering respondents, we must standardize
data.
– It is also desirable to eliminate outliers (cases with atypical values).

3
3/3/2020

Clustering Procedure
Clustering Procedures

Hierarchical Nonhierarchical

Agglomerative Divisive

Linkage Variance Centroid


Methods Methods Methods

Ward’s Method

Single Complete Average

Select a Clustering Procedure


• Hierarchical clustering is characterized by development
of a hierarchy or tree-like structure.
– Agglomerative clustering starts with each object in a separate
cluster.
– Divisive clustering starts with all the objects grouped in a single
cluster.
• Agglomerative methods are commonly used in research.

4
3/3/2020

Agglomerative Clustering Methods


Single Linkage
Minimum
Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
Cluster 1 Cluster 2

Agglomerative Clustering Methods


Ward’s Procedure

• Ward's procedure: For each cluster, the means for all the variables are
computed. Then, for each object, the squared Euclidean distance to the
cluster means is calculated. These distances are summed for all the
objects. At each stage, the two clusters with the smallest increase in the
overall sum of squares within cluster distances are combined.

10

5
3/3/2020

Agglomerative Clustering Methods

Centroid Method

11

Select a Clustering Procedure


• Of hierarchical methods, average linkage & Ward's
methods have been shown to perform better than other
procedures.

• It has been suggested that hierarchical & nonhierarchical


methods be used in tandem.

12

6
3/3/2020

Formulate the Problem


• Most important part of clustering is selecting the variables
on which clustering is based.
– Inclusion of even one or two irrelevant variables may distort an
otherwise useful clustering solution.
– Basically, set of variables selected should describe similarity
between objects in terms that are relevant to research problem.

• Variables should be selected based on past research,


theory, or a consideration of hypotheses being tested.
– In exploratory research, researcher should exercise judgment &
intuition.

13

Attitudinal Data For Clustering

Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2

14

7
3/3/2020

Term Associated with Cluster Analysis


• Agglomeration schedule. An agglomeration schedule gives
information on objects or cases being combined at each stage of
a hierarchical clustering process.

15

Results of Hierarchical Clustering

Agglomeration Schedule Using Ward’s Procedure


Stage cluster
Clusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage
1 14 16 1.000 0 0 6
2 6 7 2.000 0 0 7
3 2 13 3.500 0 0 15
4 5 11 5.000 0 0 11
5 3 8 6.500 0 0 16
6 10 14 8.160 0 1 9
7 6 12 10.167 2 0 10
8 9 20 13.000 0 0 11
9 4 10 15.583 0 6 12
10 1 6 18.500 6 7 13
11 5 9 23.000 4 8 15
12 4 19 27.750 9 0 17
13 1 17 33.100 10 0 14
14 1 15 41.333 13 0 16
15 2 5 51.833 3 11 18
16 1 3 64.500 14 5 19
17 4 18 79.667 12 0 18
18 2 4 172.662 15 17 19
19 1 2 328.600 16 18 0

16

8
3/3/2020

Term Associated with Cluster Analysis


• Dendrogram. A dendrogram, or tree graph, is a graphical device
for displaying clustering results. Vertical lines represent clusters
that are joined together. Position of line on scale indicates
distances at which clusters were joined. The dendrogram is read
from left to right.

17

18

9
3/3/2020

Term Associated with Cluster Analysis


• Icicle diagram. An icicle diagram is a graphical display of
clustering results, so called because it resembles a row of icicles
hanging from the eaves of a house. Columns correspond to
objects being clustered, & rows correspond to number of
clusters. An icicle diagram is read from bottom to top.

19

20

10
3/3/2020

Vertical Icicle Plot using Ward’s Method

21

Results of Hierarchical Clustering

Cluster Membership of Cases Using Ward’s Procedure


Number of Clusters
Label case 4 3 2

1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2

22

11
3/3/2020

Decide on Number of Clusters


• In hierarchical clustering, distances at which clusters are
combined can be used as criteria.
– This information can be obtained from agglomeration schedule or
from dendrogram.
• Relative sizes of clusters should be meaningful.
• Theoretical, conceptual, or practical considerations may
suggest a certain number of clusters.

23

Interpreting & Profiling Clusters


• Interpreting & profiling clusters involves examining cluster
centroids.
– Centroids enable us to describe each cluster by assigning it a name
or label.

• It is often helpful to profile clusters in terms of variables that


were not used for clustering.
– These may include demographic, psychographic, product usage,
media usage, or other variables.

24

12
3/3/2020

Cluster Centroids

Means of Variables

Cluster No. V1 V2 V3 V4 V5 V6
1 5.750 3.625 6.000 3.125 1.750 3.875

2 1.667 3.000 1.833 3.500 5.500 3.333

3 3.500 5.833 3.333 6.000 3.500 6.000

25

Results of Nonhierarchical Clustering

Initial Cluster Centers

Cluster
1 2 3
V1 4 2 7
V2 6 3 2
V3 3 2 6
V4 7 4 4
V5 2 7 1
V6 7 2 3

a
Iteration History
Change in Cluster Centers
Iteration 1 2 3
1 2.154 2.102 2.550
2 0.000 0.000 0.000
a. Convergence achieved due to no or small distance
change. The maximum distance by which any center
has changed is 0.000. The current iteration is 2. The
minimum distance between initial centers is 7.746.

26

13
3/3/2020

Results of Nonhierarchical Clustering


Cluster Membership

Case Number Cluster Distance


1 3 1.414
2 2 1.323
3 3 2.550
4 1 1.404
5 2 1.848
6 3 1.225
7 3 1.500
8 3 2.121
9 2 1.756
10 1 1.143
11 2 1.041
12 3 1.581
13 2 2.598
14 1 1.404
15 3 2.828
16 1 1.624
17 3 2.598
18 1 3.555
19 1 2.154
20 2 2.102

27

Results of Nonhierarchical Clustering

Final Cluster Centers

Cluster
1 2 3
V1 4 2 6
V2 6 3 4
V3 3 2 6
V4 6 4 3
V5 4 6 2
V6 6 3 4

Distances between Final Cluster Centers

Cluster 1 2 3
1 5.568 5.698
2 5.568 6.928
3 5.698 6.928

28

14
3/3/2020

Results of Nonhierarchical Clustering

ANOVA

Cluster Error
Mean Square df Mean Square df F Sig.
V1 29.108 2 0.608 17 47.888 0.000
V2 13.546 2 0.630 17 21.505 0.000
V3 31.392 2 0.833 17 37.670 0.000
V4 15.713 2 0.728 17 21.585 0.000
V5 22.537 2 0.816 17 27.614 0.000
V6 12.171 2 1.071 17 11.363 0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.

Number of Cases in each Cluster


Cluster 1 6.000
2 6.000
3 8.000
Valid 20.000
Missing 0.000

29

Ethical Issue in Research

30

15
3/3/2020

Ethical issue in Research Process


• Central to research in social science is inclusion of living
organism as research subjects
• This imposes an obligation to treat these organism in a
humane, respectful & ethical manner.

31

Ethical issue: Nuremberg Code


• Participant in research should be voluntary
• Participant has the right to know the nature, purposes &
duration of research
• Researcher to ensure participants are not exposed to
harmful research practice
• Research can be terminated by either participant or
researcher if it becomes obvious to either that continuation
of the experiment could be unacceptable.

32

16
3/3/2020

Ethical issue:
American Psychology Association
• Informed consent must include
– Purpose, expected duration, & procedures of research
– Right to decline to participate & to withdraw from research from
research once participation has begun
– Foreseeable consequences of declining or withdrawing
– Reasonably foreseeable factors that may be expected to influence
their willingness to participate such as potential risk, discomfort, or
adverse effects
– Any prospective research benefits
– Limits of confidentiality, Incentives for participants
– Whom to contact for questions about research & research
participants’ right
• Consent must be obtained for recording
• Steps taken to protect prospective participants

33

Fraud in Research
• Data Fabrication
– Making up data or results and reporting them
• Falsification
– Manipulating research materials, equipment, or processes, or
changing or omitting data or results such that research is not
accurately represented in research record.
• Plagiarism
– Appropriation of another person’s idea, processes, results or words
without giving appropriate credit

34

17
3/3/2020

35

18

You might also like