Professional Documents
Culture Documents
Session-1
1
1/7/2020
Discussion
2
1/7/2020
I don’t know if we
should change the
package of Colgate
toothpaste?
Discussion
3
1/7/2020
Research…
Provides information
to guide decisions
Research…
Reduces risk in
decision making
4
1/7/2020
Discussion
What is Research?
10
5
1/7/2020
Defining Research…
• Systematic investigation into and study of materials &
sources in order to establish facts & reach new conclusions
(Oxford dictionary).
11
Defining Research
• Research is the systematic & objective
– Identification (of information)
– Collection (of information)
– Analysis (of information)
– Dissemination (of information) &
– Use of information
• for improving decision making related to…
– Identification and Solution of problems & opportunities
in business
12
6
1/7/2020
Defining Research
13
14
7
1/7/2020
RESEARCH
SUPPLIERS
INTERNAL
EXTERNAL
Field Other
Internet
Services Services
Services
Syndicate Customized
Services Focus Groups & Technical &
Services
Qualitative Analytical
Services Services
15
Research Classification
Discussion:
16
8
1/7/2020
A Classification of Research
Research
17
18
9
1/7/2020
19
Discussion
20
10
1/7/2020
21
21
Problem Definition
22
11
1/7/2020
Problem Definition
23
24
12
1/7/2020
25
26
13
1/7/2020
Objectives
Buyer Behavior
Legal Environment
Economic Environment
27
28
14
1/7/2020
29
30
15
1/7/2020
Problem Definition
• MDP asks what Decision maker needs to do where as,
31
32
16
1/7/2020
33
17
1/10/2020
1
1/10/2020
2
1/10/2020
• Includes:
– Theory / Objective evidence
– Analytical model (verbal/graphical/mathematical)
– Research question (define it)
– Hypotheses (End product in this step)
3
1/10/2020
Theory/Objective evidence
• Theory
– Example for choice making related theories:
• Theory of rationality
• Bounded rationality Theory
• Objective Evidence
– Empirical observation (available in literature)
(Ajzen, 1985)
4
1/10/2020
(Davis 1989)
Research Question
• MDP: what should be done to improve the patronage of Big
Bazar store?
• Research questions(RQ):
Refined Questions or statements of the specific components
of the (research) problem.
10
5
1/10/2020
11
12
6
1/10/2020
Research Problem
Objective/
Theoretical
Framework;
Analytical
Model
13
Research Hypothesis
• An unproven statement or proposition about a factor or
phenomenon that is of interest to the researcher.
– Often, a hypothesis is a possible answer to the research
question.
– It is mostly about the relationship between two
variables/ two phenomena.
– An empirically testable statement
14
7
1/10/2020
15
• Hypothesis
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
– H2: Store-loyal customers are more risk-averse than are non-loyal
customers.
– H3: Customers of Big Bazar are loyal.
• Inappropriate hypothesis
16
8
1/10/2020
17
18
9
1/15/2020
Session-3
Discussion Environmental
with Decision Context of
Problem
Maker
Defining
RP-1
Interview
with Experts
Problem Situation
Defining Defining
Genesis of MDP RP-2
Problem
Secondary
Data Analysis
Defining
RP-3
Qualitative
Research
1
1/15/2020
Based on Theoretical
Knowledge
Based on
Defining • Literature Review
Defining • Qualitative study
RP-1 RQ-1
Defining
Defining RQ-2 Developing
RP-2 Hypothesis-1
Defining
RQ-3 Developing
Hypothesis-2
Defining
RP-3
Defining
RQ-4 Developing
Hypothesis-2
Situation
• Harley- Davidson made such an important comeback in
early 2000s that there was a long waiting list to get Harley-
Davidson bike.
• In 2007 market share was about 50% in heavyweight bike
category.
• Distributors urging for expansion.
• But the company was skeptical about investing in new
production facilities.
2
1/15/2020
3
1/15/2020
Problem Definition
• MDP:
– Should Harley-Davidson invest to produce more bike?
• RP:
– To determine if customer would be loyal buyers of
Harley-Davidson
Approach to Problem
RQ:
• Who are customers? What are their demographic &
psychographic characteristics?
• Can different types of customers be segmented? Is it
possible to segment market in a meaningful way?
• How do customers feel regarding their Harleys? Are all
customers motivated by the same appeal?
• Are the customers loyal to Harley-Davidson? What is the
extent of brand loyalty?
4
1/15/2020
• RQ:
– Can different types of customers be segmented based
on psychographic characteristics?
• Hypothesis:
– H1: There are distinct segments of bike buyers
Psychographics is the study of personality, values, opinions, attitudes,
interests, and lifestyles.
Research Design
10
5
1/15/2020
• Involves:
– (Define the information needed)
– Design exploratory, descriptive, and/or causal phases of research
– Specify measurement & scaling procedures
– Construct & pretest a questionnaire or an appropriate form for data
collection
– Specify sampling process & sample size
– Develop a plan of data analysis
11
Research Design
12
6
1/15/2020
Exploratory Conclusive
Objective: To provide insights & To test specific hypotheses and
understanding examine relationships
13
Research Design
Cross-Sectional Longitudinal
Design Design
14
7
1/15/2020
Measure effect on
Preplanned and dependent
Often the front end structured design variables
of total research
design Control mediating
variables
15
Exploratory Research
• Can be conducted by analyzing (qualitatively) Primary data
and Secondary data
16
8
1/15/2020
Secondary Data
Internal External
Requires
Published Computerized Syndicated
Ready to Use Further
Materials Databases Services
Processing
17
18
9
1/15/2020
19
20
10
1/15/2020
21
22
11
1/15/2020
Cross- Sample
Sectional Surveyed
Design at T1
Same
Sample Sample
Longitudinal also
Surveyed
Design Surveyed
at T1
at T2
Time T1 T2
23
Detecting Change - +
Large amount of data collection - +
Accuracy - +
Representative Sampling + -
Response bias + -
24
12
1/15/2020
• METHOD: Experiments
25
Exploratory Research
(a) •Secondary Data Conclusive Research
Analysis •Descriptive/Causal
•Focus Groups
Conclusive Research
(b) •Descriptive/Causal
Exploratory Research
Conclusive Research •Secondary Data
(c) •Descriptive/Causal Analysis
•Focus Groups
26
13
1/15/2020
Background of Study
Definition of MDP & RP
Developing RQ
Conducting Qualitative Research & Literature Review
Developing Hypothesis
Developing /Adopting Questionnaire
Data Collection
Testing Hypothesis
Drawing Managerial Implication & Conclusion
Limitation of Study
27
28
14
1/21/2020
Session-4
1
1/21/2020
2
1/21/2020
3
1/21/2020
In-depth Interview
• One on one interviews
• Encourages an intimate dialogue
• Variations in interviews
– Depth Interviews – 45 minutes to 1 hour
– Intensive Depth Interviews – 2 to 3 hours
– Focused Interviews – 30 minutes (for advertising check)
• Appearance of Interviewer must match with respondent
4
1/21/2020
10
5
1/21/2020
11
12
6
1/21/2020
13
14
7
1/21/2020
Word
Picture
Brand personification …
15
16
8
1/21/2020
17
18
9
1/21/2020
Ethnography
Nature of Observation
ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process
19
20
10
1/21/2020
Ethnography
Nature of Observation
ACTIVE PASSIVE
A researcher takes part in the
process of respondent performing A researcher acts as an
their behavior outsider when the respondent is
More like a scene where respondent performing their behavior
demonstrates how they usually do it Everything proceeds naturally
for you. and uninterrupted
Periodic questioning or clarification Questioning and clarification is
is done at the spot. done before or after the process
21
Ethnography: Importance
• One of the best ways to gain deeper customer insight.
• …to get to know customers and their culture, & role
certain products play in their lives.
• …shows consumer reality rather than consumer
reconstruction.
• …helps identify contradictions between what people say
they do & what they actually do.
• … enables us to identify their hidden needs- and this is
where real breakthroughs can occur.
22
11
1/21/2020
Netnography
• Ethnography: Study of a community
• Data Sources:
– Archival Netnographic Data
– Social Network Analysis
– Elicited Netnographic Data
23
Advantage Disadvantage
24
24
12
1/21/2020
Reference
• Qualitative Research- Discussion Guide: Textbook (page
167- 171)
• FGD discussion Guide: Textbook (page 140-142)
25
Issue
• In recent times, education loan from bank has grown & SBI is the one of
major player in this market.
• In 2010, education loan market of X premier business school students in
India was studied for SBI. It was found that among the students of X
Business school, market share of education loan for SBI was 87%.
• However, in 2013 the market share dipped to 82%.
• Again a study was conducted in 2016, it was found that market share of
SBI in education loan among students of that business school has further
slipped to 76%.
• It was also observed that market share of CBI, another PSU bank, is
constantly increasing. In 2010, the market share of education loan for
CBI was 7% and it has increased to 18% in 2016.
• SBI is now worried about losing market share among students &
hired you to conduct research.
(Above Information is just an illustration)
26
13
1/21/2020
Issue
• Design preference & perception for mobile phones
27
28
14
1/21/2020
Session-5
Research Data
Descriptive Causal
1
1/21/2020
Scale
2
1/21/2020
Nominal Scale
• Basic operation: = or ≠
Ordinal Scale
3
1/21/2020
Interval Scale
Ratio Scale
4
1/21/2020
Numbers
Nominal 4 81 9
Assigned to
Runners
Rank Order of
Ordinal
Winners
10
5
1/21/2020
11
12
6
1/21/2020
13
14
7
1/21/2020
Questionnaire
15
16
8
1/21/2020
17
18
9
1/21/2020
20
10
1/21/2020
21
22
11
1/21/2020
23
Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?
• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size
24
12
1/21/2020
25
13
31-Jan-20
Session-6
Sampling
• Census Vs. Sampling
– Sampling is the selection of a subset (a statistical sample) of
individuals from within a statistical population to estimate
characteristics of the whole population
– Why Proper Sampling is important?
• Target Population
• Sampling Frame
– A representation of the elements of the target population. It consists
of a list or set of directions for identifying the target population
• Sampling technique & Sample Size
1
31-Jan-20
Sampling Techniques
Nonprobability Probability
Sampling Sampling
Convenience Sampling
• Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected
because they happen to be in the right place at the right
time.
– use of students, and members of social organizations
– mall intercept interviews without qualifying the
respondents
– “people on the street” interviews
2
31-Jan-20
Judgmental Sampling
• Judgmental sampling is a form of convenience sampling in
which the population elements are selected based on the
judgment of the researcher.
– test markets
– purchase engineers selected in industrial marketing
research
– expert witnesses used in court
Quota Sampling
• Quota sampling may be viewed as two-stage restricted
judgmental sampling.
– The first stage consists of developing control categories, or quotas,
of population elements.
– In the second stage, sample elements are selected based on
convenience or judgment.
Population Sample
composition composition
Control
Characteristic % % Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000
3
31-Jan-20
Snowball Sampling
• In snowball sampling, an initial group of respondents is
selected, usually at random.
• After being interviewed, these respondents are asked to
identify others who belong to the target population of
interest.
• Subsequent respondents are selected based on the
referrals.
Sampling Techniques
Nonprobability Probability
Sampling Techniques Sampling Techniques
4
31-Jan-20
Systematic Sampling
• The sample is chosen by selecting a random starting point
and then picking every ith element in succession from the
sampling frame.
– For example, there are 100,000 elements in the population and a
sample of 1,000 is desired. In this case the sampling interval, i, is
100. A random number between 1 and 100 is selected. If, for
example, this number is 23, the sample consists of elements 23, 123,
223, 323, 423, 523, and so on.
10
5
31-Jan-20
Stratified Sampling
• A two-step process in which the population is partitioned
into subpopulations, or strata.
– The strata should be mutually exclusive & collectively
exhaustive in that every population element should be assigned to
one and only one stratum and no population elements should be
omitted.
– Next, elements are selected from each stratum by a random
procedure, usually SRS.
• A major objective of stratified sampling is to increase
precision without increasing cost.
• The elements within a stratum should be as homogeneous
as possible, but the elements in different strata should be as
heterogeneous as possible.
11
Cluster Sampling
• The target population is first divided into mutually exclusive
and collectively exhaustive subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a
probability sampling technique such as SRS.
• For each selected cluster, either all the elements are
included in the sample (one-stage) or a sample of elements
is drawn probabilistically (two-stage).
• Elements within a cluster should be as heterogeneous as
possible, but clusters themselves should be as
homogeneous as possible. Ideally, each cluster should be a
small-scale representation of the population.
12
6
31-Jan-20
13
14
7
31-Jan-20
15
16
8
31-Jan-20
17
17
18
9
31-Jan-20
19
• Ho μ > 25
• H1 μ < 25
– Take action if Ho is rejected (H1 is accepted).
20
10
31-Jan-20
• Ho μ < 20
• H1 μ > 20
– Take action if Ho is rejected (H1 is accepted).
21
21
• Ho μ = 50
• H1 μ =/ 50
– Take action if Ho is rejected (H1 is accepted).
• Two-Tail Study
22
22
11
31-Jan-20
• Hypothesis (alternate)
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
– H2: Store-loyal customers are more risk-averse than are non-loyal
customers.
– H3: Customers of Big Bazar are loyal.
23
– H0: Customers who are store loyal are more or as (at least as)
knowledgeable about the shopping environment as other
customers.
– H1: Customers who are store loyal are less knowledgeable about
the shopping environment.
24
12
31-Jan-20
Types of Hypotheses
Null
– H0: μ = 50
– H0: μ < 50
– H0: μ > 50
Alternate
– HA: μ =/ 50
– HA: μ > 50
– HA: μ < 50
25
25
26
26
13
31-Jan-20
Two-tail Study
27
27
28
28
14
31-Jan-20
29
29
• Note:
– A null hypothesis may be rejected, but it can never be accepted
based on a single test.
– In classical hypothesis testing, there is no way to determine whether
the null hypothesis is true.
30
30
15
31-Jan-20
31
31
Hypothesis Tests
Tests of Tests of
Association Differences
Median/
Distributions Means Proportions Rankings
32
32
16
31-Jan-20
Frequency Distribution
33
Frequency Distribution
• In a frequency distribution, one variable is considered at a
time.
– A frequency distribution for a variable produces a table of
frequency counts, percentages, & cumulative percentages for all
values associated with that variable.
34
34
17
31-Jan-20
Measures of Location
• Mean
– Most commonly used measure of central tendency.
– Used when data is in interval or ratio scale.
• Median
– Middle value when data are arranged in ascending or descending
order. It is the 50th percentile.
– When data is in Ordinal Scale & also interval or ratio scale
• Mode
– The value that occurs most frequently & represents the highest
peak of the distribution.
– Mode is a good measure of location when the variable is inherently
categorical or has otherwise been grouped into categories.
35
35
Measures of Variablity
• Variability is a measure of the dispersion or spread of
scores in a distribution.
– Variability ranges from 0 to ∝.
• Range
• Interquartile Range
• Variance
– Mean squared deviation from the mean. The variance can never be
negative.
• Standard Deviation
– Square root of the variance.
• Coefficient of variation
– Ratio of SD to the mean expressed as a percentage & is a unitless
measure of relative variability.
– Can be used with ratio scale only.
36
36
18
31-Jan-20
Symmetric Distribution
Skewed Distribution
37
• A positively skewed
distribution is a
distribution of scores
where a few outliers are
substantially larger (toward
the right tail in a graph)
than most other scores.
• A negatively skewed
distribution is a
distribution of scores
where a few outliers are
substantially smaller
(toward the left tail in a
graph) than most other
scores. 38
38
19
31-Jan-20
39
39
40
20
2/8/2020
Session-7
Cross-Tabulation
1
2/8/2020
Cross-Tabulation
• While a frequency distribution describes one variable at a time, a
cross-tabulation describes two or more variables simultaneously.
General rule is to
compute % in the
direction of the
independent variable,
across the dependent
variable.
2
2/8/2020
χ2
φ=
n
3
2/8/2020
2
φ
V=
min (r-1), (c-1)
χ2/n
V=
min (r-1), (c-1)
Exercise
4
2/8/2020
Case Problem
• To find out frequency distribution of Familiarity with
Internet among sample.
• To find out
– Mean, Median & Mode;
– Standard deviation; &
– Skewness & Kurtosis of Familiarity rating with Internet
among sample.
10
10
5
2/8/2020
• To find out
– Whether there is any association between theses
variables or not
11
11
Parametric Test
12
6
2/8/2020
Hypothesis Tests
Tests of Tests of
Association Differences
Median/
Distributions Means Proportions Rankings
13
13
14
14
7
2/8/2020
Hypothesis Tests
Independent Paired
Samples Independent Paired
Samples
Samples Samples
* Two-Group t * Paired
test t test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
15
15
Parametric Test
• One Sample Test
• Two independent Sample test
• Paired Sample test
16
16
8
2/8/2020
17
18
18
9
2/8/2020
19
19
One-Sample Test
Test Value = 4
95% Confidence
Interval of the
Difference
Mean
t Df Sig. (2-tailed) Difference Lower Upper
Familiarity 2.470 28 .020 .724 .12 1.32
20
20
10
2/8/2020
21
21
22
22
11
2/8/2020
23
24
24
12
2/8/2020
25
25
H0: 2 = 2
1 2
H1: 2 2
1 2
26
26
13
2/8/2020
Group Statistics
Std. Error
Sex N Mean Std. Deviation
Mean
27
27
28
14
2/10/2020
Session-8
1
2/10/2020
2
2/10/2020
H0: 2 = 2
1 2
H1: 2 2
1 2
Group Statistics
Std. Error
Sex N Mean Std. Deviation
Mean
3
2/10/2020
4
2/10/2020
10
10
5
2/10/2020
Hypothesis Tests
11
11
Non-Parametric Test
12
6
2/10/2020
Non-Parametric Tests
• Nonparametric tests are used when the independent
variables are nonmetric.
13
13
14
7
2/10/2020
15
15
16
16
8
2/10/2020
• Hypothesis
Ho: Internet Usage are normally distributed
H1: Internet Usage are NOT normally distributed
17
17
Descriptive Statistics
Std.
N Mean Minimum Maximum
Deviation
Internet Usage
30 6.60 4.296 2 15
Hrs/Week
18
9
2/10/2020
19
19
K-S Table
20
20
10
2/10/2020
Lilliefors
Test Table
21
21
22
22
11
2/10/2020
χ2 = (fo - fe)2
Σ fe
• Uniform distribution
Ho: The ratings of familiarity with internet are uniformly distributed
H1: The ratings of familiarity with internet are not uniformly
distributed.
• Expected Distribution
Ho: The observed distribution is the same as the expected distribution
H1: The observed distribution is not the same as the expected
distribution
23
23
24
24
12
2/10/2020
Binomial Test
• Expected Proportion (for testing Population proportion)
Ho: p = 0.5
H1: p =/ 0.5
25
25
26
26
13
2/10/2020
27
Column Total 15 15
• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?
28
28
14
2/10/2020
29
29
(f - f )2
χ2 =
Σ
o e
f
e
n rn c
fe = n
30
30
15
2/10/2020
31
31
χ2
φ=
n
32
32
16
2/10/2020
33
2
φ
V=
min (r-1), (c-1)
χ2/n
V=
min (r-1), (c-1)
34
34
17
2/10/2020
35
18
2/14/2020
Session-9a
1
2/14/2020
2
2/14/2020
Column Total 15 15
• Exercise:
Is the proportion of respondents using the Internet for
shopping indifferent to gender (males and females)?
Mann-Whitney U test
3
2/14/2020
H1: The two populations (male & female) are not identical with respect
to familiarity with internet. (Mean Rank with respect to familiarity for two
populations are not same)
7
4
2/14/2020
Test Statisticsa
Familiarity
Mann-Whitney U 31.500
Wilcoxon W 151.500
Z -3.277
Asymp. Sig. (2-tailed)
.001
Exact Sig. [2*(1-tailed Sig.)]
.001b
a. Grouping Variable: Sex
b. Not corrected for ties. 9
10
5
2/14/2020
• Hypothesis
Ho: Md = 0
H1: Md =/ 0
11
11
6
2/14/2020
13
13
• Hypothesis
Ho: pluses = minus
H1: pluses =/ minus
14
14
7
2/14/2020
N
Attitude toward Technology - Negative Differencesa 23
Attitude toward Internet
Positive Differencesb 1
Tiesc 6
Total 30
a. Attitude toward Technology < Attitude toward Internet
b. Attitude toward Technology > Attitude toward Internet
c. Attitude toward Technology = Attitude toward Internet
Test Statisticsa
Attitude toward
Technology - Attitude
toward Internet
Exact Sig. (2-tailed) .000b
a. Sign Test
b. Binomial distribution used.
15
15
Hypothesis Tests
16
16
8
2/14/2020
17
9
2/14/2020
Session-9b
Causality
1
2/14/2020
Definitions of Terms
• Independent variables
– Variables or alternatives that are manipulated & whose effects
are measured & compared, e.g., price levels.
• Test units
– Individuals, organizations, or other entities whose response to
the independent variables or treatments is being examined,
e.g., consumers or stores.
• Dependent variables
– Variables which measure effect of independent variables on test
units, e.g., sales, profits, market shares.
• Extraneous variables
– Variables other than independent variables that affect response
of test units, e.g., store size, store location, competitive effort.
• Experiment
– Process of manipulating one or more independent variables and
measuring their effect on one or more dependent variables, while
controlling for the extraneous variable
Illustration:
• Whether humor has positive effect on the purchase intention of the
products that are purchased impulsively.
4
2
2/14/2020
Validity in Experiment
3
2/14/2020
Limitations of Experiment
4
2/14/2020
Experimental Design
X 01
• A single group of test units is exposed to a treatment X.
• A single measurement on dependent variable is taken.
• There is no random assignment of test units.
• One-shot case study is more appropriate for exploratory than for
conclusive research.
Note:
X: Exposure to a treatment
O: Observation
10
10
5
2/14/2020
01 X 02
• A group of test units is measured twice.
Note:
X: Exposure to a treatment
O: Observation
11
11
EG: X 01
CG: 02
• A two-group experimental design.
• EG is exposed to treatment, & CG is not.
• Measurements on both groups are made only after treatment.
• Test units are not assigned at random.
• Treatment effect would be measured as 01 - 02.
Note
EG: Experimental group (EG)
CG: Control group (CG)
X: Exposure to a treatment
12
12
6
2/14/2020
EG: R 01 X 02
CG: R 03 04
Note
EG: Experimental group (EG)
CG: Control group (CG)
R: Randomization
X: Exposure to a treatment
13
13
EG : R X 01
CG : R 02
Note
EG: Experimental group (EG)
CG: Control group (CG)
R: Randomization
X: Exposure to a treatment
14
14
7
2/14/2020
15
15
Treatment Groups
Block Store Commercial Commercial Commercial
Number Patronage A B C
1 Heavy A B C
2 Medium A B C
3 Low A B C
4 None A B C
16
16
8
2/14/2020
Factorial Design
17
17
Factorial Design
Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor
Low A B C
Medium D E F
High G H I
18
18
9
2/14/2020
19
ANOVA: Introduction
• There must also be one or more independent variables that are all
categorical (nonmetric).
20
20
10
2/14/2020
ANOVA: Introduction
21
21
22
22
11
2/14/2020
Independent Variable X
Total
Categories Sample
Within X1 X2 X3 … Xc
Category Y1 Y1 Y1 Y1 Y1 Total
Variation Variation
Y2 Y2 Y2 Y2 Y2 =SSy
=SSwithin : :
: :
Yn Yn Yn Yn YN
Category Y1 Y2 Y3 Yc Y
Mean
Between Category Variation = SSbetween
23
23
24
24
12
2/14/2020
SS x /(c - 1)
F= = MS x
SS error/(N - c) MS error
25
25
Interpret Results
26
26
13
2/14/2020
27
28
28
14
2/14/2020
29
29
Cell means
30
30
15
2/14/2020
31
31
32
32
16
2/14/2020
Assumptions in ANOVA
33
33
Thank You
34
17
22-Feb-20
Session-10
1
22-Feb-20
Factorial Design
Factorial Design
Amount of Humor
Amount of Store No Medium High
Information Humor Humor Humor
Low A B C
Medium D E F
High G H I
2
22-Feb-20
Two-way ANOVA
Two-way ANOVA
Cell Means
Promotion Coupon Count Mean
High Yes 5 9.200
High No 5 7.400
Medium Yes 5 7.600
Medium No 5 4.800
Low Yes 5 5.400
Low No 5 2.000
TOTAL 30
Factor Level
Means
Promotion Coupon Count Mean
High 10 8.300
Medium 10 6.200
Low 10 3.700
Yes 15 7.400
No 15 4.733
Grand Mean 30 6.067
6
3
22-Feb-20
Issues in Interpretation
• Multiple comparisons,
• Interactions effects
• Relative importance of factors
Example:
• Medicines A & B may have no effect when either is taken alone. But, the two
together may have an effect. “The whole is different from the sum of the
parts.”
• Good teachers & small classrooms might both encourage learning. A good
teacher in a small classroom might be especially effective.
4
22-Feb-20
Patterns of Interaction
Y X 21 Y X 21
X 11 X 12 X13 X 11 X 12 X13
Case 3: Interaction Case 4: Interaction
X 22 X 22
Y X 21 Y
X21
X 11 X 12 X13 X 11 X 12 X13
= 0.557
10
10
5
22-Feb-20
= 0.280
11
11
ANCOVA
12
6
22-Feb-20
13
13
ANCOVA: Examples
14
14
7
22-Feb-20
ANCOVA: Illustration
15
15
Analysis of Covariance
16
16
8
22-Feb-20
MANOVA
17
• If, however, there are multiple dependent variables that are uncorrelated or
orthogonal, ANOVA on each of dependent variables is more appropriate.
18
18
9
22-Feb-20
MANOVA: Example
19
19
20
10
22-Feb-20
21
21
22
22
11
22-Feb-20
Ranks
Test Statisticsa,b
SALES
Chi-Square 16.529
df 2
Asymp. Sig. .000
a. Kruskal Wallis Test
b. Grouping Variable: PROMOTION
23
23
24
24
12
22-Feb-20
Frequencies
PROMOTION
1 2 3
SALES >
Median 9 4 1
<=
Median 1 6 9
Test Statisticsa
SALES
N 30
Median 6.00
Chi-Square 13.125b
df 2
Asymp. Sig. .001
a. Grouping Variable: PROMOTION
b. 3 cells (50.0%) have expected frequencies less
than 5. The minimum expected cell frequency is
4.7.
25
25
26
26
13
22-Feb-20
Session: 11
CORRELATION
1
22-Feb-20
Y6
0
-3 -2 -1 0 1 2 3
X
4
2
22-Feb-20
Partial Correlation
Partial correlation coefficient measures association
between two variables after controlling for, or adjusting
for, effects of one or more additional variables.
rx y - (rx z ) (ry z )
rx y . z =
1 - rx2z 1 - ry2z
3
22-Feb-20
rx y - ry z rx z
ry (x . z ) =
1 - rx2z
Nonmetric Correlation
Spearman's rho & Kendall's tau are two measures of
nonmetric correlation.
– Both measures use rankings rather than absolute values of
variables. Both vary from -1.0 to +1.0.
4
22-Feb-20
REGRESSION
Regression
• Yi = + Xi + ei
10
5
22-Feb-20
Regression
• Examines associative relationships between a metric
dependent variable & one or more independent variables
(does not imply or assume any causality) in following ways:
– Determine whether independent variables explain a significant
variation in dependent variable: whether a relationship exists.
– Determine how much of variation in dependent variable can be
explained by independent variables: strength of relationship.
– Determine structure or form of relationship: mathematical
equation relating independent and dependent variables.
– Control for other independent variables when evaluating
contributions of a specific variable or set of variables.
– Predict values of the dependent variable.
11
12
6
22-Feb-20
13
9
Attitude
Duration of Residence
14
7
22-Feb-20
Line 2
9 Line 3
Line 4
6
15
Bivariate Regression
Y β0 + β1X
YJ
eJ
eJ
YJ
X
X1 X2 X3 X4 X5
16
8
22-Feb-20
Bivariate Regression
Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329
ANALYSIS OF VARIANCE
df Sum of Squares Mean Square
17
H0: R2pop = 0
18
9
22-Feb-20
19
20
10
22-Feb-20
Assumptions of Regression
Error term is normally distributed.
Mean of error term is 0.
21
Multiple Regression
General form of multiple regression model is as
follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk
22
11
22-Feb-20
Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974
ANALYSIS OF VARIANCE
df Sum of Squares Mean Square
23
Significance Testing
H0 : R2pop = 0
H0: β 1 = β2 = β 3 = . . . = β k = 0
SS reg /k
F=
SS res /(n - k - 1)
= R 2 /k
2
(1 - R )/(n- k - 1)
24
12
22-Feb-20
Significance Testing
t= b
SE
b
25
26
13
22-Feb-20
Stepwise Regression
Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.
27
Stepwise Regression
Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.
28
14
27-02-2020
Session: 12a
Multiple Regression
General form of multiple regression model is as
follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k X k + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk
1
27-02-2020
Multiple Regression
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974
ANALYSIS OF VARIANCE
df Sum of Squares Mean Square
Significance Testing
H0 : R2pop = 0
H0: β 1 = β2 = β 3 = . . . = β k = 0
SS reg /k
F=
SS res /(n - k - 1)
= R 2 /k
2
(1 - R )/(n- k - 1)
2
27-02-2020
Significance Testing
t= b
SE
b
3
27-02-2020
Stepwise Regression
Purpose of stepwise regression is to select, from a
large number of predictor variables, a small subset of
variables that account for most of variation in
dependent or criterion variable.
– In this procedure, predictor variables enter or are removed from
the regression equation one at a time.
– It has several approaches - Forward inclusion; Backward
elimination; & Stepwise solution.
Stepwise Regression
Forward inclusion. Initially, there are no predictor variables in
regression equation. Predictor variables are entered one at a time,
only if they meet certain criteria specified in terms of F ratio. Order
in which variables are included is based on contribution to
explained variance.
Backward elimination. Initially, all predictor variables are
included in regression equation. Predictors are then removed one
at a time based on F ratio for removal.
Stepwise solution. Forward inclusion is combined with removal of
predictors that no longer meet specified criterion at each step.
4
27-02-2020
Caution about R²
Value of R² can be “artificially” increased by simply
adding explanatory variable to regression model.
– For comparing two regression models with same dependent
variable ‘y’ but differing number of explanatory variables – the
model with higher R² value is not necessarily the better one.
Adjusted R²
For comparing two regression models, it is advisable to
compute adjusted R²
Adjusted R² =
Where
• K is the number of independent variables in the model, excluding
the constant.
• N is the number of points in your data sample.
10
5
27-02-2020
Residuals
Time
11
Multicollinearity
It arises when intercorrelations among predictors are
very high.
• Few Problems due to Multicollinearity
– Partial regression coefficients may not be estimated precisely.
Standard errors are likely to be high.
– Magnitudes, as well as the signs of partial regression coefficients,
may change from sample to sample.
– It becomes difficult to assess relative importance of independent
variables in explaining variation in dependent variable.
– Predictor variables may be incorrectly included or removed in
stepwise regression.
12
6
27-02-2020
Multicollinearity: Correction
• A simple procedure for adjusting for multicollinearity consists of
using only one of the variables in a highly correlated set of
variables.
13
THANK YOU
14
7
3/3/2020
Factor Analysis
1
3/3/2020
2
3/3/2020
Factor Analysis
3) Interpret the pattern of correlations – what is related to what?
3
3/3/2020
• Confirmatory
– Evaluate a specific, clearly-articulated hypotheses about a
correlational structure among variables
– Get “fit” indices & significance tests
Data Matrix
• Factor analysis is totally dependent on correlations between
variables.
• Factor analysis summarizes correlation structure
Data Matrix
4
3/3/2020
10
10
5
3/3/2020
11
Correlation Matrix
12
6
3/3/2020
13
14
7
3/3/2020
15
Scree Plot
3.0
2.5
2.0
Eigenvalue
1.5
1.0
0.5
0.0
1 2 3 4 5 6
Component Number
16
8
3/3/2020
17
18
9
3/3/2020
19
Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
High Loadings
High Loadings
After Rotation
Before Rotation
20
10
3/3/2020
Unrotated Factors
21
Rotated Factors
22
11
3/3/2020
23
23
Thank You
24
12
3/3/2020
Cluster Analysis
1
3/3/2020
Cluster Analysis
• Techniques used to classify objects or cases into relatively
homogeneous groups called clusters.
– Examine an entire set of interdependent relationship.
– No distinction between dependent & independent variable
Variable 2
2
3/3/2020
Variable 1
X
Variable 2
3
3/3/2020
Clustering Procedure
Clustering Procedures
Hierarchical Nonhierarchical
Agglomerative Divisive
Ward’s Method
4
3/3/2020
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
• Ward's procedure: For each cluster, the means for all the variables are
computed. Then, for each object, the squared Euclidean distance to the
cluster means is calculated. These distances are summed for all the
objects. At each stage, the two clusters with the smallest increase in the
overall sum of squares within cluster distances are combined.
10
5
3/3/2020
Centroid Method
11
12
6
3/3/2020
13
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2
14
7
3/3/2020
15
16
8
3/3/2020
17
18
9
3/3/2020
19
20
10
3/3/2020
21
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
22
11
3/3/2020
23
24
12
3/3/2020
Cluster Centroids
Means of Variables
Cluster No. V1 V2 V3 V4 V5 V6
1 5.750 3.625 6.000 3.125 1.750 3.875
25
Cluster
1 2 3
V1 4 2 7
V2 6 3 2
V3 3 2 6
V4 7 4 4
V5 2 7 1
V6 7 2 3
a
Iteration History
Change in Cluster Centers
Iteration 1 2 3
1 2.154 2.102 2.550
2 0.000 0.000 0.000
a. Convergence achieved due to no or small distance
change. The maximum distance by which any center
has changed is 0.000. The current iteration is 2. The
minimum distance between initial centers is 7.746.
26
13
3/3/2020
27
Cluster
1 2 3
V1 4 2 6
V2 6 3 4
V3 3 2 6
V4 6 4 3
V5 4 6 2
V6 6 3 4
Cluster 1 2 3
1 5.568 5.698
2 5.568 6.928
3 5.698 6.928
28
14
3/3/2020
ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
V1 29.108 2 0.608 17 47.888 0.000
V2 13.546 2 0.630 17 21.505 0.000
V3 31.392 2 0.833 17 37.670 0.000
V4 15.713 2 0.728 17 21.585 0.000
V5 22.537 2 0.816 17 27.614 0.000
V6 12.171 2 1.071 17 11.363 0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
29
30
15
3/3/2020
31
32
16
3/3/2020
Ethical issue:
American Psychology Association
• Informed consent must include
– Purpose, expected duration, & procedures of research
– Right to decline to participate & to withdraw from research from
research once participation has begun
– Foreseeable consequences of declining or withdrawing
– Reasonably foreseeable factors that may be expected to influence
their willingness to participate such as potential risk, discomfort, or
adverse effects
– Any prospective research benefits
– Limits of confidentiality, Incentives for participants
– Whom to contact for questions about research & research
participants’ right
• Consent must be obtained for recording
• Steps taken to protect prospective participants
33
Fraud in Research
• Data Fabrication
– Making up data or results and reporting them
• Falsification
– Manipulating research materials, equipment, or processes, or
changing or omitting data or results such that research is not
accurately represented in research record.
• Plagiarism
– Appropriation of another person’s idea, processes, results or words
without giving appropriate credit
34
17
3/3/2020
35
18