You are on page 1of 11

January 8th, 2021

Final project report to be submitted after 7 days of completion of subject

Primary research project, consumer centric | Psycho-graphic study

Stages involved in Business Research

1. Problem Statement
a. Exploration <> Objective / Research questions
b. Methods of Validation / Basic Research
c. Questionnaire formulation
d. Sampling plan
e. Methods of data collection
Above are all steps in Desk research/planning
2. Field visit
3. Editing and cleaning of data
4. Data analysis & interpretation > Lead to solutions for step (a)

January 9th, 2021

Research Design: Blueprint for planning phase of research. It is always made in such a fashion that it
answers the 6 W’s(What, Why, How, Who, When, Where)

U/Disturbances/Errors are called extraneous variables – minimized than max in causal vs descriptive

Exploratory Design Ways:


1. Self and Peer experience
2. Literature studies, internet
3. Secondary data analysis, market available data
4. Expert opinion
a. Expert interview
b. panel discussion
5. Qualitative analysis on end consumer
a. Direct
i. Focus Groups
ii. in-depth interviews
b. Indirect – projective techniques
i. Word association test
ii. Sentence completion test
iii. Thematic appreciation test
iv. Role play

RD and Dupont to be read/completed - Done

Week of Jan 17th to schedule 2 hr sessions with Rastogi sir……

January 14th

FG’s to have a common ground discussion for coming to conclusion

FG’s to be homogenous

FG’s to have moderator


January 16th

Focus Group Discussion Panel Discussion In-depth interview Expert interview


End Consumers Experts Consumer Expert
8-12 3-4 Unstructured, free Structured,
flow pointed questions
Homogenous Heterogenous Loosely prepared Fixed set of
questions question
To get minimum common Get as much diverse Time consuming Fixed time
thought/set of variables information as possible
Inexpensive Expensive Less expensive Expensive
Moderator – who knows Moderator – to keep Laddering, hidden Direct answers
the problem discussion on track and issue & symbolic with logic, point
keep everyone in check analysis of view
35-40 25-30 Views expressed after Views are direct
thought process

January 23rd:

Projective Techniques – Word association (top of the mind recall), Sentence completion, thematic
aperception, role play

Dupont Discussion:

Jan 22 – Dupont Discussion

To study the factors influencing the buying decision in residential segment

b. To study the market share vs competitors

c. To understand the demographic

d. Would the launch in residential influence in commercial segment

Whether feel/touch influence the buying decision

ii. Whether visual / design / pattern of carpet influencing…

iii. Does quality influencing

iv. Does durability influence

v. Does color influence

vi. Does the purpose of use influence

vii. Does price influence

Causal/Experimental design: Impact of cause on variable, keeping everything else constant


3 elements of Experimental design: Randomization, Replication & local control

Experimental Design: Pre-experimental, true experimental, quasi-experimental, statistical

Scale and scaling design – Causal design and DuPont case discussion

February 4th:

Causal Design: 3 basic principles of Randomization, Replication and Local Control

Pre-Experimental design: least efficient

Extraneous errors – are those not interested to study, but only for finding dependent and uncontrolled
variables – to minimize the influences of all other disturbances/errors

Static Design – Experimental group vs control group. Division is not based on any scientific principles but
arbitrary decision of researcher

Causal design types and discussion

Demographic variables are nominal or ratio

Read Scales and Scaling technique > Psychographic variables > Comparative(ordinal) and Non-
comparitive (interval)

Paired comparison: if row is preferred over column, we give 1 else 0. Diagonal elements are ignored (no
comparison b/w same). Each triangle should be identical and then the rows are to be summed up for
max weightage of any particular parameter.

Read Questionnaire & pg 360 Hospital questionnaire | pro’s con’s of design

Sem-2  April 15th – 20th

Exploration > Objectives to be ascertained > Questionnaire to be made(should start with the purpose of
survey) > Samling plan(size, target and method of selecting sample) > Method of collecting the data >
analysis if tool (hypo testing)
April 2nd: Cosmopolitan survey on sexual behavior of US women

To cover: Nike case, US women survey

Q3. Non probabilistic convenient sampling

Q4.

Notes on simple regression – revise

Regression equation =

Ei = random error

B1= regression slop

Multiple regression or multi-variate regression: Assuming, variables are linear in nature and
independent X’s.

Either convert non-linear into linear or apply using log linear

Y = Dependent, regressed, study variable

α = constant, fixed effect


B1, B2, B3 = partial regression co-efficients

X1, X2, X3 = based on exploratory research(if correct variables are surveyed, U would be less impactful)

U = disturbance, brings randomness in equation

Y = α + B1 + B2 + B3 + U (σ 2)

If X is changed by 1 unit, Y will change by B 1 units = partial regression

Assumptions for nulti-variate:

1. Linear relationship b/w X & Y


2. Y is normally distributed (in case of violation, Y should be measured on interval scale)
3. Multiple regression cannot be applied for ordinal or nominal case variables

Logistic regression: Used for categories and assessment of risks

Discriminant regression: discrimination b/w categories

Scatterplots are drawn assuming linearity, b/w Y & Y’(predicted value)

Steps for regression:


1. Test the assumptions
2. Fit the regression equation & interpret the coefficients
Y^ = α ^ + B1^ + B2^ + B3^(Estimated, best fit line)
3. Interpret the efficiency of forecast by using R 2
4. Test for the significance of the regression equation using ANOVA
5. Test for the significance of the independent variables using ‘t’ test

Case study - Sales as a function of other variables

1. Test for normality of “Sales” using Kolmogorov Smirnov test

Z=0.244, p=0.000, α =0.05

Since Z> α , Reject H0

H0: Sales follows normal distribution

H1: Sales does not follow normal distribution, but sales is measured on ratio scale hence using

2. Fit the model for predicted values | Run Analyze>Regression>Linear


a. All are positively influencing other than “index”
b. Market potential is increased by 1 unit(1 lakh), sales will increase by 0.312 lakhs,
keeping other variables constant
c. If dealers are increased by 1, sales increases by 0.309 lakhs
d. If the number of salesperson is increased by 1, sales will increase by 0.696 lakhs
e. If competition increases by 1 unit, sales will decrease by 3.219 lakhs
3. For comparing relative importance between different influencers, we look at standardized β
coefficients
4. Efficiency R2 = 0.9737, remaining is U | By adding variables, we can reduce R 2
a. Adjusted R2 is taking into account other independent variables
5. Construct scatter plot to check linearity
6. To test for the significance of R2/goodness of fit | Highest value of R2 will decide which model is
the best fit
a. H0: Model insignificantly explains sales
b. H1: Model significantly explains sales
Since F=265.339, p=0 & α =0.05, p< α , Reject H0
7. Significance of independent variables(all) | t-stat talks about significance
a. H0: Market potential insignificantly influences sales(R 2=0)
b. H1: Market potential significantly influences sales(R 2>0)
Since p< α , Reject H0
8. To test the model | Best Linear Unbiased Estimated (BLUE) model | Markovian testing
9. Assumptions for BLUE are:
a. The mean of residuals is 0
b. The residuals are normally distributed
c. Variance of residuals is constant(homogeneity of variance)
d. Successive residuals are un-correlated to independent among themselves | Violation of
this assumption is called auto-correlation | All time series are auto-regressive
e. The independent variables X are independent among themselves(violation of this will
become multi collinearity)

Identification of multi-collinearity:

1. Variance Inflating Factor = 1/(1- R2) | VIF will live b/w 0-1
a. If VIF = 1, R2 = 0 => MC is absent
b. If VIF = 1-6 => MC is insignificant
c. If VIF = 6-10, MC is significant(Researcher’s decision zone)
d. If VIF > 10, sever MC => Remedial action is required
2. Condition indices: If CI = 1
a. If CI = 1, R2 = 0 => MC is absent
b. If CI </= 15 => MC is insignificant
c. If VIF = 15-25, MC is significant(Researcher’s decision zone)
d. If VIF > 25, sever MC => Remedial action is required
3. Generally VIF and CI will give similar results. In case of conflicts, CI is given priority.
4. For reducing multi collinearity, we drop the variables one by one. Varible having highest VIF is
dropped 1st and consequently others.
5. Collinearity stats: Tolerance is percentage variance and reciprocal of VIF | we look for
dimensional CI and then start removing variables with highest VIF

May 21st – EOD submission for project

Successive residulas are independent aong themselves – if this condition is violated, it becomes atuo
correlation/serial corr.

Most common form of detection is durbin-watson test (correlation coefficient b/w 2 successive term

4) Correlation coefficient(row) lies b/w -1 to 1, D can be b/w 0 to 4

H0: AC is insignificant(R2=0)

H1: AC is significant(R2>0)

 D = 4, row = -1, perfect negative auto correlation


 D = 2, row = 0, auto correlation absent
 D = 4, row = 1, perfect positive auto correlation
 D = 1.5-0 & 0-2.85, auto correlation insignificant(positive & negative)
 D = 0-1.5, significant positive auto correlation
 D = 2.85-4, significant negative auto correlation

Remedial action: In case AC is present, we apply Generalized least squares/weighted least squares
method for estimation in place of OLS.

3) Variance of residuals = constant(homoscedasticity)

Homoscedasticity, or homogeneity of variances, is an assumption of equal or similar variances in


different groups being compared. This is an important assumption of parametric statistical tests
because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and
skewed test results.

If variance is not constant, there will be a high degree of loss for predictive probability/forecast

Heteroscediscity is the biggest deterrent as causing high variance of observations.

(i) BPG(Brusch Pagan Godfrey)


(ii) Goldfield quandt test
(iii) White test – best, assumption free test based on chi-square test

Parametric statistics are based on assumptions about the distribution of population from which the
sample was taken. Nonparametric statistics are not based on assumptions, that is, the data can be
collected from a sample that does not follow a specific distribution.

Not on SPSS, but on e-views and R | Instead, not for inferential, but we can have a diagrammatic test
plotting scatter b/w standard predicted value and standardized residual value. (There should be an
evident pattern) | If no pattern, the data is homoelascedistic.(pattern is hetero)

Remedial actions: we apply Generalized least squares/weighted least squares method for estimation in
place of OLS.

2) Mean of residuals = 0 | Sum of deviations about mean is always 0

1) Residuals are normally distributed | If residuals are not normally distributed, this indicates mis-
specification error. Highlights presence of linearity

Analyze>Regression>Linear> Plots + Stats + Save

To check for normality diagrammatically, histogram or NPP can be selected

Purchase intention case study:


1. Ran regression and checked for normality and linearity – SPSS
R2 = 43.6% - only able to explain 45% selling behavior
Durbin = 1.9, very close to 2

Using ANOVA model significant, null hypo rejected – F and p-values

Looking at std. beta coefficients – “highest like to travel” > 0.5

Unstd. Is used to build predictive models only

This is a fit case of MC data, all insignificant variables independently

2nd approach to eliminate multi-collinearity is factor analysis

It eliminates MC

It reduces the number of independent variables for making it more logical

Factors are latent constructs which are not direct but derived
Factor analysis assumptions:
1. It is used for data reduction

2. It helps to eliminate multi-collinearity

Pre-requisites are:

1. The variables should be on the metric scale – interval or ratio scale


2. The variables should have significant correlation b/w themselves
H0: Corelation is insignificant(R2=0)
H1: Corelation is significant(R2>0)
We use barlett’s test for sphericity(X 2) – it is a chi-square test
3. The sample size should be sufficiently large/adequate – tested by Kaiser Mayer Orkin
(KMO)value > 0.5. It is called adequacy test

Assumption 1 & 2 are necessary, while 3 is sufficient condition(as long as 5 times observation of the
variables)

differences b/w FA and regression

1. Lambdas are called factor loadings vs regression co-efficients


2. Regression had a fixed effect called alpha not in FA
3. No regression U in FA

The factor derives its name from dominating factor. Mostly dominating factors have a high correlation
amongst themselves. It is also called segmentation of variables.

Since the study was conducted on a 9 point likert scale, therefore 1 st assumption is fulfilled

H0: Corelation is insignificant

H1: Corelation is significant

X2 = 7280.059, p = 0, Alpha = 0.05  Since p<alpha, null is rejected

KMO = 0.750 which is > 0.5, therefore 2 nd condition satisfied

Correlation is the determination of R which has to be looked at.

Factor Analysis:

1. Is an interdependence technique
2. Used to reduce the no. of variables(observed)
3. Scores/data of latent variables/constructs is generated
4. Helps to remove multi-colinearity

F = ƛ1X1 + ƛ2X2 + …… + ƛnXn

Theoretically, no of factors = no of variables

Conditions:

1. Variables should be measured on metric scale(interval or ratio)


2. Variables should be significantly correlated, R>0
3. Sample should be adequate

Exploratory(Purchase intention) vs confirmatory(Dell): Basically knowing whether factors and groupings


are know or not

Steps:

1. Test assumptions
2. Identify no. of factors to be extracted
a. Exploratory: no. of factors to be extracted is not known
i. Eigen value > 1 approach
ii. Scree plot approach: diagrammatic approach plotting the Eigen values vis a vis
the factors on x-axis. We try to find the elbow in the graph and the
corresponding factors are extracted. This is an indicative, subjective approach. It
generally gives 1 value more than the Eigen approach.
b. Confirmatory: no. of factors to be extracted are known
3. Method of extraction: If lines intersect at right angles, there is no correlation(Principal
component analysis), Variables/Factors are extracted at 90(orthogonal extraction)
a. PCA – Factors are extracted at 90 degrees orthogonally
b. Principal Access Factoring: Factors are extracted at any other angle except 90. MC will
still remain.
4. Redistribution of variability or rotation:
a. Varimax – variance maximizing in each of the factor to have optimal variance across all
factors. Orthogonal at 90 degrees
b. Quartimax – at 45 degrees
c. Equamax – at 60 degrees
d. Promax –
e. (There should be atleast more than 15 iterations to allow for variability) | Purpose is to
stabilize and converge.

Analyze > Dimesion reduction>Factor

Descriptives>KMO & Coefficients | Extraction>PCA with Scree plot & exploratory or confirmatory choice

H0: Corelation is insignificant

H1: Corelation is significant

X2 = 7280.059, p = 0, Alpha = 0.05  Since p<alpha, null is rejected. Hence 2 nd assumption is fulfilled

Since the study was conducted on a 9 point likert scale, therefore 1 st assumption is fulfilled

KMO = 0.750 which is > 0.5, therefore 3 rd condition satisfied

Correlation is the determination of R which has to be looked at.

P.S: Sum of eigen values will be equal to no of variables

Higher the commonality, important is the variable.


Higher the factor loading, important is the variable to the given factor

P.S: If the sample correlation, r = 0.43, the variable will be insignificant

Factor loadings will lie b/w -1 to +1

5. Name the factors on the basis of dominating variables


6. Calculate the factor scores(all individual scores of each factors)

Purchase intention:

a. 1st factor having 6 variables(monetary) – F1 – Financial behaviour


b. 2nd factor having 4 variables(style) – F2 – Style behaviour
c. 3rd factor having 3 variables(Future/optimism) – F3 – Future prediction
d. 4th factor having 3 variables(Confidence) – F4 – Confidence
e. 5th factor having 3 variables(travel) – F5 – Travel
f. 6th factor having 2 variables(home) – F6 – Family oriented

>Scores>Save as variables + regression

Then build regression model using factors extracted

Colinearity dignostics, residual and pred, stand and unstand coefficients

Durbin is insignificant

Auto corekation is positive and corelated

Heteroscedacity is present

All are positively influcing the condition to buy

You might also like