Professional Documents
Culture Documents
1. Problem Statement
a. Exploration <> Objective / Research questions
b. Methods of Validation / Basic Research
c. Questionnaire formulation
d. Sampling plan
e. Methods of data collection
Above are all steps in Desk research/planning
2. Field visit
3. Editing and cleaning of data
4. Data analysis & interpretation > Lead to solutions for step (a)
Research Design: Blueprint for planning phase of research. It is always made in such a fashion that it
answers the 6 W’s(What, Why, How, Who, When, Where)
U/Disturbances/Errors are called extraneous variables – minimized than max in causal vs descriptive
January 14th
FG’s to be homogenous
January 23rd:
Projective Techniques – Word association (top of the mind recall), Sentence completion, thematic
aperception, role play
Dupont Discussion:
Scale and scaling design – Causal design and DuPont case discussion
February 4th:
Extraneous errors – are those not interested to study, but only for finding dependent and uncontrolled
variables – to minimize the influences of all other disturbances/errors
Static Design – Experimental group vs control group. Division is not based on any scientific principles but
arbitrary decision of researcher
Read Scales and Scaling technique > Psychographic variables > Comparative(ordinal) and Non-
comparitive (interval)
Paired comparison: if row is preferred over column, we give 1 else 0. Diagonal elements are ignored (no
comparison b/w same). Each triangle should be identical and then the rows are to be summed up for
max weightage of any particular parameter.
Exploration > Objectives to be ascertained > Questionnaire to be made(should start with the purpose of
survey) > Samling plan(size, target and method of selecting sample) > Method of collecting the data >
analysis if tool (hypo testing)
April 2nd: Cosmopolitan survey on sexual behavior of US women
Q4.
Regression equation =
Ei = random error
Multiple regression or multi-variate regression: Assuming, variables are linear in nature and
independent X’s.
X1, X2, X3 = based on exploratory research(if correct variables are surveyed, U would be less impactful)
Y = α + B1 + B2 + B3 + U (σ 2)
H1: Sales does not follow normal distribution, but sales is measured on ratio scale hence using
Identification of multi-collinearity:
1. Variance Inflating Factor = 1/(1- R2) | VIF will live b/w 0-1
a. If VIF = 1, R2 = 0 => MC is absent
b. If VIF = 1-6 => MC is insignificant
c. If VIF = 6-10, MC is significant(Researcher’s decision zone)
d. If VIF > 10, sever MC => Remedial action is required
2. Condition indices: If CI = 1
a. If CI = 1, R2 = 0 => MC is absent
b. If CI </= 15 => MC is insignificant
c. If VIF = 15-25, MC is significant(Researcher’s decision zone)
d. If VIF > 25, sever MC => Remedial action is required
3. Generally VIF and CI will give similar results. In case of conflicts, CI is given priority.
4. For reducing multi collinearity, we drop the variables one by one. Varible having highest VIF is
dropped 1st and consequently others.
5. Collinearity stats: Tolerance is percentage variance and reciprocal of VIF | we look for
dimensional CI and then start removing variables with highest VIF
Successive residulas are independent aong themselves – if this condition is violated, it becomes atuo
correlation/serial corr.
Most common form of detection is durbin-watson test (correlation coefficient b/w 2 successive term
H0: AC is insignificant(R2=0)
H1: AC is significant(R2>0)
Remedial action: In case AC is present, we apply Generalized least squares/weighted least squares
method for estimation in place of OLS.
If variance is not constant, there will be a high degree of loss for predictive probability/forecast
Parametric statistics are based on assumptions about the distribution of population from which the
sample was taken. Nonparametric statistics are not based on assumptions, that is, the data can be
collected from a sample that does not follow a specific distribution.
Not on SPSS, but on e-views and R | Instead, not for inferential, but we can have a diagrammatic test
plotting scatter b/w standard predicted value and standardized residual value. (There should be an
evident pattern) | If no pattern, the data is homoelascedistic.(pattern is hetero)
Remedial actions: we apply Generalized least squares/weighted least squares method for estimation in
place of OLS.
1) Residuals are normally distributed | If residuals are not normally distributed, this indicates mis-
specification error. Highlights presence of linearity
It eliminates MC
Factors are latent constructs which are not direct but derived
Factor analysis assumptions:
1. It is used for data reduction
Pre-requisites are:
Assumption 1 & 2 are necessary, while 3 is sufficient condition(as long as 5 times observation of the
variables)
The factor derives its name from dominating factor. Mostly dominating factors have a high correlation
amongst themselves. It is also called segmentation of variables.
Since the study was conducted on a 9 point likert scale, therefore 1 st assumption is fulfilled
Factor Analysis:
1. Is an interdependence technique
2. Used to reduce the no. of variables(observed)
3. Scores/data of latent variables/constructs is generated
4. Helps to remove multi-colinearity
Conditions:
Steps:
1. Test assumptions
2. Identify no. of factors to be extracted
a. Exploratory: no. of factors to be extracted is not known
i. Eigen value > 1 approach
ii. Scree plot approach: diagrammatic approach plotting the Eigen values vis a vis
the factors on x-axis. We try to find the elbow in the graph and the
corresponding factors are extracted. This is an indicative, subjective approach. It
generally gives 1 value more than the Eigen approach.
b. Confirmatory: no. of factors to be extracted are known
3. Method of extraction: If lines intersect at right angles, there is no correlation(Principal
component analysis), Variables/Factors are extracted at 90(orthogonal extraction)
a. PCA – Factors are extracted at 90 degrees orthogonally
b. Principal Access Factoring: Factors are extracted at any other angle except 90. MC will
still remain.
4. Redistribution of variability or rotation:
a. Varimax – variance maximizing in each of the factor to have optimal variance across all
factors. Orthogonal at 90 degrees
b. Quartimax – at 45 degrees
c. Equamax – at 60 degrees
d. Promax –
e. (There should be atleast more than 15 iterations to allow for variability) | Purpose is to
stabilize and converge.
Descriptives>KMO & Coefficients | Extraction>PCA with Scree plot & exploratory or confirmatory choice
X2 = 7280.059, p = 0, Alpha = 0.05 Since p<alpha, null is rejected. Hence 2 nd assumption is fulfilled
Since the study was conducted on a 9 point likert scale, therefore 1 st assumption is fulfilled
Purchase intention:
Durbin is insignificant
Heteroscedacity is present