Professional Documents
Culture Documents
UNIT-IV
Data Preparation – editing – Coding –Data entry – Validity of data – Qualitative Vs
Quantitative data analyses – Bivariate and Multivariate statistical techniques – Factor
analysis – Discriminant analysis – cluster analysis – multiple regression and correlation
– multidimensional scaling – Conjoint Analysis - Application of statistical software for
data analysis.
1. Data Preparation – editing – Coding –Data entry – Validity of data
Data analysis: After the collection of data from primary or secondary sources, arrangement is
done so that the same may be analyzed & interpreted with the help of statistical tools like
correlation, regression, ANOVA and Structural Equation Modeling (SEM).
Data Processing: Data processing refers to the process of converting data from one format to
another. It transforms plain data into valuable information and information into data.
Processing/Steps of data processing:
1. Validity of data
2. Data editing and coding
3. Data classification
4. Data entry
5. Data tabulation.
Coding: Q1) 1) Male 2) female The data obtained is complete in all respects.
It is accurate in terms of information
2) Educational qualification a) Ug b) PG c) Others
recorded and responses sought.
Coding Q2) 1) UG 2) PG 3) Others The response format is in the form that was
instructed.
The data is structured in a manner that
entering the information will not be a
Classification of data problem
Data classification: Classification of the data implies Checking the contents for completeness
that the collected raw data is categorized into Checking the response for internal
common group having common feature. consistency
Editing is useful to the researcher for the listed cases
Data having common characteristics are placed in a below
common group.
❖ Forget to ask questions.
The entire data collected is categorized into various ❖ Forget to record a response.
groups or classes, which convey a meaning to the
❖ wrongly classify a response.
researcher.
❖ Write only half a response.
Classification is done in two ways: ❖ Write illegibly.
Various ways of data editing
1. Classification according to attribute: Gender,
Marital status…etc 1) By inference: Certain questions in a research
instrument may be related to one another based on this
2. Classification according to the class intervals:
researcher can edit, sometimes in this method error also
Production, weight, height..etc
may occur during data analysis. Careful editing is
important.
2. Data analysis
Data analysis is a process of gathering, modeling and transforming data with the goal
of highlighting useful information, suggesting conclusions and supporting decision-
making.
Analysis means computation of certain indices or measures along with searching for patterns of
relationships that exists among the data groups.
“Data analysis helps the researcher to identify the relationship among variables”
3. Factor analysis
“Factor analysis is a part of General Linear Model (GLM) , it is a technique that is used
to reduce a large number of variables into fewer numbers of factors”.
For Example to measure IQ of the students’ researcher used 9 questions/items,
after factor analysis researcher gets 7 items/question to measure customer
satisfaction. Factor analysis removed Question 5 and Question 9 to measure IQ of
the students.
Types of factoring:
There are different types of methods used
Assumption of factor analysis
to extract the factor from the data set:
There is linear relationship 1. Principal component analysis: This
There is no multicollinearity is the most common method used by
It includes relevant variables into researchers. PCA starts extracting the
analysis maximum variance and puts them into
There is true correlation between the first factor. After that, it removes
variables and factors. that variance explained by the first
Several methods are available, but factors and then starts extracting
principle component analysis is used maximum variance for the second
most commonly. factor. This process goes to the last
No outlier: Assume that there are no outliers factor.
in data. 2. Common factor analysis: The second
Adequate sample size: The case must be most preferred method by researchers, it
greater than the factor. extracts the common variance and
No perfect multicollinearity: Factor puts them into factors. This method
analysis is an interdependency does not include the unique variance of
technique. There should not be perfect all variables. This method is used in
multicollinearity between the variables. SEM.
Homoscedasticity: Since factor analysis is a 3. Image factoring: This method is
linear function of measured variables, it does based on correlation matrix. OLS
not require homoscedasticity between the Regression method is used to predict the
variables. factor in image factoring.
Linearity: Factor analysis is also based on 4. Maximum likelihood method: This
linearity assumption. Non-linear variables method also works on correlation
can also be used. After transfer, however, it metric but it uses maximum likelihood
changes into linear variable. method to factor.
Interval Data: Interval data are assumed. 5. Other methods of factor
analysis: Alfa factoring outweighs
least squares. Weight square is another
regression based method which is used
for factoring.
Rotation method in Factor analysis: (1) No rotation method, (2) Varimax rotation method, (3)
Quartimax rotation method, (4) Direct oblimin rotation method, and (5) Promax rotation method
STPES TO DO FACTOR ANALYSIS USING SPSS
AnalyzeData reduction -> Factor->Move variables-> Click Extraction->Select principal
component->Eigen value type 1->maximum iterations for convergence type items-Click
varimax
7
4. Discriminant analysis
Discriminant analysis: Discriminant analysis is a statistical method that is used by
researchers to help them understand the relationship between a "dependent
variable" and one or more “independent variables”.
.
"A dependent variable is the variable that a researcher is trying to explain or predict
from the values of the independent variables.
Discriminant analysis is most often used to help a researcher predict the group or category
to which a subject belongs. For example, when individuals are interviewed for a job,
managers will not know for sure how job candidates will perform on the job if hired.
Suppose, however, that a human resource manager has a list of current employees who
have been classified into two groups: "high performers" and "low performers."
These individuals have been working for the company for some time, have been
evaluated by their supervisors, and are known to fall into one of these two mutually
exclusive categories.
The manager also has information on the employees' backgrounds: educational
attainment, prior work experience, participation in training programs, work attitude
measures, personality characteristics, and so forth. This information was known at the
time these employees were hired.
The manager wants to be able to predict, with some confidence, which future job
candidates are high performers and which are not. A researcher or consultant can use
discriminant analysis, along with existing data, to help in this task.
1) Analyze-
2) Classify-
3) Discriminant-
6) Click OK
8
5. Cluster analysis
iii) Partial Correlation: In partial correlation though more than two factors are
involved but correlation is studied only between two factors and the other factors are
assumed to be constant
Methods of Studying Linear Correlation
1) Scatter Diagram: Scatter diagram is a special type of dot chart.
2) Karl Pearson's Coefficient of Correlation: Karl Pearson, a great biometrician
and statistician, suggested a mathematical method for measuring the magnitude
of linear relationship between two variables. Karl Pearson's method is the most
widely used method in practice and is known as Pearson Coefficient of
Correlation. It is denoted by the symbol 'r';
r=
xy
where x = x – x , y = y- y
x2 y2
3) Spearman's Rank Correlation: When the variables under consideration are not
capable of quantitative measurement but can be arranged in serial order (ranks),
we find correlation between the ranks of two series. This happens when we deal
with qualitative characteristics such as honesty, beauty, etc. This method is called
10
Types of regression
1) Simple linear regression: One
Linear regression: relationship dependent variable and one
between the criterion or the scalar independent variable
response and the multiple predictors
or explanatory variables. Example: Relationship between sales
Logistic regression: is used when on Salesperson salary
the dependent variable is 2) Multiple linear regression: One
dichotomous dependent variable and Multiple
Polynomial regression: is used Independent Variable.
for curvilinear data. Polynomial Example: relationship between sales
regression is fit with the method of on sales person salary, showroom
least squares.
rent.
Stepwise regression: is used for
fitting regression models with 3) Structural Equation Modeling:
predictive models. It is carried out Multiple independent variable and
automatically. With each step, the Multiple dependent variables.
variable is added or subtracted from
the set of explanatory variables.
Ridge regression: is a technique STEPS IN LINEAR REGRESSION
for analyzing multiple regression data.
1. State the hypothesis.
Ridge regression: is a regression
analysis method that performs both 2. State the null hypothesis
variable selection and regularization.
Lasso regression: is a regression 3. Gather the data.
analysis method that performs both
variable selection and regularization 4. Compute the regression
ElasticNet regression: is a equation
regularized regression method that
linearly combines the penalties of the 5. Examine tests of statistical
lasso and ridge methods.
significant and measures of
association
6. Relate statistical findings to the
hypothesis. Accept or reject
hypothesis.
11
8. Multi-dimensional scaling
Multidimensional scaling is a visual representation of distances or dissimilarities between
sets of objects. “
9. Conjoint analysis
Conjoint analysis is one of the most popular tools used for market research
purposes. It is an advanced exploratory technique used to determine how people
make decisions and on what factors do they place real value in various products
and services. It has been widely employed for product/services analysis
purposes since 1970s.
1) Interpretation It relies on interpretation and logic. This analysis relies on statistics. Quan-
Qualitative researchers present titative researchers use graphs and
their analyses using text and tables to present their analysis.
arguments.
14
2) Procedures and Qualitative analysis has no set rules, Quantitative analysis follows agreed
Rules but rather guidelines are there to upon standardised procedures and rules.
support the analysis.
5) Reliability Qualitative research is validity, but Their reliability is easy to establish and
is less reliable (or consistent). They that they generally involve sophisticated
have a corresponding weakness in comparisons of variables in different
their ability to compare variables in conditions.
different conditions.
8) Suitability More suitable when time and Relies on more extensive interviewing.
resources are limited.
15