You are on page 1of 15

1

UNIT-IV
Data Preparation – editing – Coding –Data entry – Validity of data – Qualitative Vs
Quantitative data analyses – Bivariate and Multivariate statistical techniques – Factor
analysis – Discriminant analysis – cluster analysis – multiple regression and correlation
– multidimensional scaling – Conjoint Analysis - Application of statistical software for
data analysis.
1. Data Preparation – editing – Coding –Data entry – Validity of data
Data analysis: After the collection of data from primary or secondary sources, arrangement is
done so that the same may be analyzed & interpreted with the help of statistical tools like
correlation, regression, ANOVA and Structural Equation Modeling (SEM).

Data Processing: Data processing refers to the process of converting data from one format to
another. It transforms plain data into valuable information and information into data.
Processing/Steps of data processing:
1. Validity of data
2. Data editing and coding
3. Data classification
4. Data entry
5. Data tabulation.

Figure: Steps of data processing

Figure: Data Processing in Quantitative research analysis


2

Software packages used for data analysis: Data editing


1) MS Excel
Data editing: The first step in processing of data is
2) SPSS (Software Packages for Social Sciences)
editing of complete schedules/questionnaires. Editing is
3) R software
a process of checking to detect and or correct errors and
4) LISREL (Linear Structural relations)
omissions. Editing is done at two stages: first at the
5) SMART PLS
fieldwork stage and second at office.
6) AMOS (Analysis of Moment structures)
7) SAS Types of data editing

1) Field editing: During the stress of interviewing the


Data coding interviewer cannot always record responses completely
and legibly. Therefore after each interview is over, he
Data coding: Coding means assigning numerals or other should review the schedule to complete abbreviated
symbols to the categories or responses. responses, rewrite illegible responses and correct
Steps in coding quantitative data: omissions. After the data collection, data editing will
happen.
i. Developing a code book
ii. Pretesting code book 2) Office or central editing: All completed
iii. Coding the data schedules/questionnaires should be thoroughly
iv. Verifying the data checked in the office for Completeness, accuracy and.
Coding Example Uniformity.

1) Gender: a) Male b) Female Benefits of data editing

Coding: Q1) 1) Male 2) female  The data obtained is complete in all respects.
 It is accurate in terms of information
2) Educational qualification a) Ug b) PG c) Others
recorded and responses sought.
Coding Q2) 1) UG 2) PG 3) Others  The response format is in the form that was
instructed.
 The data is structured in a manner that
entering the information will not be a
Classification of data problem
Data classification: Classification of the data implies  Checking the contents for completeness
that the collected raw data is categorized into  Checking the response for internal
common group having common feature. consistency
Editing is useful to the researcher for the listed cases
Data having common characteristics are placed in a below
common group.
 ❖ Forget to ask questions.
The entire data collected is categorized into various  ❖ Forget to record a response.
groups or classes, which convey a meaning to the
 ❖ wrongly classify a response.
researcher.
 ❖ Write only half a response.
Classification is done in two ways:  ❖ Write illegibly.
Various ways of data editing
1. Classification according to attribute: Gender,
Marital status…etc 1) By inference: Certain questions in a research
instrument may be related to one another based on this
2. Classification according to the class intervals:
researcher can edit, sometimes in this method error also
Production, weight, height..etc
may occur during data analysis. Careful editing is
important.

2) By recalling respondents answer 3) By going back to


the respondents
3

Tabulation of data Graphing of data

Tabulation is an orderly arrangement of data in rows  Visual representation of


and columns. data
 Data are presented as
Tabulation summarizes the raw data and displays data
absolute numbers or
in form of some statistical tables.
percentages
Example: Row and column vise arrangement of Example: Bar graph, Pie graph,
data…data view in SPSS, one can perform tabulation Line graph, Histogram
using SPSS

2. Data analysis

Data analysis is a process of gathering, modeling and transforming data with the goal
of highlighting useful information, suggesting conclusions and supporting decision-
making.
Analysis means computation of certain indices or measures along with searching for patterns of
relationships that exists among the data groups.
“Data analysis helps the researcher to identify the relationship among variables”

Figure: Data analysis techniques


4

Descriptive analysis and causal analysis Inferential Analysis


Descriptive analysis: The study of distribution of Inferential analysis is concerned with
variables is termed as a descriptive analysis. the testing the hypothesis and
Causal Analysis: Causal analysis is concerned with the estimating the population values
study of how one or more variables affect changes in based on the sample values.
another variables PARAMETRIC TESTS
.
UNIVARIATE ANALYSIS -One variable analyzed at a time. These tests depends upon assumptions
Univariate analysis are typically that the population(s) from
1. Frequency distribution which data are randomly sampled have
2. Measures of central tendency a normal distribution. Types of
Mean: calculates average value of a data, parametric tests are:
Median: Calculates middle value of data, 1. t- test
Mode: Point of maximum frequency, 2. z- test
3. Measures of dispersion 3. F- test
 Range 4. χ2- test
 Mean deviation When to use parametric test
 Standard deviation
 Confidence interval for a
BIVARIATE ANALYSIS -Two variable analyzed at a time. population mean, with
Various BIVARIATE analysis are known standard deviation.
1. Simple Correlation  Confidence interval for a
2. Simple Regression population mean, with
3. Two-Way ANOVA unknown standard
MULTIVARIATE ANALYSIS -More than two variables
deviation.
 Confidence interval for a
analyzed at a time. Various Multivariate analysis are
population variance.
1. Multiple Correlation
 Confidence interval for the
2. Multiple Regression
difference of two means,
3. Multi- ANOVA with unknown standard
CLASSIFICATION OF MULTIVARIATE METHODS
deviation.

Non parametric Test


 Do Not Involve Population
Parameters Example:
Probability Distributions,
 Independence Data
Measured on Any Scale
(Ratio or Interval, Ordinal or
Nominal)
Rank sum test. Kolmogorov-
Smirnov – test for goodness of fit,
comparing two populations.
Mann – Whitney U test and
Kruskal Wallis test. One sample
run test, rank correlation.
5

3. Factor analysis

“Factor analysis is a part of General Linear Model (GLM) , it is a technique that is used
to reduce a large number of variables into fewer numbers of factors”.
For Example to measure IQ of the students’ researcher used 9 questions/items,
after factor analysis researcher gets 7 items/question to measure customer
satisfaction. Factor analysis removed Question 5 and Question 9 to measure IQ of
the students.

Figure: Factor analysis basic idea

Key concepts and terms:


Exploratory factor analysis: Assumes that any indicator or variable may be
associated with any factor. This is the most common factor analysis used by
researchers and it is not based on any prior theory.
Confirmatory factor analysis (CFA): Used to determine the factor and factor
loading of measured variables, and to confirm what is expected on the basic or
pre-established theory. CFA assumes that each factor is associated with a specified
subset of measured variables. It commonly uses two approaches:
6

Types of factoring:
There are different types of methods used
Assumption of factor analysis
to extract the factor from the data set:
 There is linear relationship 1. Principal component analysis: This
 There is no multicollinearity is the most common method used by
It includes relevant variables into researchers. PCA starts extracting the
analysis maximum variance and puts them into
 There is true correlation between the first factor. After that, it removes
variables and factors. that variance explained by the first
 Several methods are available, but factors and then starts extracting
principle component analysis is used maximum variance for the second
most commonly. factor. This process goes to the last
 No outlier: Assume that there are no outliers factor.
in data. 2. Common factor analysis: The second
 Adequate sample size: The case must be most preferred method by researchers, it
greater than the factor. extracts the common variance and
 No perfect multicollinearity: Factor puts them into factors. This method
analysis is an interdependency does not include the unique variance of
technique. There should not be perfect all variables. This method is used in
multicollinearity between the variables. SEM.
 Homoscedasticity: Since factor analysis is a 3. Image factoring: This method is
linear function of measured variables, it does based on correlation matrix. OLS
not require homoscedasticity between the Regression method is used to predict the
variables. factor in image factoring.
 Linearity: Factor analysis is also based on 4. Maximum likelihood method: This
linearity assumption. Non-linear variables method also works on correlation
can also be used. After transfer, however, it metric but it uses maximum likelihood
changes into linear variable. method to factor.
 Interval Data: Interval data are assumed. 5. Other methods of factor
analysis: Alfa factoring outweighs
least squares. Weight square is another
regression based method which is used
for factoring.

Values to consider factor analysis

Factor loading: It should be greater than 0.07


Eigen values: Shoud be greater than 1

Rotation method in Factor analysis: (1) No rotation method, (2) Varimax rotation method, (3)
Quartimax rotation method, (4) Direct oblimin rotation method, and (5) Promax rotation method
STPES TO DO FACTOR ANALYSIS USING SPSS
AnalyzeData reduction -> Factor->Move variables-> Click Extraction->Select principal
component->Eigen value type 1->maximum iterations for convergence type items-Click
varimax
7

4. Discriminant analysis
Discriminant analysis: Discriminant analysis is a statistical method that is used by
researchers to help them understand the relationship between a "dependent
variable" and one or more “independent variables”.
.
"A dependent variable is the variable that a researcher is trying to explain or predict
from the values of the independent variables.

Discriminant analysis is similar to regression analysis and analysis of variance


(ANOVA). The principal difference between discriminant analysis and the other
two methods is with regard to the nature of the dependent variable. In
discriminant analysis, the dependent variable must be a "categorical variable."

When to use discriminant analysis

Discriminant analysis is most often used to help a researcher predict the group or category
to which a subject belongs. For example, when individuals are interviewed for a job,
managers will not know for sure how job candidates will perform on the job if hired.
Suppose, however, that a human resource manager has a list of current employees who
have been classified into two groups: "high performers" and "low performers."
These individuals have been working for the company for some time, have been
evaluated by their supervisors, and are known to fall into one of these two mutually
exclusive categories.
The manager also has information on the employees' backgrounds: educational
attainment, prior work experience, participation in training programs, work attitude
measures, personality characteristics, and so forth. This information was known at the
time these employees were hired.
The manager wants to be able to predict, with some confidence, which future job
candidates are high performers and which are not. A researcher or consultant can use
discriminant analysis, along with existing data, to help in this task.

Steps to perform discriminant analysis


STEPS TO PERFORM DISCRIMINANT
ANALYSIS IN SPSS

1) Analyze-

2) Classify-

3) Discriminant-

4) Define grouping and independent


variables-

5) Define range where minimum is 0


and maiximum is 1

6) Click OK
8

5. Cluster analysis

Cluster analysis: is a class of techniques that are used to classify objects or


cases into relative groups called clusters. Cluster analysis is also called
classification analysis or numerical taxonomy. In cluster analysis, there
is no prior information about the group or cluster membership.

6. Correlation and Multiple regression


Correlation: Correlation is linear association between two random
variables.
Example: Relationship between height of the father and height of the kids
Relationship between sales and employee salary
Correlation analysis shows the nature and strength of relationship between
two variables. When variables are dependent correlation is applied.
Correlation(r) lies between -1 to +1.
No correlation: A zero correlation indicates that there is no relationship between the variables
Negative correlation: A correlation of +1 indicates a perfect positive correlation
Positive correlation: A correlation of –1 indicates a perfect negative correlation

 1 is a perfect positive correlation


 0 is no correlation (the values don't seem linked at all)
 -1 is a perfect negative correlation

Positive and negative correlation

 Correlation is Positive when the values increase together, and


 Correlation is Negative when one value decreases as the other
increases
9

Methods of studying correlation


Correlation strength and Association
1) On the Basis of Direction: This category
Correlation strength includes the following types:
i) Positive Correlation: When the
values of two variables move in the same
direction
ii) Negative Correlation: The values
of two variables move in opposite directions.

2) On the Basis of Ratio of Change


Direction: On this basis the correlation is
Correlation Association
categorized as:
i) Linear Correlation: When plotted on a
graph it tends to be a perfect line
ii) Non-linear (Curvilinear) Correlation:
When plotted on a graph it is not a straight line

3) On the Basis of the Number of Variables:


This category includes:
i) Simple Correlation: In simple correlation
we study only two variables: say price and
demand.
ii) Multiple Correlations: In multiple correlations we study together the relationship
between three or more factors like production, rainfall and use of fertilizers.

iii) Partial Correlation: In partial correlation though more than two factors are
involved but correlation is studied only between two factors and the other factors are
assumed to be constant
Methods of Studying Linear Correlation
1) Scatter Diagram: Scatter diagram is a special type of dot chart.
2) Karl Pearson's Coefficient of Correlation: Karl Pearson, a great biometrician
and statistician, suggested a mathematical method for measuring the magnitude
of linear relationship between two variables. Karl Pearson's method is the most
widely used method in practice and is known as Pearson Coefficient of
Correlation. It is denoted by the symbol 'r';

r=
 xy  
where x = x – x , y = y- y
 x2  y2
3) Spearman's Rank Correlation: When the variables under consideration are not
capable of quantitative measurement but can be arranged in serial order (ranks),
we find correlation between the ranks of two series. This happens when we deal
with qualitative characteristics such as honesty, beauty, etc. This method is called
10

Spearman's Rank Difference Method or Ranking Method and the correlation


coefficient so obtained is called Rank Correlation Coefficient and is denoted by
r. This method was developed by Charles Edward Spearman, a British
Psychologist in 1904.
7. Regression analysis
Regression analysis is a set of statistical processes for estimating the relationships
among dependent (criterion) and Independent (Predictors) variables.
Regression analysis explains the changes in criterions in relation to changes in select
predictors.
Three major uses for regression analysis are determining the strength of predictors,
forecasting an effect, and trend forecasting.

Types of regression
1) Simple linear regression: One
 Linear regression: relationship dependent variable and one
between the criterion or the scalar independent variable
response and the multiple predictors
or explanatory variables. Example: Relationship between sales
 Logistic regression: is used when on Salesperson salary
the dependent variable is 2) Multiple linear regression: One
dichotomous dependent variable and Multiple
 Polynomial regression: is used Independent Variable.
for curvilinear data. Polynomial Example: relationship between sales
regression is fit with the method of on sales person salary, showroom
least squares.
rent.
 Stepwise regression: is used for
fitting regression models with 3) Structural Equation Modeling:
predictive models. It is carried out Multiple independent variable and
automatically. With each step, the Multiple dependent variables.
variable is added or subtracted from
the set of explanatory variables.
 Ridge regression: is a technique STEPS IN LINEAR REGRESSION
for analyzing multiple regression data.
1. State the hypothesis.
 Ridge regression: is a regression
analysis method that performs both 2. State the null hypothesis
variable selection and regularization.
 Lasso regression: is a regression 3. Gather the data.
analysis method that performs both
variable selection and regularization 4. Compute the regression
 ElasticNet regression: is a equation
regularized regression method that
linearly combines the penalties of the 5. Examine tests of statistical
lasso and ridge methods.
significant and measures of
association
6. Relate statistical findings to the
hypothesis. Accept or reject
hypothesis.
11

ASSUMPTIONS OF LINEAR REGRESSION


1. Both the independent (X) and the dependent (Y) variables are measured at the
interval or ratio level.
2. The relationship between the independent (X) and the dependent (Y) variables is
linear.
3. Errors in prediction of the value of Y are distributed in a way that approaches the
normal curve.
4. Errors in prediction of the value of Y are all independent of one another.
5. The distribution of the errors in prediction of the value of Y is constant regardless of
the value of X
Some of the uses of the regression analysis are given below:
(i) Regression Analysis helps in establishing a functional relationship between two or
more variables.
ii) With the use of electronic machines and computers, the medium of calculation of
regression equation particularly expressing multiple and non-linear relations has been
reduced considerably.
(iii) Since most of the problems of economic analysis are based on cause and effect
relationship, the regression analysis is a highly valuable tool in economic and business
research.
(iv) The regression analysis is very useful for prediction purposes. Once a functional
relationship is established the value of the dependent variable can be estimated from
the given value of the independent variables.
12

8. Multi-dimensional scaling
Multidimensional scaling is a visual representation of distances or dissimilarities between
sets of objects. “

Figure: Multi-dimensional scaling perceptual map

9. Conjoint analysis
Conjoint analysis is one of the most popular tools used for market research
purposes. It is an advanced exploratory technique used to determine how people
make decisions and on what factors do they place real value in various products
and services. It has been widely employed for product/services analysis
purposes since 1970s.

Figure: Conjoint analysis Example


13

Qualitative data analysis


Qualitative data analysis : Qualitative data analysis is a systematic process of
searching, arranging and organizing the interview transcripts, observation notes, or other non-
textual materials that the investigator gathers to increase the understanding of the
phenomenon – Wong 2008

Quantitative Data Analysis Techniques:


Quantitative data arise when numbers arises as a result of the process of
measurement. When one measures something, a value is selected from a scale of values
that corresponds to the observation made of some object or situation, or with a
response to a question addressed to an individual.

Basis of Difference Qualitative data analysis Quantitative data analysis

1) Interpretation It relies on interpretation and logic. This analysis relies on statistics. Quan-
Qualitative researchers present titative researchers use graphs and
their analyses using text and tables to present their analysis.
arguments.
14

2) Procedures and Qualitative analysis has no set rules, Quantitative analysis follows agreed
Rules but rather guidelines are there to upon standardised procedures and rules.
support the analysis.

3) Occurrence Qualitative analysis occurs Quantitative analysis occurs only after


simultaneously with data data collection is finished.
collection.

4) Methodology Qualitative analysis may vary Methods of quantitative analysis are


methods depending on the determined in advance as part of the
situations. study design.

5) Reliability Qualitative research is validity, but Their reliability is easy to establish and
is less reliable (or consistent). They that they generally involve sophisticated
have a corresponding weakness in comparisons of variables in different
their ability to compare variables in conditions.
different conditions.

6) Questions Open-ended questions and probing Specific questions obtain predetermined


yield detailed information that responses to standardized questions.
illuminates nuances and highlights
diversity.

7) Information Provides information on the More likely provides information on the


application of the program in a broad application of the program.
specific context to a specific
population.

8) Suitability More suitable when time and Relies on more extensive interviewing.
resources are limited.
15

Application of Statistical Software for Data Analysis


Statistics is the science of making effective use of numerical data relating to
groups of individuals or experiments. It deals with all aspects of this, including not only
the collection, analysis and interpretation of such data, but also the planning of the
collection of data, in terms of the design of surveys and experiments.
Some of the more common software packages used for data analysis is as follows:
1) SPSS
Software packages used for data analysis:
2) SAS 1) MS Excel
3) STATA 2) SPSS (Software Packages for Social Sciences)
3) R software
4) LISREL 4) LISREL (Linear Structural relations)
5) AMOS 5) SMART PLS
6) AMOS (Analysis of Moment structures)
6) R software
7) SAS
7) Smart PLS
8) Visual PLS

******************All the Best***************

You might also like