the activity that ensures the accuracy of the data and their

conversion from raw form to reduced and classified forms that

are more appropriate for analysis. Preparing a descriptive

statistical summary is another preliminary step leading to an

understanding of the collected data.

Editing, Coding, Data Entry: Editing detects errors and

omissions, corrects them when possible, and certifies that

maximum data quality standards are achieved. Types of Editing

Field Editing and Central Editing. Coding involves assigning

numbers or other symbols to answers so that the responses can

be grouped into a limited number of categories. In coding,

categories are the partitions of a data set of a given variable

(e.g., if the variable is gender, the partitions are male and

female). Categorization is the process of using rules to partition

a body of data. Both closed- and open-response questions must

be coded. A codebook, or coding scheme, contains each variable

in the study and specifies the application of coding rules to the

variable. It is used by the researcher or research staff to promote

more accurate and more efficient data entry or data analysis. It is

also the definitive source for locating the positions of variables in

the data file during analysis. Coding rules - Four rules guide the

precoding and postcoding and categorization of a data set. The

categories within a single variable should be: Appropriate to the

research problem and purpose. Exhaustive. Mutually exclusive.

Derived from one classification dimension. Content analysis

follows a systematic process for coding and drawing inferences

from texts. It starts by determining which units of data will be

analyzed. Content Analysis Types: 1) Syntactical units can

be words, phrases, sentences, or paragraphs; words are the

smallest and most reliable data units to analyze; 2) Referential

units are described by words, phrases, and sentences; they may

be objects, events, persons, and so forth, to which a verbal or

textual expression refers; 3) Propositional units are assertions

about an object, event, person, and so on; 4) Thematic units are

topics contained within (and across) texts; they represent higherlevel abstractions inferred from the text and its context. Missing

data are information from a participant or case that is not

available for one or more variables of interest. In survey studies,

missing data typically occur when participants accidentally skip,

refuse to answer, or do not know the answer to an item on the

questionnaire. Data entry converts information gathered by

secondary or primary methods to a medium for viewing and

manipulation. Keyboarding remains a mainstay for researchers

who need to create a data file immediately and store it in a

minimal space on a variety of media.

Validity of data: In general, validity is an indication of how sound

your research is. More specifically, validity applies to both the

design and the methods of your research. Validity in data

collection means that your findings truly represent the

phenomenon you are claiming to measure. Valid claims are solid

claims.

Qualitative Vs Quantitative data analyses: Read Exhibit 7-2.

Bivariate and Multivariate statistical techniques: Bivariate

studies are different from univariate studies because it allows

the researcher to analyze the relationship between two variables

(often denoted as X, Y) ins order to test simple hypotheses of

association and causality. For example, if you wanted to know

whether there is a relationship between the number of students

in an engineering classroom (independent variable) and their

grades in that subject (dependent variable), you would use

bivariate analysis since it measures two elements based on the

observation of data. Four steps to conducting bivariate analysis:

1) Define the nature of the relationship; 2) Identify the type and

direction of the relationship; 3) Determine if the relationship is

statistically significant; 4) Identify the strength of the relationship.

Multivariate studies are similar to bivariate studies, but

multivariate studies have more than one dependent variable. For

example, if an advertiser wanted to examine the effectiveness of

three different banner ads on a popular website, the advertiser

could measure the ads click rate for both men and women.

Researchers could then use multivariate statistical analysis to

examine the relationships between all of the variables.

Multivariate analytical techniques represent a variety of

mathematical models used to measure and quantify outcomes,

taking into account important factors that can influence this

relationship. The most popular is multiple regression analysis

which helps one understand how the typical value of the

dependent variable changes when any one of the independent

variables is varied, while the other independent variables are

held fixed. Other techniques include factor analysis, path analysis

and multiple analyses of variance (MANOVA).

Factor analysis: It is a statistical tool that measures the impact of

a few un-observed variables called factors on a large number of

observed variables. It is used as a data reduction method. It may

relationship between variables or to confirm a hypothesis. It is

often used to determine a linear relationship between variables

before subjecting them to further analysis. Principal Factor

Analysis is also called Common Factor Analysis and it aims to

identify the minimum number of factors that can lead to the

correlation between a given set of variables. Other types of Factor

Analysis include Image factoring, Alpha factoring, Principal

Component Analysis and so on.

Discriminant analysis: It is a statistical tool with an objective to

assess the adequacy of a classification, given the group

memberships; or to assign objects to one group among a number

of groups. For any kind of Discriminant Analysis, some group

assignments should be known beforehand. Discriminant Analysis

is quite close to being a graphical version of MANOVA and often

used to complement the findings of Cluster Analysis and Principal

Components Analysis. When Discriminant Analysis is used to

separate two groups, it is called Discriminant Function Analysis

(DFA); while when there are more than two groups the

Canonical Varieties Analysis (CVA) method is used. Discriminant

Analysis has various benefits as a statistical tool and is quite

similar to regression analysis. It can be used to determine which

predictor variables are related to the dependent variable and to

predict the value of the dependent variable given certain values

of the predictor variables. Discriminant Analysis is also

widely used to create Perceptual Mapping by marketers and has

some benefits over other methods that use perceived distances;

like the option of using tests of significance to check for

dissimilarities among products and that the distances between

two products would not be impacted by other products included

in the study. Discriminant Analysis is often used in combination

with cluster analysis. Say, the loans department of a bank wants

to find out the creditworthiness of applicants before disbursing

loans. It may use Discriminant Analysis to find out whether an

applicant is a good credit risk or not

cluster analysis: It is a statistical tool used to classify objects into

groups, such that the objects belonging to one group are much

more similar to each other and rather different from objects

belonging to other groups. It is generally used for exploratory

data analysis and serves as a method of discovery by solving

classification issues. 1) Hierarchical cluster analysis methods

- Agglomerative methods in this, all objects start in separate

clusters till slowly similar objects are combined and this process

is repeated till all objects are in a single cluster. Finally, the

optimum number of clusters is chosen from among all options.

Divisive methods in this, all objects start in the same cluster

and the reverse of the agglomerative method is used. 2) Nonhierarchical Cluster Analysis method (also known as k-means

clustering methods): These are generally used when large data

sets are involved. Further, these provide the flexibility of moving

a subject from one cluster to another. The main benefit of Cluster

Analysis is that it allows us to group similar data together. This

helps us identify patterns between data elements. It reveals

associations between data objects and helps to outline structure

which might not have been apparent previously but gives much

sense and meaning to the data when discovered. Once a clear

structure emerges, it allows easier decision making.

multiple regression and correlation: Multiple regression is also

known as logistic regression - Logistic regression aims to

measure the relationship between a categorical dependent

variable and one or more independent variables (usually

continuous) by plotting the dependent variables probability

scores. A categorical variable is a variable that can take values

falling in limited categories instead of being continuous. Logistic

regression uses regression to predict the outcome of a categorical

dependent variable on the basis of predictor variables. The

probable outcomes of a single trial are modeled as a function of

the explanatory variable using a logistic function. Logistic

modeling is done on categorical data which may be of various

types including binary and nominal. For example, a variable

might be binary and have two possible categories of yes and

no; or it may be nominal say hair color maybe black, brown, red,

gold and grey. Another objective of logistic regression is to check

if the probability of getting a particular value of the dependent

variable is related to the independent variable. Multiple logistic

regression is used when there are more than one independent

variables under study. For e.g., Logistic Regression would help

identify factors like product quality, service quality, brand image,

reward programs, etc., that impact customers loyalty and

willingness to recommend a retail stores products to others. The

results would help improve the stores performance on these

parameters and increase customer loyalty.

multidimensional scaling: is a means of visualizing the level of

similarity of individual cases of a dataset. It refers to a set of

related ordination techniques used in information visualization, in

particular to display the information contained in a distance

data; 3) Running the MDS statistical program; 4) Decide number

of dimensions; 5) Mapping the results and defining the

dimensions; 6) Test the results for reliability and validity; 7)

Report the results comprehensively. For e.g, In marketing, MDS

is a statistical technique for taking the preferences and

perceptions of respondents and representing them on a visual

grid, called perceptual maps. By mapping multiple attributes and

multiple brands at the same time, a greater understanding of the

marketplace and of consumers' perceptions can be achieved, as

compared with a basic two attribute perceptual map

Application of statistical software for data analysis: Following are

the statistical software and the features it has for doing data

analysis: 1) SAS/STAT: SAS/STAT software is designed for both

specialized and enterprise wide analytical needs. It uses more

of coding and little less of menu-driven way of doing

analysis.

SAS/STAT

software

provides

a

complete,

comprehensive set of tools that can meet the data analysis needs

of the entire organization. Features: Anova; Mixed Models

Linear mixed, non-linear mixed and general linear models;

Regression; Categorical data analysis; Bayesian analysis;

Multivariate analysis; Survival analysis; Psychometric analysis;

Cluster analysis; Nonparametric analysis; Survey data analysis;

Mutiple imputation for missing values. 2) SPSS: It is more

menu driven and less coding; Analysing variables seperately;

Comparing multiple variables; Association between variables. 3)

R: It is all coding for doing all the latest methods of doing

data analysis. Every data analysis method can be done using R;

Creating unique and beautiful data visualizations; Getting better

results faster; Draw on the talents of statisticians worldwide as

they make method libraries for free usage.

