You are on page 1of 7

AMR Chapter Summary by group 3

Submitted by,
Group No. 3

Group members
Rahul Dev Verma MBA07231
Sasi Kumar Bhogi MBA07221
Nayan Trivedi MBA07222
Sayee Prasad Kompella MBA07206
Prateek Seth MBA07243
Chapter 14: Data Preparation

Data preparation is a process of cleaning and converting raw data or unstructured data before
processing and analysis. It is a crucial step before initiating and often involves reformatting the
data, making corrections to the data & combining data sets to enrich data.

The Data-Preparation Process

The data-preparation process has been shown in below figure. The whole process is guided by
the preparatory plan of data analysis that was formulated in the research design stage. The first
phase is to check for adequate questionnaires. This is followed by editing, coding, and
transcribing the available data. The data are cleaned & treatment for unavailable responses is
prescribed. Often statistical data adjustment may be essential to make them a sample of the
population of interest. The researcher is the responsible to choose an appropriate data analysis
method. The final data analysis strategy varies from the primary plan of data analysis due to
the information and understandings gained since the preliminary plan was drowned. Data
preparation should begin at earliest since the starting batch of questionnaires is obtained from
the field while the fieldwork is keep going on. Therefore, if any issues come to notice, the
fieldwork can be changed to incorporate remedial actions.

Marketing Research and Social Media

Social media data collection and research can be a very vigorous process. Data developed by a
sizeable networked panel can be made unrestricted to members for analysis by investigators
with the suitable collection of Web 2.0 tools, enabling the discussions to organize and
reorganize within the panel dynamically. Unlike traditional data collection, respondents do not
just respond to questions. Instead, they generate and revise the data through their collaborative
participation. They may change their standings in response to others, regardless of whether the
others are researchers, customers, or respondents.

Statistical Software

Main statistical packages such as SPSS from IBM, EXCEL from Microsoft, SAS by SAS
Institute, and MINITAB by Minitab, LLC have Internet websites that can be reached for
various information. These packages also include options for managing missing responses and
statistically adjusting the available data. In addition, a variety of statistical packages can now
be identified on the Internet. Whereas some of the programs may or may not deliver integrated
data analysis and management, they can nevertheless be handy for conducting precise statistical
analyses.

SPSS Windows

SPSS Windows provides a robust statistical analysis and data control system in graphical
conditions. Thus, a typical SPSS user must be familiar with basic spreadsheet concepts.
Particularly,
• Highlighting
• Cut, Copy and Paste
• Mouse gestures
• Dialog boxes, radio buttons, and checkboxes
• Windowing characteristics
• Online Help
Chapter 15: Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

This chapter gave an overview on frequency distribution and hypothesis testing. The
objective of the frequency distribution is to obtain a count of the number of responses
associated with different values of the variable.
Frequency Distribution:
A frequency distribution will have a table of number of frequencies that occurs, percentages
and also cumulative percentages which are associated with that variable. A frequency
distribution also helps to construct the shape of the empirical distribution of that particular
variable. It may be used to construct a vertical bar chart, histograms, etc.
The most used statistics associated with frequency distribution are
• measures of location → mean, mode, and median
• measures of variability → range, interquartile range, standard deviation, and
coefficient of variation
• measures of shape → skewness and kurtosis
Measures of Location: It is a statistic that gives you a location within a data set. The centre of
the distribution can be Measured by using central tendency.
Mean: It is the most commonly used statistic to measure the central tendency. It is the
average value of the set that is the value obtained by summation of all elements in the set and
dividing with number of elements.
Mode: It is the measure of central tendency given by the value that appears most frequently in
the given sample.
Median: It is the measure of central tendency given by the value which half of the values are
fallen above that value and half will fall below.
Measures of Shape: Measures of shape are also useful in understanding the nature of the
distribution.
Skewness: Skewness is the tendency of the deviations from the mean to be larger in one
direction than in the other. It is simply a characteristic that assesses its symmetry about the
mean.
Kurtosis: Kurtosis is a measure of the relative peak ness or flatness of the curve of the
frequency distribution.
Hypothesis Testing:
Steps that are involved in hypothesis testing:
Step-1: Formulate the null hypothesis as H0 and the alternative hypothesis as H1.
Step-2: Choose an appropriate statistical technique.
Step-3: Select the level of significance α.
Step-4: Collect the data and determine the sample size.
Step-5. Determine the probability under the null hypothesis. Also calculate the critical values
associated with the test statistic that divide the rejection and nonrejection regions.
Step-6: Compare the probability associated with the test statistic with the level of significance
specified.
Step-7: Make a decision whether to reject or not reject the null hypothesis.
Step-8: Express that decision in terms of the marketing research problem.
Cross-Tabulations:
This is used when there is a need to check the relation between 2 or more variables at a time.
It is the combination of frequency distribution of those 2 or more variables in 1 table. For
example, the relation between the variable store sales and variable gender can be found by
putting them in a crosstabulation. These tables are also called contingency tables.
Crosstabulation is generally preferred when there are only 2 variables.
If there are 2 variables it is called bivariate. If we take a third variable into consideration the
relation between the first 2 variables can be fine-tuned. It might also prove that there was no
relation between initial 2 variables.

Chi-Squared test:
This test is used to check the strength of the relationship in the crosstabulation. Chi-squared
test is performed by taking null hypothesis as
H0: there is no association exists b/w 2 variables.
The frequencies of the cells are calculated assuming that there is no association b/w the
variables. The difference b/w the calculated frequencies and the original frequencies is
calculated.
Chi square is then calculated by taking the square of the difference and dividing it by the
expected value.
Next step is to calculate Degrees of freedom.
The value closest to calculated chi square is searched in the chi distribution table by taking
the DoF and the corresponding p(probability) value is taken. Then based on the accepted
deviation(p), the hypothesis is either proved true or false.
Parametric tests:
chi-square is a nonparametric test. Whereas, parametric tests consist of t test and z test.
Parametric tests can be either for one sample or for two sample.
T-test:
This test is used to make inferences about a population. In this test it is assumed that the
variable distribution is normal and the mean value of population is given. The unknown
population variance is then calculated by calculating the variance of the sample. The standard
deviation of the sample is calculated by dividing the standard deviation of the population with
the square root of the sample size. Then, t is calculated as
t = (mean of the sample − mean of the population)/standard deviation of the sample

Steps for Hypothesis testing:


Prepare Null hypothesis H0 and then the alternative hypothesis H1 .
Select the level of significance alpha (0.05 generally)
Calculate the mean and SD of the sample
Find t-stat
Degree of Freedom
Find the critical value of t i.e talpha/2 if talpha/2 > t-stat then don’t reject the null hypothesis.
Similarly, this t-test can be applied for two independent samples. The Hypothesis is taken as
H0 : µ1 = µ2
H1 : µ1 =! µ2
The rest of the steps are same as mentioned above.
Note: this can only be applied when the variances of the two populations are same.
If the variances are different then another test called F-test is performed.
The hypothesis is taken as
H0 : s1 = s2
H1 : s1 = ! s2
The rest of the steps are done in a similar fashion to t-test.
If the samples are not independent but paired. Then a paired sample t-test is performed. Here
the difference b/w the means of the populations are taken as D and the t-stat is calculated.
Dof is calculated as n-1 where n is the number of pairs.
The hypothesis then will be
H0: µD = 0
H1 : µD =! 0
Critical value of t is calculated and the rest of the procedure is same as t-test.
Non-parametric tests:
One sample:
KS one sample test : compares cumulative relative frequency. K is calculated K = Max|Ai-Oi|
and if the value of K is high, the higher the chances of null hypothesis being false.
Two independent samples:
If there are 2 independent samples then Mann-Whitney U test is used.

Paired Samples:
For paired Samples Wilcoxon matched-pairs signed-ranks test is used.

You might also like