You are on page 1of 26

Quantitative Data Analysis :

Hypothesis Testing

Florencia / 201650281
Mario Alexander Setiawan / 2016505..
Purpose of hypothesis testing -> determine accurately if the null hypothesis can be
rejected in favor of the alternate hypotesis
Type I Errors, Type II Errors , and
Statistical Power
Two kinds of errors

 Type I Error -> alpha (α)


is the probability of rejecting the null hypothesis when it is actually true
 Type II Error -> beta (β)
is the probability of failing to reject the null hypothesis given that alternate
hypothesis is actually true
.

Example:
Ho : All honest people are good
Ha : Not all honest people are good
Type I Error : not all honest people are good, when in fact all honest people are
good. We reject the null hypothesis we can say that not all honest people are
good (rejecting Ho incorrectly)
Type II Error : all honest people are good, when in fact not all honest people are
good. We will accept the null hypothesis and say that all honest people are good.
(rejecting Ha incorrectly)
Statistical Power

1. Alpha (α) : the statistical significance criterion used in the test


2. Effect Size : the size of a difference or the strength of a relationship in the population
3. The size of sample : the size of sample can affect the accuracy
Choosing the Appropriate Statistical
Tehcnique
The choice of appropriate statistical technique depends on:
a. Number of variable (dependent / independent)
b. The scale of measurement (metric / nonmetric)

Statistical technique divided into 2:


c. Univariate techniques
d. Multivariate techniques
Univariate Techniques
1. Testing a hypothesis about a single mean

 One sample t-test : population = comparison standard


2. Testing hypoteses about two related
means

 Paired sample t-test : differences in the same group before and after a
treatment.
 Wilcoxon signed-rank : nonparametic test for examining significant
differences between two samples or repeated measurement on a single
sample.
 McNemar’s test : significance of the difference between two dependent
samples when the variable of interest is dichotomous.
3. Testing Hypoteses about two unrelated
means
 Independent sample t-test : significant difference in the means for two
groups in the variables of interest
4. Testing hypotheses about several means
 Analysis of Variance (ANOVA) : significant mean differences among more than
two groups on an interval or rato\io scaled dependent variable.
Multivariate Techniques
Regresion Analysis

 Multiple regression analysis : use more than one independent variable to


explain variance in the dependent variable.
Standardized regression coeffiecients
Standardized regression coefficients (or beta coefficients) are the estimates resulting from a multiple regression analysis
performed on variables that have been standardized (a process whereby the variables are transformed into variables with a
mean of 0 and a standard deviation of 1)
Allow the researcher to compare relative effects of independent variables on dependent
variable, when the independent are
measured in different units of measurement.

For example, let’s say your model involved how income of parents and
their education level affected their offspring’s lifetime earnings. Income
of parents is measured in dollars and education level is measured in years
of school. Standardizing these variables means that they can be compared
to each other in the model. Let’s say income has a standardized beta
coefficient with a value of .2 and education level has a beta of .34. The
model shows that with every increase of one standard deviation in parent’s
income, an offspring’s income rises by .2 standard deviations. This assumes
the other variable (education level) is held constant. With an increase of
one standard deviation in education level, earnings rise .34 standard
deviations — assuming parent’s income is held constant.
Regression with dummy variables
A dummy variables is a variable that has two or more distinct level, which are coded 0 or 1.

Dummy variables allow us to use nominal or ordinal variables as


Independent variables to explain, understand, or predict the
dependent variables.

Extra- Score
course
transform
Qualitative Quantitative

ex: join extracourse, not join extracourse


D=1 (Join)
D=2 (Not join)
Multicollinearity
Multicollinearity is an often encountered statistical phenomenon in
which two or more independent variables in a multiple regression model
are highly correlated.
Ways to detect multicollinearity:
1.Check the correlation matrix for the independent variables.
The presence of high correlations is a first sign of
sizeable multicollinearity. (commonly 0,7 or more is high)

2. Use Tollerance and VIF (produce by SPSS)


A common cutoff value is a tolerance value of 0.10
which corresponds to a VIF of 10.
• Tollerance value >0.10 (no multicollinearity)
• Tollerance value <0.10 ( multicollinearity)
• VIF >10 (multicollinearity)
• VIF <10 (no multicollinearity)

Method to reduce multicollinearity:


1. Reduce the set of independent variables to a set that are not collinear. (may lead to
omitted variable bias)
2. Use more sophisticated ways to analyze the data, such as ridge regression.
3. Create a new variable that is a composite of the highly correlated variables.
Other multivariate tests and analysis
1.Discriminant Analysis Example of conjoint analysis question.
Helps to identify the independent variables
that discriminate a nominally scaled dependent
variable of interest. Phone Phone
2. Logistic regression X Y
Logistic regression is also used when the dependent
variable is nonmetric (qualitative). Memory 12 GB 16 GB
-Often preffered to used when the dependent
variable has only 2 groups (ex: yes/no, good/bad). Battery life 24 hours 12 hours
-Allows researcher to predict a discrete outcome,
Camera 8 megapixels 16 megapixels
such as “will purchase the product/will not purchase
the product” from a set of variables that may be Price $249 $319
continous, discrete, or dichotomous.
Would you choose phone X
3. Conjoint Analysis or Y?
Is a statistical technique that is used in many fields
including marketing, product management, and Conjoint analysis takes a these attribute and
operation research. level description of products (memory,etc), and
In marketing, conjoint analysis is used to understand uses them by asking participant to make choices
how consumers develop preferences for product or between products. By asking enough choices, its
Services. possible to establish how important each of the
levels is relative to others, known as utility level.
Other multivariate tests and analysis
4. Two-way ANOVA
Can be used to examine the effect of two nonmetric
independent variables on a single metric dependent Gender
variables.
Two-way ANOVA enables us to examine main effects salaries
(the effects of independent variables on dependent Experience
variables.) but also interaction effect that exist between
the independent variables (or factors).

5. MANOVA
Is similar to ANOVA, with the difference that ANOVA
tests the mean differences of more than two groups on
one dependent variable, whereas MANOVA tests mean
differences among groups across several dependent
variables simultaneously, by using sums of square and
cross-product matrices.

6. Canonical Correlation
Examines the relationship between two or more
dependent variables and several independent variables.

-quality of work -engrossment in work


-the output -timely compleation of work
--rate of reject -number of absence
In sum


data. Using these techniques allow us to generalize the results obtained from the sample
to the population at large.

Several univariate, bivariate, and multivariate techniques are available to analyze sample

Have been explained in this chapter that the choice of statistical technique depends on
the number of variables you are examining, on the scale of measurement of your
variables, on whether the assumptions of parametric test are met, and on the size of
your sample.
DATA WAREHOUSING, DATA MINING, AND
OPERATIONS RESEARCH

Data Warehouse Data Mining Operations


The process of extracting, transferring,
Reseacrh
Data mining more
and integrating data spread across effectively leverages the Tools to simplify and
multiple external databases and even data warehouse by thus clarify certain types
operating systems, with a view to identifying hidden relations of complex problem that
facilitating analysis and decision making. and patterns in the data lend themselves to
stored in it. quantification.
SOME SOFTWARE PACKAGES USEFUL FOR DATA ANALYSIS

LISREL MATLAB
Is a computer program that was originally
Is designed to estimate and test structural designed to simplify the implementation of
equation models. numerical linear algebra routine.

Qualtrics
Mplus
Is a statistical modeling program that Allows users to do many kinds of online
offers researchers a wide choice of data collection and data analysis,
models, estimators, and alogrithms. employee evaluations, website feedback,
marketing research, and customer
satisfcation, and loyalty research.
SOME SOFTWARE PACKAGES USEFUL FOR DATA ANALYSIS

SAS SPSS
Is a data management and analysis
Is an integrated system of software products, program designed to do statistical data
capable of performing a broad range of analysis, including descriptive statistics
statistical analyses such as descriptive such as plots, frequencies, charts, and
statistics, multivariate techniques, and time lists, as well as sophisticated statistical
series analyses. procedure like ANOVA, factor analysis,
cluster analysis, and categorical data
analysis.

Stata SPSS AMOS


Is a general purpose statistical software
package that supports various statistical Is designed to estimate and test structural
and econometric methods, graphics, and equation model.
enhanced features for data manipulation,
programming, and matrix manipulation.

You might also like