Statistical Tools in Research

Statistical Tools used in Research
Submitted to: Submitted By:

Dr. Bhagwan Singh Subhrat Sharma
CUHP13MBA85
Central University of Himachal Pradesh

Chi-Square Test of Regression
Statistics Defined Correlation Hypothesis Factor Analysis References
Independence analysis
• Hypothesis :
Alpha
• Hypothesis :
Beta
Statistics is the science and practice of developing human knowledge through the use of
empirical data expressed in quantitative form. It is based on statistical theory which is a branch
of applied mathematics. Within statistical theory, randomness and uncertainty are modelled by
probability theory. (Wikipedia Encyclopaedia)
What is statistics?
 The collecting, summarizing, and analysing of data.
 The term also refers to raw numbers, or “stats”, and to the summarization of data.
Example: Frequencies
Allows an examination of the relationship between variables; is there a relationship between
these variables? Are they positively or negatively related?
A correlation coefficient of 0 means that there is no relationship between the variables, -1

negative relationship, 1 positive relationship.
Important: Correlation is not causation.

• "Correlation Is Not Causation" ... which says that a correlation does not mean that one thing causes the other (there
could be other reasons the data has a good correlation).
Ex. What is the relationship between exercise and depression?

• Does depression increase when exercise increases?
• Does depression decrease when exercise increases?
• Is there no significant correlation between exercise and depression?
 Correlation is Positive when the values increase together, and
 Correlation is Negative when one value decreases as the other increases
Null hypothesis: A hypothesis put forward to argue that a relationship or pattern does not exist.
•Cholesterol study example: In a Randomized Control Trial, the control group and the treatment group have equal
levels of cholesterol at the end of a study.
•Null hypothesis: Groups A and B are equal.
•Denoted by Ho:
Alternative Hypothesis: Statement of what study is set to establish.
•Alternative Hypothesis: Groups A and B have different levels of cholesterol.
•Denoted by H1:
The null hypothesis will be true if the findings are insignificant.
The null hypothesis will be false if the findings are significant.

Alpha level, or significance level, is the value that is determined by the researcher in order to reject or retain the
null hypothesis. It is a pre-determined value, not calculated.
• In other words, if we select a value of .05, findings would be deemed statistically significant if they were
found to be .05 or less.
What does this mean?

• Alpha indicates the probability that the null hypothesis will be rejected when it is true (in other
words, the null hypothesis is wrongly rejected).
This is called Type 1 error or alpha error
E.g. in a trial of new Drug X, the null hypothesis might be that the new Drug X is no better
than the current Drug Y.
• H0: there is no difference between Drug X and Drug Y.
• A Type 1 error would occur if we concluded that the two drugs
• produced different effects when there was no difference between them.
Type 2 error is If Drug X and
failing to detect Drug Y
Beta is the
an association You kept the produced
probability of
when one null hypothesis different
making a Type 2
exists, or failing when you effects, and it
error when
to reject the should not was concluded
testing a
null hypothesis have. that they
hypothesis.
when it is produce the
actually false. same effects.
 The test is applied when you have two qualitative variables from a single
population.
 It is used to determine whether there is a significant association between the two
variables.
 For example, in an election survey, voters might be classified by gender (male or
female) and voting preference (BJP, Congress or AAP).
 We could use a chi-square test for independence to determine whether gender is related
to voting preference
Voting Preferences
Row total
BJP Congress AAP
Male 200 150 50 400
Female 250 300 50 600
Column
450 450 100 1000
total
When to Use Chi-Square Test for Independence
• The test procedure described in this lesson is appropriate when the
following conditions are met:
• The sampling method is simple random sampling.
• Each population is at least 10 times as large as its respective
sample.
• The variables under study are each categorical.
• If sample data are displayed in a contingency table, the expected
frequency count for each cell of the table is at least 5.
Regression analysis is used when you want to predict a continuous dependent variable from a number
of independent variables.
Regression analysis is widely used for prediction and forecasting.
Classical assumptions for regression analysis include:

• The sample is representative of the population for the inference prediction.
• The error is a random variable with a mean of zero conditional on the explanatory variables.
• The independent variables are measured with no error.
• The predictors are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others.
• The errors are uncorrelated, that is, the variance–covariance matrix of the errors is diagonal and each non-zero element is the
variance of the error.
• The variance of the error is constant across observations (homoscedasticity). If not, weighted least squares or other methods
might instead be used.
Illustration of linear regression on a data set.
What is a factor?
• The key concept of factor
analysis is that multiple
observed variables have similar
patterns of responses because
It allows researchers to they are all associated with a
investigate concepts latent (i.e. not directly
measured) variable.
that are not easily • For example, people may
Factor analysis is a measured directly by respond similarly to questions
useful tool for collapsing a large about income, education, and
occupation, which are all
investigating variable number of variables into associated with the latent
relationships for a few interpretable variable socioeconomic status.
complex concepts such underlying factors.

as socioeconomic status,
dietary patterns, or
psychological scales.
Indicators of wealth, with six variables and two resulting factors.
Variables
Factor 1 Factor 2
Income
0.65 0.11
Education
0.59 0.25
Occupation
0.48 0.19
House value
0.38 0.60
Number of public parks in
neighborhood
0.13 0.57
Number of violent crimes per year in
neighbourhood
0.23 0.55
• The variable with the strongest association to the underlying latent variable. Factor
1, is income, with a factor loading of 0.65.
• Since factor loadings can be interpreted like standardized regression coefficients,
one could also say that the variable income has a correlation of 0.65 with Factor 1.
This would be considered a strong association for a factor analysis in most research
fields.
• Two other variables, education and occupation, are also associated with Factor 1.
Based on the variables loading highly onto Factor 1, we could call it “Individual
socioeconomic status.”
• Notice that the variable house value also is marginally important in Factor 1
(loading = 0.38). This makes sense, since the value of a person’s house should be
associated with his or her income.
 http://dss.princeton.edu/online_help/analysis/regression_intro.htm
 http://stattrek.com/chi-square-test/independence.aspx
 http://www.statsoft.com/Textbook/Principal-Components-Factor-Analysis
 http://www.theanalysisfactor.com/factor-analysis-1-introduction/
 http://mathworld.wolfram.com/HypothesisTesting.html
 https://www.mathsisfun.com/data/correlation.html

Statistical Tools in Research

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Tools in Research

Uploaded by

Copyright:

Available Formats

Statistical Tools used in Research

Submitted to: Submitted By:

Central University of Himachal Pradesh

A correlation coefficient of 0 means that there is no relationship between the variables, -1

Important: Correlation is not causation.

Ex. What is the relationship between exercise and depression?

The null hypothesis will be false if the findings are significant.

What does this mean?

This is called Type 1 error or alpha error

Regression analysis is widely used for prediction and forecasting.

Classical assumptions for regression analysis include:

complex concepts such underlying factors.

You might also like