You are on page 1of 6

Hypothesis testing is a formal procedure for investigating our ideas about the world using

statistics. It is most often used by scientists to test specific predictions, called hypotheses, that
arise from theories.

There are 5 main steps in hypothesis testing:


1. State your research hypothesis as a null hypothesis and alternate hypothesis (Ho) and (Ha or
H1).
2. Collect data in a way designed to test the hypothesis.
3. Perform an appropriate statistical test.
4. Decide whether to reject or fail to reject your null hypothesis.
5. Present the findings in your results and discussion section.

Step 1: State your null and alternate hypothesis


- After developing your initial research hypothesis (the prediction that you want to investigate), it
is important to restate it as a null (Ho) and alternate (Ha) hypothesis so that you can test it
mathematically.
- The alternate hypothesis is usually your initial hypothesis that predicts a relationship between
variables. The null hypothesis is a prediction of no relationship between the variables you are
interested in.

Step 2: Collect data


- For a statistical test to be valid, it is important to perform sampling and collect data in a way
that is designed to test your hypothesis. If your data are not representative, then you cannot
make statistical inferences about the population you are interested in.

Step 3: Perform a statistical test


- There are a variety of statistical tests available, but they are all based on the comparison of
within-group variance (how spread out the data is within a category) versus between-group
variance (how different the categories are from one another).
- If the between-group variance is large enough that there is little or no overlap between groups,
then your statistical test will reflect that by showing a low p-value. This means it is unlikely that
the differences between these groups came about by chance.
- Alternatively, if there is high within-group variance and low between-group variance, then your
statistical test will reflect that with a high p-value. This means it is likely that any difference you
measure between groups is due to chance.
- Your choice of statistical test will be based on the type of variables and the level of
measurement of your collected data.

Step 5: Present your findings


- The results of hypothesis testing will be presented in the results and discussion sections of
your research paper, dissertation or thesis.
- In the results section you should give a brief summary of the data and a summary of the
results of your statistical test (for example, the estimated difference between group means and
associated p-value). In the discussion, you can discuss whether your initial hypothesis was
supported by your results or not.

6 Common Statistical Tools


1. Mean
Mean is an important method for performing statistical analysis and presents the average data
to provide a common measure of central tendency. To calculate it, you simply add up the list of
values in a data set and then divide that number by the total number of items in the set. In
statistics, the arithmetic mean is the summation of all observations divided by the number of
observations. It represents the central value of a data set or distribution through a single value.

The formula for mean is as follows:


Mean = (sum of all values) / (total number of values in the data set)

2. Standard deviation
Standard deviation is the measure of the spread of a data set. The degree of variance of the
data set is the average square of the difference between the mean value and each data value.
Standard deviation is the spread of data values around the mean or average data.

The formula for standard deviation is:


σ = √(∑x−¯x)2 /n)

The symbol for standard deviation is "σ," where "Σ" stands for the sum of the data, "x" stands for
the value of the data set. In the formula, n stands for the number of data points in the
population.

3. Hypothesis testing
A hypothesis is a claim or assumption about a data set. Hypothesis testing is a standard
process to draw conclusions about the property of a population parameter or a population
probability distribution. It's also called T Testing and is helpful while testing the two sets of
random variables within the data set. Hypotheses testing compares the data against various
hypotheses and assumptions and assists in forecasting and decision-making.

The formula for hypothesis testing is:


H0: P = 0.5
H1: P ≠ 0.5

Researchers interpret statistical hypothesis test results to make a specific claim. They measure
the p-value, which has a 50% chance of being correct. The null hypothesis is the basic
assumption that researchers make while starting a test or experiment and accept or reject as
per the findings. They test the alternative hypothesis, which is the opposite of the null
hypothesis, to determine whether the null hypothesis is true. If they find adequate evidence of
the null evidence not being true, the alternative hypothesis replaces the null hypothesis.
4. Regression
Regression is the relationship between a dependent variable and an independent variable.
Researchers and statisticians use it to explain how one variable influences another or how the
changes in a variable trigger change in another. Regression analysis graphs and charts can
help show the relationships between the variables and trends over a specific amount of time.
The formula for regression is:

Y = a + b(x) + Є

In this formula, "Y" is the dependent variable, "x" is the independent variable, "a" stands for the
intercept, "b" is the slope, and "Є" denotes the regression residual.

5. Sample size determination


In statistics, it is critical to determine the right size of the sample to get accurate results and
predictions. In most cases, businesses have a large amount of data to process and analyze and
may study only a part of it in greater detail. Statisticians determine the correct sample size by
considering factors like cost, time, or convenience.

There is no specific formula for sample size determination. Researchers and statisticians design
special data collection and sampling methods as per the study or the population size. For a
generic study, using a table to create a sample size and analyze it can be helpful.

6. Variance
Variance in statistics refers to the expected deviation between values in a specific data set.
Businesses use this to measure the average value and volatility of the market and the stability
of a specific investment return within a period. It's helpful in the mathematical sense to analyze
data, but to use the insights gathered from variance, you may have to take the square root of
the sample variance. It simply measures the variability of data from the average.

To calculate it, take the differences between each number in the data set and the mean. Then
square the difference in the number to make them positive. Finally, divide the sum of the
squares by the number of values. The formula for variance is:

σ² = ∑(x−xˉ)2 / (n−1)

The symbol "σ²" illustrates variance, while its square root, the standard deviation, is "σ." In this
formula, "x" represents the ith data point, "xˉ" is the mean value of all data points, "n" denotes
the number of data points.

STATISTICAL TOOLS IN RESEARCH


1. Mean: The arithmetic mean, more commonly known as -the average, is the sum of a list
of numbers divided by the number of items on the list. The mean is useful in determining
the overall trend of a data set or providing a rapid snapshot of your data. Another
advantage of the mean is that it's very easy and quick to calculate.
2. Standard Deviation: The standard deviation, often represented with the Greek letter
sigma, is the measure of a spread of data around the mean. A high standard deviation
signifies that data is spread more widely from the mean, where a low standard deviation
signals that more data align with the mean.
3. Regression: Regression models the relationships between dependent and explanatory
variables, which are usually charted on a scatterplot.
- The regression line also designates whether those relationships are strong or
weak.
- Regression is commonly taught in high school or college statistics courses with
applications for science or business in determining trends over time.
4. One sample t-test: A one sample t-test allows us to test whether a sample mean (of a
normally distributed interval variable) significantly differs from a hypothesized value.
- The mean of the variable for this particular sample of students which is
statistically significantly different from the test value.
- We would conclude that this group of students has a significantly higher mean on
the writing test than the given.
5. One sample median test: A one sample median test allows us to test whether a sample
median differs significantly from a hypothesized value.
6. Binomial test: A one sample binomial test allows us to test whether the proportion of
successes on a two-level categorical dependent variable significantly differs from a
hypothesized value.
7. Chi-square goodness of fit: A chi-square goodness of fit test allows us to test whether
the observed proportions for a categorical variable differ from hypothesized proportions.
8. Wilcoxon-Mann-Whitney test: The Wilcoxon-Mann-Whitney test is a non-parametric
analog to the independent samples t-test and can be used when you do not assume that
the dependent variable is a normally distributed interval variable (you only assume that
the variable is at least ordinal).
9. Chi-square test: A chi-square test is used when you want to see if there is a
relationship between two categorical variables.
10. Fisher's exact test: The Fisher's exact test is used when your cells has an expected
frequency of five or less.
11. One-way ANOVA: A one-way analysis of Variance (ANOVA) is used when you have a
categorical independent variable (with two or more categories) and a normally distributed
interval dependent variable and you wish to test for differences in the means of the
dependent variable broken down by the levels of the independent variable.
12. Kruskal Wallis test: The Kruskal Wallis test is used when you have one independent
variable with two or more levels and an ordinal dependent variable. In other words, it is
the non-parametric version of ANOVA and a generalized form of the Mann-Whitney test
method since it permits 2 or more groups.
13. Paired t-test: A paired (samples) t-test is used when you have two related observations
(i.e. two observations per subject) and you want to see if the means on these two
normally distributed interval variables differ from one another.
14. Wilcoxon signed rank sum test: The Wilcoxon signed rank sum test is the
non-parametric version of a paired samples t-test. You use the Wilcoxon signed rank
sum test when you do not wish to assume that the difference between the two variables
is interval and normally distributed (but you do assume the difference is ordinal).
15. McNemar test: You would perform McNemar test if you were interested in the marginal
frequencies of two binary outcomes. These binary outcomes may be the same outcome
variable on matched pairs (like a case-control study) or two outcome variables from a
single group.
16. One-way repeated measures ANOVA: You would perform a one- way repeated
measures analysis of variance if you had one categorical independent variable and a
normally distributed interval dependent variable that was repeated at least twice for each
subject. This is the equivalent of the paired samples t-test, but allows for two or more
levels of the categorical variable. This tests whether the mean of the dependent variable
differs by the categorical variable.
17. Repeated measures logistic regression: If you have a binary outcome measured
repeatedly for each subject and you wish to run a logistic regression that accounts for
the effect of these multiple measures from each subjects, you can perform a repeated
measures logistic regression.
18. Factorial ANOVA: A factorial ANOVA has two or more categorical independent
variables (either with or without the interactions) and a single normally distributed interval
dependent variable.
19. Friedman test: You perform a Friedman test when you have one within-subjects
independent variable with two or more levels and a dependent variable that is not
interval and normally distributed (but at least ordinal. The null hypothesis in this test is
that the distribution of the ranks of each type of score (i.e., reading, writing and math) are
the same.
20. Factorial logistic regression: A factorial Logistic regression is used when you have two
or more categorical independent variables but a dichotomous dependent variable.
21. Correlation: A correlation is useful when you want to see the linear relationship between
two (or more) normally distributed interval variables. Although it is assumed that the
variables are interval and normally distributed, we can include dummy variables when
performing correlations.
22. Simple linear regression: Simple linear regression allows us to look at the linear
relationship between one normally distributed interval predictor and one normally
distributed interval outcome variable.
23. Non-parametric correlation: A Spearman correlation is used when one or both of the
variables are not assumed to be normally distributed and interval (but are assumed to be
ordinal).
24. Multiple regression: Multiple regressions is very similar to simple regression, except
that in multiple regression you have more than one predictor variable in the equation.
25. Analysis of covariance: Analysis of covariance is like ANOVA, except in addition to the
categorical predictors you also have continuous predictors as well.
26. One-way MANOVA: MANOVA (multivariate analysis of variance) is like ANOVA, except
that there are two or more dependent variables. In a one-way MANOVA, there is one
categorical independent variable and two or more dependent variables.
27. Canonical correlation: Canonical correlation is a multivariate technique used to
examine the relationship between two groups of variables. For each set of variables, it
creates latent variables and looks at the relationships among the latent variables. It
assumes that all variables in the model are interval and normally distributed.
28. Factor analysis: Factor analysis is a form of exploratory multivariate analysis that is
used to either reduce the number of variables in a model or to detect relationships
among variables. All variables involved in the factor analysis need to be continuous and
are assumed to be normally distributed. The goal of the analysis is to try to identify
factors which underlie the variables.

You might also like