You are on page 1of 11

HYPOTHESIS TEST AJAY RM

Question Answer
Null
Hypothesis

In statistical inference of observed data of a scientific experiment,
the null hypothesis refers to a general statement or default position
that there is no relationship between two measured phenomena,[1]
or that a potential medical treatment has no effect.[2] Rejecting or
disproving the null hypothesis and thus concluding that there are
grounds for believing that there is a relationship between two
phenomena
For instance, a certain drug may reduce the chance of having a heart
attack. Possible null hypotheses are "this drug does not reduce the
chances of having a heart attack" or "this drug has no effect on the
chances of having a heart attack". The test of the hypothesis consists
of administering the drug to half of the people in a study group as a
controlled experiment. If the data show a statistically significant
change in the people receiving the drug, the null hypothesis is
rejected.
The choice of null hypothesis (H0) and consideration of directionality
(see "one-tailed test") is critical. Consider the question of whether a
tossed coin is fair (i.e. that on average it lands heads up 50% of the
time). A potential null hypothesis is "this coin is not biased toward
heads" (one-tail test). The experiment is to repeatedly toss the coin.
A possible result of 5 tosses is 5 heads. Under this null hypothesis,
the data are considered unlikely (with a fair coin, the probability of
this is 1/25=3.1% and the result would be even more unlikely if the
coin were biased in favour of tails). The data refute the null
hypothesis (that the coin is either fair or biased toward tails) and the
conclusion is that the coin is biased towards heads.



Alternative
Hypothesis

There are two types of statistical hypotheses.

Null hypothesis. The null hypothesis, denoted by H0, is usually the
hypothesis that sample observations result purely from chance.

Alternative hypothesis. The alternative hypothesis, denoted by H1 or
Ha, is the hypothesis that sample observations are influenced by
some non-random cause.
For example, suppose we wanted to determine whether a coin was
fair and balanced. A null hypothesis might be that half the flips would
result in Heads and half, in Tails. The alternative hypothesis might be
that the number of Heads and Tails would be very different.
Symbolically, these hypotheses would be expressed as

H0: p = 0.5
Ha: p <> 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10
Tails. Given this result, we would be inclined to reject the null
hypothesis. That is, we would conclude that the coin was probably
not fair and balanced.
Chi Square what is a Chi Square?
The Chi Square is a statistical test that measures how close the
variables of data are to what would be expected under a particular
given assumption.

How to Use a Chi Square.
1 Begin with a hypothesis before you start your data analysis. A
common hypothesis in much research is that there is no correlation
between the two variables of interest.
What is the symbol for chi square?
The Greek letter chi (lower case) which is very similar to X except
the forward stroke is curved at each end. It is part of unicode, found
in Microsoft word and other programs

What is chi-square?
(k'skwr') n. A test statistic that is calculated as the sum of the
squares of observed values minus expected values divided by the
expected values
A hypothesis A hypothesis (plural hypotheses) is a proposed explanation for a
phenomenon. For a hypothesis to be a scientific hypothesis, the
scientific method requires that one can test it. Scientists generally
base scientific hypotheses on previous observations that cannot
satisfactorily be explained with the available scientific theories. Even
though the words "hypothesis" and "theory" are often used
synonymously, a scientific hypothesis is not the same as a scientific
theory. A scientific hypothesis is a proposed explanation of a
phenomenon which still has to be rigorously tested. In contrast, a
scientific theory has undergone extensive testing and is generally
accepted to be the accurate explanation behind an
observation.[1][citation needed] A working hypothesis is a
provisionally accepted hypothesis proposed for further research.[2]
A t-test A t-test helps you compare whether two groups have different
average values (for example, whether men and women have
different average heights).
Lets say youre curious about whether New Yorkers and Kansans
spend a different amount of money per month on movies. Its
impractical to ask every New Yorker and Kansan about their movie
spending, so instead you ask a sample of eachmaybe 300 New
Yorkers and 300 Kansansand the averages are $14 and $18. The t-
test asks whether that difference is probably representative of a real
difference between Kansans and New Yorkers generally or whether
that is most likely a meaningless statistical fluke.

Technically, it asks the following: If there were in fact no difference
between Kansans and New Yorkers generally, what are the chances
that randomly selected groups from those populations would be as
different as these randomly selected groups are? For example, if
Kansans and New Yorkers as a whole actually spent the same amount
of money on average, its very unlikely that 300 randomly selected
Kansans each spend exactly $14 and 300 randomly selected New
Yorkers each spend exactly $18. So if youre sampling yielded those
results, you would conclude that the difference in the sample groups
is most likely representative of a meaningful difference between the
populations as a whole.
Formula

M = mean
n = number of scores per
group

x = individual scores
M = mean
n= number of scores in
group

Steps
Create four columns: "x", "(x-M
x
)
2"
, "y", "(y-M
y
)
2"

1. Put the raw data for group X in column x, and for group Y in column y
2. Calculate the mean for both groups
3. Calculate deviation scores for each group by subtracting each score
from it's group mean and squaring it and put these in the columns
"(x-M
x
)
2
" and "(y-M
y
)
2"

4. Sum the squared deviation scores for each group
5. Calculate S
2
for each group
6. Set up formula
7. Calculate t
8. Check to see if t is statistically significant on probability table
with df = N-2 and p < .05 (N= total number of scores)

F-Test

Any statistical test that uses F-distribution can be called a F-test. It is
used when the sample size is small i.e. n < 30.

For example suppose one is interested to test if there is any
significant difference between the mean height of male and female
students in a particular college. In such a situation, t-test for
difference of means can be applied.
However one assumption of t-test is that the variance of the two
populations is equal- here two populations are the population of
heights of male and female students. Unless this assumption is true,
the t-test for difference of means cannot be carried out.

F-test for testing equality of variance is used to test the hypothesis of
equality of two population variances. The example considered above
requires the application of this test.
F-test for testing equality of several means. Test for equality of
several means is carried out by the technique named ANOVA.
For example suppose that the efficacy of a drug is sought to be
tested at three levels say 100mg, 250mg and 500mg. A test is
conducted among fifteen human subjects taken at random- with five
subjects being administered each level of the drug.

To test if there are significant differences among the three levels of
the drug in terms of efficacy, the ANOVA technique has to be
applied. The test used for this purpose is the F-test.

School n Mean StDev
BB&N 23 4.3 0.4
Roxbury Latin 25 3.9 0.6
Winsor 26 4.2 0.3
Belmont Hill 29 3.1 0.3
Is there any significant difference between these schools AP English scores?
(Assume that the populations are normally distributed)



H
0
: =
BB&N

RL
=
Winsor
=
BelHill

The mean AP English Test scores in BB&N, Roxbury Latin, Winsor, and
Belmont Hill are all the same.
H
A
: The mean AP English Test scores in BB&N, Roxbury Latin, Winsor,
and Belmont Hill are not all the same.

Random samples taken
All of the standard deviations are the same
No standard deviation is more than twice any other.
All of the populations are normally distributed

School n Mean StDev
BB&N 23 4.3 0.4
Roxbury Latin 25 3.9 0.6
Winsor 26 4.2 0.3
Belmont Hill 29 3.1 0.3
Is there any significant difference between these
schools AP English scores? (Assume that the
populations are normally distributed)





Plug the F statistic into the F distribution (df = 3, 99). The shaded area
has a p-value of nearly 0.
Since all the conditions were met, we have conclusive evidence (df = 3,99, p
= 0) to reject the null hypothesis that the mean AP English Test scores in
BB&N, Roxbury Latin, Winsor, and Belmont Hill are all the same.

Diff betn one
way anova &
two way anova
An example of when a one-way ANOVA could be used is if you want to
determine if there is a difference in the mean height of stalks of three
different types of seeds. Since there is more than one mean, you can use a
one-way ANOVA since there is only one factor that could be making the
heights different.

Now, if take these three different types of seeds, and then add the
possibility that four different types of fertilizer is used, then you would want
to use a two-way ANOVA. The mean height of the stalks could be different
for a combination of several reasons:

The types of seed could cause the change,
the types of fertilizer could cause the change, and/or
there is an interaction between the type of seed and the type of fertilizer.

There are two factors here (type of seed and type of fertilizer), so, if the
assumptions hold, then you can use a two-way ANOVA.
School n Mean StDev
BB&N 23 4.3 0.4
Roxbury Latin 25 3.9 0.6
Winsor 26 4.2 0.3
Belmont Hill 29 3.1 0.3
Is there any significant difference between these schools
AP English scores? (Assume that the populations are
normally distributed)
Standard
Deviation

The Standard Deviation is a measure of how spread out numbers are.
Its symbol is (the greek letter sigma)
The formula is easy: it is the square root of the Variance. So now you ask,
"What is the Variance?"
Variance
The Variance is defined as:
The average of the squared differences from the Mean
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:
Answer:
Mean =
600 + 470 + 170 + 430 + 300
=
1970
= 394
5 5

To calculate the Variance, take each difference, square it, and then average
the result:


So, the Variance is 21,704.
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: = 21,704 = 147.32... = 147

Parametric and
nonparametric
tests
Parametric and nonparametric are two broad classifications of
statistical procedures
Parametric tests are based on assumptions about the distribution of
the underlying
population from which the sample was taken. The most common
parametric assumption is that data are approximately normally
distributed.
Nonparametric tests do not rely on assumptions about the shape or
parameters of the underlying population distribution.

If the data deviate strongly from the assumptions of a parametric
procedure, using the

parametric procedure could lead to incorrect conclusions.

Parametric Tests
Parametric tests are used when you have ratio or interval data. Parametric
tests are more likely to detect significance.
Parametric tests include:
o Correlation
o Regression
o Multiple Regression
o T Tests
o One-Way ANOVA
o Two-Way ANOVA

Non-Parametric Tests
o Non-parametric tests are used when you have nominal or
ordinal data. Non-parametric tests are less likely to
detect significance.
o Never do a non-parametric test when you can do a
parametric test. Non-parametric tests are not as powerful
so it is much harder to find significant results.

Use non-parametric tests when:

(1) Your data is nominal or ordinal
(2) Your sample size is too small for a parametric test
Non-parametric tests include:
o Chi-Square
o Mann-Whitney U
o Wilcoxon T Test
o Kruskal-Walli
Correlation


When two sets of data are strongly linked together we say they have
a High Correlation.
The word Correlation is made of Co- (meaning "together"), and
Relation
Correlation is Positive when the values increase together, and
Correlation is Negative when one value decreases as the other
increases
Correlation can have a value:
o 1 is a perfect positive correlation
o 0 is no correlation (the values don't seem linked at all)
o -1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and
if it is positive or negative.
How To Calculate
How did I calculate the value 0.9575 at the top?
I used "Pearson's Correlation". There is software that can calculate it, such
as the CORREL() function in Excel or OpenOffice Calc ...
... but here is how to calculate it yourself:
Let us call the two sets of data "x" and "y" (in our case Temperature is x and
Ice Cream Sales is y):
Step 1: Find the mean of x, and the mean of y
Step 2: Subtract the mean of x from every x value (call them "a"), do the
same for y (call them "b")
Step 3: Calculate: a b, a2 and b2 for every value
Step 4: Sum up a b, sum up a2 and sum up b2
Step 5: Divide the sum of a b by the square root of [(sum of a2) (sum of
b2









Regression
analysis


In statistics, regression analysis is a statistical process for estimating
the relationships among variables.
It includes many techniques for modeling and analyzing several
variables, when the focus is on the relationship between a
dependent variable and one or more independent variables.
More specifically, regression analysis helps one understand how the
typical value of the dependent variable (or 'criterion variable')
changes
when any one of the independent variables is varied, while the other
independent variables are held fixed.
Most commonly, regression analysis estimates the conditional
expectation of the dependent variable given the independent
variables that is, the average value of the dependent variable when
the independent variables are fixed.

You might also like