Attribution Non-Commercial (BY-NC)

142 views

Attribution Non-Commercial (BY-NC)

- Hypothesis Test_ Difference in Means
- History of World Cups
- Biostatistics and Orthodontics
- Caveats for Using Statistical Significance Tests in Research Assessments
- Desert at Ion Final 1
- Statistical Hypothesis
- tukey
- syllabus16
- BA1040 exam 2011
- LO_Unit3_FrameworkForInference.pdf
- Training t Test 16 En
- Kazdin Clinicalrm Im 5e Final-252799
- Hsern Ern's UCS Exam Revision Notes
- Statistics
- Groupings Paper
- Research Method
- Ch01_03
- 3--Test of hypothesis (part_1).pdf
- Useful Things
- (159506968) snyopsis

You are on page 1of 29

assumption may or may not be true.

examine the entire population. Since that is often impractical, researchers typically

examine a random sample from the population. If sample data are not consistent with

the statistical hypothesis, the hypothesis is rejected.

Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis

that sample observations result purely from chance.

hypothesis that sample observations are influenced by some non-random cause.

For example, suppose we wanted to determine whether a coin was fair and

balanced. A null hypothesis might be that half the flips would result in Heads and half, in

Tails. The alternative hypothesis might be that the number of Heads and Tails would be

very different. Symbolically, these hypotheses would be expressed as

H0: P = 0.5

Ha: P ≠ 0.5

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given

this result, we would be inclined to reject the null hypothesis. We would conclude, based

on the evidence, that the coin was probably not fair and balanced.

Hypothesis Tests

based on sample data. This process, called hypothesis testing, consists of four steps.

State the hypotheses. This involves stating the null and alternative hypotheses.

The hypotheses are stated in such a way that they are mutually exclusive. That

is, if one is true, the other must be false.

Formulate an analysis plan. The analysis plan describes how to use sample

data to evaluate the null hypothesis. The evaluation often focuses around a

single test statistic.

Analyze sample data. Find the value of the test statistic (mean score,

proportion, t-score, z-score, etc.) described in the analysis plan.

Interpret results. Apply the decision rule described in the analysis plan. If the

value of the test statistic is unlikely, based on the null hypothesis, reject the null

hypothesis.

Decision Rules

The analysis plan includes decision rules for rejecting the null hypothesis. In

practice, statisticians describe these decision rules in two ways - with reference to a P-

value or with reference to a region of acceptance.

the P-value. Suppose the test statistic is equal to S. The P-value is the

probability of observing a test statistic as extreme as S, assuming the null

hypothesis is true. If the P-value is less than the significance level, we reject the

null hypothesis.

Region of acceptance. The region of acceptance is a range of values. If the test

statistic falls within the region of acceptance, the null hypothesis is not rejected.

The region of acceptance is defined so that the chance of making a Type I error

is equal to the significance level.

The set of values outside the region of acceptance is called the region of

rejection. If the test statistic falls within the region of rejection, the null

hypothesis is rejected. In such cases, we say that the hypothesis has been

rejected at the α level of significance.

These approaches are equivalent. Some statistics texts use the P-value

approach; others use the region of acceptance approach. In subsequent lessons, this

tutorial will present examples that illustrate each approach.

TYPES OF HYPOTHESIS

A test of a statistical hypothesis, where the region of rejection is on only one side

of the sampling distribution, is called a one-tailed test. For example, suppose the null

hypothesis states that the mean is less than or equal to 10. The alternative hypothesis

would be that the mean is greater than 10. The region of rejection would consist of a

range of numbers located on the right side of sampling distribution; that is, a set of

numbers greater than 10.

the sampling distribution, is called a two-tailed test. For example, suppose the null

hypothesis states that the mean is equal to 10. The alternative hypothesis would be that

the mean is less than 10 or greater than 10. The region of rejection would consist of a

range of numbers located on both sides of sampling distribution; that is, the region of

rejection would consist partly of numbers that were less than 10 and partly of numbers

that were greater than 10.

LEVEL OF SIGNIFICANCE

The significance level of a test is the probability that the test statistic will reject

the null hypothesis when the [hypothesis] is true. Significance is a property of the

distribution of a test statistic, not of any particular draw of the statistic. The significance

level is usually denoted by the Greek symbol α (lower case alpha). Popular levels of

significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a

p-value lower than the α-level, the null hypothesis is rejected. Such results are

informally referred to as 'statistically significant'. For example, if someone argues that

"there's only one chance in a thousand this could have happened by coincidence," a

0.001 level of statistical significance is being implied. The lower the significance level,

the stronger the evidence required. Choosing level of significance is an arbitrary task,

but for many applications, a level of 5% is chosen, for no better reason than that it is

conventional.

In some situations it is convenient to express the statistical significance as 1 − α.

In general, when interpreting a stated significance, one must be careful to note what,

precisely, is being tested statistically.

confidence in the determination of significance, but run an increased risk of failing to

reject a false null hypothesis (a Type II error, or "false negative determination"), and so

have less statistical power. The selection of an α-level thus inevitably involves a

compromise between significance and power, and consequently between the error and

the Type II error. More powerful experiments - usually experiments with more subjects

or replications - can obviate this choice to an arbitrary degree.

In some fields, for example nuclear and particle physics, it is common to express

statistical significance in units of "σ" (sigma), the standard deviation of a Gaussian

distribution. A statistical significance of "nσ" can be converted into a value of α via use

of the error function:

For example, if a theory predicts a parameter to have a value of, say, 100, and one

measures the parameter to be 109 ± 3, then one might report the measurement as a

"3σ deviation" from the theoretical prediction. In terms of α, this statement is equivalent

to saying that "assuming the theory is true, the likelihood of obtaining the experimental

result by coincidence is 0.27%" (since 1 − erf (3/√2) = 0.0027).

useful in exploratory data analyses. However, modern statistical advice is that, where

the outcome of a test is essentially the final outcome of an experiment or other study,

the p-value should be quoted explicitly. And, importantly, it should be quoted whether

the p-value is judged to be significant. This is to allow maximum information to be

transferred from a summary of the study into meta-analyses.

The critical value(s) for a hypothesis test is a threshold to which the value of the

test statistic in a sample is compared to determine whether or not the null hypothesis is

rejected.

The critical value for any hypothesis test depends on the significance level at

which the test is carried out, and whether the test is one-sided or two-sided.

follows:

(Note: The methodology below works equally well for both one-tail and two-tail

hypothesis testing.)

1. State the null hypothesis, H0, and the alternative hypothesis, H1.

2. Choose the level of significance, α according to the importance of the risk or

committing Type I errors. Determine the sample size, n, based on the resources

available to collect the data.

3. Determine the test statistic and sampling distribution. When the hypotheses involve

the population mean, μ, the test statistic is z when σ is known and t when σ is not

known. These test statistics follow the normal distribution and the t-distribution

respectively.

4. Determine the critical values that divide the rejection and non-rejection regions.

Note: For ethical reasons, the level of significance and critical values should be

determined prior to conducting the test. The test should be designed so that the

predetermined values do not influence the test results.

5. Collect the data and compute the test statistic.

Draw Conclusions

6. Evaluate the test statistic and determine whether or not to reject the null hypothesis.

Summarize the results and state a managerial conclusion in the context of the problem.

Example:

A phone industry manager thinks that customer monthly cell phone bills have increased

and now average over $52 per month. The company asks you to test this claim. The

population standard deviation, σ, is known to be equal to 10 from historical data.

The Hypotheses

1.H0: μ ≤ 52

H1: μ > 52

Study Design

2. After consulting with the manager and discussing error risk, we choose a level of

significance, α, of 0.10. Our resources allow us to sample 64 sample cell phone bills.

3. Since our hypothesis involves the population mean and we know the population

standard deviation, our test statistic is z and follows the normal distribution.

4. In determining the critical value, we first recognize this test as a one-tail test since the

null hypothesis involves an inequality, ≤. Therefore the rejection region is entirely on the

side of the distribution greater than the historic mean - right tail.

We want to determine a z-value for which the area to the right of that value is 0.10, our

α. We can use the cumulative normal distribution table (which gives areas to the left of

the z-value) and find z having value 0.90 = 1.285. This is our critical value.

The Study

5. We conduct our study and find that the mean of the 64 sample cell phone bills is

53.1. We compute the test statstic, z = (xbar-μ)/(σ/√n) = (53.1-52)/(10/√64) = 0.88.

Conclusions

6. Since 0.88 is less than the critical value of 1.285, we do not reject the null hypothesis.

We report to the company that, based on our testing, there is not evidence that the

mean cell phone bill has increased from $52 per month.

TEST OF SIGNIFICANCE

When a statistic is significant, it simply means that you are very sure that the

statistic is reliable. It doesn't mean the finding is important or that it has any decision-

making utility.

For example, suppose we give 1,000 people an IQ test, and we ask if there is a

significant difference between male and female scores. The mean score for males is 98

and the mean score for females is 100. We use an independent group’s t-test and find

that the difference is significant at the .001 level. The big question is, "So what?” The

difference between 98 and 100 on an IQ test is a very small difference...so small, in

fact, that it’s not even important.

Then why did the t-statistic come out significant? Because there was a large

sample size. When you have a large sample size, very small differences will be

detected as significant. This means that you are very sure that the difference is real

(i.e., it didn't happen by fluke). It doesn't mean that the difference is large or important. If

we had only given the IQ test to 25 people instead of 1,000, the two-point difference

between males and females would not have been significant.

Significance is a statistical term that tells how sure you are that a difference or

relationship exists. To say that a significant difference or relationship exists only tells

half the story. We might be very sure that a relationship exists, but is it a strong,

moderate, or weak relationship? After finding a significant relationship, it is important to

evaluate its strength. Significant relationships can be strong or weak. Significant

differences can be large or small. It just depends on your sample size.

Many researchers use the word "significant" to describe a finding that may have

decision-making utility to a client. From a statistician's viewpoint, this is an incorrect use

of the word. However, the word "significant" has virtually universal meaning to the

public. Thus, many researchers use the word "significant" to describe a difference or

relationship that may be strategically important to a client (regardless of any statistical

tests). In these situations, the word "significant" is used to advise a client to take note of

a particular difference or relationship because it may be relevant to the company's

strategic plan. The word "significant" is not the exclusive domain of statisticians and

either use is correct in the business world. Thus, for the statistician, it may be wise to

adopt a policy of always referring to "statistical significance" rather than simply

"significance" when communicating with the public.

two-tailed test of significance. The answer is that it depends on your hypothesis. When

your research hypothesis states the direction of the difference or relationship, then you

use a one-tailed probability. For example, a one-tailed test would be used to test these

null hypotheses: Females will not score significantly higher than males on an IQ test.

Blue collar workers are will not buy significantly more product than white collar workers.

Superman is not significantly stronger than the average person. In each case, the null

hypothesis (indirectly) predicts the direction of the difference. A two-tailed test would be

used to test these null hypotheses: There will be no significant difference in IQ scores

between males and females. There will be no significant difference in the amount of

product purchased between blue collar and white collar workers. There is no significant

difference in strength between Superman and the average person. The one-tailed

probability is exactly half the value of the two-tailed probability.

There is a raging controversy (for about the last hundred years) on whether or

not it is ever appropriate to use a one-tailed test. The rationale is that if you already

know the direction of the difference, why bother doing any statistical tests. While it is

generally safest to use two-tailed tests, there are situations where a one-tailed test

seems more appropriate. The bottom line is that it is the choice of the researcher

whether to use one-tailed or two-tailed research questions.

have calculated to some critical value for the statistic. It doesn't matter what type of

statistic we are calculating (e.g., a t-statistic, a chi-square statistic, an F-statistic, etc.),

the procedure to test for significance is the same.

1. Decide on the critical alpha level you will use (i.e., the error rate you are willing to

accept).

2. Conduct the research.

3. Calculate the statistic.

4. Compare the statistic to a critical value obtained from a table.

If your statistic is higher than the critical value from the table:

• You reject the null hypothesis.

• The probability is small that the difference or relationship happened by chance,

and p is less than the critical alpha level (p < alpha ).

If your statistic is lower than the critical value from the table:

• You fail to reject the null hypothesis.

• The probability is high that the difference or relationship happened by chance,

and p is greater than the critical alpha level (p > alpha).

Modern computer software can calculate exact probabilities for most test statistics. If

you have an exact probability from computer software, simply compare it to your critical

alpha level. If the exact probability is less than the critical alpha level, your finding is

significant, and if the exact probability is greater than your critical alpha level, your

finding is not significant. Using a table is not necessary when you have the exact

probability for a statistic.

In hypothesis testing, there are two types of errors. The first is type I error and

the second is type II error.

Type I error

In hypothesis testing, type I errors occurs when we are rejecting the null

hypothesis, but that hypothesis was true. In hypothesis testing, type I error is denoted

by alpha. In Hypothesis testing, the normal curve that shows the critical region is called

the alpha region. Even though it is unlikely that the test statistics will fall into the critical

region (red) when the null hypothesis is true, it is still possible. When this occurs, we

reject H0, when indeed it is true, and therefore make an error in doing so.

Type II errors

In hypothesis testing, type II errors occur when we accept the null hypothesis but

it is false. In hypothesis testing, type II errors are denoted by beta. In Hypothesis

testing, the normal curve that shows the acceptance region is called the beta region.

T-TEST

Description

The t-test (or student's t-test) gives an indication of the separateness of two sets

of measurements, and is thus used to check whether two sets of measures are

essentially different (and usually that an experimental effect has been demonstrated).

The typical way of doing this is with the null hypothesis that means of the two sets of

measures are equal.

• Underlying variances are equal (if not, use Welch's test)

It is used when there is random assignment and only two sets of measurement to

compare.

• Matched-pair t-test: When samples appear in pairs (eg. before-and-after).

where measures of a manufactured item are compared against the required standard.

Calculation

The value of t may be calculated using packages such as SPSS. The actual calculation

for two groups is:

t = experimental effect / variability

= difference between group means /

standard error of difference between group means

Interpretation

The resultant t-value is then looked up in a t-table to determine the probability

that a significant difference between the two sets of measures exists and hence what

can be claimed about the efficacy of the experimental treatment.

Effect

The t-value can also be converted to a Pearson r-value to measure effect, which

can be calculated as:

r = SQRT( t2 / (t2 + DF))

where DF is the degrees of freedom.

In a t-test, DF = N1 + N2 - 2.

Reporting

Reporting a t-test might look something like this:

On average, the reported relationship between holidays in the south

(M=24.1, SE=1.5) were significantly preferred than holidays in the north

(M=20.1, SE=1.2), t(22)=2.3, p<.05, r=.44.

In this, 'M' is the mean and 'SE' the standard error of each sample. In 't(X)=Y', X is

the degrees of freedom and Y is the t-metric. 'p' is the probability of a type-1 error and

'r' is the effect.

Discussion

The t-test was described by 1908 by William Sealy Gosset for monitoring the

brewing at Guinness in Dublin. Guinness considered the use of statistics a trade secret,

so he published his test under the pen-name 'Student' -- hence the test is now often

called the 'Student's t-test'.

The t-test is a basic test that is limited to two groups. For multiple groups, you

would have to compare each pair of groups, for example with three groups there would

be three tests (AB, AC, BC), whilst with seven groups there would need to be 21 tests.

The basic principle is to test the null hypothesis that the means of the two groups

are equal.

A significant problem with this is that we typically accept significance with each t-

test of 95% (p=0.05). For multiple tests these accumulate and hence reduce the

validity of the results.

Z-TEST

Description

The Z-test compares sample and population means to determine if there is a

significant difference.

It requires a simple random sample from a population with a Normal distribution

and where where the mean is known.

A statistical test of the null hypothesis that a population parameter μ is equal to a

given value μ 0 . We construct a z-statistic for the null hypothesis, i.e. a statistic which,

under the null hypothesis, has mean zero and approximately a standard normal

distribution. Then we accept the null hypothesis if z is less than z p for a one tailed test

with probability p, where z p is the pth percentile of the standard normal distribution.

Calculation

The z measure is calculated as:

z = (x - ) / SE

where x is the mean sample to be standardized, (mu) is the population

mean and SE is the standard error of the mean.

SE = / SQRT(n)

where is the population standard deviation and n is the sample size.

The z value is then looked up in a z-table. A negative z value means it is below the

population mean (the sign is ignored in the lookup table).

Discussion

The Z-test is typically with standardized tests, checking whether the scores from

a particular sample are within or outside the standard test performance.

The z value indicates the number of standard deviation units of the sample from the

population mean.

CORRELATION

The correlation is one of the most common and most useful statistics. A

correlation is a single number that describes the degree of relationship between two

variables. Let's work through an example to show you how this statistic is computed.

Correlation Example

Let's assume that we want to look at the relationship between two variables,

height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are

effects your self esteem (incidentally, I don't think we have to worry about the direction

of causality here -- it's not likely that self esteem causes your height!). Let's say we

collect some information on twenty individuals (all male -- we know that the average

height differs for males and females so, to keep this example simple we'll just use

males). Height is measured in inches. Self esteem is measured based on the average

of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the

data for the 20 cases (don't take this too seriously -- I made this data up to illustrate

what a correlation is):

1 68 4.1

2 71 4.6

3 62 3.8

4 75 4.4

5 58 3.2

6 60 3.1

7 67 3.8

8 68 4.1

9 71 4.3

10 69 3.7

11 68 3.5

12 67 3.2

13 63 3.7

14 62 3.3

15 60 3.4

16 63 4.0

17 65 4.1

18 67 3.8

19 63 3.4

20 61 3.6

Now, let's take a quick look at the histogram for each variable:

Height 65.4 4.40574 19.4105 1308 58 75 17

Self

3.755 0.426090 0.181553 75.1 3.1 4.6 1.5

Esteem

You should immediately see in the bivariate plot that the relationship between the

variables is a positive one (if you can't see that, review the section on types of

relationships) because if you were to fit a single straight line through the dots it would

have a positive slope or move up from left to right. Since the correlation is nothing more

than a quantitative estimate of the relationship, we would expect a positive correlation.

general, higher scores on one variable tend to be paired with higher scores on the other

and that lower scores on one variable tend to be paired with lower scores on the other.

You should confirm visually that this is generally true in the plot above.

Calculating the Correlation

Now we're ready to compute the correlation value. The formula for the correlation is:

We use the symbol r to stand for the correlation. Through the magic of

mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is

negative, we have a negative relationship; if it's positive, the relationship is positive. You

don't need to know how we came up with this formula unless you want to be a

statistician. But you probably will need to know how the formula relates to real data --

how you can use the formula to compute the correlation. Let's look at the data we need

for the formula. Here's the original data with the other necessary columns:

Self Esteem

Person Height (x) x*y x*x y*y

(y)

1 68 4.1 278.8 4624 16.81

2 71 4.6 326.6 5041 21.16

3 62 3.8 235.6 3844 14.44

4 75 4.4 330 5625 19.36

5 58 3.2 185.6 3364 10.24

6 60 3.1 186 3600 9.61

7 67 3.8 254.6 4489 14.44

8 68 4.1 278.8 4624 16.81

9 71 4.3 305.3 5041 18.49

10 69 3.7 255.3 4761 13.69

11 68 3.5 238 4624 12.25

12 67 3.2 214.4 4489 10.24

13 63 3.7 233.1 3969 13.69

14 62 3.3 204.6 3844 10.89

15 60 3.4 204 3600 11.56

16 63 4 252 3969 16

17 65 4.1 266.5 4225 16.81

18 67 3.8 254.6 4489 14.44

19 63 3.4 214.2 3969 11.56

20 61 3.6 219.6 3721 12.96

Sum = 1308 75.1 4937.6 85912 285.45

The first three columns are the same as in the table above. The next three

columns are simple computations based on the height and self esteem data. The

bottom row consists of the sum of each column. This is all the information we need to

compute the correlation. Here are the values from the bottom row of the table (where N

is 20 people) as they are related to the symbols in the formula:

Now, when we plug these values into the formula given above, we get the

following (I show it here tediously, one step at a time):

So, the correlation for our twenty cases is .73, which is a fairly strong positive

relationship. I guess there is a relationship between height and self esteem, at least in

this made up data!

Once you've computed a correlation, you can determine the probability that the

observed correlation occurred by chance. That is, you can conduct a significance test.

Most often you are interested in determining the probability that the correlation is a real

one and not a chance occurrence. In this case, you are testing the mutually

exclusive hypotheses:

Alternative Hypothesis: r <> 0

The easiest way to test this hypothesis is to find a statistics book that has a table

of critical values of r. Most introductory statistics texts would have a table like this. As in

all hypotheses testing, you need to first determine the significance level. Here, I'll use

the common significance level of alpha = .05. This means that I am conducting a test

where the odds that the correlation is a chance occurrence are no more than 5 out of

100. Before I look up the critical value in a table I also have to compute the degrees of

freedom or df. The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, I

have to decide whether I am doing a one-tailed or two-tailed test. In this example, since

I have no strong prior theory to suggest whether the relationship between height and

self esteem would be positive or negative, I'll opt for the two-tailed test. With these three

pieces of information -- the significance level (alpha = .05)), degrees of freedom (df =

18), and type of test (two-tailed) -- I can now test the significance of the correlation I

found. When I look up this value in the handy little table at the back of my statistics book

I find that the critical value is .4438. This means that if my correlation is greater than .

4438 or less than -.4438 (remember, this is a two-tailed test) I can conclude that the

odds are less than 5 out of 100 that this is a chance occurrence. Since my correlation 0f

.73 is actually quite a bit higher, I conclude that it is not a chance finding and that the

correlation is "statistically significant" (given the parameters of the test). I can reject the

null hypothesis and accept the alternative.

All I've shown you so far is how to compute a correlation between two variables.

In most studies we have considerably more than two variables. Let's say we have a

study with 10 interval-level variables and we want to estimate the relationships among

all of them (i.e., between all possible pairs of variables). In this instance, we have 45

unique correlations to estimate (more later on how I knew that!). We could do the above

computations 45 times to obtain the correlations. Or we could use just about any

statistics program to automatically compute all 45 with a simple click of the mouse.

I used a simple statistics program to generate random data for 10 variables with 20

cases (i.e., persons) for each variable. Then, I told the program to compute the

correlations among these variables. Here's the result:

C1 C2 C3 C4 C5 C6 C7 C8

C9 C10

C1 1.000

C2 0.274 1.000

C3 -0.134 -0.269 1.000

C4 0.201 -0.153 0.075 1.000

C5 -0.129 -0.166 0.278 -0.011 1.000

C6 -0.095 0.280 -0.348 -0.378 -0.009 1.000

C7 0.171 -0.122 0.288 0.086 0.193 0.002 1.000

C8 0.219 0.242 -0.380 -0.227 -0.551 0.324 -0.082 1.000

C9 0.518 0.238 0.002 0.082 -0.015 0.304 0.347 -0.013

1.000

C10 0.299 0.568 0.165 -0.122 -0.106 -0.169 0.243 0.014

0.352 1.000

This type of table is called a correlation matrix. It lists the variable names (C1-

C10) down the first column and across the first row. The diagonal of a correlation matrix

(i.e., the numbers that go from the upper left corner to the lower right) always consists of

ones. That's because these are the correlations between each variable and itself (and a

variable is always perfectly correlated with itself). This statistical program only shows

the lower triangle of the correlation matrix. In every correlation matrix there are two

triangles that are the values below and to the left of the diagonal (lower triangle) and

above and to the right of the diagonal (upper triangle). There is no reason to print both

triangles because the two triangles of a correlation matrix are always mirror images of

each other (the correlation of variable x with variable y is always equal to the correlation

of variable y with variable x). When a matrix has this mirror-image quality above and

below the diagonal we refer to it as asymmetric matrix. A correlation matrix is always a

symmetric matrix.

To locate the correlation for any pair of variables, find the value in the table for

the row and column intersection for those two variables. For instance, to find the

correlation between variables C5 and C2, I look for where row C2 and column C5 is (in

this case it's blank because it falls in the upper triangle area) and where row C5 and

column C2 is and, in the second case, I find that the correlation is -.166.

OK, so how did I know that there are 45 unique correlations when we have 10

variables? There's a handy simple little formula that tells how many pairs (e.g.,

correlations) there are for any number of variables:

where N is the number of variables. In the example, I had 10 variables, so I know I have

(10 * 9)/2 = 90/2 = 45 pairs.

Other Correlations

The specific type of correlation I've illustrated here is known as the Pearson

Product Moment Correlation. It is appropriate when both variables are measured at

an interval level. However there are a wide variety of other types of correlations for

other circumstances. for instance, if you have two ordinal variables, you could use the

Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau).

When one measure is a continuous interval level one and the other is dichotomous (i.e.,

two-category) you can use the Point-Biserial Correlation.

Regression Analysis

and one independent variable. After performing an analysis, the regression statistics

can be used to predict the dependent variable when the independent variable is known.

Regression goes beyond correlation by adding prediction capabilities.

man is thought to be financially successful. A mother knows that more sugar in her

children's diet results in higher energy levels. The ease of waking up in the morning

often depends on how late you went to bed the night before. Quantitative regression

adds precision by developing a mathematical formula that can be used for predictive

purposes.

For example, a medical researcher might want to use body weight (independent

variable) to predict the most appropriate dose for a new drug (dependent variable). The

purpose of running the regression is to find a formula that fits the relationship between

the two variables. Then you can use that formula to predict values for the dependent

variable when only the independent variable is known. A doctor could prescribe the

proper dose based on a person's body weight.

The regression line (known as the least squares line) is a plot of the expected

value of the dependent variable for all values of the independent variable. Technically, it

is the line that "minimizes the squared residuals". The regression line is the one that

best fits the data on a scatterplot.

Using the regression equation, the dependent variable may be predicted from the

independent variable. The slope of the regression line (b) is defined as the rise divided

by the run. The y intercept (a) is the point on the y axis where the regression line would

intercept the y axis. The slope and y intercept are incorporated into the regression

equation. The intercept is usually called the constant, and the slope is referred to as the

coefficient. Since the regression model is usually not a perfect predictor, there is also an

error term in the equation.

the independent variable. Here are three equivalent ways to mathematically describe a

linear regression model.

y = a + bx + e

The significance of the slope of the regression line is determined from the t-

statistic. It is the probability that the observed correlation coefficient occurred by chance

if the true correlation is zero. Some researchers prefer to report the F-ratio instead of

the t-statistic. The F-ratio is equal to the t-statistic squared.

The t-statistic for the significance of the slope is essentially a test to determine if

the regression model (equation) is usable. If the slope is significantly different than zero,

then we can use the regression model to predict the dependent variable for any value of

the independent variable.

On the other hand, take an example where the slope is zero. It has no prediction

ability because for every value of the independent variable, the prediction for the

dependent variable would be the same. Knowing the value of the independent variable

would not improve our ability to predict the dependent variable. Thus, if the slope is not

significantly different than zero, don't use the model to make predictions.

coefficient. Its value may vary from zero to one. It has the advantage over the

correlation coefficient in that it may be interpreted directly as the proportion of variance

in the dependent variable that can be accounted for by the regression equation. For

example, an r-squared value of .49 means that 49% of the variance in the dependent

variable can be explained by the regression equation. The other 51% is unexplained.

The standard error of the estimate for regression measures the amount of

variability in the points around the regression line. It is the standard deviation of the data

points as they are distributed around the regression line. The standard error of the

estimate can be used to develop confidence intervals around a prediction.

Example

advertising expenditures and its sales volume. The independent variable is advertising

budget and the dependent variable is sales volume. A lag time of one month will be

used because sales are expected to lag behind actual advertising expenditures. Data

was collected for a six month period. All figures are in thousands of dollars. Is there a

significant relationship between advertising budget and sales volume?

4.2 27.1

6.1 30.4

3.9 25.0

5.7 29.7

7.3 40.1

5.9 28.8

--------------------------------------------------

Standard error of the estimate = 2.568

t-test for the significance of the slope = 4.095

Degrees of freedom = 4

Two-tailed probability = .0149

r-squared = .807

You might make a statement in a report like this: A simple linear regression was

performed on six months of data to determine if there was a significant relationship

between advertising expenditures and sales volume. The t-statistic for the slope was

significant at the .05 critical alpha level, t(4)=4.10, p=.015. Thus, we reject the null

hypothesis and conclude that there was a positive significant relationship between

advertising expenditures and sales volume. Furthermore, 80.7% of the variability in

sales volume could be explained by advertising expenditures.

ANOVA

In statistics, analysis of variance (ANOVA) is a collection of statistical models,

and their associated procedures, in which the observed variance in a particular variable

is partitioned into components due to different sources of variation. In its simplest form

ANOVA provides a statistical test of whether or not the means of several groups are all

equal, and therefore generalizes Student's two-sample t-test to more than two groups.

ANOVAs are helpful because they possess an advantage over a two-sample t-test.

Doing multiple two-sample t-tests would result in an increased chance of committing a

type I error. For this reason, ANOVAs are useful in comparing three or more means.

1. Fixed-effects models assume that the data came from normal populations which

may differ only in their means. (Model 1)

2. Random effects models assume that the data describe a hierarchy of different

populations whose differences are constrained by the hierarchy. (Model 2)

3. Mixed-effect models describe the situations where both fixed and random effects

are present. (Model 3)

In practice, there are several types of ANOVA depending on the number of treatments

and the ways they are applied to the subjects in the experiment are:

• One-way ANOVA is used to test for differences among two or more independent

groups. Typically, however, the one-way ANOVA is used to test for differences

among at least three groups, since the two-group case can be covered by a t-test

(Gosset, 1908). When there are only two means to compare, the t-test and the

ANOVA F-test are equivalent; the relation between ANOVA and t is given by

F = t2.

• Factorial ANOVA is used when the experimenter wants to study the effects of

two or more treatment variables. The most commonly used type of factorial

ANOVA is the 22 (read "two by two") design, where there are two independent

variables and each variable has two levels or distinct values. However, such use

of ANOVA for analysis of 2kfactorial designs and fractional factorial designs is

"confusing and makes little sense"; instead it is suggested to refer the value of

the effect divided by its standard error to a t-table. Factorial ANOVA can also be

multi-level such as 33, etc. or higher order such as 2×2×2, etc. Since the

introduction of data analytic software, the utilization of higher order designs and

analyses has become quite common.

• Repeated measures ANOVA is used when the same subjects are used for each

treatment (e.g., in a longitudinal study). Note that such within-subjects designs

can be subject to carry-over effects.

• Mixed-design ANOVA. When one wishes to test two or more independent

groups subjecting the subjects to repeated measures, one may perform a

factorial mixed-design ANOVA, in which one factor is a between-subjects

variable and the other is within-subjects variable. This is a type of mixed-effect

model.

• Multivariate analysis of variance (MANOVA) is used when there is more than

one dependent variable.

goodness of fit and tests of independence. A test of goodness of fit establishes whether

or not an observed frequency distribution differs from a theoretical distribution. A test of

independence assesses whether paired observations on two variables, expressed in a

contingency table, are independent of each other – for example, whether people from

different regions differ in the frequency with which they report that they support a

political candidate.

The first step in the chi-square test is to calculate the chi-square statistic. In order

to avoid ambiguity, the value of the test-statistic is denoted by Χ2 rather than χ2 (i.e.

uppercase chi instead of lowercase); this also serves as a reminder that the distribution

of the test statistic is not exactly that of a chi-square random variable. However some

authors do use the χ2 notation for the test statistic. An exact test which does not rely on

using the approximate χ2 distribution is Fisher's exact test: this is significantly more

accurate in evaluating the significance level of the test, especially with small numbers of

observation.

observed and theoretical frequency for each possible outcome, squaring them, dividing

each by the theoretical frequency, and taking the sum of the results. A second important

part of determining the test statistic is to define the degrees of freedom of the test: this

is essentially the number of observed frequencies adjusted for the effect of using some

of those observations to define the "theoretical frequencies".

Nonparametric Statistics

General Purpose:

first used by Wolfowitz, 1942) first requires a basic understanding of parametric

statistics. Elementary Concepts introduces the concept of statistical significance

testing based on the sampling distribution of a particular statistic (you may want to

review that topic before reading on). In short, if we have a basic knowledge of the

underlying distribution of a variable, then we can make predictions about how, in

repeated samples of equal size, this particular statistic will "behave," that is, how it is

distributed. For example, if we draw 100 random samples of 100 adults each from the

general population, and compute the mean height in each sample, then the distribution

of the standardized means across samples will likely approximate the normal

distribution (to be precise, Student's t distribution with 99 degrees of freedom; see

below). Now imagine that we take an additional sample in a particular city ("Tallburg")

where we suspect that people are taller than the average population. If the mean height

in that sample falls outside the upper 95% tail area of the t distribution then we conclude

that, indeed, the people of Tallburg are taller than the average population.

equal size, the standardized means (for height) will be distributed following the t

distribution (with a particular mean and variance). However, this will only be true if in the

population the variable of interest (height in our example) is normally distributed, that is,

if the distribution of people of particular heights follows the normal distribution (the bell-

shape distribution).

For many variables of interest, we simply do not know for sure that this is the

case. For example, is income distributed normally in the population? -- probably not.

The incidence rates of rare diseases are not normally distributed in the population, the

number of car accidents is also not normally distributed, and neither are very many

other variables in which a researcher might be interested.

For more information on the normal distribution, see Elementary Concepts; for

information on tests of normality, see Normality tests.

Sample size

Another factor that often limits the applicability of tests based on the assumption

that the sampling distribution is normal is the size of the sample of data available for the

analysis (sample size; n). We can assume that the sampling distribution is normal even

if we are not sure that the distribution of the variable in the population is normal, as long

as our sample is large enough (e.g., 100 or more observations). However, if our sample

is very small, then those tests can be used only if we are sure that the variable is

normally distributed, and there is no way to test this assumption if the sample is small.

Problems in measurement

Applications of tests that are based on the normality assumptions are further

limited by a lack of precise measurement. For example, let us consider a study where

grade point average (GPA) is measured as the major variable of interest. Is an A

average twice as good as a C average? Is the difference between a B and an A

average comparable to the difference between a D and a C average? Somehow, the

GPA is a crude measure of scholastic accomplishments that only allows us to establish

a rank ordering of students from "good" students to "poor" students. This general

measurement issue is usually discussed in statistics textbooks in terms of types of

measurement or scale of measurement. Without going into too much detail, most

common statistical techniques such as analysis of variance (and t- tests), regression,

etc., assume that the underlying measurements are at least of interval, meaning that

equally spaced intervals on the scale can be compared in a meaningful manner (e.g, B

minus A is equal to D minus C). However, as in our example, this assumption is very

often not tenable, and the data rather represent a rank ordering of observations

(ordinal) rather than precise measurements.

Hopefully, after this somewhat lengthy introduction, the need is evident for

statistical procedures that enable us to process data of "low quality," from small

samples, on variables about which nothing is known (concerning their distribution).

Specifically, nonparametric methods were developed to be used in cases when the

researcher knows nothing about the parameters of the variable of interest in the

population (hence the name nonparametric). In more technical terms, nonparametric

methods do not rely on the estimation of parameters (such as the mean or the standard

deviation) describing the distribution of the variable of interest in the population.

Therefore, these methods are also sometimes (and more appropriately) called

parameter-free methods or distribution-free methods.

Basically, there is at least one nonparametric equivalent for each parametric general

type of test. In general, these tests fall into the following categories:

Tests of differences between variables (dependent samples);

Tests of relationships between variables.

Usually, when we have two samples that we want to compare concerning their

mean value for some variable of interest, we would use the t-test for independent

samples); nonparametric alternatives for this test are the Wald-Wolfowitz runs test, the

Mann-Whitney U test, and the Kolmogorov-Smirnov two-sample test. If we have multiple

groups, we would use analysis of variance (see ANOVA/MANOVA; the nonparametric

equivalents to this method are the Kruskal-Wallis analysis of ranks and the Median test.

customarily use the t-test for dependent samples (in Basic Statistics for example, if

we wanted to compare students' math skills at the beginning of the semester with their

skills at the end of the semester). Nonparametric alternatives to this test are the Sign

test and Wilcoxon's matched pairs test. If the variables of interest are dichotomous in

nature (i.e., "pass" vs. "no pass") then McNemar's Chi-square test is appropriate. If

there are more than two variables that were measured in the same sample, then we

would customarily use repeated measures ANOVA. Nonparametric alternatives to this

method are Friedman's two-way analysis of variance and Cochran Q test (if the variable

was measured in terms of categories, e.g., "passed" vs. "failed"). Cochran Q is

particularly useful for measuring changes in frequencies (proportions) across time.

correlation coefficient. Nonparametric equivalents to the standard correlation coefficient

are Spearman R, Kendall Tau, and coefficient Gamma (see Nonparametric

correlations). If the two variables of interest are categorical in nature (e.g., "passed" vs.

"failed" by "male" vs. "female") appropriate nonparametric statistics for testing the

relationship between the two variables are the Chi-square test, the Phi coefficient, and

the Fisher exact test. In addition, a simultaneous test for relationships between multiple

cases is available: Kendall coefficient of concordance. This test is often used for

expressing inter-rater agreement among independent judges who are rating (ranking)

the same stimuli.

Descriptive statistics

When one's data are not normally distributed, and the measurements at best

contain rank order information, then computing the standard descriptive statistics (e.g.,

mean, standard deviation) is sometimes not the most informative way to summarize the

data. For example, in the area of psychometrics it is well known that the rated intensity

of a stimulus (e.g., perceived brightness of a light) is often a logarithmic function of the

actual intensity of the stimulus (brightness as measured in objective units of Lux). In this

example, the simple mean rating (sum of ratings divided by the number of stimuli) is not

an adequate summary of the average actual intensity of the stimuli. (In this example,

one would probably rather compute the geometric mean.) Nonparametrics and

Distributions will compute a wide variety of measures of location (mean, median,

mode, etc.) and dispersion (variance, average deviation, quartile range, etc.) to provide

the "complete picture" of one's data.

procedures. Each nonparametric procedure has its peculiar sensitivities and blind spots.

For example, the Kolmogorov-Smirnov two-sample test is not only sensitive to

differences in the location of distributions (for example, differences in means) but is also

greatly affected by differences in their shapes. The Wilcoxon matched pairs test

assumes that one can rank order the magnitude of differences in matched observations

in a meaningful manner. If this is not the case, one should rather use the Sign test. In

general, if the result of a study is important (e.g., does a very expensive and painful

drug therapy help people get better?), then it is always advisable to run different

nonparametric tests; should discrepancies in the results occur contingent upon which

test is used, one should try to understand why some tests give different results. On the

other hand, nonparametric statistics are less statistically powerful (sensitive) than their

parametric counterparts, and if it is important to detect even small effects (e.g., is this

food additive harmful to people?) one should be very careful in the choice of a test

statistic.

Nonparametric methods are most appropriate when the sample sizes are small.

When the data set is large (e.g., n > 100) it often makes little sense to use

nonparametric statistics at all. Elementary Concepts briefly discusses the idea of the

central limit theorem. In a nutshell, when the samples become very large, then the

sample means will follow the normal distribution even if the respective variable is not

normally distributed in the population, or is not measured very well. Thus, parametric

methods, which are usually much more sensitive (i.e., have more statistical power) are

in most cases appropriate for large samples. However, the tests of significance of many

of the nonparametric statistics described here are based on asymptotic (large sample)

theory; therefore, meaningful tests can often not be performed if the sample sizes

become too small. Please refer to the descriptions of the specific tests to learn more

about their power and efficiency

Nonparametric Correlations

coefficients (Spearman R, Kendall Tau, and Gamma coefficients). Note that the chi-

square statistic computed for two-way frequency tables, also provides a careful

measure of a relation between the two (tabulated) variables, and unlike the correlation

measures listed below, it can be used for variables that are measured on a simple

nominal scale.

Spearman R

Spearman R (Siegel & Castellan, 1988) assumes that the variables under

consideration were measured on at least an ordinal (rank order) scale, that is, that the

individual observations can be ranked into two ordered series. Spearman R can be

thought of as the regular Pearson product moment correlation coefficient, that is, in

terms of proportion of variability accounted for, except that Spearman R is computed

from ranks.

Kendall tau

assumptions. It is also comparable in terms of its statistical power. However, Spearman

R and Kendall tau are usually not identical in magnitude because their underlying logic

as well as their computational formulas are very different. Siegel and Castellan (1988)

express the relationship of the two measures in terms of the inequality: More

importantly, Kendall tau and Spearman R imply different interpretations: Spearman R

can be thought of as the regular Pearson product moment correlation coefficient, that is,

in terms of proportion of variability accounted for, except that Spearman R is computed

from ranks. Kendall tau, on the other hand, represents a probability, that is, it is the

difference between the probability that in the observed data the two variables are in the

same order versus the probability that the two variables are in different orders.

Gamma

Kendall tau when the data contain many tied observations. In terms of the underlying

assumptions, Gamma is equivalent to Spearman R or Kendall tau; in terms of its

interpretation and computation it is more similar to Kendall tau than Spearman R. In

short, Gamma is also a probability; specifically, it is computed as the difference between

the probability that the rank ordering of the two variables agree minus the probability

that they disagree, divided by 1 minus the probability of ties. Thus, Gamma is basically

equivalent to Kendall tau, except that ties are explicitly taken into account.

Parametric tests

Conventional statistical procedures are also called parametric tests. In a parametric

test a sample statistic is obtained to estimate the population parameter. Because this

estimation process involves a sample, a sampling distribution, and a population, certain

parametric assumptions are required to ensure all components are compatible with

each other. For example, in Analysis of Variance (ANOVA) there are three assumptions:

• Observations are independent.

• The sample data have a normal distribution.

• Scores in different groups have homogeneous variances.

In a repeated measure design, it is assumed that the data structure conforms to the

compound symmetry. A regression model assumes the absence of collinearity, the

absence of auto correlation, random residuals, linearity...etc. In structural equation

modeling, the data should be multivariate normal.

comparing means in terms of variance with reference to a normal distribution. The

inventor of ANOVA, Sir R. A. Fisher (1935) clearly explained the relationship among the

mean, the variance, and the normal distribution: "The normal distribution has only two

characteristics, its mean and its variance. The mean determines the bias of our

estimate, and the variance determines its precision." (p.42) It is generally known that the

estimation is more precise as the variance becomes smaller and smaller.

Parametric statistics is a branch of statistics that assumes data has come from a type

of probability distribution and makes inferences about the parameters of the distribution.

[1]

Most well-known elementary statistical methods are parametric.[2]

parametric methods.[3] If those extra assumptions are correct, parametric methods can

produce more accurate and precise estimates. They are said to have more statistical

power. However, if those assumptions are incorrect, parametric methods can be very

misleading. For that reason they are often not considered robust. On the other hand,

parametric formulae are often simpler to write down and faster to compute. In some, but

definitely not all cases, their simplicity makes up for their non-robustness, especially if

care is taken to examine diagnostic statistics.[4]

Because parametric statistics require a probability distribution, they are not distribution-

free.[5]

Example

Suppose we have a sample of 99 test scores with a mean of 100 and a standard

deviation of 10. If we assume all 99 test scores are random samples from a normal

distribution we predict there is a 1% chance that the 100 th test score will be higher than

123.65 (that is the mean plus 2.365 standard deviations) assuming that the 100 th test

score comes from the same distribution as the others. The normal family of distributions

all have the same shape and are parameterized by mean and standard deviation. That

means if you know the mean and standard deviation, and that the distribution is normal,

you know the probability of any future observation. Parametric statistical methods are

used to compute the 2.365 value above, given 99 independent observations from the

same normal distribution.

scores. We don't need to assume anything about the distribution of test scores to

reason that before we gave the test it was equally likely that the highest score would be

any of the first 100. Thus there is a 1% chance that the 100 th is higher than any of the

99 that preceded it.

There are two types of test data and consequently different types of analysis. As

the table below shows, parametric data has an underlying normal distribution which

allows for more conclusions to be drawn as the shape can be mathematically

described. Anything else is non-parametric.

Parametric Non-parametric

Assumed distribution Normal Any

Assumed variance Homogeneous Any

Typical data Ratio or Interval Ordinal or Nominal

Data set relationships Independent Any

Usual central measure Mean Median

Benefits Can draw more Simplicity; Less affected

conclusions by outliers

Tests

Choosing Choosing parametric test Choosing a non-

parametric test

Correlation test Pearson Spearman

Independent measures, 2 Independent-measures t- Mann-Whitney test

groups test

Independent measures, One-way, independent- Kruskal-Wallis test

>2 groups measures ANOVA

Repeated measures, 2 Matched-pair t-test Wilcoxon test

conditions

Repeated measures, >2 One-way, repeated Friedman's test

conditions measures ANOVA

- Hypothesis Test_ Difference in MeansUploaded byr01852009pa
- History of World CupsUploaded byDanni Moe
- Biostatistics and OrthodonticsUploaded byvelangni
- Caveats for Using Statistical Significance Tests in Research AssessmentsUploaded byscientometrics
- Desert at Ion Final 1Uploaded byAmit Kumar
- Statistical HypothesisUploaded byJhoanie Marie Cauan
- tukeyUploaded byNatKTh
- syllabus16Uploaded byChingo Smingo
- BA1040 exam 2011Uploaded byS.L.L.C
- LO_Unit3_FrameworkForInference.pdfUploaded byCletus Sikwanda
- Training t Test 16 EnUploaded byRaúl Moreno Gómez
- Kazdin Clinicalrm Im 5e Final-252799Uploaded byMohamed Zakarya
- Hsern Ern's UCS Exam Revision NotesUploaded byNanthida Ang
- StatisticsUploaded byFaris Al-Farik
- Groupings PaperUploaded byFernanda Daza Goytia
- Research MethodUploaded byAgbarakwe Ikechukwu
- Ch01_03Uploaded byTitian Hasanah
- 3--Test of hypothesis (part_1).pdfUploaded byhijab
- Useful ThingsUploaded byAbhilasha Ravichander
- (159506968) snyopsisUploaded byRishabh Shukla
- 14615cprpatients-160908013119Uploaded byJasMisionMXPachuca
- Hong2014 Miccai Depth Based Shape Analysis 0Uploaded bymarcniethammer
- APPLYING EDUCATIONAL RESEARCH Report.docxUploaded byLuis Fernando Santos Meneses
- 44299768-MB0050Uploaded byShazia Syed
- term project part 1-2-3Uploaded byapi-260904973
- Research logyUploaded byvenkey198796
- SOCY-321 HomeworkUploaded byDaniel Oshiro
- 2015CEP2096_LAB 3 TRAFFICMOVEMENT Headway.pdfUploaded byMohit Kohli
- International Arbitrage Pricing TheoryUploaded bySaeed Jafari
- resutados2019_1.docxUploaded byVladimiro Ibañez Quispe

- Final ExamUploaded byVatsal Patel
- Econometrics PaperUploaded byPam Ramos
- SVAR Notes: Learn in PersonUploaded byEconometrics Freelancer
- Descriptive Statistics and Graphs: Statistics for psychologyUploaded byiamquasi
- ps3Uploaded byThinh
- Edexcel S1 Revision SheetsUploaded byBooks
- Practice Exam 28 February 2015 Questions and AnswersUploaded byMohammed Omrun
- Partial Least Squares Regression a TutorialUploaded byazhang576
- Quiz 1Uploaded byVineet Khandelwal
- CT3_QP_0509Uploaded byNitai Chandra Ganguly
- 10. Parameter EstimationUploaded byNurgazy Nazhimidinov
- math-1040-term-project-powerpointUploaded byapi-261680510
- 9ABS401, 9ABS304 Probability and StatisticsUploaded bysivabharathamurthy
- Chapter 04Uploaded byJose Q. Hdz
- MS C7Uploaded bySanggari Krish
- 92725950-rohatgi-expl.pdfUploaded byRicardo
- Chap11 Chi SquareUploaded byRandy Vd
- Data Analysis Using Stata Third EditionUploaded byMsfe Gerardo Trujeque
- 20150319_Session01_DataScienceUploaded byamensto
- Business Analysis Using Regression - A CasebookUploaded byMichael Wen
- STA80006_Weeks7-12.pdfUploaded byseggy7
- Econometricscoursoutline MBAMS.docUploaded bysultanatkhan
- Simple Regression With SPSSUploaded byyazid_mrsmbp2000
- unit 7 test - carias 1Uploaded byapi-293308991
- Finance - Financial Models in Excel - VolatilityUploaded byShweta Srivastava
- 5632.pdfUploaded byAbhishek S S
- 2010LSIexcerptUploaded bypanjc1019
- Assignment 1Uploaded bySharukh Khan
- skenario 1,2,3,4Uploaded bysepta afifin
- HRD ConclusionUploaded byhaidersarwar