You are on page 1of 61

Statistical Tests

1.Statistical Hypothesis Testing

2.Relationships

 2.1Linear Relationship
 2.2Non-Linear Relationship

3.Correlations

 3.1Pearson Product-Moment Correlation


 3.2Spearman rho
 3.3Partial Correlation
 3.4Correlation and Causation

4.Regression

 4.1Linear Regression
 4.2Multiple Regression
 4.3Correlation and Regression

5.Student’s T-Test

 5.1Independent One-Sample T-Test


 5.2Independent Two-Sample T-Test
 5.3Dependent T-Test for Paired Samples
 5.4Another article on Student’s T-Test

6.ANOVA

 6.1One-Way ANOVA
 6.2Two-Way ANOVA
 6.3Factorial ANOVA
 6.4Repeated Measures ANOVA

7.Nonparametric Statistics

 7.1Cohen’s Kappa
 7.2Mann-Whitney U-Test
 7.3Wilcoxon Signed Rank Test

1
8.Other Ways to Analyze Data

 8.1Chi Square Test


 8.2Z-Test
 8.3F-Test
 8.4Factor Analysis
 8.5ROC Curve Analysis
 8.6Meta Analysis

1. Statistical Hypothesis Testing

Statistical hypothesis testing is used to determine whether an experiment conducted provides


enough evidence to reject a proposition.

It is also used to remove the chance process in an experiment and establish its validity and
relationship with the event under consideration.

For example, suppose you want to study the effect of smoking on the occurrence of lung
cancer cases. If you take a small group, it may happen that there appears no correlation at all,
and you find that there are many smokers with healthy lungs and many non-smokers with
lung cancer.

However, it can just happen that this is by chance, and in the overall population this isn't true.
In order to remove this element of chance and increase the reliability of our hypothesis, we
use statistical hypothesis testing.

In this, you will first assume a hypothesis that smoking and lung cancer are unrelated. This is
called the 'null hypothesis', which is central to any statistical hypothesis testing.

You should therefore first choose a distribution for the experimental group. Normal
distribution is one of the most common distributions encountered in nature, but it can be
different in different special cases.

The Critical Value

There should then be limits set on the critical value, beyond which you can assume that the
experiment proves that the null hypothesis is false and therefore using statistical hypothesis
testing, the experiment shows there is enough evidence to reject the null hypothesis. This is
generally set at 5% or 1% chance probability.

This means that if the experiment suggests that the probability of a chance event in the
experiment is less than this critical value, then the null hypothesis can be rejected.

If the null hypothesis is rejected, then we need to look for an alternative hypothesis that is in
line with the experimental observations.

2
There is also the gray area in between, like at the 15-20% level, in which it is hard to say
whether the null hypothesis can be rejected. In such cases, we can say that there is reason
enough to doubt the validity of the null hypothesis but there isn't enough evidence to suggest
that we reject the null hypothesis altogether.

A result in the gray area often leads to more exploration before concluding anything.

Accepting a Hypothesis

The other thing with statistical hypothesis testing is that there can only be an experiment
performed that doubts the validity of the null hypothesis, but there can be no experiment that
can somehow demonstrate that the null hypothesis is actually valid. This because of the
falsifiability-principle in the scientific method.

Therefore it is a tricky situation for someone who wants to show the independence of the two
events, like smoking and lung cancer in our previous example.

This problem can be overcome using a confidence interval and then arguing that the
experimental data reveals that the first event has a negligible (as much as the confidence
interval) effect, if at all, on the second event.

In the figure below, we can see that one can argue the independence is within 0.05 times the
standard deviation.

3
2. Relationship between Variables

It is very important to understand relationship between variables to draw the right conclusion
from a statistical analysis. The relationship between variables determines how the right
conclusions are reached. Without an understanding of this, you can fall into many pitfalls that
accompany statistical analysis and infer wrong results from your data.

There are several different kinds of relationships between variables. Before drawing a
conclusion, you should first understand how one variable changes with the other. This means
you need to establish how the variables are related - is the relationship linear or quadratic or
inverse or logarithmic or something else?

Suppose you measure a volume of a gas in a cylinder and measure its pressure. Now you start
compressing the gas by pushing a piston all while maintaining the gas at the room
temperature. The volume of gas decreases while the pressure increases. You note down
different values on a graph paper.

If you take enough measurements, you can see a shape of a parabola defined by xy=constant.
This is because gases follow Boyle's law that says when temperature is constant, PV =
constant. Here, by taking data you are relating the pressure of the gas with its volume.
Similarly, many relationships are linear in nature.

Relationships in Physical and Social Sciences

Relationships between variables need to be studied and analyzed before drawing conclusions
based on it. In natural science and engineering, this is usually more straightforward as you
can keep all parameters except one constant and study how this one parameter affects the
result under study.

However, in social sciences, things get much more complicated because parameters may or
may not be directly related. There could be a number of indirect consequences and deducing
cause and effect can be challenging.

Only when the change in one variable actually causes the change in another parameter is
there a causal relationship. Otherwise, it is simply a correlation. Correlation doesn't imply
causation. There are ample examples and various types of fallacies in use.

A famous example to prove the point: Increased ice-cream sales shows a strong correlation to
deaths by drowning. It would obviously be wrong to conclude that consuming ice-creams
causes drowning. The explanation is that more ice-cream gets sold in the summer, when more
people go to the beach and other water bodies and therefore increased deaths by drowning.

4
Positive and Negative Correlation

Correlation between variables can be positive or negative. Positive correlation implies an


increase of one quantity causes an increase in the other whereas in negative correlation, an
increase in one variable will cause a decrease in the other.

It is important to understand the relationship between variables to draw the right conclusions.
Even the best scientists can get this wrong and there are several instances of how studies get
correlation and causation mixed up.

2.1 Linear Relationship

A linear relationship is one where increasing or decreasing one variable n times will cause a
corresponding increase or decrease of n times in the other variable too. In simpler words, if
you double one variable, the other will double as well.

Some Examples of Linear Relationships

First, let us understand linear relationships. These relationships between variables are such
that when one quantity doubles, the other doubles too.

For example:

 For a given material, if the volume of the material is doubled, its weight will also
double. This is a linear relationship. If the volume is increased 10 times, the weight
will also increase by the same factor.
 If you take the perimeter of a square and its side, they are linearly related. If you take
a square that has sides twice as large, the perimeter will also become twice larger.
 The cost of objects is usually linear. If a notebook costs $1, then ten notebooks will
cost $10.
 The force of gravity between the earth and an object is linear in nature. If the mass of
the object doubles, the force of gravity acting on it will also be double.

As can be seen from the above examples, a number of very important physical phenomena
can be described by a linear relationship.

Apart from these physical processes, there are many correlations between variables that can
be approximated by a linear relationship. This greatly simplifies a problem at hand because a
linear relationship is much simpler to study and analyze than a non-linear one.

5
Constant of Proportionality

The constant of proportionality is an important concept that emerges from a linear


relationship. By using this constant, we can formulate the actual formula that describes one
variable in terms of the other.

For example, in our first example, the constant of proportionality between mass and volume
is called density. Thus we can mathematically write:

Mass = density x volume

The constant of proportionality, the density, is defined from the above equation - it is the
mass per unit volume of the material.

If you plot these variables on a graph paper, the slope of the straight line is the constant of
proportionality.

In this example, if you plot mass on the y-axis and volume on the x-axis, you will find that
the slope of the line thus formed gives the density.

Linear relationships are not limited to physical phenomena but are frequently encountered in
all kinds of scientific research and methodologies. An understanding of linear relationships is
essential to understand these relationships between variables.

2.2 Non-Linear Relationship

Non-linear relationship is fundamental to most physical and statistical phenomena and their
study is important to fully understand the world around.

Linear relationships are the easiest to understand and study and a number of very important
physical phenomena are linear. However, it doesn't cover the whole ambit of our
mathematical techniques and non-linear relationships are fundamental to a number of most
important and intriguing physical and social phenomena around.

Examples of Non-Linear Relationships

As their name suggest, non-linear relationships are not linear, which means by doubling one
variable, the other variable will not double.

There are an endless variety of non-linear relationships that one can encounter. However,
most of them can still fit into other categories, like polynomial, logarithmic, etc.

6
Examples:

 The side of a square and its area are not linear. In fact, this is a quadratic relationship.
If you double the side of a square, its area will increase 4 times.
 While charging a capacitor, the amount of charge and time are non-linearly
dependent. Thus the capacitor is not twice as charged after 2 seconds as it was after 1
second. This is an exponential relationship.

Studying Non-Linear Relationships

Even though non-linear relationships are much more complicated than linear ones, they can
be studied in their own right. If you are studying these, you should first see if they fit any
standard shapes like parabolas or exponential curves. These are commonly occurring
relationships between variables.

For example, the pressure and volume of nitrogen during an isentropic expansion are related
as PV1.4 which is highly non-linear but fits neatly into this equation.

Next, a number of non-linear relationships are monotonic in nature. This means they do not
oscillate and steadily increase or decrease. This is good to study because they behave
qualitatively like linear relationships for a number of cases.

Approximations

A linear relationship is the simplest to understand and therefore can serve as the first
approximation of a non-linear relationship. The limits of validity need to be well noted. In
fact, a number of phenomena were thought to be linear but later scientists realized that this
was only true as an approximation.

Consider special theory of relativity that redefined our perceptions of space and time. It gives
the full non-linear relationship between variables. They can very well be approximated to be
linear in Newtonian mechanics as a first approximation at lower speeds. If you consider
momentum, in Newtonian mechanics it is linearly dependent on velocity. If you double the
velocity, the momentum will double. However, at speeds approaching those of light, this
becomes a highly non-linear relationship.

Some of the greatest scientific challenges need the study of non-linear relationships. The
study of turbulence, which is one of the greatest unsolved problems in science and
engineering, needs the study of a non-linear differential equation.

7
3. Statistical Correlation

Statistical correlation is a statistical technique which tells us if two variables are related.

For example, consider the variables family income and family expenditure. It is well known
that income and expenditure increase or decrease together. Thus they are related in the sense
that change in any one variable is accompanied by change in the other variable.

Again price and demand of a commodity are related variables; when price increases demand
will tend to decreases and vice versa.

If the change in one variable is accompanied by a change in the other, then the variables are
said to be correlated. We can therefore say that family income and family expenditure, price
and demand are correlated.

Relationship between Variables

Correlation can tell you something about the relationship between variables. It is used to
understand:

1. whether the relationship is positive or negative


2. The strength of relationship.

Correlation is a powerful tool that provides these vital pieces of information.

In the case of family income and family expenditure, it is easy to see that they both rise or fall
together in the same direction. This is called positive correlation.

In case of price and demand, change occurs in the opposite direction so that increase in one is
accompanied by decrease in the other. This is called negative correlation.

Coefficient of Correlation

Statistical correlation is measured by what is called coefficient of correlation (r). Its


numerical value ranges from +1.0 to -1.0. It gives us an indication of the strength of
relationship.

In general, r > 0 indicates positive relationship, r < 0 indicates negative relationship while r =
0 indicates no relationship (or that the variables are independent and not related). Here r =
+1.0 describes a perfect positive correlation and r = -1.0 describes a perfect negative
correlation.

8
Closer the coefficients are to +1.0 and -1.0, greater is the strength of the relationship between
the variables.

As a rule of thumb, the following guidelines on strength of relationship are often useful
(though many experts would somewhat disagree on the choice of boundaries).

Value of r Strength of relationship

-1.0 to -0.5 or 1.0 to 0.5 Strong

-0.5 to -0.3 or 0.3 to 0.5 Moderate

-0.3 to -0.1 or 0.1 to 0.3 Weak

-0.1 to 0.1 None or very weak

Correlation is only appropriate for examining the relationship between meaningful


quantifiable data (e.g. air pressure, temperature) rather than categorical data such as gender,
favorite color etc.

Disadvantages

While 'r' (correlation coefficient) is a powerful tool, it has to be handled with care.

1. The most used correlation coefficients only measure linear relationship. It is therefore
perfectly possible that while there is strong non linear relationship between the
variables, r is close to 0 or even 0. In such a case, a scatter diagram can roughly
indicate the existence or otherwise of a non linear relationship.
2. One has to be careful in interpreting the value of 'r'. For example, one could compute
'r' between the size of shoe and intelligence of individuals, heights and income.
Irrespective of the value of 'r', it makes no sense and is hence termed chance or non-
sense correlation.
3. 'r' should not be used to say anything about cause and effect relationship. Put
differently, by examining the value of 'r', we could conclude that variables X and Y
are related. However the same value of 'r' does not tell us if X influences Y or the
other way round. Statistical correlation should not be the primary tool used to study
causation, because of the problem with third variables.

3.1 Pearson Product-Moment Correlation

Pearson Product-Moment Correlation is one of the measures of correlation which quantifies


the strength as well as direction of such relationship. It is usually denoted by Greek letter ρ.

9
In the study of relationships, two variables are said to be correlated if change in one variable
is accompanied by change in the other - either in the same or reverse direction.

Conditions

This coefficient is used if two conditions are satisfied

1. The variables are in the interval or ratio scale of measurement


2. Alinear relationship between them is suspected

Positive and Negative Correlation

The coefficient (ρ) is computed as the ratio of covariance between the variables to the
product of their standard deviations. This formulation is advantageous.

First, it tells us the direction of relationship. Once the coefficient is computed, ρ > 0 will
indicate positive relationship, ρ < 0 will indicate negative relationship while ρ = 0 indicates
non existence of any relationship.

Second, it ensures (mathematically) that the numerical value of ρ range from -1.0 to +1.0.
This enables us to get an idea of the strength of relationship - or rather the strength of linear
relationship between the variables. Closer the coefficients are to +1.0 or -1.0, greater is the
strength of the linear relationship.

As a rule of thumb, the following guidelines are often useful (though many experts could
somewhat disagree on the choice of boundaries).

Range of Ρ
Value of ρ Strength of relationship

-1.0 to -0.5 or 1.0 to 0.5 Strong

-0.5 to -0.3 or 0.3 to 0.5 Moderate

-0.3 to -0.1 or 0.1 to 0.3 Weak

-0.1 to 0.1 None or very weak

Properties of Ρ

This measure of correlation has interesting properties, some of which are enunciated below:

10
1. It is independent of the units of measurement. It is in fact unit free. For example, ρ
between highest day temperature (in Centigrade) and rainfall per day (in mm) is not
expressed either in terms of centigrade or mm.
2. It is symmetric. This means that ρ between X and Y is exactly the same as ρ between
Y and X.
3. Pearson's correlation coefficient is independent of change in origin and scale. Thus ρ
between temperature (in Centigrade) and rainfall (in mm) would numerically be equal
to ρ between temperature (in Fahrenheit) and rainfall (in cm).
4. If the variables are independent of each other, then one would obtain ρ = 0. However,
the converse is not true. In other words ρ = 0 does not imply that the variables are
independent - it only indicates the non existence of a non-linear relationship.

Caveats and Warnings

While ρ is a powerful tool, it is a much abused one and hence has to be handled carefully.

1. People often tend to forget or gloss over the fact that ρ is a measure of linear
relationship. Consequently a small value of ρ is often interpreted to mean non
existence of relationship when actually it only indicates non existence of a linear
relationship or at best a very weak linear relationship.

Under such circumstances it is possible that a non linear relationship exists.

A scatter diagram can reveal the same and one is well advised to observe the same
before firmly concluding non existence of a relationship. If the scatter diagram points
to a non linear relationship, an appropriate transformation can often attain linearity in
which case ρ can be recomputed.

2. One has to be careful in interpreting the value of ρ.

For example, one could compute ρ between size of a shoe and intelligence of
individuals, heights and income. Irrespective of the value of ρ, such a correlation
makes no sense and is hence termed chance or non-sense correlation.

3. ρ should not be used to say anything about cause and effect relationship. Put
differently, by examining the value of ρ, we could conclude that variables X and Y
are related.

However the same value of ρ does not tell us if X influences Y or the other way round
- a fact that is of grave import in regression analysis.

11
3.2 Spearman Rank Correlation Coefficient

Spearman Rank Correlation Coefficient is a non-parametric measure of correlation, using


ranks to calculate the correlation.

Spearman Rank Correlation Coefficient uses ranks to calculate correlation.

Whenever we are interested to know if two variables are related to each other, we use a
statistical technique known as correlation. If the change in one variable brings about a change
in the other variable, they are said to be correlated.

A well known measure of correlation is the Pearson product moment correlation coefficient
which can be calculated if the data is in interval/ ratio scale.

It is also known as the "spearman rho" or "spearman r correlation".

The Spearman Rank Correlation Coefficient is its analogue when the data is in terms of
ranks. One can therefore also call it correlation coefficient between the ranks. The correlation
coefficient is sometimes denoted by rs.

Example

As an example, let us consider a musical (solo vocal) talent contest where 10 competitors are
evaluated by two judges, A and B. Usually judges award numerical scores for each contestant
after his/her performance.

A product moment correlation coefficient of scores by the two judges hardly makes sense
here as we are not interested in examining the existence or otherwise of a linear relationship
between the scores.

What makes more sense is correlation between ranks of contestants as judged by the two
judges. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's
views as far as talent of the contestants are concerned (though they might award different
numerical scores) - in other words if the judges are unanimous.

Interpretation of Numerical Values

The numerical value of the correlation coefficient, rs, ranges between -1 and +1. The
correlation coefficient is the number indicating the how the scores are relating.

rs= correlation coefficient

In general,

12
 rs> 0 implies positive agreement among ranks
 rs< 0 implies negative agreement (or agreement in the reverse direction)
 rs = 0 implies no agreement

Closer rs is to 1, better is the agreement while rs closer to -1 indicates strong agreement in the
reverse direction.

Assigning Ranks

In order to compute Spearman Rank Correlation Coefficient, it is necessary that the data be
ranked. There are a few issues here.

Suppose that scores of the judges (out of 10 were as follows):

Contestant No. 1 2 3 4 5 6 7 8 9 10

Score by Judge A 5 9 3 8 6 7 4 8 4 6

Score by Judge B 7 8 6 7 8 5 10 6 5 8

Ranks are assigned separately for the two judges either starting from the highest or from the
lowest score. Here, the highest score given by Judge A is 9.

If we begin from the highest score, we assign rank 1 to contestant 2 corresponding to the
score of 9.

The second highest score is 8 but two competitors have been awarded the score of 8. In this
case both the competitors are assigned a common rank which is the arithmetic mean of ranks
2 and 3. In this way, scores of Judge A can be converted into ranks.

Similarly, ranks are assigned to the scores awarded by Judge B and then difference between
ranks for each contestant are used to evaluate rs. For the above example, ranks are as follows.

Contestant No. 1 2 3 4 5 6 7 8 9 10

Ranks of scores by Judge A 7 1 10 2.5 5.5 4 8.5 2.5 8.5 5.5

Ranks of scores by Judge B 5.5 3 7.5 5.5 3 9.5 1 7.5 9.5 3

13
Spearman Rank Correlation Coefficient is a non-parametric measure of correlation.

Spearman Rank Correlation Coefficient tries to assess the relationship between ranks without
making any assumptions about the nature of their relationship.

Hence it is a non-parametric measure - a feature which has contributed to its popularity and
wide spread use.

Advantages and Caveats

Other measures of correlation are parametric in the sense of being based on possible
relationship of a parameterized form, such as a linear relationship.

Another advantage with this measure is that it is much easier to use since it does not matter
which way we rank the data, ascending or descending. We may assign rank 1 to the smallest
value or the largest value, provided we do the same thing for both sets of data.

The only requirement is that data should be ranked or at least converted into ranks.

3.3Partial Correlation Analysis

Partial correlation analysis involves studying the linear relationship between two variables
after excluding the effect of one or more independent factors.

Simple correlation does not prove to be an all-encompassing technique especially under the
above circumstances. In order to get a correct picture of the relationship between two
variables, we should first eliminate the influence of other variables.

For example, study of partial correlation between price and demand would involve studying
the relationship between price and demand excluding the effect of money supply, exports,
etc.

What Correlation does not Provide

Generally, a large number of factors simultaneously influence all social and natural
phenomena. Correlation and regression studies aim at studying the effects of a large number
of factors on one another.

In simple correlation, we measure the strength of the linear relationship between two
variables, without taking into consideration the fact that both these variables may be
influenced by a third variable.

For example, when we study the correlation between price (dependent variable) and demand
(independent variable), we completely ignore the effect of other factors like money supply,
import and exports etc. which definitely have a bearing on the price.

14
Range

The correlation co-efficient between two variables X1 and X2, studied partially after
eliminating the influence of the third variable X3 from both of them, is the partial correlation
co-efficient r12.3.

Simple correlation between two variables is called the zero order co-efficient since in simple
correlation, no factor is held constant. The partial correlation studied between two variables
by keeping the third variable constant is called a first order co-efficient, as one variable is
kept constant. Similarly, we can define a second order co-efficient and so on. The partial
correlation co-efficient varies between -1 and +1. Its calculation is based on the simple
correlation co-efficient.

The partial correlation analysis assumes great significance in cases where the phenomena
under consideration have multiple factors influencing them, especially in physical and
experimental sciences, where it is possible to control the variables and the effect of each
variable can be studied separately. This technique is of great use in various experimental
designs where various interrelated phenomena are to be studied.

Limitations

However, this technique suffers from some limitations some of which are stated below.

 The calculation of the partial correlation co-efficient is based on the simple


correlation co-efficient. However, simple correlation coefficient assumes linear
relationship. Generally this assumption is not valid especially in social sciences, as
linear relationship rarely exists in such phenomena.
 As the order of the partial correlation co-efficient goes up, its reliability goes down.
 Its calculation is somewhat cumbersome - often difficult to the mathematically
uninitiated (though software's have made life a lot easier).

15
Multiple Correlation

Another technique used to overcome the drawbacks of simple correlation is multiple


regression analysis.

Here, we study the effects of all the independent variables simultaneously on a dependent
variable. For example, the correlation co-efficient between the yield of paddy (X1) and the
other variables, viz. type of seedlings (X2), manure (X3), rainfall (X4), humidity (X5) is the
multiple correlation co-efficient R1.2345 . This co-efficient takes value between 0 and +1.

The limitations of multiple correlation are similar to those of partial correlation. If multiple
and partial correlation are studied together, a very useful analysis of the relationship between
the different variables is possible.

3.4 Correlation and Causation

Correlation and causation, closely related to confounding variables, is the incorrect


assumption that because something correlates, there is a causal relationship.

Causality is the area of statistics that is most commonly misused, and misinterpreted, by non-
specialists. Media sources, politicians and lobby groups often leap upon a perceived
correlation, and use it to 'prove' their own beliefs. They fail to understand that, just because
results show a correlation, there is no proof of an underlying causality.

Many people assume that because a poll, or a statistic, contains many numbers, it must be
scientific, and therefore correct.

Patterns of Causality in the Mind

Unfortunately, the human mind is built to try and subconsciously establish links between
many contrasting pieces of information. The brain often tries to construct patterns from
randomness, so jumps to conclusions, and assumes that a relationship exists.

Overcoming this tendency is part of academic training of students and academics in most
fields, from physics to the arts. The ability to evaluate data objectively, is absolutely crucial
to academic success.

The Sensationalism of the Media

The best way to look at the misuse of correlation and causation is by looking at an example:

16
A survey, as reported in a British newspaper, involved questioning a group of teenagers about
their behavior, and establishing whether their parents smoked. The newspaper reported, as
fact, that children whose parents smoked were more likely to exhibit delinquent behavior.

The results seemed to show a correlation between the two variables, so the paper printed the
headline; "Parental smoking causes children to misbehave." The Professor leading the
investigation stated that cigarette packets should carry warnings about social issues alongside
the prominent health warnings.

(Source http://www.criticalthinking.org.uk/smokingparents/)

However, there are a number of problems with this assumption. The first is that correlations
can often work in reverse. For example, it is perfectly possible that the parents smoked
because of the stress of looking after delinquent children.

Another cause may be that social class causes the correlation; the lower classes are usually
more likely to smoke and are more likely to have delinquent children. Therefore, parental
smoking and delinquency are both symptoms of the problem of poverty and may well have
no direct link between them.

Emotive Bias Influences Causality

This example highlights another reason behind correlation and causation errors, because the
Professor was strongly anti-smoking. He was hoping to find a link that would support his
own agenda. This is not to say that his results were useless, because they showed that there is
a root cause behind the problems of delinquency and the likelihood of smoking. This,
however, is not the same as a cause and effect relationship, and he allowed his emotions to
cloud his judgment. Smoking is a very emotive subject, but academics must remain aloof and
unbiased if internal validity is to remain intact.

17
The Cost of Disregarding Correlation and Causation

The principle of incorrectly linking correlation and causation is closely linked to post-hoc
reasoning, where incorrect assumptions generate an incorrect link between two effects.

The principle of correlation and causation is very important for anybody working as a
scientist or researcher. It is also a useful principle for non-scientists, especially those studying
politics, media and marketing. Understanding causality promotes a greater understanding,
and honest evaluation of the alleged facts given by pollsters.

Imagine an expensive advertising campaign, based around intense market research, where
misunderstanding a correlation could cost a lot of money in advertising, production costs, and
damage to the company's reputation.

Bibliography

Coon, D. & Mitterer, J.O. (2009).Psychology: A Journey (4th Ed.). Belmont, CA: Cengage
Learning

Kassin, S.M., Fein, S., Markus, H.R. (2011). Social Psychology, Belmont, CA: Wadsworth
Cengage Learning

Kornblum, W. (2003).Sociology in a Changing World (6th Ed.). Belmont, CA: Wadsworth


Cengage Learning

Smoking Parents Cause Teenage Delinquency, (2006, August 21). Criticalthinking.org.uk.


Retrieved Feb. 26. 2008
from: http://www.criticalthinking.org.uk/smokingparents/

18
4. Regression

4.1 Linear Regression Analysis

Linear regression analysis is a powerful technique used for predicting the unknown value of a
variable from the known value of another variable.

More precisely, if X and Y are two related variables, then linear regression analysis helps us
to predict the value of Y for a given value of X or vice verse.

For example age of a human being and maturity are related variables. Then linear regression
analyses can predict level of maturity given age of a human being.

Dependent and Independent Variables

By linear regression, we mean models with just one independent and one dependent variable.
The variable whose value is to be predicted is known as the dependent variable and the one
whose known value is used for prediction is known as the independent variable.

Two Lines of Regression

There are two lines of regression- that of Y on X and X on Y. The line of regression of Y on
X is given by Y = a + bX where a and b are unknown constants known as intercept and slope
of the equation. This is used to predict the unknown value of variable Y when value of
variable X is known.

Y = a + bX

On the other hand, the line of regression of X on Y is given by X = c + dY which is used to


predict the unknown value of variable X using the known value of variable Y. Often, only
one of these lines make sense.

Exactly which of these will be appropriate for the analysis in hand will depend on labeling of
dependent and independent variable in the problem to be analyzed.

Choice of Line of Regression

For example, consider two variables crop yield (Y) and rainfall (X). Here construction of
regression line of Y on X would make sense and would be able to demonstrate the
dependence of crop yield on rainfall. We would then be able to estimate crop yield given
rainfall.

Careless use of linear regression analysis could mean construction of regression line of X on
Y which would demonstrate the laughable scenario that rainfall is dependent on crop yield;
this would suggest that if you grow really big crops you will be guaranteed a heavy rainfall.

19
Regression Coefficient

The coefficient of X in the line of regression of Y on X is called the regression coefficient of


Y on X. It represents change in the value of dependent variable (Y) corresponding to unit
change in the value of independent variable (X).

For instance if the regression coefficient of Y on X is 0.53 units, it would indicate that Y will
increase by 0.53 if X increased by 1 unit. A similar interpretation can be given for the
regression coefficient of X on Y.

Once a line of regression has been constructed, one can check how good it is (in terms of
predictive ability) by examining the coefficient of determination (R2). R2 always lies
between 0 and 1. All software provides it whenever regression procedure is run.

R2 - coefficient of determination

The closer R2 is to 1, the better is the model and its prediction. A related question is whether
the independent variable significantly influences the dependent variable. Statistically, it is
equivalent to testing the null hypothesis that the regression coefficient is zero. This can be
done using t-test.

Assumption of Linearity

Linear regression does not test whether data is linear. It finds the slope and the intercept
assuming that the relationship between the independent and dependent variable can be best
explained by a straight line.

One can construct the scatter plot to confirm this assumption. If the scatter plot reveals non
linear relationship, often a suitable transformation can be used to attain linearity.

4.2 Multiple Regression Analysis

Multiple regression analysis is a powerful technique used for predicting the unknown value
of a variable from the known value of two or more variables- also called the predictors.

More precisely, multiple regression analysis helps us to predict the value of Y for given
values of X1, X2, …,Xk.

For example the yield of rice per acre depends upon quality of seed, fertility of soil, fertilizer
used, temperature, rainfall. If one is interested to study the joint affect of all these variables
on rice yield, one can use this technique.

An additional advantage of this technique is it also enables us to study the individual


influence of these variables on yield.

20
Dependent and Independent Variables

By multiple regression, we mean models with just one dependent and two or more
independent (exploratory) variables. The variable whose value is to be predicted is known as
the dependent variable and the ones whose known values are used for prediction are known
independent (exploratory) variables.

The Multiple Regression Model

In general, the multiple regression equation of Y on X1, X2, …,Xk is given by:

Y = b0 + b1 X1 + b2 X2 + …………………… + bkXk

Interpreting Regression Coefficients

Here b0 is the intercept and b1, b2, b3, …,bk are analogous to the slope in linear regression
equation and are also called regression coefficients. They can be interpreted the same way as
slope. Thus if bi = 2.5, it would indicates that Y will increase by 2.5 units if Xi increased by 1
unit.

The appropriateness of the multiple regression model as a whole can be tested by the F-test in
the ANOVA table. A significant F indicates a linear relationship between Y and at least one
of the X's.

How Good Is the Regression?

Once a multiple regression equation has been constructed, one can check how good it is (in
terms of predictive ability) by examining the coefficient of determination (R2). R2 always
lies between 0 and 1.

R2 - coefficient of determination

All software provides it whenever regression procedure is run. The closer R2 is to 1, the better
is the model and its prediction.

A related question is whether the independent variables individually influence the dependent
variable significantly. Statistically, it is equivalent to testing the null hypothesis that the
relevant regression coefficient is zero.

This can be done using t-test. If the t-test of a regression coefficient is significant, it indicates
that the variable is in question influences Y significantly while controlling for other
independent explanatory variables.

21
Assumptions

Multiple regression technique does not test whether data are linear. On the contrary, it
proceeds by assuming that the relationship between the Y and each of Xi's is linear. Hence as
a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,…,k. If any plot
suggests non linearity, one may use a suitable transformation to attain linearity.

Another important assumption is non existence of multicollinearity- the independent


variables are not related among themselves. At a very basic level, this can be tested by
computing the correlation coefficient between each pair of independent variables.

Other assumptions include those of homoscedasticity and normality.

Multiple regression analysis is used when one is interested in predicting a continuous


dependent variable from a number of independent variables. If dependent variable is
dichotomous, then logistic regression should be used.

4.3 Correlation and Regression

Correlation and linear regression are the most commonly used techniques for investigating
the relationship between two quantitative variables.

The goal of a correlation analysis is to see whether two measurement variables co vary, and
to quantify the strength of the relationship between the variables, whereas regression
expresses the relationship in the form of an equation.

For example, in students taking a Maths and English test, we could use correlation to
determine whether students who are good at Maths tend to be good at English as well, and
regression to determine whether the marks in English can be predicted for given marks in
Maths.

What a Scatter Diagram Tells Us

The starting point is to draw a scatter of points on a graph, with one variable on the X-axis
and the other variable on the Y-axis, to get a feel of the relationship (if any) between the
variables as suggested by the data. The closer the points are to a straight line, the stronger the
linear relationship between two variables.

Why Use Correlation?

We can use the correlation coefficient, such as the Pearson Product Moment Correlation
Coefficient, to test if there is a linear relationship between the variables. To quantify the
strength of the relationship, we can calculate the correlation coefficient (r). Its numerical
value ranges from +1.0 to -1.0. r> 0 indicates positive linear relationship, r < 0 indicates
negative linear relationship while r = 0 indicates no linear relationship.

22
A Caveat

It must, however, be considered that there may be a third variable related to both of the
variables being investigated, which is responsible for the apparent correlation. Correlation
does not imply causation. Also, a nonlinear relationship may exist between two variables that
would be inadequately described, or possibly even undetected, by the correlation coefficient.

Why Use Regression

In regression analysis, the problem of interest is the nature of the relationship itself between
the dependent variable (response) and the (explanatory) independent variable.

The analysis consists of choosing and fitting an appropriate model, done by the method of
least squares, with a view to exploiting the relationship between the variables to help estimate
the expected response for a given value of the independent variable. For example, if we are
interested in the effect of age on height, then by fitting a regression line, we can predict the
height for a given age.

Assumptions

Some underlying assumptions governing the uses of correlation and regression are as follows.

The observations are assumed to be independent. For correlation, both variables should be
random variables, but for regression only the dependent variable Y must be random. In
carrying out hypothesis tests, the response variable should follow Normal distribution and the
variability of Y should be the same for each value of the predictor variable. A scatter diagram
of the data provides an initial check of the assumptions for regression.

23
Uses of Correlation and Regression

There are three main uses for correlation and regression.

 One is to test hypotheses about cause-and-effect relationships. In this case, the


experimenter determines the values of the X-variable and sees whether variation in X
causes variation in Y. For example, giving people different amounts of a drug and
measuring their blood pressure.
 The second main use for correlation and regression is to see whether two variables are
associated, without necessarily inferring a cause-and-effect relationship. In this case,
neither variable is determined by the experimenter; both are naturally variable. If an
association is found, the inference is that variation in X may cause variation in Y, or
variation in Y may cause variation in X, or variation in some other factor may affect
both X and Y.
 The third common use of linear regression is estimating the value of one variable
corresponding to a particular value of the other variable.

5.Student’s T-Test

The student's t-test is a statistical method that is used to see if two sets of data differ
significantly.

The method assumes that the results follow the normal distribution (also called student's t-
distribution) if the null hypothesis is true. This null hypothesis will usually stipulate that there
is no significant difference between the means of the two data sets.

It is best used to try and determine whether there is a difference between two independent
sample groups. For the test to be applicable, the sample groups must be completely
independent, and it is best used when the sample size is too small to use more advanced
methods.

Before using this type of test it is essential to plot the sample data from the two samples and
make sure that it has a reasonably normal distribution, or the student's t test will not be
suitable. It is also desirable to randomly assign samples to the groups, wherever possible.

Example

You might be trying to determine if there is a significant difference in test scores between
two groups of children taught by different methods.

24
The null hypothesis might state that there is no significant difference in the mean test scores
of the two sample groups and that any difference down to chance.

The student's t test can then be used to try and disprove the null hypothesis.

Restrictions

The two sample groups being tested must have a reasonably normal distribution. If the
distribution is skewed, then the student's t test is likely to throw up misleading results. The
distribution should have only one main peak (= mode) near the mean of the group.

If the data does not adhere to the above parameters, then either a large data sample is needed
or, preferably, a more complex form of data analysis should be used.

Results

The student's t test can let you know if there is a significant difference in the means of the
two sample groups and disprove the null hypothesis. Like all statistical tests, it cannot prove
anything, as there is always a chance of experimental error occurring. But the test can support
a hypothesis.

However, it is still useful for measuring small sample populations and determining if there is
a significant difference between the groups.

5.1 Independent One-Sample T-Test

An independent one-sample t-test is used to test whether the average of a sample differ
significantly from a population mean, a specified value μ0.

When you compare each sample to a "known truth", you would use the (independent) one-
sample t-test. If you are comparing two samples not strictly related to each other, the
independent two-sample t-test is used.

Any single sample statistical test that uses t-distribution can be called a 'one-sample t-test'.
This test is used when we have a random sample and we want to test if it is significantly
different from a population mean.

Hypothesis to Be Tested

Generally speaking, this test involves testing the null hypothesis H0: μ = μ0 against the
alternative hypothesis, H1: μ ≠ μ0 where μ is the population mean and μ0 is a specific value
of the population mean that we would like to test for acceptance.

25
An example may clarify the calculation and hypothesis testing of the independent one-sample
t-test better.

An Example

Suppose that the teacher of a school claims that an average student of his school studies 8
hours per day during weekends and we desire to test the truth of this claim.

The statistical methodology for this purpose requires that we begin by first specifying the
hypothesis to be tested.

In this case, the null hypothesis would be H0: μ = 8, which essentially states that mean hours
of study per day is no different from 8 hours. And the alternative hypothesis is, H1: μ ≠ 8,
which is negation of the teacher's claim.

Collecting Samples

In the next step, we take a sample of say 10 students of the school and collect data on how
long they study during weekends.

These 10 different study hours is our data.

Suppose that the sample mean turns out to be 6.5 hours.

We cannot infer anything directly from this mean - as to whether the claim is to be accepted
or rejected as it could very well have happened that by sheer luck (even though the sample
was drawn randomly). Students included in the sample may have been those who studied
fewer than 8 hours.

On the other hand, it could also be the case that the claim was indeed inappropriate.

To draw a scientifically valid conclusion, we can perform an independent one-sample t-test


which helps us to either accept or reject the null hypothesis.

If the null hypothesis is rejected, it means that the sample came from a population with mean
study hours significantly different from 8 hours.

On the other hand if the null hypothesis is accepted, it means that there is no evidence to
suggest that average study hours were significantly different from 8 hours - thereby
establishing evidence of the claim.

26
Assumptions

This test is one of the most popular small sample test widely used in all disciplines -
medicine, behavioral science, physical science etc. However, this test can be used only if the
background assumptions are satisfied.

 The population from which the sample has been drawn should be normal - appropriate
statistical methods exist for testing this assumption (For example the Kolmogorov
Smirnov non parametric test). It has however been shown that minor departures from
normality do not affect this test - this is indeed an advantage.
 The population standard deviation is not known.
 Sample observations should be random.

A Small Sample Test

This test is a small sample test. It is difficult to draw the clearest line of demarcation between
large and small samples. Statisticians have generally agreed that a sample may be considered
small if its size is < 30 (less than 30).

The test used for dealing with problems relating the large samples are different from the one
used for small samples. We often use z-test for large samples.

5.2 Independent Two-Sample T-Test

The independent two-sample t-test is used to test whether population means are significantly
different from each other, using the means from randomly drawn samples.

Any statistical test that uses two samples drawn independently of each other and using t-
distribution, can be called a 'two-sample t-test'.

Hypothesis Testing

Generally speaking, this test involves testing the null hypothesis H0: μ(x) = μ(y) against the
alternative research hypothesis, H1: μ(x) ≠ μ(y) where μ(x) and μ(y) are respectively the
population mean of the two populations from which the two samples have been drawn.

Hypothesis testing is frequently used for the scientific method.

An Example

Suppose that a school has two buildings - one for girls and the other for boys. Suppose that
the principal want to know if the pupils of the two buildings are working equally hard, in the
sense that they put in equal number of hours in studies on the average.

27
Statistically speaking, the principal is interested in testing whether the average number of
hours studied by boys is significantly different from the average for girls.

Steps

1. To calculate, we begin by specifying the hypothesis to be tested.

In this case, the null hypothesis would be H0: μ(boys) = μ(girls), which essentially
states that mean study hours for boys and girls are no different.

The alternative research hypothesis is H1: μ(boys) ≠ μ(girls).

2. In the second step, we take a sample of say 10 students from the boy's building and 15
from girl's building and collect data on how long they study daily. These 10 and 15
different study hours are our two samples.

It is not difficult to see that the two samples have been drawn independent of each
other - an essential requirement of the independent two-sample t-test.

Suppose that the sample mean turns out to be 7.25 hours for boys and 8.5 for girls.
We cannot infer anything directly from these sample means - specifically as to
whether boys and girls were equally hard working as it could very well have
happened by sheer luck (even though the samples were drawn randomly) that boys
included in the boy's sample were those who studied fewer hours.

On the other hand, it could also be the case that girls were indeed working harder than
boys.

3. The third step would involve performing the independent two-sample t-test which
helps us to either accept or reject the null hypothesis.

If the null hypothesis is rejected, it means that two buildings were significantly
different in terms of number of hours of hard work.

On the other hand if the null hypothesis is accepted, one can conclude that there is no
evidence to suggest that the two buildings differed significantly and that boys and
girls can be said to be at par.

Assumptions

Along with the independent single sample t-test, this test is one of the most widely tests.
However, this test can be used only if the background assumptions are satisfied.

 The populations from which the samples have been drawn should be normal -
appropriate statistical methods exist for testing this assumption (For example, the

28
Kolmogorov Smirnov non-parametric test). One needs to note that the normality
assumption has to be tested individually and separately for the two samples. It has
however been shown that minor departures from normality do not affect this test - this
is indeed an advantage.
 The standard deviation of the populations should be equal i.e. σX2 = σY2 = σ2, where
σ2 is unknown. This assumption can be tested by the F-test.
 Samples have to be randomly drawn independent of each other. There is however no
requirement that the two samples should be of equal size - often times they would be
unequal though the odd case of equal size cannot be ruled out.

5.3 Dependent T-Test for Paired Samples

The dependent t-test for paired samples is used when the samples are paired. This implies that
each individual observation of one sample has a unique corresponding member in the other
sample.

 One sample has been tested twice (repeated measures)

Or,

 Two samples have been "matched" or "paired", in some way. (matched subjects
design)

The emphasis being on pairing of observations, it is obvious that the samples are dependent -
hence the name.

Any statistical test involving paired samples and using t-distribution can be called 't-test for
paired samples'.

An Example

Let us illustrate the meaning of a paired sample. Suppose that we are required to examine if a
newly developed intervention program for disadvantaged students has an impact. For this
purpose, we need to obtain scores from a sample of n such students in a standardized test
before administering the program.

After the program is over, the same test needs to be administered to the same group of
students and scores obtained again.

There are two samples: 1) the sample of prior intervention scores (pretest) and, 2) the post
intervention scores (posttest). The samples are related in the sense that each pretest has a
corresponding posttest as both were obtained from the same student.
29
If the score of each student (ith) before and after the program is xi and yi respectively, then
the pair (xi, yi) corresponds to the same subject (student in this case).

This is what is meant by paired sample. It is very important that two scores for each
individual student be correctly identified and labeled as the differences di =│ xi - yi │are
used to determine the test statistic and consequently the p-value.

Steps

1. With the above framework, the null hypothesis would be H0: there is no significant
difference between pre and post intervention scores, which essentially states that the
intervention program was not effective. The alternative hypothesis is H1: there is
significant difference between pre and post intervention scores.
2. Once the hypotheses have been framed, the second step involves taking the sample of
pre and post intervention scores and determining the sum, ∑│ xi - yi │. Logically
speaking, a small sum could indicate truth of the null hypothesis.

However nothing concrete can be interpreted from it - specifically as to whether


intervention program did have an impact as it could very well have happened by sheer
luck (even though the students were drawn randomly) that for this sample of students,
the scores did not change much.

On the other hand, it could also be the case that the program was indeed useful.

3. The third step involves performing the dependent t-test for paired samples which
helps us to either accept or reject the null hypothesis. If the null hypothesis is rejected,
one can infer that the program was useful.

On the other hand if the null hypothesis is accepted, one can conclude that there is no
evidence to suggest the program did have an impact.

Assumptions

This test has a few background assumptions which need to be satisfied.

1. The sample of differences (di's) should be normal - an assumption that can be tested -
for instance by the Kolmogorov Smirnov non-parametric test.

It has however been shown that minor departures from normality do not affect this
test - this is indeed an advantage.

2. The samples should be dependent and it should be possible to identify specific pairs.
3. An obvious requirement is that the two samples should be of equal size.

30
For Small Samples

This test is a small sample test. It is difficult to draw the clearest line of demarcation between
large and small sample.

Statisticians have generally agreed that a sample may be considered small if its size is < 30
(below 30).

5.4 Student’s T-Test (II)

Any statistical test that uses t distribution can be called a t-test, or the "student's t-test". It is
basically used when the sample size is small i.e. n<30.

For example, if a person wants to test the hypothesis that mean height of student's of a
college is not different from 150 cms, he can take a sample of size say 20 from the college.
From the mean height of these students, he can test the hypothesis. The test to be used for this
purpose is t-test.

Student's T-test for Different Purposes

There are different types of t-test each for different purpose. Some of the popular types are
outlined below:

1. Student's t-test for single mean is used to test a hypothesis on a specific value of the
population mean. Statistically speaking, we test the null hypothesis H0: μ = μ0 against
the alternative hypothesis H1: μ >< μ0 where μ is the population mean and μ0 is a
specific value of the population that we would like to test for acceptance.

The example on heights of students explained above requires this test. In that
example, μ0 = 150.

2. The t-test for difference of means is used to test the hypothesis that two populations
have the same mean.

For example suppose one is interested to test if there is any significant difference
between the mean height of male and female students in a particular college. In such a
situation, t-test for difference of means can be applied. One would have to take two
independent samples from the college- one from males and the other from females in
order to perform this test.

An additional assumption of this test is that the variance of the two populations is
equal.

31
3. A paired t-test is usually used when the two samples are dependent- this happens
when each individual observation of one sample has a unique relationship with a
particular member of the other sample.

For example we may wish to test if a newly developed intervention program for
disadvantaged students is useful. For this, we need to obtain scores from say 22
students in a standardized test before administering the program. After the program is
over, the same test needs to be administered again on the same group of 22 students
and scores obtained.

The two samples- the sample of prior intervention scores and the sample of post
intervention scores are related as each student has two scores. The samples are
therefore dependent. The paired t-test can is applicable in such scenarios.

4. A t-test for correlation coefficient is used for testing an observed sample correlation
coefficient (r).

For example suppose a random sample of 27 pairs of observation from a normal


population gave a correlation coefficient 0.2. Notice that this is the sample correlation
coefficient obtained from a sample of observations. One may be interested to know
whether the variables are correlated in the population. In this case we can use t-test for
correlation coefficient

5. A t-test for testing significance of regression coefficient is used to test the


significance of regression coefficients in linear and multiple regression setup.

Assumptions

Irrespective of the type of t-test used, two assumptions have to be met.

1. the populations from which the samples are drawn are normal.
2. the population standard deviation is not known.

Student's t-test is a small sample test. It is difficult to drawn a clearest line of demarcation
between large and small sample. Statisticians have generally agreed that a sample may be
considered small if its size is < 30.

The test used for dealing with problems relating the large samples are different from the one
used for small samples. We often use z-test for large samples.

32
6.ANOVA

The Analysis of Variance, popularly known as the ANOVA, can be used in cases where there
are more than two groups.

When we have only two samples we can use the t-test to compare the means of the samples
but it might become unreliable in case of more than two samples. If we only compare two
means, then the t-test (independent samples) will give the same results as the ANOVA.

It is used to compare the means of more than two samples. This can be understood better with
the help of an example.

One Way Anova

EXAMPLE: Suppose we want to test the effect of five different exercises. For this, we recruit
20 men and assign one type of exercise to 4 men (5 groups). Their weights are recorded after
a few weeks.

We may find out whether the effect of these exercises on them is significantly different or not
and this may be done by comparing the weights of the 5 groups of 4 men each.

The example above is a case of one-way balanced ANOVA.

It has been termed as one-way as there is only one category whose effect has been studied
and balanced as the same number of men has been assigned on each exercise. Thus the basic
idea is to test whether the samples are all alike or not.

Why Not Multiple T-Tests?

As mentioned above, the t-test can only be used to test differences between two means. When
there are more than two means, it is possible to compare each mean with each other mean
using many t-tests.

But conducting such multiple t-tests can lead to severe complications and in such
circumstances we use ANOVA. Thus, this technique is used whenever an alternative
procedure is needed for testing hypotheses concerning means when there are several
populations.

One Way and Two Way ANOVA

Now some questions may arise as to what are the means we are talking about and why
variances are analyzed in order to derive conclusions about means. The whole procedure can
be made clear with the help of an experiment.

33
Let us study the effect of fertilizers on yield of wheat. We apply five fertilizers, each of
different quality, on five plots of land each of wheat. The yield from each plot of land is
recorded and the difference in yield among the plots is observed. Here, fertilizer is a factor
and the different qualities of fertilizers are called levels.

This is a case of one-way or one-factor ANOVA since there is only one factor, fertilizer. We
may also be interested to study the effect of fertility of the plots of land. In such a case we
would have two factors, fertilizer and fertility. This would be a case of two-way or two-factor
ANOVA. Similarly, a third factor may be incorporated to have a case of three-way or three-
factor ANOVA.

Chance Cause and Assignable Cause

In the above experiment the yields obtained from the plots may be different and we may be
tempted to conclude that the differences exist due to the differences in quality of the
fertilizers.

But this difference may also be the result of certain other factors which are attributed to
chance and which are beyond human control. This factor is termed as “error”. Thus, the
differences or variations that exist within a plot of land may be attributed to error.

Thus, estimates of the amount of variation due to assignable causes (or variance between the
samples) as well as due to chance causes (or variance within the samples) are obtained
separately and compared using an F-test and conclusions are drawn using the value of F.

Assumptions

There are four basic assumptions used in ANOVA.

 the expected values of the errors are zero


 the variances of all errors are equal to each other
 the errors are independent
 they are normally distributed

6.1 One-Way ANOVA

A One-Way ANOVA (Analysis of Variance) is a statistical technique by which we can test if


three or more means are equal. It tests if the value of a single variable differs significantly
among three or more levels of a factor.

We can say we have a framework for one-way ANOVA when we have a single factor with
three or more levels and multiple observations at each level.

34
In this kind of layout, we can calculate the mean of the observations within each level of our
factor.

The concepts of factor, levels and multiple observations at each level can be best understood
by an example.

Factor and Levels - An Example

Let us suppose that the Human Resources Department of a company desires to know if
occupational stress varies according to age.

The variable of interest is therefore occupational stress as measured by a scale.

The factor being studied is age. There is just one factor (age) and hence a situation
appropriate for one-way ANOVA.

Further suppose that the employees have been classified into three groups (levels):

 less than 40
 40 to 55
 above 55

These three groups are the levels of factor age - there are three levels here. With this design,
we shall have multiple observations in the form of scores on Occupational Stress from a
number of employees belonging to the three levels of factor age. We are interested to know
whether all the levels i.e. age groups have equal stress on the average.

Non-significance of the test statistic (F-statistic) associated with this technique would imply
that age has no effect on stress experienced by employees in their respective occupations. On
the other hand, significance would imply that stress afflicts different age groups differently.

Hypothesis Testing

Formally, the null hypothesis to be tested is of the form:

H0: All the age groups have equal stress on the average or μ1 = μ2 = μ3 , where μ1, μ2, μ3 are
mean stress scores for the three age groups.

The alternative hypothesis is:

H1: The mean stress of at least one age group is significantly different.

35
One-way Anova and T-Test

The one-way ANOVA is an extension of the independent two-sample t-test.

In the above example, if we considered only two age groups, say below 40 and above 40,
then the independent samples t-test would have been enough although application of
ANOVA would have also produced the same result.

In the example considered above, there were three age groups and hence it was necessary to
use one-way ANOVA.

Often the interest is on acceptance or rejection of the null hypothesis. If it is rejected, this
technique will not identify the level which is significantly different. One has to perform t-
tests for this purpose.

This implies that if difference between the means exists; we would have to carry out 3C2
independent t-tests in order to locate the level which is significantly different. It would be kC2
number of t-tests in the general one-way ANOVA design with k levels.

Advantages

One of the principle advantages of this technique is that the number of observations need not
be the same in each group.

Additionally, layout of the design and statistical analysis is simple.

Assumptions

For the validity of the results, some assumptions have been checked to hold before the
technique is applied. These are:

 Each level of the factor is applied to a sample. The population from which the sample
was obtained must be normally distributed.
 The samples must be independent.
 The variances of the population must be equal.

Replication and Randomization

In general, ANOVA experiments need to satisfy three principles - replication, randomization


and local control.

Out of these three, only replication and randomization have to be satisfied while designing
and implementing any one-way ANOVA experiment.

36
Replication refers to the application of each individual level of the factor to multiple subjects.
In the above example, in order to apply the principle of replication, we had obtained
occupational stress scores from more than one employee in each level (age group).

Randomization refers to the random allocation of the experimental units. In our example,
employees were selected randomly for each of the age groups.

6.2 Two-Way ANOVA

A Two-Way ANOVA is useful when we desire to compare the effect of multiple levels of
two factors and we have multiple observations at each level.

One-Way ANOVA compares three or more levels of one factor. But some experiments
involve two factors each with multiple levels in which case it is appropriate to use Two-Way
ANOVA.

Let us discuss the concepts of factors, levels and observation through an example.

Factors and Levels - An Example


A Two-Way ANOVA is a design with two factors.

Let us suppose that the Human Resources Department of a company desires to know if
occupational stress varies according to age and gender.

The variable of interest is therefore occupational stress as measured by a scale.

There are two factors being studied - age and gender.

Further suppose that the employees have been classified into three groups or levels:

 age less than 40,


 40 to 55
 above 55

In addition employees have been labeled into gender classification (levels):

 male
 female

In this design, factor age has three levels and gender two. In all, there are 3 x 2 = 6 groups or
cells. With this layout, we obtain scores on occupational stress from employee(s) belonging
to the six cells.

37
Testing for Interaction

There are two versions of the Two-Way ANOVA.

The basic version has one observation in each cell - one occupational stress score from one
employee in each of the six cells.

The second version has more than one observation per cell but the number of observations in
each cell must be equal. The advantage of the second version is it also helps us to test if there
is any interaction between the two factors.

For instance, in the example above, we may be interested to know if there is any interaction
between age and gender.

This helps us to know if age and gender are independent of each other - they are independent
if the effect of age on stress remains the same irrespective of whether we take gender into
consideration.

Hypothesis Testing

In the basic version there are two null hypotheses to be tested.

 H01: All the age groups have equal stress on the average
 H02: Both the gender groups have equal stress on the average.

In the second version, a third hypothesis is also tested:

 H03: The two factors are independent or that interaction effect is not present.

The computational aspect involves computing F-statistic for each hypothesis.

Assumption

The assumptions in both versions remain the same - normality, independence and equality of
variance.

Advantages

 An important advantage of this design is it is more efficient than its one-way


counterpart. There are two assignable sources of variation - age and gender in our
example - and this helps to reduce error variation thereby making this design more
efficient.
 Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same
time.

38
 One can also test for independence of the factors provided there are more than one
observation in each cell. The only restriction is that the number of observations in
each cell has to be equal (there is no such restriction in case of one-way ANOVA).

Replication, Randomization and Local Control

An Two-Way ANOVA satisfies all three principles of design of experiments namely


replication, randomization and local control.

Principles of replication and randomization need to be satisfied in a manner similar to One-


Way ANOVA.

The principle of local control means to make the observations as homogeneous as possible so
that error due to one or more assignable causes may be removed from the experimental error.

In our example if we divided the employees only according to their age, then we would have
ignored the effect of gender on stress which would then accumulate with the experimental
error.

But we divided them not only according to age but also according to gender which would
help in reducing the error - this is application of the principle of local control for reducing
error variation and making the design more efficient.

6.3 Factorial ANOVA

Experiments where the effects of more than one factor are considered together are called
'factorial experiments' and may sometimes be analyzed with the use of factorial ANOVA.

For instance, the academic achievement of a student depends on study habits of the student as
well as home environment. We may have two simple experiments, one to study the effect of
study habits and another for home environment.

Independence of Factors

But these experiments will not give us any information about the dependence or
independence of the two factors, namely study habit and home environment.

In such cases, we resort to Factorial ANOVA which not only helps us to study the effect of
two or more factors but also gives information about their dependence or independence in the
same experiment. There are many types of factorial designs like 22, 23, 32 etc. The simplest
of them all is the 22 or 2 x 2 experiment.

39
An Example

In these experiments, the factors are applied at different levels. In a 2 x 2 factorial design,
there are 2 factors each being applied in two levels.

Let us illustrate this with the help of an example. Suppose that a new drug has been
developed to control hypertension.

We want to test the effect of quantity of the drug taken and the effect of gender. Here, the
quantity of the drug is the first factor and gender is the second factor (or vice versa).

Suppose that we consider two quantities, say 100 mg and 250 mg of the drug (1 / 2). These
two quantities are the two levels of the first factor.

Similarly, the two levels of the second factor are male and female (A / B).

Thus we have two factors each being applied at two levels. In other words, we have a 2 x 2
factorial design.

Here we have 4 different treatment groups, one for each combination of levels of factors - by
convention, the groups are denoted by A1, A2, B1, B2. These groups mean the following.

 A1 : 100mg of the drug applied on male patients


 A2 : 250mg of the drug applied on male patients
 B1 : 100mg of the drug applied on female patients
 B2 : 250mg of the drug applied on female patients.

Here, the quantity of the drug and gender are the independent variables whereas reduction of
hypertension after one month is the dependent variable.

Main Effects and Interaction

A main effect is an outcome that can show consistent difference between levels of a factor.

In our example, there are two main effects - quantity and gender.

Factorial ANOVA also enables us to examine the interaction effect between the factors. An
interaction effect is said to exist when differences on one factor depend on the level of other
factor.

However, it is important to remember that interaction is between factors and not levels. We
know that there is no interaction between the factors when we can talk about the effect of one
factor without mentioning the other factor.

40
Hypothesis Testing

In the above example, there are three hypotheses to be tested. These are:

H01: Main effect 'quantity' is not significant

H02: Main effect 'gender' is not significant

H03: Interaction effect is not present.

For main effect gender, the null hypothesis means that there is no significant difference in
reduction of hypertension in males and females.

The null hypothesis for the main effect quantity means that there is no significant difference
in reduction of hypertension whether the patients are given 100 mg or 250 mg of the drug.

For the interaction effect, the null hypothesis means that the two main effects gender and
quantity are independent. The computational aspect involves computing F-statistic for each
hypothesis.

Advantages

Factorial design has several important features.

 Factorial designs are the ultimate designs of choice whenever we are interested in
examining treatment variations.
 Factorial designs are efficient. Instead of conducting a series of independent studies,
we are effectively able to combine these studies into one.
 Factorial designs are the only effective way to examine interaction effects.

6.4 Repeated Measures ANOVA

Repeated Measures ANOVA is a technique used to test the equality of means.

It is used when all the members of a random sample are tested under a number of conditions.
Here, we have different measurements for each of the sample as each sample is exposed to
different conditions.

However, it is used when all the members of a random sample are tested under a number of
conditions. Here, we have different measurements for each of the sample as each sample is
exposed to different conditions.

In other words, the measurement of the dependent variable is repeated. It is not possible to
use the standard ANOVA in such a case as such data violates the assumption of

41
independence of data and as such it will not be able to model the correlation between the
repeated measures.

Not Multivariate Design

However, it must be noted that a repeated measures design is very much different from a
multivariate design.

For both, samples are measured on several occasions, or trials, but in the repeated measures
design, each trial represents the measurement of the same characteristic under a different
condition.

For example, repeated measures ANOVA can be used to compare the number of oranges
produced by an orange grove in years one, two and three. The measurement is the number of
oranges and the condition that changes is the year.

But in a multivariate design, each trial represents the measurement of a different


characteristic.

Thus, to compare the number, weight and price of oranges repeated measures ANOVA
cannot be used. The three measurements are number, weight, and price, and these do not
represent different conditions, but different qualities.

Why Use Repeated Measures Design?

Repeated measures design is used for several reasons:

 By collecting data from the same participants under repeated conditions the individual
differences can be eliminated or reduced as a source of between group differences.
 Also, the sample size is not divided between conditions or groups and thus inferential
testing becomes more powerful.
 This design also proves to be economical when sample members are difficult to
recruit because each member is measured under all conditions.

Assumption

This design is based on the assumption of Sphericity, which means that the variance of the
population difference scores for any two conditions should be the same as the variance of the
population difference scores for any other two conditions.

But this condition is only relevant to the one-way repeated measures ANOVA and in other
cases this assumption is commonly violated.

42
Hypothesis

The null hypothesis to be tested here is:

H0: There are no differences between population means.

Some differences will occur in the sample. It is desired to draw conclusions about the
population from which it was taken, not about the sample. The F-ratios are used for the
analysis of variance and conclusions are drawn accordingly.

Within-Subject Design

The repeated measures design is also known as a within-subject design.

The data presented in this design includes a measure repeated over time, a measure repeated
across more than one condition or several related and comparable measures.

Possible Designs for Repeated Measures

 One-way repeated measures


 Two-way repeated measures
 Two-way mixed split-plot design (SPANOVA)

43
7.Nonparametric Statistics

Nonparametric statistics are those data that do not assume a prior distribution. When an
experiment is performed or data collected for some purpose, it is usually assumed that it fits
some given probability distribution, typically the normal distribution. This is the basis on
which the data is interpreted. When these assumptions are not made, it becomes
nonparametric statistics.

There are several advantages of using nonparametric statistics. As can be expected, since
there are fewer assumptions that are made about the sample being studied, nonparametric
statistics are usually wider in scope as compared to parametric statistics that actually assume
a distribution. This is mainly the case when we do not know a lot about the sample we are
studying and making a priori assumptions about data distributions might not give us accurate
results and interpretations. This directly translates into an increase in robustness.

However, there are also some disadvantages of nonparametric statistics. The main
disadvantage is that the degree of confidence is usually lower for these types of studies. This
means for the same sample under consideration, the results obtained from nonparametric
statistics have a lower degree of confidence than if the results were obtained using parametric
statistics. Of course, this is assuming that the study is such that it is valid to assume a
distribution for the sample.

There are many experimental scenarios in which we can assume a normal distribution. For
example if an experiment looks at the correlation between a healthy morning breakfast and
IQ, the experimenter can assume beforehand that the IQs of the sample size follow a normal
distribution within the sample, assuming the sample is chosen randomly from thepopulation.
On the other hand, if this assumption is not made, then the experimenter is following
nonparametric statistics methods.

However, there could be another experiment that measures the resistance of the human body
to a strain of bacteria. In such a case, it is not possible to determine if the data will be
normally distributed. It might happen that all people are resistant to the strain of bacteria
under study or perhaps no one is. Again, there could be other considerations as well. It could
be that people of a particular ethnicity are born with that resistance while none of the others
are. In such cases, it is not right to assume a normal distribution of data. These are the
situations in which nonparametric statistics should be used. There are many tests that tell us
whether the data can be assumed to be normally distributed or not.

44
7.1 Cohen’s Kappa

Cohen's Kappa is an index that measures interrater agreement for categorical (qualitative)
items.

The items are indicators of the extent to which two raters who are examining the same set of
categorical data, agree while assigning the data to categories, for example, classifying a
tumor as 'malignant' or 'benign'.

Comparison between the level of agreement between two sets of dichotomous scores or
ratings (an alternative between two choices, e.g. accept or reject) assigned by two raters to
certain qualitative variables can be easily accomplished with the help of simple percentages,
i.e. taking the ratio of the number of ratings for which both the raters agree to the total
number of ratings. But despite the simplicity involved in its calculation, percentages can be
misleading and does not reflect the true picture since it does not take into account the scores
that the raters assign due to chance.

Using percentages can result in two raters appearing to be highly reliable and completely in
agreement, even if they have assigned their scores completely randomly and they actually do
not agree at all. Cohen's Kappa overcomes this issue as it takes into account agreement
occurring by chance.

How to Compute Cohen's Kappa

The formula for Cohen's Kappa is:

Pr(a) - Pr(e)
К=
1 - Pr(e)

Where:
Pr(a) = Observed percentage of agreement,
Pr(e) = Expected percentage of agreement.

The observed percentage of agreement implies the proportion of ratings where the raters
agree, and the expected percentage is the proportion of agreements that are expected to occur
by chance as a result of the raters scoring in a random manner. Hence Kappa is the proportion
of agreements that is actually observed between raters, after adjusting for the proportion of
agreements that take place by chance.

Let us consider the following 2×2 contingency table, which depicts the probabilities of two
raters classifying objects into two categories.

45
Rater 1 Total

Category 1 2

Rater 2 1 P11 P12 P10

2 P21 P22 P20

Total P01 P02 1

Then
Pr(a) = P01 + P10
Pr(e) = P02 + P20

Interpretation

The value of К ranges between -1 and +1, similar to Karl Pearson's co-efficient of correlation
'r'. In fact, Kappa and r assume similar values if they are calculated for the same set of
dichotomous ratings for two raters.

A value of kappa equal to +1 implies perfect agreement between the two raters, while that of
-1 implies perfect disagreement. If kappa assumes the value 0, then this implies that there is
no relationship between the ratings of the two raters, and any agreement or disagreement is
due to chance alone. A kappa value of 0.70 is generally considered to be satisfactory.
However, the desired reliability level varies depending on the purpose for which kappa is
being calculated.

Caveats

Kappa is very easy to calculate given the software's available for the purpose and is
appropriate for testing whether agreement exceeds chance levels. However, some questions
arise regarding the proportion of chance, or expected agreement, which is the proportion of
times the raters would agree by chance alone. This term is relevant only in case the raters are
independent, but the clear absence of independence calls its relevance into question.

Also, kappa requires two raters to use the same rating categories. But it cannot be used in
case we are interested to test the consistency of ratings for raters that use different categories,
e.g. if one uses the scale 1 to 5, and the other 1 to 10.

46
7.2 Mann-Whitney U-Test

Mann-Whitney-Wilcoxon (MWW)<br>Wilcoxon Rank-Sum Test

Non-parametric tests are basically used in order to overcome the underlying assumption of
normality in parametric tests. Quite general assumptions regarding the population are used in
these tests.

A case in point is the Mann-Whitney U-test (Also known as the Mann-Whitney-Wilcoxon


(MWW) or Wilcoxon Rank-Sum Test). Unlike its parametric counterpart, the t-test for two
samples, this test does not assume that the difference between the samples is normally
distributed, or that the variances of the two populations are equal. Thus when the validity of
the assumptions of t-test are questionable, the Mann-Whitney U-Test comes into play and
hence has wider applicability.

The Method

The Mann-Whitney U-test is used to test whether two independent samples of observations
are drawn from the same or identical distributions. An advantage with this test is that the two
samples under consideration may not necessarily have the same number of observations.

This test is based on the idea that the particular pattern exhibited when 'm' number of X
random variables and 'n' number of Y random variables are arranged together in increasing
order of magnitude provides information about the relationship between their parent
populations.

The Mann-Whitney test criterion is based on the magnitude of the Y's in relation to the X's,
i.e. the position of Y's in the combined ordered sequence. A sample pattern of arrangement
where most of the Y's are greater than most of the X's or vice versa would be evidence
against random mixing. This would tend to discredit the null hypothesis of identical
distribution.

Assumptions

The test has two important assumptions. First the two samples under consideration are
random, and are independent of each other, as are the observations within each sample.
Second the observations are numeric or ordinal (arranged in ranks).

How to Calculate the Mann-Whitney U

In order to calculate the U statistics, the combined set of data is first arranged in ascending
order with tied scores receiving a rank equal to the average position of those scores in the
ordered sequence.

47
Let T denote the sum of ranks for the first sample. The Mann-Whitney test statistic is then
calculated using U = n1 n2 + {n1 (n1 + 1)/2} - T , where n1 and n2 are the sizes of the first
and second samples respectively.

An Example

An example can clarify better. Consider the following samples.

Sample A

Observation 25 25 19 21 22 19 15

Rank 15.5 15.5 9.5 13 14 9.5 3.5

Sample B

Observation 18 14 13 15 17 19 18 20 19

Rank 6.5 2 1 3.5 5 9.5 6.5 12 9.5

Here, T = 80.5, n1 = 7, n2 = 9.Hence, U = (7 * 9) + [{7 * (7+1)}/2] - 80.5 = 10.5.

We next compare the value of calculated U with the value given in the Tables of Critical
Values for the Mann-Whitney U-test, where the critical values are provided for given n1 and
n2 , and accordingly accept or reject the null hypothesis. Even though the distribution of U is
known, the normal distribution provides a good approximation in case of large samples.

Hypothesis On Equality of Medians

Often this statistic is used to compare a hypothesis regarding equality of medians. The logic
is simple - since the U statistic tests if two samples are drawn from identical populations,
equality of median follow.

As a Counterpart of T-Test

The Mann-Whitney U test is truly the non parametric counterpart of the two sample t-test. To
see this, one needs to recall that the t-test tests for equality of means when the underlying
assumptions of normality and equality of variance are satisfied. Thus the t-test tests if the two
samples have been drawn from identical normal population. The Mann-Whitney U test is its
generalization.

48
7.3 Wilcoxon Signed Rank Test

The Wilcoxon Signed Rank Test is a non-parametric statistical test for testing hypothesis on
median.

The test has two versions: "single sample" and "paired samples / two samples".

Single Sample

The first version is the analogue of independent one sample t-test in the non parametric
context. It uses a single sample and is recommended for use whenever we desire to test a
hypothesis about population median.

m0 = the specific value of population median

The null hypothesis here is of the form H0 : m = m0 , where m0 is the specific value of
population median that we wish to test against the alternative hypothesis H1 : m ≠ m0 .

For example, let us suppose that the manager of a boutique claims that median income his
clients is $24,000/- per annum. To test if this is tenable, the analyst will obtain the yearly
income of a sample of his clients and test the null hypothesis H0 : m0 = 24,000.

Paired Samples

The second version of the test uses paired samples and is the non parametric analogue of
dependent t-test for paired samples.

This test uses two samples but it is necessary that they should be paired. Paired samples
imply that each individual observation of one sample has a unique corresponding member in
the other sample.

An Example - Paired Samples

For example, suppose that we have a sample of weights of n obese adults before they are
subjected to a change of diet.

After a lapse of six months, we would like to test whether there has been any significant loss
in weight as a result of change in diet. One could be tempted to straightaway use the
dependent t-test for paired samples here.

49
However that test has certain assumption notable among them being normality. If this
normality assumption is not satisfied, one would have to go for the non parametricWilcoxon
Signed Rank Test.

The null hypothesis then would be that there has been no significant reduction in median
weight after six months against the alternative that medians before and after significantly
differ.

Normality Assumptions is not Required

Most of the standard statistical techniques can be used provided certain standard assumptions
such as independence, normality etc. are satisfied.

Often these techniques cannot be used if the normality assumption is not satisfied. Among
others, the t-test requires this assumption and it is not advisable to use it if this assumption is
violated.

Advantages

The advantage with Wilcoxon Signed Rank Test is that it neither depends on the form of the
parent distribution nor on its parameters. It does not require any assumptions about the shape
of the distribution.

For this reason, this test is often used as an alternative to t test's whenever the population
cannot be assumed to be normally distributed. Even if the normality assumption holds, it has
been shown that the efficiency of this test compared to t-test is almost 95%.

Let us illustrate how signed ranks are created in one-sample case by considering the example
explained above. Assume that a sample of yearly incomes of 10 customers was collected. The
null hypothesis to be tested is H0 : m = 24,000.

We first calculate the deviations of the given observations from 24,000 and then rank them in
order of magnitude. This has been done in the following table:

Income Deviation Signed Ranks

23,928 -72 -1

24,500 500 5.5

23,880 -120 -2

50
24,675 675 7

21,965 -2035 -10

22,900 -1100 -9

23,500 -500 -5,5

24,450 450 4

22,998 -1002 -8

23,689 -311 -3

The deviations are ranked in increasing order of absolute magnitude and then the ranks are
given the signs of the corresponding deviations.

In the above table the difference 500 occurs twice. In such a case, we assign a common rank
which is the arithmetic mean of their respective ranks. Hence 500 was assigned the rank
which is the arithmetic mean of 5 and 6.

In a two sample case, the ranks are assigned in a similar way. The only difference is that in a
two sample case we first find out the differences between the corresponding observations of
the samples and then rank them in increasing order of magnitude.

The ranks are then given the sign of the corresponding differences.

51
8. Other Ways to Analyze Data

8.1 Chi Square Test

Any statistical test that uses the chi square distribution can be called chi square test. It is
applicable both for large and small samples-depending on the context.

For example suppose a person wants to test the hypothesis that success rate in a particular
English test is similar for indigenous and immigrant students.

If we take random sample of say size 80 students and measure both indigenous/immigrant as
well as success/failure status of each of the student, the chi square test can be applied to test
the hypothesis.

There are different types of chi square test each for different purpose. Some of the popular
types are outlined below.

Tests for Different Purposes

1. Chi square test for testing goodness of fit is used to decide whether there is any
difference between the observed (experimental) value and the expected (theoretical)
value.

For example given a sample, we may like to test if it has been drawn from a normal
population. This can be tested using chi square goodness of fit procedure.

2. Chi square test for independence of two attributes. Suppose N observations are
considered and classified according two characteristics say A and B. We may be
interested to test whether the two characteristics are independent. In such a case, we
can use Chi square test for independence of two attributes.

The example considered above testing for independence of success in the English test
vis a vis immigrant status is a case fit for analysis using this test.

3. Chi square test for single variance is used to test a hypothesis on a specific value of
the population variance. Statistically speaking, we test the null hypothesis H0: σ = σ0
against the research hypothesis H1: σ # σ0 where σ is the population mean and σ0 is a
specific value of the population variance that we would like to test for acceptance.

In other words, this test enables us to test if the given sample has been drawn from a
population with specific variance σ0. This is a small sample test to be used only if
sample size is less than 30 in general.

52
Assumptions

The Chi square test for single variance has an assumption that the population from which the
sample has been is normal. This normality assumption need not hold for chi square goodness
of fit test and test for independence of attributes.

However while implementing these two tests, one has to ensure that expected frequency in
any cell is not less than 5. If it is so, then it has to be pooled with the preceding or succeeding
cell so that expected frequency of the pooled cell is at least 5.

Non Parametric and Distribution Free

It has to be noted that the Chi square goodness of fit test and test for independence of
attributes depend only on the set of observed and expected frequencies and degrees of
freedom. These two tests do not need any assumption regarding distribution of the parent
population from which the samples are taken.

Since these tests do not involve any population parameters or characteristics, they are also
termed as non parametric or distribution free tests. An additional important fact on these two
tests is they are sample size independent and can be used for any sample size as long as the
assumption on minimum expected cell frequency is met.

8.2 Z-Test

Z-test is a statistical test where normal distribution is applied and is basically used for dealing
with problems relating to large samples when n ≥ 30.

n = sample size

For example suppose a person wants to test if both tea & coffee are equally popular in a
particular town. Then he can take a sample of size say 500 from the town out of which
suppose 280 are tea drinkers. To test the hypothesis, he can use Z-test.

Z-Test's for Different Purposes

There are different types of Z-test each for different purpose. Some of the popular types are
outlined below:

1. z test for single proportion is used to test a hypothesis on a specific value of the
population proportion.

53
Statistically speaking, we test the null hypothesis H0: p = p0 against the alternative
hypothesis H1: p >< p0 where p is the population proportion and p0 is a specific value
of the population proportion we would like to test for acceptance.

The example on tea drinkers explained above requires this test. In that example, p0 =
0.5. Notice that in this particular example, proportion refers to the proportion of tea
drinkers.

2. z test for difference of proportions is used to test the hypothesis that two populations
have the same proportion.

For example suppose one is interested to test if there is any significant difference in
the habit of tea drinking between male and female citizens of a town. In such a
situation, Z-test for difference of proportions can be applied.

One would have to obtain two independent samples from the town- one from males
and the other from females and determine the proportion of tea drinkers in each
sample in order to perform this test.

3. z -test for single mean is used to test a hypothesis on a specific value of the population
mean.

Statistically speaking, we test the null hypothesis H0: μ = μ0 against the alternative
hypothesis H1: μ >< μ0 where μ is the population mean and μ0 is a specific value of
the population that we would like to test for acceptance.

Unlike the t-test for single mean, this test is used if n ≥ 30 and population standard
deviation is known.

4. z test for single variance is used to test a hypothesis on a specific value of the
population variance.

Statistically speaking, we test the null hypothesis H0: σ = σ0 against H1: σ >< σ0 where
σ is the population mean and σ0 is a specific value of the population variance that we
would like to test for acceptance.

In other words, this test enables us to test if the given sample has been drawn from a
population with specific variance σ0. Unlike the chi square test for single variance,
this test is used if n ≥ 30.

5. Z-test for testing equality of variance is used to test the hypothesis of equality of two
population variances when the sample size of each sample is 30 or larger.

54
Assumption

Irrespective of the type of Z-test used it is assumed that the populations from which the
samples are drawn are normal.

8.3 F-Test

Any statistical test that uses F-distribution can be called a F-test. It is used when the sample
size is small i.e. n < 30.

For example suppose one is interested to test if there is any significant difference between the
mean height of male and female students in a particular college. In such a situation, t-test for
difference of means can be applied.

However one assumption of t-test is that the variance of the two populations is equal- here
two populations are the population of heights of male and female students. Unless this
assumption is true, the t-test for difference of means cannot be carried out.

The F-test can be used to test the hypothesis that the population variances are equal.

F-test's for Different Purposes

There are different types of t-tests each for different purpose. Some of the popular types are
outlined below.

1. F-test for testing equality of variance is used to test the hypothesis of equality of two
population variances. The example considered above requires the application of this
test.
2. F-test for testing equality of several means. Test for equality of several means is
carried out by the technique named ANOVA.

For example suppose that the efficacy of a drug is sought to be tested at three levels
say 100mg, 250mg and 500mg. A test is conducted among fifteen human subjects
taken at random- with five subjects being administered each level of the drug.

To test if there are significant differences among the three levels of the drug in terms
of efficacy, the ANOVA technique has to be applied. The test used for this purpose is
the F-test.

3. F-test for testing significance of regression is used to test the significance of the
regression model. The appropriateness of the multiple regressionmodel as a whole can
be tested by this test. A significant F indicates a linear relationship between Y and at
least one of the X's.

55
Assumptions

Irrespective of the type of F-test used, one assumption has to be met. The populations from
which the samples are drawn have to be normal. In the case of F-test for equality of variance,
a second assumption has to be satisfied in that the larger of the sample variances has to be
placed in the numerator of the test statistic.

Like t-test, F-test is also a small sample test and may be considered for use if sample size is <
30.

Deciding

In attempting to reach decisions, we always begin by specifying the null hypothesis against a
complementary hypothesis called alternative hypothesis. The calculated value of the F-test
with its associated p-value is used to infer whether one has to accept or reject a null
hypothesis.

All software's provide these p-values. If the associated p-value is small i.e. (<0.05) we say
that the test is significant at 5% and one may reject the null hypothesis and accept the
alternative one.

On the other hand if associated p-value of the test is >0.05, one may accept the null
hypothesis and reject the alternative. Evidence against the null hypothesis will be considered
very strong if p-value is less than 0.01. In that case, we say that the test is significant at 1%.

8.4 Factor Analysis

Factor analysis is a statistical approach that can be used to analyze large number of
interrelated variables and to categorize these variables using their common aspects.

The approach involves finding a way of representing correlated variables together to form a
new smaller set of derived variables with minimum loss of information. So, it is a type of a
data reduction tool and it removes redundancy or duplication from a set of correlated
variables.

Also, factors are formed that are relatively independent of one another. But since it require
the data to be correlated, so all assumptions that apply to correlation are relevant here.

Main Types

There are two main types of factor analysis. The two main types are:

56
 Principal component analysis - this method provides a unique solution so that the
original data can be reconstructed from the results. Thus, this method not only
provides a solution but also works the other way round, i.e., provides data from the
solution. The solution generated includes as many factors as there are variables.
 Common factor analysis - this technique uses an estimate of common difference or
variance among the original variables to generate the solution. Due to this, the number
of factors will always be less than the number of original factors. So, factor analysis
actually refers to common factor analysis.

Main Uses

The main uses of factor analysis can be summarized as given below. It helps us in:

 Identification of underlying factors- the aspects common to many variables can be


identified and the variables can be clustered into homogeneous sets. Thus, new sets of
variables can be created. This allows us to gain insight to categories.
 Screening of variables- it helps us to identify groupings so that we can select one
variable to represent many.

Example

Let us consider an example to understand the use of factor analysis.

Suppose we want to know whether certain aspects such as “task skills” and “communication
skills” attribute to the quality of “leadership” or not. We prepare a questionnaire with 20
items, 10 of them pertaining to task elements and 10 to communication elements.

Before using the questionnaire on the sample we use it on a small group of people, who are
like those in the survey. When we analyze the data we try to see if there are really two factors
and if those factors represent the aspects of task and communication skills.

In this way, factors can be found to represent variables with similar aspects.

8.5 ROC Curve Analysis

In almost all fields of human activity, there is often a need to discriminate between good and
bad, presence and absence. Various tests have been designed to meet this objective. The ROC
curve technique has been designed to attain two objectives in this regard.

ROC - Receiver Operating Characteristic

Various tests have been designed to meet this objective. The ROC curve technique has been
designed to attain two objectives in this regard.

57
First, it can be used to calibrate (in some sense) a test so that it is able to perform the
discrimination activity well. Second, it can be used to choose between tests and specify best
among them.

What is ROC?

If one applies to a bank for credit, it is most likely that the bank will calculate a credit score
out of the applicant’s background. A higher score could indicate a good customer with
minimal chance of default. The banker could refuse credit if the score is low. Often a credit
score cut-off is used below which the application is rejected.

It is not difficult to see that there is always an element of risk here – risk of committing two
types of errors. A good prospective customer (one who would not default) could be refused
credit by the bank and a bad one could be approved credit. Clearly the banker would like the
cut-off be fixed in a manner that chances of both the errors are minimized if not entirely
eliminated.

The ROC Curve was invented during the WW2 to help radars detect weak signals from
aircrafts

While complete elimination is impossible, the ROC curve analysis is a technique which
contributes to this endeavour. A related problem is the question of choosing between methods
of identifying good/bad customers should there be a choice. The ROC curve analysis
technique can be of use even here.

The Plot

In order to draw the ROC curve, the concepts of ‘Sensitivity’ and ‘Specificity’ are used – the
curve actually is the plot of sensitivity (in the y axis) against 1- specificity (in the x axis) for
different values of the cut-off.

To understand these concepts, assume that we select a sample of z customers of the bank by
retrospective sampling method. Further suppose that m and n of these are good and bad
(defaulting) customers respectively (m+n=z). Next, we use the credit scale on these
customers and calculate their credit scores. Then we use the cut-off and label customers good
or bad according to whether the credit score is above or below the cut-off. Out of m good
customers, the test classified x of them as good while the remaining m-x were classified as
bad.

In the parlance of ROC curve, x is termed as TP (for true positive meaning that the credit
scale was able to identify these customers as good correctly) while m-x in termed as FN (for
false negative). Further suppose that out of n bad customers, the test classified y of them as
bad while the remaining n-y were classified as good. In ROC parlance, y is termed as TN (for
true negative) while n-y in termed as FP (for false positive).

58
Actual vs Credit Scale
Actual status of costumers

Good Bad

Good x n-y
As predicted by credit scale
Bad m-x y

Total m n

Sensitivity and Specificity

The probability that among the good customers the test will identify a customer as good is
known as ‘Sensitivity’ of the test for that cut-off (given by x/m). On the other hand, among
the bad customers, the probability that the test will identify a customer as bad is known as
‘Specificity’ (given by y/n) of the test again for the same cut-off.

Shape of a Good Curve

To draw the curve, the sensitivity and specificity are determined for a range of cut-offs. Then
sensitivity and 1-specificity are plotted. A good test is one in which the curve is closer to the
upper left corner.

So far so good but how does one determine the optimal cut-off? The banker would like to
determine that cut-off for which sensitivity is high and 1-specificity is low – ideally 100%
sensitivity with 100% specificity. That is easier said than done as the best of the curves is not
a vertical line but one which rises steeply initially and then slowly. The highest point on the
curve has 100% sensitivity and 0% specificity. In other words as one of sensitivity or
specificity increases, the other decreases and vice versa. The problem of determining the
ideal cut-off is to choose one depending upon the extent of sensitivity and specificity that the
decision maker is comfortable with. Having thus fixed a cut-off, the banker can then use it for
evaluating fresh credit applications.

The fact that the best of the tests has a curve which rises steeply initially is used to choose
between tests. A test can be called the best if its corresponding ROC curve is higher than
others.

8.6 Meta-Analysis

Meta analysis is a statistical technique developed by social scientists, who are very limited in
the type of experiments they can perform.

59
Social scientists have great difficulty in designing and implementing true experiments, so
meta-analysis gives them a quantitative tool to analyze statistically data drawn from a number
of studies, performed over a period of time.

Medicine and psychology increasingly use this method, as a way of avoiding time-consuming
and intricate studies, largely repeating the work of previous research.

What is Meta-Analysis?

Social studies often use very small sample sizes, so any statistics used generally give results
containing large margins of error.

This can be a major problem when interpreting and drawing conclusions, because it can mask
any underlying trends or correlations. Such conclusions are only tenuous, at best, and leave
the research open for criticism.

Meta-analysis is the process of drawing from a larger body of research, and using powerful
statistical analyzes on the conglomerated data.

This gives a much larger sample population and is more likely to generate meaningful and
usable data.

The Advantages of Meta-Analysis

Meta-analysis is an excellent way of reducing the complexity and breadth of research,


allowing funds to be diverted elsewhere. For rare medical conditions, it allows researchers to
collect data from further afield than would be possible for one research group.

As the method becomes more common, database programs have made the process much
easier, with professionals working in parallel able to enter their results and access the data.
This allows constant quality assessments and also reducing the chances of unnecessary repeat
research, as papers can often take many months to be published, and the computer records
ensure that any researcher is aware of the latest directions and results.

The field of meta study is also a lot more rigorous than the traditional literature review, which
often relies heavily upon the individual interpretation of the researcher.

When used with the databases, a meta study allows a much wider net to be cast than by the
traditional literature review, and is excellent for highlighting correlations and links between
studies that may not be readily apparent as well as ensuring that the compiler does not
subconsciously infer correlations that do not exist.

60
The Disadvantages of Meta-Analysis

There are a number of disadvantages to meta-analysis, of which a researcher must be aware


before relying upon the data and generated statistics.

The main problem is that there is the potential for publication bias and skewed data.

Research generating results not refuting a hypothesis may tend to remain unpublished, or
risks not being entered into the database. If the meta study is restricted to the research with
positive results, then the validity is compromised.

The researcher compiling the data must make sure that all research is quantitative, rather than
qualitative, and that the data is comparable across the various research programs, allowing a
genuine statistical analysis.

It is important to pre-select the studies, ensuring that all of the research used is of a sufficient
quality to be used.

One erroneous or poorly conducted study can place the results of the entire meta-analysis at
risk. On the other hand, setting almost unattainable criteria and criteria for inclusion can leave
the meta study with too small a sample size to be statistically relevant.

Striking a balance can be a little tricky, but the whole field is in a state of constant
development, incorporating protocols similar to the scientific method used for normal
quantitative research.

Finding the data is rapidly becoming the real key, with skilled meta-analysts developing a
skill-set of library based skills, finding information buried in government reports and
conference data, developing the knack of assessing the quality of sources quickly and
effectively.

Conclusions and the Future

Meta-analysis is here to stay, as an invaluable tool for research, and is rapidly gaining
momentum as a stand-alone discipline, with practitioners straddling the divide between
statisticians and librarians.

The conveniences, as long as the disadvantages are taken into account, are too apparent to
ignore, and a meta study can reduce the need for long, expensive and potentially intrusive
repeated research studies.

61

You might also like