Professional Documents
Culture Documents
Definition of terms
Treatment. This denotes any procedure done on the experimental subject. The effect is
to be measured and analyzed.
Experimental Subject. The main material being used in the experiment. It can be a person,
animal and a device.
Randomization. This refers to the assignment of the experimental subjects to the
treatments by chance.
Replication. The method at which the experiment is repeated several times to find
variations among the observations.
Randomized Complete Block Design
Population is the collection of the elements which has some or the other characteristic in
common. Number of elements in the population is the size of the population.
Sample is the subset of the population. The process of selecting a sample is known as
sampling. Number of elements in the sample is the sample size.
There are lot of sampling techniques which are grouped into two categories as:
Probability Sampling & Non- Probability Sampling
Probability Sampling
This Sampling technique uses randomization to make sure that every element of the
population gets an equal chance to be part of the selected sample. It’s alternatively known as
random sampling.
Simple Random Sampling
Every element has an equal chance of getting selected to be the part sample. It is
used when we don’t have any kind of prior information about the target population.
Three methods of SRS: Lottery Method Sampling, Table of Random Numbers or Digits
Sampling and Simple Random Sampling Formula
Stratified Sampling
This technique divides the elements of the population into small subgroups (strata)
based on the similarity in such a way that the elements within the group are homogeneous
and heterogeneous among the other subgroups formed. And then the elements are randomly
selected from each of these strata. We need to have prior information about the population
to create subgroups.
Cluster Sampling
Our entire population is divided into clusters or sections and then the clusters are
randomly selected. All the elements of the cluster are used for sampling. Clusters are
identified using details such as age, sex, location etc.
Cluster sampling can be done in following ways:
Single Stage Cluster Sampling. Entire cluster is selected randomly for sampling.
Two Stage Cluster Sampling. randomly select clusters and then from those selected
clusters we randomly select elements for sampling
Systematic Clustering
Here the selection of elements is systematic and not random except the first element.
Elements of a sample are chosen at regular intervals of population. All the elements are
put together in a sequence first where each element has the equal chance of being selected.
For a sample of size n, we divide our population of size N into subgroups of k elements.
We select our first element randomly from the first subgroup of k elements.
To select other elements of sample, perform following:
We know number of elements in each group is k i.e N/n
So if our first element is n1 then
Second element is n1+k i.e n2
Third element n2+k i.e n3 and so on..
Taking an example of N=20, n=5
No of elements in each of the subgroups is N/n i.e 20/5 =4= k
Now, randomly select first element from the first subgroup.
If we select n1= 3
n2 = n1+k = 3+4 = 7
n3 = n2+k = 7+4 = 11
Multi-Stage Sampling
It is the combination of one or more methods described above. Population is divided into
multiple clusters and then these clusters are further divided and grouped into various
sub groups (strata) based on similarity. One or more clusters can be randomly selected from
each stratum. This process continues until the cluster can’t be divided anymore. For example
country can be divided into states, cities, urban and rural and all the areas with similar
characteristics can be merged together to form a strata.
Non-Probability Sampling
It does not rely on randomization. This technique is more reliant on the researcher’s ability to
select elements for a sample. Outcome of sampling might be biased and makes difficult for all
the elements of population to be part of the sample equally. This type of sampling is also known
as non-random sampling.
Convenience Sampling
Here the samples are selected based on the availability. This method is used when the
availability of sample is rare and also costly. So based on the convenience samples are
selected.
Purposive Sampling
This is based on the intention or the purpose of study. Only those elements will be
selected from the population which suits the best for the purpose of our study.
Quota Sampling
This type of sampling depends of some pre-set standard. It selects the representative
sample from the population. Proportion of characteristics/ trait in sample should be same as
population. Elements are selected until exact proportions of certain types of data is
obtained or sufficient data in different categories is collected.
For example: If our population has 45% females and 55% males then our sample should
reflect the same percentage of males and females.
Referral /Snowball Sampling
This technique is used in the situations where the population is completely unknown and
rare. Therefore we will take the help from the first element which we select for the population
and ask him to recommend other elements who will fit the description of the sample needed.
So this referral technique goes on, increasing the size of population like a snowball.
For example: It’s used in situations of highly sensitive topics like HIV Aids where people will
not openly discuss and participate in surveys to share information about HIV Aids.
Not all the victims will respond to the questions asked so researchers can contact people they
know or volunteers to get in touch with the victims and collect information
Helps in situations where we do not have the access to sufficient people with the characteristics
we are seeking. It starts with finding people to study.
Dependent t-test
When to use
Used to test either a "change" or a "difference" in means between two related groups, but
not both at the same time.
Why to use
It compares the means of two related groups to determine whether there is a statistically
significant difference between these means.
Formula
General steps
1. Set-up and test appropriate statistic
2. Formulate and interpret your conclusion.
3. State null hypothesis and alternative hypothesis
NULL: There is no significant difference
ALTERNATIVE: There is a significant difference
4. State the level of significance (probability that test statistic falls on rejection region)
α = 0.05
5. Solve for deviation (D) and D2
To get D:
Subtract A from B
Then add all of the values
To get D2 :
Square the values of (A-B)
Then add all of the values
6. Compute t
ΣD: Sum of the differences (Sum of X-Y)
ΣD2: Sum of the squared differences
(ΣD)2: Sum of the differences, squared.
7. Find critical value of t
Find the p-value in the t-table, using the degrees of freedom (df= -1). Compare your t-
value and the answer from the t-table.
8. Conclusion
Refer to the hypothesis in doing so.
INDEPENDENT T-TEST
When to use:
We use Independent Sample T Test to compare the means of two independent groups in
order to determine whether there is statistical evidence that the associated population
means are significantly different.
Why use:
The independent t-test is used for the purpose of hypothesis testing in statistics.
Calculating a t-test requires three key data values.
Formula:
General Steps:
1. Define Null and Alternative Hypothesis
2. State Alpha
3. Calculate Degrees of Freedom
4. State Decision Rule
5. Calculate Test Statistics
6. State Results
7. State Conclusion
One-way ANOVA
Is used to determine if there is a significant difference between the means of two or more
independent groups
The number of groups used is usually 3.
Only has one factor or independent variable.
dfbetween = k – 1
dfwithin = n – k
dftotal = dfbetween + dfwithin
a) Look for the critical value using dfbetween , dfwithin and an F-table.
b) If the F-value is greater than the critical value, reject the null hypothesis.
SSBETWEEN
MSBETWEEN = dfbetween
SSWITHIN
MSWITHIN = dfwithin
MSBETWEEN
F= MSWITHIN
6) State Results
7) State Conclusion
Two-way ANOVA
Compares the mean differences between groups that have been split on two independent
variables.
Use a two-way ANOVA when you have one measurement variable and two factors.
Its primary purpose is to understand if there is an interaction between the two independent
variables on the dependent variable.
α = 0.05
3) Calculate the degrees of freedom
df(first factor) = a − 1
df(second factor) = b − 1
df(first * second) = (a − 1)(b − 1)
dferror = N – ab
dftotal = N – 1
a) Look for the critical values for the first factor, second factor, and for the
interaction in the F-table.
b) If the F-value is greater than each of the critical values, reject the null
hypothesis.
5) Calculate test statistics
∑(∑𝑏)2 𝑡2
SS(second factor) = -N
An
𝑆𝑆(𝑠𝑒𝑐𝑜𝑛𝑑𝑓𝑎𝑐𝑡𝑜𝑟)
MS(second factor) = df(second factor)
𝑆𝑆(𝑓𝑖𝑟𝑠𝑡∗𝑠𝑒𝑐𝑜𝑛𝑑)
MS(first * second) = df(firstsecond)
𝑆𝑆(𝑒𝑟𝑟𝑜𝑟)
MSerror = 𝑀𝑆 (𝑒𝑟𝑟𝑜𝑟)
𝑀𝑆(𝑓𝑖𝑟𝑠𝑡𝑓𝑎𝑐𝑡𝑜𝑟)
F(first factor) = 𝑀𝑆 (𝑒𝑟𝑟𝑜𝑟)
𝑀𝑆(𝑠𝑒𝑐𝑜𝑛𝑑𝑓𝑎𝑐𝑡𝑜𝑟)
F(second factor) = 𝑀𝑆 (𝑒𝑟𝑟𝑜𝑟)
𝑀𝑆(𝑓𝑖𝑟𝑠𝑡∗𝑠𝑒𝑐𝑜𝑛𝑑)
F(first * second) = 𝑀𝑆 (𝑒𝑟𝑟𝑜𝑟)
6) State Results
7) State Conclusion
Friedman Test
When to use:
It is used to test the differences between the dependent variable being measure is ordinal
Why to use:
when the data is significantly different than normally distributed this becomes the preferred test
over using ANOVA
Formula :
where: b = no. of blocks
t = no of treatments
T = total summation of treatment that is blocked
Steps:
a. State the null hypothesis
b. State the alternatice hypothesis
c. Arrange the recorded observations in a two-way table which the treatmentsare placed
in columns and blocks in rows.
d. Rank the data within the blocks, rank 1 for the lowest value and tied values the
average of the two ranks.
e. In each treatment/ column, get the sum of the ranks.
f. Compute the Friedman test Statistics, Fr, using the formula.
g. Test the null hypothesis. Compare Fr with the critical value of x2 distribution at d.f. =
t-1 and α=0.05. If Fr ≥ x2critical, the null hypothesis is rejected
CHI SQUARE
The Chi-square test is intended to test how likely it is that an observed distribution is due
to chance
Properties of Chi square
1. It is also called a "goodness of fit" statistic, because it measures how well the observed
distribution of data fits with the distribution that is expected if the variables are independent
2. The Mean of 𝑥 2 distribution is equal to the number of degrees of freedom (𝑛).
̅̅̅
𝒙𝟐 = 𝒏
3. The variance is equal to two times the number of degrees of freedom.
𝒙𝟐 = 𝟐𝒏
4. The median of 𝑥 2 distribution divides, the area of the curve into two equal parts, each part
being 0.5
5. The mode of 𝑥 2 distribution is equal to (n-2).
𝒙𝟐 = (𝒏𝟏 + 𝟐)
6. Since Chi-square values always positive, the Chi-square curve is always positively skewed.
WHEN TO USE
Chi square statistic is commonly used for testing relationships between two categorical
variables. The null hypothesis of the Chi-Square test is that no relationship exists on the
categorical variables in the population; they are independent
WHY TO USE
It is intended to test how likely it is that an observed distribution is due to chance. It is
also called a "goodness of fit" statistic, because chi square can be used in this kind of
problems to measure how well the observed distribution fits with the distribution that is
expected if the corresponding variables are independent.
STEPS:
1. State the null hypothesis