You are on page 1of 11

COLLEGE OF INFORMATION AND COMPUTING SCIENCES

Instructional Material for students

MODULE 4:
SAMPLING DISTRIBUTION,

Course overview

The course introduces the students to various methods of statistical analyses as applied in various industries and
enterprises. Through the use of primary statistical techniques, the students attain a meaningful understanding of
statistical reasoning within the context of management decision-making. Topics essentially focus on statistical
description, statistical induction, and analysis of statistical relationship.

Objectives

After successful completion of this module, the student can be able to;
• Identify their learning outcomes and expectations for the course;
• Recognize their capacity to create new understandings from reflecting on the course;
• Know the capabilities of Descriptive Statistics.

Module Content:
 Sampling Distribution
o Mean and Standard Deviation of Sample Mean
o The Sampling Distribution of the Sample Mean
 Statistical Relationship
o Overview of Statistical Relationship
o Line graph and Scatter Plot
o Correlation
o The Correlation Coefficient and Cohen’s D
 Supplemental Videos

Sampling Distributions
A statistic, such as the sample mean or the sample standard deviation, is a number computed from a sample. Since a sample
is random, every statistic is a random variable: it varies from sample to sample in a way that cannot be predicted with
certainty. As a random variable it has a mean, a standard deviation, and a probability distribution. The probability distribution
of a statistic is called its sampling distribution.
This module introduces the concepts of the mean, the standard deviation, and the sampling distribution of a sample statistic,
with an emphasis on the sample mean  .
The Mean and Standard Deviation of the Sample Mean
Suppose we wish to estimate the mean μ of a population. In actual practice we would typically take just one sample.
Imagine however that we take sample after sample, all of the same size n, and compute the sample mean x−−x- of each
one. We will likely get a different value of  each time. The sample mean  is a random variable: it varies from sample to
sample in a way that cannot be predicted with certainty. We will write  X−−X- when the sample mean is thought of as a
random variable, and write x−−x- for the values that it takes. The random variable X−−X- has a mean, denoted μX−−μX-,
and a standard deviation, denoted σX−−.σX-. Here is an example with such a small population and small sample size that we
can actually write down every single sample.

Now we apply for the formulas from Section 4.2.2 “The Mean and Standard Deviation of a Discrete Random Variable”,
“Random Variable” in “Discrete Random Variables” for the mean and standard deviation of a discrete random variable to

we obtain
The mean and standard deviation of the population {152,156,160,164} in the example are μ = 158 and σ=20−
−√.σ=20. The mean of the sample mean X−−X- that we have just computed is exactly the mean of the population. The
standard deviation of the sample mean X−−X- that we have just computed is the standard deviation of the population
divided by the square root of the sample size: 10−−√=20−−√/2–√.10=20/2. These relationships are not
coincidences, but are illustrations of the following formulas.

The first formula says that if we could take every possible sample from the population and compute the corresponding
sample mean, then those numbers would center at the number we wish to estimate, the population mean μ.

The second formula says that averages computed from samples vary less than individual measurements on the
population do, and quantifies the relationship.
The Sampling Distribution of the Sample Mean

The Central Limit Theorem

In "Example 1" in "The Mean and Standard Deviation of the Sample Mean" we constructed the probability distribution
of the sample mean for samples of size two drawn from the population of four rowers. The probability distribution is:

Figure 2.1 "Distribution of a Population and a Sample Mean" shows a side-by-side comparison of a histogram for the
original population and a histogram for this distribution. Whereas the distribution of the population is uniform, the
sampling distribution of the mean has a shape approaching the shape of the familiar bell curve. This phenomenon of
the sampling distribution of the mean taking on a bell shape even though the population distribution is not bell-
shaped happens in general. Here is a somewhat more realistic example.

Figure 2.1  Distribution of a Population and a Sample Mean


Suppose we take samples of size 1, 5, 10, or 20 from a population that consists entirely of the numbers 0 and 1, half
the population 0, half 1, so that the population mean is 0.5. The sampling distributions are:

Histograms illustrating these distributions are shown in Figure 2.2 "Distributions of the Sample Mean".

Figure 2.2  Distributions of the Sample Mean


As n increases the sampling distribution of X−−X- evolves in an interesting way: the probabilities on the lower and
the upper ends shrink and the probabilities in the middle become larger in relation to them. If we were to continue to
increase n then the shape of the sampling distribution would become smoother and more bell-shaped.
What we are seeing in these examples does not depend on the particular population distributions involved. In
general, one may start with any distribution and the sampling distribution of the sample mean will increasingly
resemble the bell-shaped normal curve as the sample size increases. This is the content of the Central Limit
Theorem.

The Central Limit Theorem


For samples of size 30 or more, the sample mean is approximately normally distributed, with mean μX−−=μμX-=μ and
standard deviation σX−−=σ/n−−√σX-=σ/n, where n is the sample size. The larger the sample size, the better the
approximation.

The Central Limit Theorem is illustrated for several common population distributions in Figure 2.3 "Distribution of
Populations and Sample Means".

Figure 2.3  Distribution of Populations and Sample Means

The dashed vertical lines in the figures locate the population mean. Regardless of the distribution of the population,
as the sample size is increased the shape of the sampling distribution of the sample mean becomes increasingly bell-
shaped, centered on the population mean. Typically by the time the sample size is 30 the distribution of the sample
mean is practically the same as a normal distribution.

The importance of the Central Limit Theorem is that it allows us to make probability statements about the sample
mean, specifically in relation to its value in comparison to the population mean, as we will see in the examples. But
to use the result properly we must first realize that there are two separate random variables (and therefore two
probability distributions) at play:

1. X, the measurement of a single element selected at random from the population; the distribution of X is the
distribution of the population, with mean the population mean μ and standard deviation the population
standard deviation σ;

2. X−−X-, the mean of the measurements in a sample of size n; the distribution of X−−X- is its sampling
distribution, with mean μX−−=μμX-=μ and standard deviation σX−−=σ/n−−√.
Note that if in Note 2.11 "Example 3" we had been asked to compute the probability that the value of a single randomly
selected element of the population exceeds 113, that is, to compute the number P(X > 113), we would not have been able
to do so, since we do not know the distribution of X, but only that its mean is 112 and its standard deviation is 40. By
contrast we could compute P( >113)P( >113) even without complete knowledge of the distribution of X because the
Central Limit Theorem guarantees that  is approximately normal.
Statistical Relationship Definition
Statistical relationship can be defined as a relationship used for determining the relationship between two or more
variables in a statistical manner or by conducting a survey. This is a mixture of deterministic and random
relationships.

Overview of Statistical Relationship

Deterministic Relationship: This involves an accurate relationship between any two variables. Example: If Paul
earns 20 per hour, for every hour he works, he earns 20 more.

Random Relationship: This is a slightly inaccurate relationship. There seems to be a relationship between two
variables, but it may not be accurate. Example: Joe spent 100 on advertising to get sales worth 300, but there is no
guarantee for sales worth300 again if he spends 100. The statistical relationship is the combination of both
Deterministic and Random relationships. The relationship between two variables exists if the values of one variable
are associated with the values of another variable. This relationship is not necessarily either Deterministic or
Random. It could be either.

Few examples of statistical relationships are:

·       The positive relationship between height and weight

·       The alcohol consumption and alcohol content in blood represent a positive relationship.
·        Driving speed and gasoline mileage establish a negative relationship.

·        The relationship between math’s score and statistics score.

Line graph or scatter plot


A line graph or a scatter plot can give us an idea about statistical relationships. This doesn’t have to be a straight
line; it can be a nonlinear relationship like a parabola. The pattern doesn’t have to be perfect; it can be like in the
scatter plot below.

The above two figures are examples that show the statistical relationship.

Statistical relationships can be positive or negative and strong or weak. Correlation and Cohen’s D are two
measures of relationship strength.

Correlation
Correlation is used to test the relationship between the variables in statistical relationships. It is a measure of how
variables are related to each other. This study of the relationship between variables in a statistical relationship is
called correlation analysis.

There could be a high correlation between variables or low correlation between variables.

Calorie intake and weight gain is an example of a high correlation.

The dog’s name and the pedigree they like is an example of low correlation or no relation at all. Knowing these
statistics of correlations is useful because you can make predictions about future behavior. Predicting what the
future holds is very useful in areas like healthcare, business, etc.

Correlation analysis is very important in the field of education and research. Correlation analysis is necessary for
the following:

·       Ascertaining features of psychological and educational tests (for instance, to find the validity, etc.)
·       Testing the consistency of the data for the hypothesis
·       Predicting the trends and anticipating a variable upon using the knowledge of other variables
·       To construct psychological and educational models and theories
·       Isolating influence of variables.

The Correlation Coefficient and Cohen’s D


·       Correlation Coefficient

Correlation coefficients are used to assign a value to the relationship. Correlation coefficients have a value between
-1 and 1. In this range, 0 indicates no relationship, 1 indicates a strong positive relationship, and -1 indicates a
strong negative relationship.

There are several types of correlation coefficient formulas. Few of them are the sample correlation coefficient,
population correlation coefficient, Pearson correlation coefficient, and Goodman and Kruskal’s lambda coefficient.

The below image shows the different correlation strengths:

·       Cohen’s D:

Cohen’s D is one of the most popular ways to measure relationship strength, effect size. For example, to know
whether Medication A has better effect than Medication B.

The formula for Cohen’s D for Welch test is:

M_1 \text {is the mean of the first group}, M_2 \text{ is the mean of the second group},
S_{pooled} \text{is the pooled standard deviation of the two groups}
whereM1 is the mean of the first group,M2 is the mean of the second group,Spooledis the pooled s
tandard deviation of the two groupswhere

here S_1S1is the standard deviation of the first group and S_2S2is the standard deviation of the second group

In the above formula, it does not matter which group mean is M₁, and which group mean is M₂. Usually, the larger
mean is M₁ and the smaller mean is M₂ so that Cohen’s D turns out to be positive. Cohen’s D values should
always be positive, so it is the absolute value of the difference between the means M₁ and M₂. The pooled
standard deviation in this formula is usually a kind of average of the two group standard deviations called the
pooled-within group's standard deviation.

Assessment:

Sampling Distributions

Supplemental Video:

Sampling Distribution
 https://youtu.be/z0Ry_3_qhDw
 https://youtu.be/p24UTvbKZog

Mean and Standard Deviation of Sampling Distribution


 https://youtu.be/Ly8oz-aL3IU
 https://youtu.be/Z66GkDW9l1A

Sampling Distribution of Sample Mean


 https://youtu.be/0Rxx1hb5Mmc

Statistics - How to Perform Linear Transformation of Non-linear bivariate relationship


 https://youtu.be/aw__r4XQsIw
 https://youtu.be/unCGTTdpZRw

Statistics - Interpreting Scatterplots and Correlation in Filipinohttps://youtu.be/yP3wIjfrxaY


 https://youtu.be/yP3wIjfrxaY
 https://youtu.be/BHX8NVdtExk

Statistics - Interpreting Scatterplots and Correlation in Filipinohttps://youtu.be/yP3wIjfrxaY


 https://youtu.be/BHX8NVdtExk
 https://youtu.be/XwEWvD0IbWw

Correlation Coefficient
 https://youtu.be/11c9cs6WpJU
 https://youtu.be/lVOzlHx_15s
 https://youtu.be/MR9M2zN0HFU

Cohens’d
 https://youtu.be/IetVSlrndpI
 https://youtu.be/GDe4M0xEghs
 https://youtu.be/5rXOy1S5bVk
 https://youtu.be/lTlEDQK0vQg

Resources:
 https://hmhub.me/quantitative-methods-of-forecasting/
 https://brilliant.org/wiki/bayes-theorem/
 https://www.wikilectures.eu/w/Statistical_Induction_Principle
 https://saylordotorg.github.io/text_introductory-statistics/s06-descriptive-statistics.html
 https://docs.dart.ucar.edu/en/latest/theory/conditional-probability-bayes-theorem.html
 https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
 https://eli.thegreenplace.net/2018/conditional-probability-and-bayes-theorem/
 https://dataz4s.com/statistics/sample-space-events-probabilities/

You might also like