You are on page 1of 7

STATISTICS AND PROBABILITY (STAT001) REVIEWER

--------------------------STATISTICS----------------------------- _____________________________________________
▪ a mathematical science including methods of collecting, DISCRETE PROBABILITY/ PROBABILITY MASS
organizing, and analyzing data in such a way that DISTRIBUTION
meaningful conclusions can be drawn from them. ▪ consists of the values a random variable can assume and
▪ derived from the Latin word, “statisticum collegium” the corresponding probabilities of the values.
(council of state) or Italian word, “Statista”, and meaning
of these words is “political state” or “government.” --PROPERTIES OF PROBABILITY DISTRIBUTION----
▪ the probability of each value of the random variable must
--------------------------PROBABILITY-------------------------- be between or equal to 0 and 1. In symbol, we write it as
▪ means possibility. 0 ≤ P(X) ≤ 1.
▪ a branch of mathematics that deals with occurrence of a ▪ the sum of the probabilities of all the values of the random
random event. variable must be equal to 1. In symbol, we write is as
▪ The value expressed from zero to one. Probability has ∑P(X) = 1.
been introduced in Maths to produce how likely events
MODULE 2: MEAN AND VARIANCE OF A DISCRETE
are to happen.
RANDOM VARIABLE
MODULE 1: RANDOM VARIABLES AND
-------MEAN OF PROBABILITY DISTRIBUTION--------
PROBABILITY DISTRIBUTION FOR DISCRETE
▪ The mean of a discrete random variable X is a weighted
RANDOM VARIABLE
average of the possible values that the random variable
----------------------RANDOM VARIABLE -------------------- can take.
- ▪ common symbol for the mean (also known as the
▪ is a function that associates a real number to each expected value of X) is µ.
element in the sample space. ▪ Mean: µ= E(x) = ∑ [x • P(x)]
▪ a variable whose values are determined by chance.
-----------MEAN OF THE RANDOM VARIABLE----------
--------------DISCRETE RANDOM VARIABLE------------- ▪ describes the measure of the so-called long run of
▪ if its set of possible outcomes is countable. theoretical average but it does not talk about the spread
▪ represent count data, e.g.: number of defective chairs of the distribution.
produced in a factory. ▪ This spread or variability is measured by the variance and
standard deviation.
-----------CONTINUOUS RANDOM VARIABLE-----------
_____________________________________________
▪ if it takes on values on a continuous scale.
VARIANCE AND STANDARD DEVIATION OF
▪ represent measured data, e.g.: heights, weights, and
PROBABILITY DISTRIBUTION
temperature.

---------------PROBABILITY DISTRIBUTION--------------- ---------------------------VARIANCE------------------------------


▪ is a statistical
function that ▪ is a measure of spread or dispersion.
describes all the ▪ measures the variation of the values of a random variable
possible values and from the mean.
likelihoods that a ▪ Symbol used is: 𝝈2. Its square root is called the standard
random variable can deviation; symbol: 𝝈.
take within a given ▪ Variance: 𝝈2=∑ [x2 • P(x)]- µ2.
range. This range ▪ Standard Deviation: 𝝈=√∑ [x2 • P(x)]- µ2.
will be bounded
between the
minimum and maximum possible values.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
MODULE 3: NORMAL DISTRIBUTION ------AREA UNDER STANDARD NORMAL CURVE-----

------PROPERTIES OF NORMAL DISTRIBUTION-------


▪ The graph is bell-shaped.
▪ The curve is symmetrical about its center.
▪ The mean, median, and mode coincide at the center.
▪ The width of the curved is determined by the standard
deviation of the distribution.
▪ The tails of the curve flatten out indefinitely along the
horizontal axis, always approaching the axis but never
touching it. That is, the curve is asymptotic to the base
line.
▪ The area under the curve is 1. Thus, it represents the
probability or proportion, or the percentage associated
with specific sets of measurement values.

------------------NORMAL DISTRIBUTION-------------------
▪ If a distribution consists of a very large number of cases
and the three measures of average (mean, median, and
mode) are equal, then the distribution is symmetrical.
▪ In Statistics, such distribution is called normal distribution
or simply normal curve. ▪ Area under the curve is 1. It represents the probability or
▪ Developed its equation in 18th century by Abrahan De proportion, or the percentage associated with specific
Moivre. sets of measurement values. Thus, we can use the area
▪ developed mathematical equation for the normal curve under the normal curve to find the probability.
from the study of errors. Also known as bell curve or
Gaussian Distribution in 19th century by Karl Freidrich
Gaus.

----------------STANDARD NORMAL CURVE--------------


-----------------------------Z-SCORES-----------------------------
▪ Represent the distances from the center measured in
standard deviation units.

-----PERCENTILE UNDER THE NORMAL CURVE-----


▪ The area under the curve is 1. Thus, it represents the
probability or proportion, or the percentage associated
with specific sets of measurement values.

▪ is a normal probability distribution that has a mean µ = 0


and a standard deviation σ = 1 and the values on the base
of the curve is what we call the z-scores.
▪ The area (shaded part) under the normal curve is equal
to 1. Mathematicians were able to find the areas under
the normal curve for our us.

Compiled by: Ezekiel Adduru


STATISTICS AND PROBABILITY (STAT001) REVIEWER
▪ https://geogebra.org/classic/dKvByYEt ▪ A sampling error is a deviation in sampled value versus
▪ Z-score formula: the true population value due to the fact the sample is not
representative of the population or biased in some way.
▪ An unbiased sample can be an accurate representation of
-----------------------------EXAMPLES--------------------------- the entire population and can help you draw conclusions
- about the population.

---------------PARAMETER & STATISTICS------------------

PARAMETER
▪ is a measure or characteristic obtained by using all the
data values in the population.
▪ can be numerical or nominal level of measurement and
is usually referred to as the true value of the population.
▪ used to describe a specific characteristic of the entire
population. (e.g., average age of all the employees in the
organization).
▪ The most commonly used parameters are the measures
of central tendency, specifically the population mean
which is denoted by the symbol µ.
▪ When making an inference (conclusion) about the
population, the parameter is unknown because it would
be impossible to collect information from every member
of the population. Rather, we use a statistic of a sample
picked from the population to derive a conclusion about
the parameter.

PARAMETER vs. STATISTICS


▪ A statistic describes a sample of a population while a
parameter describes the entire population.

STATISTICS
▪ estimate the parameter since the parameter is unknown.
▪ Sampling errors are statistical errors that arise when a
sample does not represent the whole population. They
are the

MODULE 4: SAMPLING DISTRIBUTION

--------------------RANDOM SAMPLING----------------------
-
▪ Randomized samples will have some sampling error
since it is only an approximation of the population from
which it is drawn.
▪ A sampling error is a deviation in sampled value versus
the true population value due to the fact the sample is not difference between the real values of the population and
representative of the population or biased in some way. the values derived by using samples from the population.
▪ Note: The number of samples of size n that can be drawn
from a population of size N is given by NCn.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
▪ The probability distribution of the sample means is called MODULE 6: SAMPLING DISTRIBUTION OF THE
the sampling distribution of the sample means. SAMPLE MEANS
MODULE 5: MEAN AND VARIANCE OF SAMPLING
_____________________________________________
DISTRIBUTION
VARIANCE AND STANDARD DEVIATION OF
-------------MEAN OF THE SAMPLE MEANS-------------- SAMPLING DISTRIBUTION
- ▪ The standard deviation (σ ) of the sampling distribution
of the sample means is also known as the standard error
▪ If repeated random samples of a given size n are taken
of the mean.
from a population of values for a quantitative variable,
▪ measures the degree of accuracy of the sample mean ( )
where the population mean is μ (mu) and the population
as an estimate of the population mean.
standard deviation is σ (sigma) then the mean of all
▪ A good estimate of the mean is obtained if the standard
sample means (x-bars) is the population mean μ (mu).
error of the mean is small or close to zero, while a poor
_____________________________________________ estimate, is the standard error of the mean is large.
VARIANCE AND STANDARD DEVIATION OF ▪ Thus, if we want to get a good estimate of the population
SAMPLING DISTRIBUTION mean, we have to make n sufficiently large. This fact is
▪ Standard Deviation (standard error of the mean) of the stated as a theorem, which is known as The Central Limit
sampling distribution: Theorem.
▪ Variance of the sampling distribution:

SYMBOLS:
σ =population standard deviation
σ2 =population variance --------------------CENTRAL LIMT THEOREM--------------
N = population size
n = sample size
-----------------------------EXAMPLE-----------------------------
▪ The population has a mean of 30 and a standard
deviation of 5. The sample size n of the sampling
distribution is 9. What is the variance of the sampling
▪ Note: as you increase the value of n from 1 the sampling
distribution?
distribution of the mean (blue) approaches the normal
▪ GIVEN:
distribution (red).
µ=30
𝝈=5 -----------------------------EXAMPLE-----------------------------
n=9
▪ A survey has found out that a family generates an average
▪ FIND:
of 17.2 pounds of garbage per week. Assume that the
? standard deviation of the distribution is 2.5 pounds. Find
the probability that the mean of a sample of 55 families
▪ FORMULA:
will be between 17 and 18.

Given:
▪ SOLUTION:
µ = 17.2
52 σ = 2.5
9 𝑥 = 17 and 18
▪ ANSWER: 2.78 n = 55

Compiled by: Ezekiel Adduru


STATISTICS AND PROBABILITY (STAT001) REVIEWER
P (17< 𝑥<18) =0.7135 ▪ allows us to conduct statistical analyses on certain data
sets that are not appropriate for analysis, using the
The probability that the
normal distribution.
mean of a sample of 55
▪ The particular form of the t distribution is determined by
families will be
its degrees of freedom. The degrees of freedom refers to
between 17 and 18 is 0.7135.
the number of independent observations in a set of data.
_____________________________________________ _____________________________________________
KEY TAKEAWAYS ON USING THE GEOGEBRA PROPERTIES OF STUDENT’S T- DISTRIBUTION
PROBABILITY CALCULATOR ▪ The distribution curve is bell-shaped.
https://geogebra.org/classic/dKvByYEt ▪ The curve is symmetrical about the mean, which is
located at the center.
Greater than/at least: Right side or ([) ▪ The mean, the median, and the mode coincide at the
Less than/most at: Left side or (]) center.
Between: Middle or ([]) ▪ The tails of the curve flatten out indefinitely along the
In encoding the values, remember to put the lower values horizontal axis, always approaching the axis but never
first. touching it. That is, the curve is asymptotic to the
MODULE 7: T-DISTRIBUTION (STUDENT’S T- baseline.
DISTRIBUTION) ▪ The area under the curve is 1. Thus, it represents the
---------------CENTRAL LIMIT THEOREM------------------ probability or proportion, or the percentage associated
▪ According to the central limit theorem, the sampling with specific sets of measurement values.
distribution of a statistic (like a sample mean) will follow -----------------T-DISTRIBUTION TABLE--------------------
a normal distribution, as long as the sample size is
sufficiently large.
▪ Therefore, when we know the standard deviation of the
population, we can compute a z-score, and use the
normal distribution to evaluate probabilities with the
sample mean.
▪ But sample sizes are
sometimes small, and often
we do not know the
standard deviation of the
population. When either of
these problems occur,
statisticians rely on the
distribution of the t statistic
(also known as the t score), whose values are given by the
image on the right.

-----------------STUDENT T-DISTRIBUTION----------------
▪ Was formulated in 1908 by an Irish brewing employee,
William Sealy Gosset. — involved in researching new
methods of manufacturing ale and was concerned with
smaller sample sizes for quality control. Because brewing
employees were not allowed to publish results, Gosset
published his findings under the pseudonym “Student.”
Hence, the t-distribution is also called Student’s t-
distribution.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
-------------WHEN TO USE T-DISTRIBUTION? ----------- ▪ By looking at the t-distribution table, the value of area in
▪ The population distribution is normal. one tail and the degree of freedom intersected at: 1.782,
▪ The population distribution is symmetric, unimodal, therefore the z-score is +1.782, and the area is 0.0374.
without outliers, and the sample size is less than 30.
MODULE 8: CONFIDENCE LEVEL AND
--------------HOW TO USE T-DISTRIBUTION? ------------
CONFIDENCE INTERVAL
To find the T critical value, you need to specify:
----------------POPULATION PROPORTION ---------------
▪ The type of test (one-tailed or two-tailed).
▪ A population proportion (p) is a fraction of the population
▪ A significance level (common choices are 0.01, 0.05, and
that has a certain characteristic.
0.10).
▪ For example, a veterinary clinic reports that out of 3,412
▪ The degrees of freedom: to determine, subtract the
animals registered at the clinic, 1,712 are dogs, 1,012 are
sample size to 1.
cats and the rest are rodents or birds. What is the
population proportion, p, for dogs at the clinic?
Note:
▪ The number of dogs is 1,712 and the total number of
▪ If two-tailed, the sign is ±.
animals is 3,412. Therefore, p = 1,712/3,412. As a
▪ If one-tailed, left-tailed, the t-value is negative (-).
decimal, that’s p = 1712/3412 = 0.502 (to two decimal
▪ If one-tailed, right-tailed, the t-value is positive (+).
places).
------FINDING PERCENTILE AND PROBABILITY------ ▪ To get “p”, just divide the total population (for the above
question, that’s animals in the
▪ When a sample of size n is clinic) by the number of items
drawn from a population you’re interested in (in the
having a normal (or nearly above case, that’s dogs). As a formula, it’s written as the
normal) distribution, the figure on the right, where:
sample mean can be ▪ x- the number of items you’re interested in.
transformed into a t-score, ▪ n- the total number of items in the population.
using the equation ▪ In the real world, you usually don’t know facts about the
presented at the beginning entire population and so you use sample data to estimate
of this lesson. We repeat that equation on the right: p. This sample proportion is written as p̂, pronounced p-
hat.
Where: ✓ It’s calculated in the same way, except you use
▪ x̄ is the sample mean data from a sample: just divide the total number of
▪ μ is the population mean items in the sample by the number of items you’re
▪ s is the standard deviation of the sample interested in.
▪ n is the sample size and;
▪ degrees of freedom are equal to n - 1. (df = n-1). -----------------CONFIDENCE INTERVAL--------------------
▪ Similar to the z-distribution, the areas under the bell curve
▪ is how much uncertainty there is with any particular
correspond to probabilities under the t-distribution.
statistic.
Hence, we can find probabilities/percentiles by finding
▪ are often used with a margin of error.
areas under the t-distribution.
▪ tells you how confident you can be that the results from
-----------------------------EXAMPLES--------------------------- a poll or survey reflect what you would expect to find if it
- were possible to survey the entire population.
What is the 95th percentile under the t-distribution whose ▪ are intrinsically connected to confidence levels.
degrees of freedom is 12? ▪ Confidence levels are expressed as a percentage (for
example, a 95% confidence level). It means that should
A: Since 95th percentile is above 50th percentile the graph you repeat an experiment or survey over and over again,
is in the right, such that the area is positive given the 95 percent of the time your results will match the results
degrees of freedom 12, and1-.95 is = 0.05.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
you get from a population (in other words, your statistics _____________________________________________
would be sound!). SUMMARY OF SOLUTION WITH SYMBOLS AND
▪ Confidence intervals are your results, and they are usually LEGEND
numbers.

--------LENGTH OF A CONFIDENCE INTERVAL--------


▪ If a confidence interval for a parameter p (population
proportion) is 45% - 60%. Then the length of the interval
is simply the difference in the two endpoints.
▪ We are most interested, of course, in obtaining
confidence intervals that are as narrow as possible.
▪ So, what can we do to
ensure that we obtain
as narrow an interval
as possible? Well, in
the case of the Z-interval, the length is:
▪ WHERE:
✓ n = sample size
✓ p̂ = number of desired outcomes
numbers of sample
✓ q̂ = 1 - p̂

▪ Confidence level refers to the percentage of probability,


or certainty, that the confidence interval would contain
the true population parameter when you draw a random
sample many times. Or, in the vernacular, "we are 99%
certain (confidence level) that most of these samples
(confidence intervals) contain the true population
parameter."

----------------------------SAMPLE SIZE-------------------------
-
▪ Sample Size when Estimating Population
Mean µ:
▪ Note: Sample size must be a whole number. Always
round up your answer to whole number.
▪ Sample Size when Estimating Population
Proportion p:
▪ Estimating Population Proportion when
both p̂ &q̂ are unknown:

Compiled by: Ezekiel Adduru

You might also like