Professional Documents
Culture Documents
--------------------------STATISTICS----------------------------- _____________________________________________
▪ a mathematical science including methods of collecting, DISCRETE PROBABILITY/ PROBABILITY MASS
organizing, and analyzing data in such a way that DISTRIBUTION
meaningful conclusions can be drawn from them. ▪ consists of the values a random variable can assume and
▪ derived from the Latin word, “statisticum collegium” the corresponding probabilities of the values.
(council of state) or Italian word, “Statista”, and meaning
of these words is “political state” or “government.” --PROPERTIES OF PROBABILITY DISTRIBUTION----
▪ the probability of each value of the random variable must
--------------------------PROBABILITY-------------------------- be between or equal to 0 and 1. In symbol, we write it as
▪ means possibility. 0 ≤ P(X) ≤ 1.
▪ a branch of mathematics that deals with occurrence of a ▪ the sum of the probabilities of all the values of the random
random event. variable must be equal to 1. In symbol, we write is as
▪ The value expressed from zero to one. Probability has ∑P(X) = 1.
been introduced in Maths to produce how likely events
MODULE 2: MEAN AND VARIANCE OF A DISCRETE
are to happen.
RANDOM VARIABLE
MODULE 1: RANDOM VARIABLES AND
-------MEAN OF PROBABILITY DISTRIBUTION--------
PROBABILITY DISTRIBUTION FOR DISCRETE
▪ The mean of a discrete random variable X is a weighted
RANDOM VARIABLE
average of the possible values that the random variable
----------------------RANDOM VARIABLE --------------------- can take.
▪ is a function that associates a real number to each ▪ common symbol for the mean (also known as the
element in the sample space. expected value of X) is µ.
▪ a variable whose values are determined by chance. ▪ Mean: µ= E(x) = ∑ [x • P(x)]
------------------NORMAL DISTRIBUTION-------------------
▪ If a distribution consists of a very large number of cases
and the three measures of average (mean, median, and
mode) are equal, then the distribution is symmetrical.
▪ In Statistics, such distribution is called normal distribution
or simply normal curve. ▪ Area under the curve is 1. It represents the probability or
▪ Developed its equation in 18th century by Abrahan De proportion, or the percentage associated with specific
Moivre. sets of measurement values. Thus, we can use the area
▪ developed mathematical equation for the normal curve under the normal curve to find the probability.
from the study of errors. Also known as bell curve or
Gaussian Distribution in 19th century by Karl Freidrich
Gaus.
STATISTICS
▪ estimate the parameter since the parameter is unknown.
▪ Sampling errors are statistical errors that arise when a
sample does not represent the whole population. They
are the difference between the real values of the
population and the values derived by using samples from
the population.
--------------------RANDOM SAMPLING-----------------------
▪ Randomized samples will have some sampling error
since it is only an approximation of the population from
which it is drawn.
▪ A sampling error is a deviation in sampled value versus
the true population value due to the fact the sample is not
representative of the population or biased in some way.
▪ A sampling error is a deviation in sampled value versus ▪ Note: The number of samples of size n that can be drawn
the true population value due to the fact the sample is not from a population of size N is given by NCn.
representative of the population or biased in some way. ▪ The probability distribution of the sample means is called
the sampling distribution of the sample means.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
MODULE 5: MEAN AND VARIANCE OF SAMPLING MODULE 6: SAMPLING DISTRIBUTION OF THE
DISTRIBUTION SAMPLE MEANS
SYMBOLS:
σ =population standard deviation
σ2 =population variance
N = population size --------------------CENTRAL LIMT THEOREM--------------
n = sample size
-----------------------------EXAMPLE-----------------------------
▪ The population has a mean of 30 and a standard
deviation of 5. The sample size n of the sampling
distribution is 9. What is the variance of the sampling
distribution?
▪ Note: as you increase the value of n from 1 the sampling
▪ GIVEN:
distribution of the mean (blue) approaches the normal
µ=30
distribution (red).
𝝈=5
n=9 -----------------------------EXAMPLE-----------------------------
▪ FIND:
▪ A survey has found out that a family generates an average
? of 17.2 pounds of garbage per week. Assume that the
standard deviation of the distribution is 2.5 pounds. Find
▪ FORMULA:
the probability that the mean of a sample of 55 families
will be between 17 and 18.
▪ SOLUTION:
Given:
52
µ = 17.2
9 σ = 2.5
▪ ANSWER: 2.78 𝑥 = 17 and 18
n = 55
-----------------STUDENT T-DISTRIBUTION----------------
▪ Was formulated in 1908 by an Irish brewing employee,
William Sealy Gosset. — involved in researching new
methods of manufacturing ale and was concerned with
smaller sample sizes for quality control. Because brewing
employees were not allowed to publish results, Gosset
published his findings under the pseudonym “Student.”
Hence, the t-distribution is also called Student’s t-
distribution.
Compiled by: Ezekiel Adduru
STATISTICS AND PROBABILITY (STAT001) REVIEWER
-------------WHEN TO USE T-DISTRIBUTION? ----------- ▪ By looking at the t-distribution table, the value of area in
▪ The population distribution is normal. one tail and the degree of freedom intersected at: 1.782,
▪ The population distribution is symmetric, unimodal, therefore the z-score is +1.782, and the area is 0.0374.
without outliers, and the sample size is less than 30.
MODULE 8: CONFIDENCE LEVEL AND
--------------HOW TO USE T-DISTRIBUTION? ------------
CONFIDENCE INTERVAL
To find the T critical value, you need to specify:
----------------POPULATION PROPORTION ---------------
▪ The type of test (one-tailed or two-tailed).
▪ A population proportion (p) is a fraction of the population
▪ A significance level (common choices are 0.01, 0.05, and
that has a certain characteristic.
0.10).
▪ For example, a veterinary clinic reports that out of 3,412
▪ The degrees of freedom: to determine, subtract the
animals registered at the clinic, 1,712 are dogs, 1,012 are
sample size to 1.
cats and the rest are rodents or birds. What is the
population proportion, p, for dogs at the clinic?
Note:
▪ The number of dogs is 1,712 and the total number of
▪ If two-tailed, the sign is ±.
animals is 3,412. Therefore, p = 1,712/3,412. As a
▪ If one-tailed, left-tailed, the t-value is negative (-).
decimal, that’s p = 1712/3412 = 0.502 (to two decimal
▪ If one-tailed, right-tailed, the t-value is positive (+).
places).
------FINDING PERCENTILE AND PROBABILITY------ ▪ To get “p”, just divide the total population (for the above
question, that’s animals in the
▪ When a sample of size n is clinic) by the number of items
drawn from a population you’re interested in (in the
having a normal (or nearly above case, that’s dogs). As a formula, it’s written as the
normal) distribution, the figure on the right, where:
sample mean can be ▪ x- the number of items you’re interested in.
transformed into a t-score, ▪ n- the total number of items in the population.
using the equation ▪ In the real world, you usually don’t know facts about the
presented at the beginning entire population and so you use sample data to estimate
of this lesson. We repeat that equation on the right: p. This sample proportion is written as p̂, pronounced p-
hat.
Where: ✓ It’s calculated in the same way, except you use
▪ x̄ is the sample mean data from a sample: just divide the total number of
▪ μ is the population mean items in the sample by the number of items you’re
▪ s is the standard deviation of the sample interested in.
▪ n is the sample size and;
▪ degrees of freedom are equal to n - 1. (df = n-1). -----------------CONFIDENCE INTERVAL--------------------
▪ Similar to the z-distribution, the areas under the bell curve
▪ is how much uncertainty there is with any particular
correspond to probabilities under the t-distribution.
statistic.
Hence, we can find probabilities/percentiles by finding
▪ are often used with a margin of error.
areas under the t-distribution.
▪ tells you how confident you can be that the results from
-----------------------------EXAMPLES---------------------------- a poll or survey reflect what you would expect to find if it
What is the 95th percentile under the t-distribution whose were possible to survey the entire population.
degrees of freedom is 12? ▪ are intrinsically connected to confidence levels.
▪ Confidence levels are expressed as a percentage (for
A: Since 95th percentile is above 50th percentile the graph example, a 95% confidence level). It means that should
is in the right, such that the area is positive given the you repeat an experiment or survey over and over again,
degrees of freedom 12, and1-.95 is = 0.05. 95 percent of the time your results will match the results
----------------------------SAMPLE SIZE--------------------------
▪ Sample Size when Estimating Population
Mean µ:
▪ Note: Sample size must be a whole number. Always
round up your answer to whole number.
▪ Sample Size when Estimating Population
Proportion p:
▪ Estimating Population Proportion when
both p̂ &q̂ are unknown: