Statistics Review Notes

STATISTICS: REVIEW NOTES
Melissa E. Agulto, Ph.D.

Professor, Central Luzon State University
I. BASIC CONSIDERATIONS
I.1. Statistics, Defined
Singular Sense: The science that deals with the collection, organization, presentation, analysis and
interpretation of data
Plural Sense: A set of numerical information; a processed data set. Examples are: statistics on
enrollment, graduates
1.2 Two Phases/Fields of Statistics
Descriptive Statistics – deals with the methods of collecting/gathering. organizing, summarizing,

presenting data and their interpretation. Examples are: measures of central tendency, variability,
skewness
Inferential Statistics – deals with making generalizations or drawing conclusions/judgments about the
entire set of data based on the analysis of a subset of data. Examples are: sampling and sampling
distributions, estimation, hypothesis testing
1.3 Basic Terms
Universe – the set of all entities under study; addresses the question, “Who do we want to study?”
Variable – a quantity that can assume any of the range or prescribed set of values; a characteristic that
manifests differences/variation in magnitudes; the characteristic that is measurable/observable in all
units in the universe; answers the question, “What do we wish know from the elements of the
universe?”
Constant – a quantity that takes a single fixed value
Qualitative Variables – those that cannot be ordered in some dimension; categorical data; classifications
are simply labels, assuming alphanumeric possible values
Quantitative Variables – those that assume numerical data
Discrete Data – those that are usually associated with count values
Continuous Data – those that are usually associated with measurement values
Distribution – the pattern of variation of the variable, displaying how often each value occurs in the data
set
1.4 Data Gathering
Ways of Data Gathering

 Objective Method – done by getting measurements, making direct direct observations
 Subjective Method – done by asking the respondent for the data/information required
 Use of Existing Records – done by utilizing the data previously gathered by certain
persons/agencies
Types of Data
 Primary Data – data gathered by the user directly from the units in the universe
 Secondary Data – data gathered not directly from the units in the universe
Raw Data – the data as gathered/collected
Array – arrangement of raw numerical data in ascending or descending order of magnitude

Grouped Data – arrangement/organization of data in a frequency distribution form
1.5. Methods of Data Presentation
Textual Method – uses sentences and paragraphs, giving a narrative that describes the characteristics
of the universe or the population based on the data gathered and organized
Tabular Method – data are organized in terms of rows and columns to present the most number of
information
Graphical method – uses pictures, figures to display trends and distribution of data. Examples are: bar
graph, line graph, pie diagram, pictograph, statistical map, histogram, frequency polygon, ogive.
 Bar graph – represents the frequency or magnitude of quantities of each of the categories as a
bar rising vertically (or horizontally) from the horizontal (or vertical) axis, with the height (or
width) of each bar being proportional to the frequency or magnitude of the corresponding
category
 Line graph – obtained by plotting the frequency of a category above the point on the horizontal
axis representing the category, and then joining the points with a straight line.
 Pie diagram – a circle subdivided into a number of slices that represent the different categories,
and with each slice proportional to the percentage corresponding to the category
 Pictograph – makes use of symbols and is used to compare few discrete data, usually of one
kind
 Statistical map – shows the geographical location and may contain different symbols on the
map. This should carry a legend to tell the meaning of the symbols
 Histogram – a bar graph associated with a frequency distribution table. It is constructed by
marking off the true class boundaries along the horizontal axis and erecting over each class
interval a rectangle whose height is equal to the frequency of the class.
 Frequency polygon – a line graph associated with a frequency distribution table. It is
constructed by plotting the class mark vs. the frequency for the class and then joining the points
with a straight line.
 Ogive – a graphical representation of the cumulative frequency of a frequency distribution table
1.6. Frequency Distribution Table, FDT
A tabular presentation of a given set of data that presents the classes/categories established for the data
and the frequency of observations falling under a specified class/category
Construction of a Quantitative FDT

a. Determine the lowest value (LV) and the highest value (HV) in the set of data to compute the
range, R = HV – LV
b. Determine the number of classes, k = √N , where N is the number of observations in the data set
and k being rounded off to the nearest integer. Each class is an interval of values defined by its
lower limit (LL) and upper limit (UL).
c. Obtain the class size or class width, c = R/k, rounding off c to the nearest value with precision the
same as those of the raw data.
d. Construct the classes as follows:
The LL of the lowest class is LV. The lower limits of the succeeding classes are set by adding c
to the lower limit of the preceding class.
The UL of the lowest class is set as the lower limit of the next class minus one unit of measure.
The upper limits of the succeeding classes are computed by adding c to the UL of the
preceding class.
Tally the data by counting the frequency or number of observations that belong to each of the
classes
e. Construct other columns of information, such as:
2
 Class Mark or Midpoint – obtained by adding the lower and the upper class limits and
dividing by two.
 Relative Frequency – the frequency of a class expressed as a percentage of the total number
of observations
 Cumulative Frequency
< CF – the number of observations less than or equal to the upper limit of a class
> CF – the number of observations greater than or equal to the lower limit of a class
 Relative Cumulative Frequency – the cumulative frequency expressed as a percentage of the
total number of observations
II. ELEMENTS OF SAMPLING AND DESCRIPTIVE STATISTICS
2.1. Definition of Terms
Population – the totality of all possible values (measurements, counts, etc,) of a particular characteristic for
a specific group of objects
Sample – a part of a population selected according to some rule or plan. It is desired that the sample be
representative of the population
Parameter – a value computed from a population; a number describing some property of a population
Statistic – a value computed from a sample; a number describing some property of a sample.
Sampling – the process of selecting a part of the universe or the population
Sampling Design – the set of rules or procedures employed in selecting the sample, including the sampling
scheme or the manner by which the samples are taken and the sample size which is the number of
sample units taken from the universe or population.
2.2. Methods of Sampling
Non-probability Sampling – the elements of the universe or population have no known chance of being
taken in the sample
Probability Sampling –assigns a known probability of selection for all possible samples; allows for the
computation of sampling error, or the error in inference inherent to the fact that what was observed
was only a sample
2.3. Sampling Procedures
Probability Sampling Procedures

 Simple Random Sampling – the elements of the universe or population have equal chance of being
included in the sample; applicable when the universe is believed to be homogeneous
 Stratified Random sampling – the elements of the universe/population are first grouped into strata and
simple random samples are taken from each stratum; applicable under the following situations:
information is required for certain subdivisions of the population; the population is extremely
heterogeneous; the problem of sampling may differ in different parts of the population
 Cluster Sampling – the elements are grouped into clusters, for example, geographical location, and a
simple random sample of clusters is selected and all the elements of the selected clusters are included
in the sample
 Systematic Sampling- adopts a skipping pattern in the selection of the sample units; the only sampling
scheme that allows sample selection without a sampling frame
 Multi-stage Sampling – characterized by sampling being done in stages before the ultimate sampling
units are selected
Non-probability Sampling Procedures

 Purposive sampling
 Quota sampling
 Judgment sampling
 Accidental sampling
3
2.4 Some Descriptive Statistics
Measures of Central Tendency – values computed from the data that tend to center or cluster around
a. Arithmetic Mean – the arithmetic average of all the values

The sample mean, for ungrouped data, is computed as:
_
X = ∑ Xi / n where n = sample size
For grouped data

_
X = ∑ fi Xi / ∑ fi where fi = frequency of the ith class
Xi = class mark for the ith class
The population mean, μ, for ungrouped data is:
μ = ∑ Xi / N where N = population size
Properties:
 The sum of the deviations from the arithmetic mean is zero
 The sum of the squares of the deviations from the arithmetic mean is less than the sum of the
squares of the deviations from any value
Advantages:
 The most commonly used average
 Easy to compute
 Easily understood
 Lends itself to algebraic manipulation
Disadvantage:
 Unduly affected by extreme values and may therefore be far from representative of the sample
_
b. Weighted Mean, Xw – an average of n quantities by attaching more significance (or weight) to some
of the numbers than to others.
_ n n
Xw = ∑ Wi Xi / ∑ Wi where: W = assigned weight for the ith quantity
i=1 i =1
c. Geometric Mean, G = the nth root of the product of k positive numbers; used primarily to average
data for which the ratio of consecutive terms remains approximately constant, which occurs with such
data as rates of change, ratios, etc.
G == ( X1 X2 … Xn) 1/n
d. Harmonic Mean, H - for n numbers, the number n divided by the sum of the reciprocals of the n
numbers; most frequently used in averaging speeds for various distances covered where the distances
remain constant, also in finding the average cost of common commodity when several different
purchases are made by investing the same amount of money each time.
H = n / ( ∑(1/Xi) )
e. Midrange, MR
MR = (Xmin + Xmax)/2
4
Quick and easy to compute but is often inefficient because all the information contained in the
intermediate values has been ignored.
f. Mode – the value that occurs most frequently (absolute mode). There could be several modes; a
relative mode is a value that occurs more frequently than neighboring values even if it is not an
absolute mode
For grouped data, the mode Mo is:
Mo = LMo + [d1/(d1+d2)] (w) where: LMo = lower limit of the modal class
d1 = the difference sign neglected between the frequency of
the modal class and the frequency of the preceding
class
d2 = the difference, sign neglected, between the frequency of
the modal class and the frequency of the following class
w = width of the modal class
g. Median – Half of the observations should have a value less than the median and half should have a
value greater than the median
The median, Md, for ungrouped data is computed as:
Md = X (N+1)/2 if N is odd
= ½ ( X N/2 + X(N/2)+1 ) if N is even
For grouped data,
(N+1)/2 - S
Md = LMd + --------------- w where: LMd = lower limit of the median class
fMd N = number of observations in the
sample
S = sum of the frequencies in all classes preceding the
median class
fMd = frequency of the median class
w = width of the median class
h. Percentile, Decile, and Quartile Limits

Percentiles – values dividing the array into 100 equal parts
Deciles – values dividing the array into 10 equal parts
Quartiles – values dividing the array into 4 equal parts
The jth Percentile is obtained as follows:

 Arrange the data in increasing order
 Compute for k
k = (j /100) N where N = total number of observations
 If k is a whole number, then the jth percentile is the aerage of the values in the kth and (k + 1)th
position. Otherwise, it is the observation in the next higher whole number position.
Measures of Dispersion – values used to describe the extent of dispersion or variability of data
a. Range – the difference between the largest and the smallest measurement in the data set
R = Xmax - Xmin
b. Mean Deviation, MD – the arithmetic mean of the absolute deviations from the mean _
∑ │Xi - X│
MD = ----------------
n
5
c. Variance
For population variance, σ2

∑(Xi - µ)2
σ2 = -------------
N
For sample variance, S2

_
∑(Xi - X)2
S2 = ------------- where: n – 1 = degree of freedom, or the number of
n–1 values that are free to vary after
certain restrictions have been placed
upon the data
The computing formula is:
n ∑ Xi2 - ( ∑ Xi )2
2
S = ------------------------
n (n – 1)
For grouped data, the sample variance is obtained from:

n ∑ fi Xi2 - ( ∑ fi Xi )2
S2 = -----------------------------
n (n – 1)
d. Standard Deviation – the square root of the variance
e. Coefficient of Variation, CV – a measure of variation relative to the mean; An ideal device for
comparing the variation in two series of data that are measured in two different units, e.g. a
comparison of variation in height with variation in weight
_
CV = S / X
f. Quartile Deviation or Semi-Interquartile Range, Q – points out that 50 per cent of the total
distribution is comprised of variates lying between the first and third quartiles
Q = ½ ( Q3 - Q1)
Measures of Skewness – values measuring the extent of departure of the distribution from symmetry
a. Pearson’s First Coefficient of Skewness

_
Skewness = ( X - Mo) / S
b. Pearson’s Second Coefficient of Skewness, SK

_
SK = 3 ( X – Md)/ S If SK = 0, distribution of data is symmetric
SK < 0, distribution is negatively skewed, or
6
the frequency curve of the distribution
has a longer tail to the left of the
central maximum than to the right
SK > 0, distribution is positively skewed
c. Moment Coefficient of Skewness, a3

_
m3 ∑ ( X - X) 3 / n
a3 = -------- = ------------------- If a3, distribution is perfectly symmetrical
S3 S3 such as the normal curve
Measures of Kurtosis – a value that measures the flatness or peakedness of the distribution of data,
usually taken relative to a normal curve
 Leptokurtic – a distribution having a relatively high peak
 Platykurtic – a flat-topped distribution
 Mesokurtic – a distribution which is not very peaked or very flat-topped like the normal distribution
a. Moment Coefficient of Kurtosis, a4

m4 ∑ ( X - X) 4 / n
a4 = -------- = ------------------- If a4 = 3, distribution is normal
S4 ( S2) 2
b. Kurtosis, K
K = ( a4 - 3) If K = 0, distribution is normal
K > 0, distribution is leptokurtic
K < 0, distribution is platykurtic
III. SETS AND PROBABILITY
3.1 Terms, Defined
Set – a well-defined collection of objects, e.g. the rivers in the Philippines, the monthly rainfall in CLSU
Element – each object in a set; a member of the set
Equal Sets – sets having exactly the same elements
Null Set or Empty Set – a set that contains no elements; the symbol “Ø” denotes this empty set
Subset – if every element of a set A is also an element of a set B, then A is called a subset of B
B = { 1, 2, 3, 4, 5, 6 }
A = { 2, 4, 6 }
or A C B where “ C” reads “ is a subset of” or *is contained in”
Sample Space – a set whose elements represent all possible outcomes of an experiment. For example, in
tossing a die,
S = { 1, 2, 3, 4, 5, 6 }
Sample Point – an element of a sample space
7
Event – a subset of a sample space
Simple Event – if an event is a set containing only one element of the sample space
Compound Event – an event which can be expressed as the union of simple events
Let S = { heart, spade, club, diamond }
A = { heart } is a simple event of drawing a heart from a deck of 52 cards
B = { heart U diamond } is a compound event of drawing a red card
3.2. Ways of Describing a Set

a. List the members separated by a comma and enclosed in braces, if the set has a finite number of
elements
A = { 1, 2, 3, 4, 5, 6 }
B = { H, T }
b. Give a statement or rule, e.g. “C is the set of all X such that X is a month CLSU with rainfall over 5
inches”, or
C = {X/X is a month in CLSU with rainfall over 5 inches}
3.3. Set Operations

a. The intersection of two sets A and B is the set of elements that are common to A and B.
Let A = { 1, 2, 3, 4, 5 }
B = { 2, 4, 6, 8 }
Then,
A ∩ B = { 2, 4 }
Let P = { a, e, i, o, u }
Q = { r, s, t }
Then,
P∩Q=Ø Sets P and Q are disjoint; that is P and Q have no elements in common
b. The union of two sets A and B is the set of elements that belong to A or to B, or to both.
Let A = { 2, 3, 5, 8 }
B = { 3, 6, 8 }
Then,
A U B = { 2, 3, 5, 6, 8 }
c. If A is a subset of the universal set U, then the complement of A with respect to U is the set of all
elements of U that are not in A. The complement of A is denoted by A’.
From the above definitions/operations, it should be noted that:

 A∩Ø = Ø
 A U Ø = A
 A ∩A’ = Ø
 A U A’ = U
 U’ = Ø
 Ø’ = U
 (A’)’ = A
3.4. Counting Sample Points
Principles of Counting
 Multiplication Rule
 Permutation
 Combination
Multiplication Rule
8
 Theorem 1. If an operation can be performed in n1 ways, and if for each of these a second
operation can be performed in n2 ways, then the two operations can be performed together in
(n1)( n2) ways.
Example. How many sample points are in the sample space when a pair of dice is thrown once?
Solution. The first die can land in any one of 6 ways. For each of these 6 ways, the second die can
also land in 6 ways. Therefore, the pair of dice can land in (6) (6) = 36 ways
 Theorem 2. Generalized Multiplication Rule. If an operation can be performed in n1 ways, and if

for each of these a second operation can be performed in n2 ways, and for each of the first two, a
third operation can be performed in n3 ways, and so forth, then the sequence of k operations can be
performed in
(n1)(n2)(n3) …(nk) ways
Example. Registrants at a large convention are offered 5 sightseeing tours on each of 3 days. In
how many ways can a person arrange to go on a sightseeing tour planned by this convention (a)
with repetition, (b) without repetition.
Solution.
a. N = (5)(5)(5) = 125 ways
b. N = (5)(4)(3) = 60 ways
Permutations – an arrangement of all or part of a set of objects where the composition of the group and
the order of the items within the group are both important
 Theorem 3. The number of permutations of n distinct objects taken n at a time is
n!
Example. How many distinct permutations can be made from the letters of the word “help”
Solution. N = n!
= 4!
= (4)(3)(2)(1) = 24
 Theorem 4. The number of permutations of n distinct objects taken r at a time is
n!
nPr = ---------
(n-r)!
Example. How many three digit numbers can be formed from the digits
2 3 4 5 6 7 and 8 if each digit can be used only once?
Solution.
7! 7! (7)(6)(5)(4!)
7P3 = -------- = ------ = --------------- = (7)(6)(5) = 210
(7-3)! 4! 4!
 Theorem 5. The number of permutations of n distinct objects arranged in a circle is

( n – 1)!
Example. In how many ways can 11 different bushes be planted in a circular arrangement?
Solution.
N = (11 – 1)! = 10! = 3 628 800 ways
 Theorem 6. The number of distinct permutations of n things of which n1 are of one kind, n2 are of
a second kind, …, nk of a kth kind is
n!
---------------------
n1! n2! … nk!
Example. How many distinct permutations can be formed from the letters of the word
“PHILIPPINES”
Solution.
P=3 H=1 I=3 L=1 N=1 E =1 S =1
9
11!
11 P 3, 3, 1, 1, 1, 1 or N = -------------------- = 1108800 ways
3! 3! 1! 1! 1! 1!
Combination – the number of selections of r objects from n objects without regard to order ; a group of
objects where the composition of the group, but not the order, is important
 Theorem 7. The number of combinations of n distinct objects taken r at a time is
n!
n C r = --------------
(n – r)! r!
Example. How many ways are there to select 3 candidates from 9 equally qualified recent
graduates for openings in an engineering firm?
Solution.
9!
9 C 3 = --------- = 84
6! 3!
3.5. Probability, Defined

A value between 0 and 1, inclusive of limits, that measures how likely a particular event will occur
 A priori Approach – defines the probability of an event A as the number of sample points in the
event A, n(A), divided by the number of sample points in the sample space, n(S)
P(A) = n(A) / n(S)
Example. One card is selected at random from 50 cards carrying numbers from 1 to 50. Find the
probability that the number of the card is (a) divisible by 10; (b) an odd number.
Solution.
Let T be the event that the card carries a number that is divisible by 10.
T = { 10, 20, 30, 40, 50}
P (T) = 5/50 = 1/10
Let O be the event that the card has an odd number.

O = {1, 3, 5, …, 49}
P(O) = 25/50 = ½
 A posteriori Approach – uses the relative frequency of the event’s occurrence to measure the
chance; defines the probability of event A as the number of times the event occurs divided by the
number of times the experiment is repeated.
Number of times event A has occurred
P(A) = ---------------------------------------------------------
Number of times the experiment was repeated
 Subjective Approach – uses the perception of the person to determine the chance of occurrence of
an event; an intelligent “guessing”
3.6. Some Probability Rules
a. The probability of an event is nonnegative and never exceeds one.

0 ≤ P(Ei) ≤ 1
b. The sum of the probabilities of all possible outcomes in the sample space is 1.
∑ P(Ei) = 1
10
c. If A and A’ are complementary events, then
P(A’) = 1 - P(A)
Example. A coin is tossed 6 times in succession. What is the probability that at least 1 head
occurs?
Solution. Let E be the event that at least 1 head occurs. The sample space S consists of 2 6 = 64
sample points, since each toss can result in 2 outcomes. Now,
P(E) = 1 - P(E’) where E’ is the event that no head occurs. This can happen in only
one way
When all tosses result in a tail
Then, P(E) = 1 - 1/64 = 63/64
d. The probability of a number of independent and mutually exclusive events is the sum of the
probabilities of the separate events.
P(E1 U E2) = P(E1) + P(E2)
 The probability statement P(E1 U E2) signifies a union of probabilities and is read “the
probability of E1 or E2”
Example. The probability that a student passes mathematics is 2/3 and the probability that he
passes English is 4/9. If the probability of passing at least one course is 4/5, what is the
probability that he will pass both course?
 Two events A and B are mutually exclusive if A ∩B = Ø i.e. they are disjoint, or they have no
points in common, or they cannot both occur at the same time.
Example. Suppose a die is tossed. Let A be the event that an even number turns up and let B
the event that an odd number shows. The intersection of the sets A = {2, 4, 6} and B = {1,
3, 5} is A ∩ B = Ø since they have no points in common. Therefore, A and B are mutually
exclusive events.
e. The probability of two independent events occurring simultaneously or in succession is the product of
the individual probabilities.
P(E1 ∩ E2) = P(E1) x P(E2)
The probability statement P(E1 ∩ E2) is called the intersection or joint probability and is read “the
probability of E1 and E2”
f. The more general rule for the union of probabilities is

P(E1 U E2) = P(E1) + P(E2) - P(E1 ∩ E2)
Example. The probability that a student passes mathematics is 2/3 and the probability that he
passes English is 4/9. If the probability of passing at least one course is 4/5, what is the
probability that he will pass both course?
Solution.
P(M ∩ E) = P(M) + P(E) - P(M U E)
= 2/3 + 4/9 - 4/5
= 14/45
g. For conditional probabilities, i.e., signifying the joint occurrence of events

P(E1 ∩ E2)
P(E1/E2) = --------------
P(E2)
If events E1 and E2 are independent, P(E1/E2) = P(E1)
Consider the following example of events that are not independent or mutually exclusive: An urban
drainage canal reaches flood stage each summer with relative frequency of 0.10; power failures in
11
industries along the canal occur with a probability of 0.20. Experience shows that when there is
flood the chances of a power failure are raised to 0.40.
The probability statements are:
P(flood) = P(F) = 0.10
P(no flood) = P(F’) = 0.90
P(power failure) = P(P) = 0.20
P(no power failure) = P(P’) = 0.80
P(power failure given that a flood occurs) = P(P/F) = 0.40
The probability of a flood or a power failure occurring is then
P(F U P) = P(F) + P(P) - P(F ∩ P)
= P(F) + P(P) - P(P/F) x P(F)
= 0.10 + 0.20 - (0.40 x 0.10)
= 0.10 + 0.20 - 0.04
= 0.26
3.7. Other Sample Problems
a. What is the probability of getting a Jack or a Spade if one card is drawn from a deck of 52 cards?
Let A be the event that a Jack will be drawn, P(A) = 4/52

B be the event that a Spade will be drawn, P(B) = 13/52
A∩B be the event that card drawn is jack and spade, P(A∩B) = 1/52
Then,
P(A∩B) = P(A) + P(B) - P(A∩B)
= 4/52 + 13/52 - 1/52
= 16/52 = 4/13
b. If 3 coins are tossed, what is the probability that at least 1 head occurs?
P(at least one head) = 1 - P(no head)

= 1 - 1/8
= 7/8
c. A box contains 6 white balls and 4 black balls. If three balls are to be drawn from the box, what is the
probability that all three are white balls?
P(W1,W2,W3) = (6/10) (5/9) (4/8)

= 120/720 = 1/6
d. Find the probability that 3 aces will appear when 5 cards are drawn with replacement
P(3A,2N) = 5 C 3 = (4/52)3 (48/52)2 = 0.0039
IV. RANDOM VARIABLES
4.1. Concept of Random Variable
From performing random experiments, our outcomes are treated as variables whose values occur by
chance, and thus are referred to as random variables.
Random Variable – a rule or function whose value is a real number determined by each element in the
sample space .
12
Example. In the random experiment of tossing two coins, we describe the event of “outcome of tails”
by a numerical characteristic.
Sample Space Real Number
HH 0
HT 1
TH 1
TT 2
The random variable Y = number of tails of tossing 2 coins
Y = 0 is equivalent to the event {HH)
Y = 1 is equivalent to the event {HT, TH)
Y = 2 is equivalent to the event { TT }
Discrete Random Variable – one that assumes a finite or countably infinite number of numerical values; a
random variable defined over a discrete sample space or a sample space containing a finite number of
points, or an unending sequence with as many elements as there are whole numbers; represents count
data..
Continuous Random Variable – one that assumes uncountably infinite number of values corresponding to
the points on a number line; a variable defined over a continuous sample space or a sample space with
infinite number of elements, or as many as the number of points on a line segment, such as all possible
heights, weights, temperatures; represent measured data.
Expected Value – an average of the values of the random variable obtained through a large number of
repetitions of the experiment
4.2 Probability Distribution
Discrete Probability Distribution - the list of all the possible values of a discrete random variable together
with their associated probabilities.
 The sum of the probabilities for all the possible values of the random variable in a probability
distribution is 1.
 Examples of discrete probability distributions are:
Uniform Poisson Hypergeometric
Binomial Negative Binomial
Multinomial Geometric
Probability Density Function – a smooth curve or function f(X), describing the distribution of a continuous
random variable; the area under the curve is equal to one
 Examples of continuous probability distributions are:
Normal Lognormal Pearson
Exponential Extreme Value Beta
Gamma
4.3. Normal Distribution
Features:
 Has a bell-shaped curve
 Has a mean μ and variance σ2 , X ~ N (μ, σ2 )
 Normal curve is symmetric
 Mean, median, and mode coincide at the center of the curve
 Asymptotic to the X-axis
 Has a total area equal to one
Standard Normal Distribution


The transformation of the normal random variable X to a standard normal variable Z takes the
form:
13
X - μ
Z = ------------- where Z ~ N (μ= 0, σ2 = 1 )
σ2
4.3 Finding the Area or Probability


P (Z < Zo) = area to the left of Zo
Example. P (Z< 1.23) = 0.8907
P (Z< - 2.51) = 0.0060

P (Z > Zo) = area to the right of Zo
Example. P (Z > 3.07) = 0.0011 P (Z > -0.3) = 0.6179
P (Z < -3.07) = 0.0011 P (Z < 0.3) = 0.6179

P (Z1 < Z < Z2)
Example. P (-2 < Z < 0.64) = 0.7161
P ( 0.58 < Z < 1.93) = 0.2542
4.4. Finding the Value of Zo, Given Area or Probability

P (Z < Zo) = 0.8708 ; Zo = 1.13
P (Z < Zo) = 0.0516; Zo = -1.63

P (Z > Zo) = 0.222; Zo = 2.01
P (Z > Zo) = 0.6808; Zo =-0.47
V. SAMPLING DISTRIBUTIONS
5.1 Terminologies
Sampling Distribution - the probability distribution of statistic
Standard Error of the Statistic – the standard deviation of the sampling distribution
5.2 Normal Distribution
a. Sampling Distribution of a Mean when n is large and σ is known

If random samples of size n are drawn from a large or infinite population with mean μ and variance
σ2, then the sampling distribution of the sample mean
is approximately normally distributed with mean μX = μ and standard deviation of the mean = σ / (n)1/2
. Hence, Z is a value of a standard normal variable
_
X - μ
Z = ------------
σ / (n)1/2
Example. An electrical firm manufactures light bulbs that have a length of life that is approximately
normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an average life of less than 775 hours.
Solution.
_
X - μ 775 - 800
Z = ------------ = -------------- = - 2.5
σ / (n)1/2 40/ √16
_
P( X < 775) = P(Z < -2.5)
= 0.006
14
_ _
b. Sampling Distribution of the Difference of Means, X 1 – X2, when σ1 and σ2 are known - s
approximately normally distributed, with Z as a value of a standard normal variable
_ _
(X1 – X2) - (μ1 – μ2)
Z = ------------------------------
√ (σ12/n1) + (σ22/n2)
Example. The generator sets of manufacturer A have a mean life time of 6.5 years and a standard
deviation of 0.9 year, while those of manufacturer B have a mean lifetime of 6.o years and a
standard deviation of 0.8 year. What is the probability that a random sample of 36 generator sets
from manufacturer A will have a mean lifetime that is at least 1 year more than the mean lifetime of
a sample of 49 generator sets from manufacturer B?
Solution.
_ _
(X1 – X2) - (μ1 – μ2)
Z = ------------------------------
√ (σ12/n1) + (σ22/n2)
(1.0) - ( 6.5 – 6.0)

= ------------------------------ = 2.646
√ (0.81/36) + (0.64/49)
_ _
P ( X1 – X2 ≥ 1.0) = P (Z ≥ 2.646)
= 1 - P( Z < 2.646)
= 1 - 0.9959 = 0.0041
5.3 t- Distribution
_
If X and S2 are the mean and variance, respectively, of a random sample of size n taken from a normal
population having the mean μ and unknown variance σ2, then
_
X - μ
t = -------------
S / (n)1/2
Is a value of a random variable T having the t-distribution with df = n – 1 degrees of freedom
Example. A manufacturer of light bulbs claims that his bulbs willburn on the average 500 hours. To
maintain this average, he tests 25 bulbs each month. If the computed t-value falls between –t 0.05 and t
0.05, he is satisfied with his claim. What conclusion should he draw from a sample that has a mean of 518
hours and a standard deviation of 40 hours? Assume the distribution of burning times to be
approximately normal.
Solution. From t-table t 0.05 = 1.711 for df = 24. Therefore the manufacturer is satisfied with his claim if a
sample of 25 bulbs yields a t-value between -1.711 and 1.711.
_
X - μ
t = -------------
S / (n)1/2
518 - 500
= ---------------- = 2.25
15
40 / √25
This is a value well above 1.711. Hence, the manufacturer is likely to conclude that his bulbs are a
better product than what he thought.
5.4 Chi-Square Distribution

If S2 is the variance of a random sample of size n taken from a normal population having the variance σ 2,
then
(n – 1) S2
χ2 = -------------
σ2
is a value of a random variable χ2 having the Chi-square distribution with df = n –1.
Example. A manufacturer of car batteries guarantees that his batteries will last, on the average, 3 years
with a standard deviation of 1 year. If 5 of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2
years, is the manufacturer still convinced that his batteries have a standard deviation of 1 year?
Solution.
S2 = [(1.92 + … + 4.22) - (1.9 + … + 4.2)2 / 5] / (5 – 1)
= 0.815
(n – 1) S2
χ2 = -------------
σ2
= (5 – 1) (0.815) / 1 = 3.26
This is a value from a χ2 – distribution with df = 4. Since 95% of the χ2 values with df = 4 fall
between 0.484 and 11.143, the computed value with σ2 = 1 is reasonable, and therefore the
manufacturer has no reason to suspect that the standard deviation is different than 1 year.
5.5 F-Distribution
If S12 and S22 are the variance of independent random samples of size n1 and n2 taken from normal
populations with variances σ12 and σ22 , respectively, then
S12 / σ12 σ22 S12
F = ---------------- = -------------
2 2
S2 / σ2 σ12 S22
Is a value of a random variable having the F-distribution with df1 = n1 – 1 and df2 = n2 - 1
It should be noted that writing F α (df1, df2) for f α with df1 and df2 degrees of freedom and taking α as the
area to the right of f-value then
1
F 1- α (df1, df2) = ----------------
F α (df1, df2)
Example. If S12 and S22 represent the variances of independent random samples of size n1 = 25 and n2 =
31, taken from normal populations with variances σ12 = 10 and σ22 = 15, respectively, find P (S12 / S22 >
1.26)
Solution.
σ22 S12
16
F = ----------------
σ12 S22
15
= ------- (1.26) = 1.89
10
From the F-distribution table,

P (S12 / S22 > 1.26) = 0.05
VI. ESTIMATION
6.1 Terminologies
Estimate – a number obtained from a sample data that is proposed as the value of a parameter
Estimator – a rule that tells how to obtain an estimate
Point Estimate – a single value used to represent the value of the parameter of interest
Interval Estimate or Confidence Interval – a range of possible values for the unknown parameter with some
measure of the degree of certainty
Level of Significance – the degree of certainty associated with an interval estimate, usually represented by
the symbol ( 1 – α )
6.2 Properties of the Best Estimator
Unbiased – if its mean or expected value is equal to θ, i.e.

E (θ) = θ
Efficient – If θ1 and θ2 are unbiased estimators for the parameter θ, that is
E (θ1) = E(θ2) = θ, then θ1 is more efficient than θ2 if V(θ1) < V(θ2)
Consistent – tends to get closer to the population parameter as the sample size becomes large
Sufficient - utilizes all the information contained in the sample for the purpose of estimating a given
parameter, and that no other estimator should provide any more information
6.3 The ( 1 – α ) 100% Confidence Intervals for Several Parameters

Mean, μ if σ known or n ≥ 30
_
X + Z α /2 (σ / √ n )
Where Z α /2 is the Z value leaving an area of α/2 to the right

Mean, μ if σ unknown or n < 30
17
_
X + t α /2 (S / √ n )
Where t α /2 is thet value with df = n – 1, leaving an area of α/2 to the right

Difference Between Two Means, μ1 - μ2, when σ1 and σ2 known
_ _
( X1 - X2 ) + Z α /2 √ (σ12 / n1 ) + (σ22 / n2 )
Where Z α /2 is the Z value leaving an area of α/2 to the right

For μ1 - μ2, when σ1 = σ2 but unknown
_ _
( X1 - X2 ) + t α /2 Sp √ (1 / n1 ) + (1 / n2 )
With df = n1 + n2 - 2 and leaving an area of α/2 to the right
Where Sp2 = pooled estimate of σ2

(n1 – 1) S12 + (n2 – 1) S22
Sp2 = ---------------------------------
n1 + n2 - 2

For μ1 - μ2, when σ1 ≠ σ2 but unknown
_ _
( X1 - X2 ) + t α /2 √ (S12 / n1 ) + (S22 / n2 )
(S12 / n1 + S22 / n2 )2
With df = ----------------------------------------------------------
[(S12 / n1)2 / (n1 – 1)] + [(S22 / n2)2 / (n2 – 1)]
And leaving an area of α/2 to the right

For μ1 - μ2 = μD, paired observations
_
d + t α /2 (Sd / √ n )
_
Where d = mean of the differences of n random pairs of measurements
Sd = standard deviation of the differences o fn random pairs
t α /2 = t value with df = n – 1, leaving an area of α/2 to the right

For σ2
(n – 1) S2 (n – 1) S2
------------ < σ2 < ------------
χ α /2 2 χ 1 - α /2 2
with df = n – 1, leaving areas of α/2 and 1 - α/2 to the right

For σ12 = σ22
S12 1 σ12 S12 1

----- ------------- < ---- < ------- --------------
S22 f α /2 (df1, df2) σ22 S22 f α /2 (df2, df1)
With df1 = n1 - 1
df2 = n2 - 1
18
VII. TESTS OF HYPOTHESES
7.1 Definition of Terms
Statistical Hypothesis – an assumption or statement which may or may not be true concerning one or more
population parameters
Null Hypothesis, Ho - the hypothesis that we formulate with the hope of rejecting
Alternative Hypothesis, Ha – the hypothesis that is hoped to be accepted when the null hypothesis is
rejected
Type I Error - the error of rejecting a null hypothesis when in fact it is true
Type II Error - the error of accepting a null hypothesis when it is false
Level of Significance of the Test – the probability of committing a Type I error, denoted by α
One-Tailed Test - a test of any statistical hypothesis where the alternative is one-sided , i.e. Ha: θ > θo
or . Ha: θ < θo
Two-Tailed Test - a test of any statistical hypothesis where the alternative is two-sided, i.e. Ha: θ ≠ θo
7.2 Steps in Hypothesis Testing
a. Formulate Ho: θ = θo
b. Formulate Ha
c. Choose a level of significance equal to α
d. Select the appropriate test statistic and establish the critical/rejection region(s)
e. Compute the value of the test statistic from a random sample of size n
F. Make conclusion. Reject Ho if the value of the test statistic falls in the critical region; otherwise accept
Ho
7.3 Selection of Test Procedure
Z-test and t-test - for comparing the mean of a population to a hypothetical value; also for comparing the
means of two populations
Regression Analysis – for determining the functional form of relationship between two or more variables;
for predicting the value of the dependent variable given the values or independent variables
Correlation Analysis - for determining the strength or degree of linear relationship between two variables
Analysis of Variance – for comparing the means of two or more populations based on partitioning the total
variance of the variable of interest into several sources or components of variation
Chi-square Goodness of Fit Test - for testing whether the observed frequency is in agreement with the
expected or hypothesized frequency
Chi-square Test of Independence – for testing whether two variables are independent of each other
7.4. Summary of Statistical Tests

VIII. REGRESSION AND CORRELATION
8.1 Terminologies
Regression Analysis – the analysis that can be used to examine data and draw conclusion about the
functional relationship existing among variables
Correlation analysis - the analysis that can be used to indicate by a quantitative measure the strength of
the relationship between the variables
Regression Model - a mathematical equation that describes the functional relationship among the
variables observed
8.2 Simple Linear Regression

–involves one independent variable X in which the postulated model for the response variable Y takes
the form
19
Y = α + βX
Which can be estimated by the line
Y = a + bX
n ∑ XiYi - (∑ Xi) (∑ Yi) ∑ XiYi - (∑ Xi) (∑ Yi)/n

Where: b = ---------------------------------- = --------------------------------
n ∑ Xi2 - ( ∑ Xi )2 ∑ Xi2 - ( ∑ Xi )2 /n
SPXY
= --------
SSX
_ _
a = Y - bX
8.3 Sample Application
Student CAT Score, X Math Grade, Y
1 65 85
2 50 74
3 55 76
4 65 90
5 55 85
6 70 87
7 65 94
8 70 98
9 55 81
10 70 91
11 50 76
12 55 74
We find that:
∑ Xi = 725 ∑ Yi = 1011 ∑ XiYi = 61 685 ∑ Yi 2 = 85905

_ _
2
∑ Xi = 44 475 X = 60.417 Y = 84.250
Then,
n ∑ XiYi - (∑ Xi) (∑ Yi)

b = ----------------------------------
n ∑ Xi2 - ( ∑ Xi )2
(12) (61685) – (725) (1011)
b = ---------------------------------- = 0.897
(12) (44475) - (725)2
_ _
a = Y - bX
= 84.250 - (0.897) (60.417) = 30.04
The regression line is given by
Y = 30.04 + 0.897 X
20
8.4 Unbiased Estimate of σ2 with df = 2
1 SSY - b SPXY
SE = --------- (SSY - b2 SSX) =
2
-------------------
n -2 n -2
where: SSX = ∑ Xi2 - ( ∑ Xi )2/n

SSY = ∑ Yi2 - ( ∑ Yi )2/n
8.5 Tests of Hypothesis


Ho : α = αo vs. Ha: α ≠ αo
Test Statistic, t = (α - αo ) / Sa
Where Sa2 = SE2 ( 1/n + X2/ SSX)
Ho is rejected if | t | > t α/2, n - 2

Ho: β = βo vs. Ha: : β ≠ βo
Test Statistic, t = (β - βo ) / Sb
Where Sb2 = SE2 / SSX
Ho is rejected if | t | > t α/2, n - 2
Example.Test the significance of the slope of the line in the previous exercise
a. Ho: β = 0
b. Ha: : β ≠ 0
c. Test statistic, t = (β - βo ) / Sb
d. Critical region: All | t| ≥ t α/2, n – 2 = t.025, 10 = 2.228
e. Computation:
SSY - b SPXY
SE 2 = -------------------
n -2
= [ (85905 – 10112/12) - .897 {61695 – (725)(1011)/12}}/ 10
= 18.67
SSX = ∑ Xi2 - ( ∑ Xi )2/n

= 44475 – (725)2/12
= 672.217
Sb2 = SE2 / SSX

Or
Sb = ( SE2 / SSX )1/2
= {( 18.67)/672.217) }1/2
= 0.167
t = (β - βo ) / Sb
= (0.897 – 0)/ 0.167 = 5.37
21
f. Decision and conclusion since computed t value = 5.37, exceeding 2.228, then Ho is
rejected and conclude that the slope differed significantly from zero. This suggests that the
students’ performance in Math was increased significantly by 0.897 units for every unit
increase in CAT Score.
8.6 Linear Correlation Coefficient, r
SSX
r = b ------
SSY

Value ranges from -1 to +1

r = -1 or + 1, perfect linear relationship exists between the values of X and Y in the sample

r is close to +1 or -1, the linear relationship between the two variables is strong; there is a high
correlation

r is close to zero, the linear relationship between X and Y is weak or perhaps nonexistent
Example. Determine the degree of relationship of the variables X andY, given the following data:
X 1 2 3 3 4 5
Y 6 5 5 4 2 2
∑ Xi = 18 ∑ Yi = 24 ∑ XiYi = 61 ∑ Yi 2 = 110 ∑ Xi 2 = 64
r = {61 – (18)(24)/6} / √(64 – 182/6) (110 -242/ 6)

= - 0.93
8.7. Test for the Significance of Correlation Coefficient, ρ
Ho: ρ = 0 vs. Ha: ρ ≠ 0
Test Statistic, t = r √(n- 2) / √( 1 – r2)
Critical region: All | t| ≥ t α/2, n – 2

Example.
a. Ho: ρ = 0
b. Ha: ρ ≠ 0
c. Test Statistic, t = r √(n- 2) / √( 1 – r2)
d. Critical region: All | t| ≥ t α/2, n – 2 = t.025,4 = 2.776
e. Computation
t = 0.93 √ (6 – 2) / √ ( 1 – 0.932)
= -5.06
e. Decision and Conclusion. Since | -5.06 | exceeds 2.776, then Ho is rejected.
The correlation coefficient between X and Y differed significantly from zero.
8.7 Coefficient of Determination, r2
Number that expresses the proportion of the total variation in the values of the variable Y that can be
accounted for or explained by the linear relationship with the values of the variable X.
22
When r = 0.6, 0.36 or 36% of the total variation of the values of Y in the sample is accounted for by a linear
relationship with the values of X
23

Statistics Review Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Review Notes

Uploaded by

Copyright:

Available Formats

STATISTICS: REVIEW NOTES

Melissa E. Agulto, Ph.D.

I.1. Statistics, Defined

1.2 Two Phases/Fields of Statistics

Descriptive Statistics – deals with the methods of collecting/gathering. organizing, summarizing,

1.3 Basic Terms

1.4 Data Gathering

Ways of Data Gathering

Array – arrangement of raw numerical data in ascending or descending order of magnitude

1.5. Methods of Data Presentation

1.6. Frequency Distribution Table, FDT

Construction of a Quantitative FDT

II. ELEMENTS OF SAMPLING AND DESCRIPTIVE STATISTICS

2.1. Definition of Terms

2.2. Methods of Sampling

2.3. Sampling Procedures

Probability Sampling Procedures

Non-probability Sampling Procedures

a. Arithmetic Mean – the arithmetic average of all the values

For grouped data

The population mean, μ, for ungrouped data is:

μ = ∑ Xi / N where N = population size

For grouped data, the mode Mo is:

The median, Md, for ungrouped data is computed as:

h. Percentile, Decile, and Quartile Limits

The jth Percentile is obtained as follows:

For population variance, σ2

For sample variance, S2

For grouped data, the sample variance is obtained from:

d. Standard Deviation – the square root of the variance

a. Pearson’s First Coefficient of Skewness

b. Pearson’s Second Coefficient of Skewness, SK

c. Moment Coefficient of Skewness, a3

a. Moment Coefficient of Kurtosis, a4

III. SETS AND PROBABILITY

3.1 Terms, Defined

3.2. Ways of Describing a Set

3.3. Set Operations

From the above definitions/operations, it should be noted that:

3.4. Counting Sample Points

 Theorem 2. Generalized Multiplication Rule. If an operation can be performed in n1 ways, and if

 Theorem 5. The number of permutations of n distinct objects arranged in a circle is

3.5. Probability, Defined

P(A) = n(A) / n(S)

Let O be the event that the card has an odd number.

3.6. Some Probability Rules

a. The probability of an event is nonnegative and never exceeds one.

f. The more general rule for the union of probabilities is

g. For conditional probabilities, i.e., signifying the joint occurrence of events

3.7. Other Sample Problems

Let A be the event that a Jack will be drawn, P(A) = 4/52

P(at least one head) = 1 - P(no head)

P(W1,W2,W3) = (6/10) (5/9) (4/8)

P(3A,2N) = 5 C 3 = (4/52)3 (48/52)2 = 0.0039

IV. RANDOM VARIABLES

4.1. Concept of Random Variable

4.2 Probability Distribution

4.3. Normal Distribution

Standard Normal Distribution

4.3 Finding the Area or Probability

4.4. Finding the Value of Zo, Given Area or Probability

5.2 Normal Distribution