STATISTICS AND PROBABILITY

STATISTICS
AND
PROBABILITY
Statistics
- Method of collecting, organizing, presenting, analysis and interpreting data
- Tool for decision making
Data and Variable
Data – Refers to any information
- Facts and statistics collected together for reference or analysis.
Universe – A particularly sphere of activity, interest and experience
- Collection (Kabuuan)
Population – Set of all possible various of variable
- A finite or Infinite collection of items under consideration
Variable – A quantity that during a calculation is assume to vary or be capable of varying
in value
Qualitative and Quantitative
Qualitative – is one whose values are adjective such as color, gender, nationalistic, etc.
- Express categorical
Quantitative – A variable that takes numerical values for which arithmetic makes sense.
- Numerical
Discrete Variable – Data can be counted
- Which data can only take on certain values
Continuous Variable – Has an infinite number of possible values and the probability
associated with any particular value of continuous distribution is null
Scale or level of Measurement
Nominal level – Basically refers to categorically discrete data such as name of your
school, type, etc.
- Variable that categorical
- Non – numeric, no sense of ordering
Interval level – Data is like ordinal except we can say the intervals between each value
are equally split
- Numerical scales in which intervals have the same interpretation
throughout.
Ratio level – Measurement there is always an absolute zero that us meaningful
- Highest level, importance 0
Probability – It is the measure of the likehood that an event will occur
- Is quantified as number between 0
N ( E)
=
N (S)
Sample – Set of possible outcome
Event – Expected amount of outcome
Example / Seatwork:
6 1
A. 7 = =
36 6
3 1
B. 4 = =
36 12
4 1
C. 5 = =
36 9
3 1
D. 10 = =
36 12
Random Variables
A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. As a function, a random variable is required to be
measurable, which rules out certain pathological case where the quantity which the
random variable returns is infinitely sensitive to small changes in the outcome.
Example: Let us presume that two coins are tossed and the sample space is S= (HH, HT,
TH, TT)
Suppose that X represent the number of heads and with each sample point we can
associate to a number of X as shown in the table below:
Outcome X P(x)
HH 2 ¼
HT 1 ¼
TH 1 ¼
TT 0 ¼
Hence; random variable X takes the values 0, 1, 2 for this random experiment.
As shown in the above example, it takes only a finite number of values and the
association of probability for each random value.
Two Types of Random Variable

A. Discrete Random Variables is one which may take only a countable number of
distinct values such as 0, 1, 2, 3, 4…Discrete random variables are usually (but not
necessarily) counts.
Examples of discrete random variables includes the number of children in a family, the
family Friday night attendance in a cinema, the number of patients in a doctor’s surgery.
The Probability Distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.
Example: Let X represent the sum of two dice.
Then the probability distribution of X is as follows:
X 2 3 4 5 6 7 8 9 10 11 12
P(x) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
B.
7 8
7
6
6
5
5
P(x) 4 4
3
3
2
2
1
1 0
2 3 4 5 6 7 8 9 10 11 12
X
Continuous Random Variable is one which takes an infinite number of possible values.
Continuous random variable are usually measurements. Examples of this are height,
weight, the amount of sugar in an orange, the time required to run a mile.
Example: The temperature of the patients in a particular clinic lies between 37ºC to 40ºC
we write this as X= (x/37 ≤ x ≤ 40)
Probability Mass Function (PMF) is a function that gives the probability that a discrete
random variable is exactly equal to some value. The probability mass function is often
the primary means of defining a discrete probability distribution, and such functions exist
for either scalar or multivariate random variables whose domain is discrete.
The Probability Mass Function of the Probability Distribution is all equal to one.
Mean and Variance of Discrete Random Variable

The mean of a discrete random variable X is a weighted average of the possible values
that the random variable can take. Unlike the sample mean of a group of observations,
which gives each observation equal weight, the mean of a random variable weights each
outcome xi according to its probability, pi. The common symbol for the mean (also
known as the expected value of X) is formally defined by:
µ= ∑(x(P(x))
The variance of a discrete random variable X measures the spread, or variability, of the
distribution, and is defined by:
σ2=[∑(x-µ)2(P(x))]
Example 1:
X 0 1 2 3 4
P(x) 0.10 0.20 0.30 0.25 0.15
Mean:
µ= ∑(x(P(x))
µ= [(0) (0.10)] + [(1) (0.20)] + [(2) (0.30)] + [(3) (0.25)] + [(4) (0.15)]
µ= 0 + 0.2 + 0.6 + 0.75 + 0.6
µ= 2.15
Variance:
σ2=[∑(x-µ)2(P(x))]
σ2= [(0-2.15)2 (0.10)] + [(1-2.15)2 (0.20)] + [(2-2.15)2 (0.30)] + [(3-2.15)2 (0.25)] +
[(4-2.15)2 (0.15)]
σ2= 1.4275
Example 2:
x F P(x)
1
0 1
18
6 1
1 6 or
18 3
8 4
2 8 or
18 9
3 1
3 3 or
18 6
Mean:
µ= ∑(x(P(x))
1 1 4 1
µ= [(0) ( )] + [(1) ( )])] + [(2) ( )] + [(3) ( )]
18 3 9 6
1 8 1
µ= 0 + + +
3 9 2
31 13
µ= or 1 or 1.7222
18 18
Variance:
σ2=[∑(x-µ)2(P(x))]
31 2 1 31 1 31 4 31 1
σ2= [(0- ) ( )] + [(1- )2 ( )] + [(2- )2 ( )] + [(3- )2 ( )]
18 18 18 3 18 9 18 6
961 169 25 529
σ2= + + +
5832 972 729 1944
204
σ2= or 0.6451
324
Normal distribution
A normal distribution is an arrangement of a data set in which most values cluster in
the middle of the range and the rest taper off symmetrically toward either extreme.
- The normal distribution is a mathematical model represented by a bell-shaped
curve which is symmetric with respect to the mean.
- The normal curve does not intersect or touch the horizontal axis.
X
The mean, median and mode of the normal distribution are equal.
mean = median = mode
- The area under the normal curved is approximately 1 or 100%

Total area = approximately 1
- The standardize normal distribution has a mean of 0 and standard deviation of 1.
Formula and notation

EXAMPLES:
1. from 0 to 1.35 = 0.4115 or 41.15%
2. from 0 to -2.01 = 0.4778 or 47.78%
3. above 1.37 = 0.0853 or 8.53% (0.5 - 0.4147 = 0.0853)
4. above -1.12 = 0.8686 or 86.86% (0.5 + 0.3686 = 0.8686)
5. below -2 = 0.0228 0r 2.28% (0.5 – 0.4772 = 0.0228)

6. below 1 = 0.8413 or 84.13% (0.5 + 0.3413 = 0.84130)
SEATWORK:
A. 10 to 50
z=
z = -1.33
A = 0.4082 or 40.82%
B. 25 to 50
z = -0.83
A = 0.2967 or 29.67%
C. 32 to 50
D. 50 TO 65
z=
z=
z = 0.5
A = 0.1915 or 19.15%
E. 50 to 78
Z=
Z=
Z = 0.93
A= 0.3238 or 32.38%
F. below 45
z=
z=
z = -0.17
A = 0.5 – 0.0675
A = 0.4325 or 43.25%
G. below 38
Z=
Z=
Z = -0.4
Z = 0.1554
A = 0.5 – 0.1554
A = 0.3446 or 34.46%
H. above 55
Z=
Z=
Z = 0.17
A = 0.5 - 0.0675
A = 0.4325 or 43.25 %
I. above 48
Z=
Z = -0.07
A = 0.5 + 0.0279
A = 0.5279
J. between 20 and 80
K. 27 and 47
L. 61 and 70
SAMPLING AND SAMPLING DISTRIBUTION
A sampling distribution is a probability distribution of a statistic obtained through a large
number of samples drawn from a specific population. The sampling distribution of a
given population is the distribution of frequencies of a range of different outcomes that
could possibly occur for a statistic of a population. Sampling is a term used in statistics. It
is the process of choosing a representative sample from a target population and collecting
data from that sample in order to understand something about the population as a whole.
Example number 1:
X ( Sample) ( x-x̄ ) ( x-x̄ )2
3 ̵ 2.5 6.25
4 ̵ 1.5 2.25
5 ̵ 0.5 0.25
6 0.5 0.25
7 1.5 2.25
8 2.5 6.25
N:6 17.5
∑x = 33
 A sampling distribution of sample means is a theoretical distribution of the values that
the mean of a sample takes on in all of the possible samples of a specific size that can be
made from a given population
Formula: Mean (x̄)
∑x
X̅ = ∑ x
N
33
X̅ =
6
X̅ = 5.5
 Population variance (σ2) tells us how data points in a specific population are spread out.
It is the average of the distances from each data point in the population to the mean,
squared.
Formula: σ2 =
∑ ( x−x )
N
17 .5
σ2 =
6
σ2 = 2.9166
 The standard deviation of a population gives researchers the amount of dispersion of
data for an entire population of survey respondents. A population standard deviation
represents a parameter, not a statistic. Parameters refer to a numerical property of a
population. A statistic, conversely, means that a number can be computed from data.
Researchers use statistics to estimate parameters.
Formula: σ = √σ
σ = √ 2 .9166
σ = 1. 7078
 The sample variance, s2, is used to calculate how varied a sample is. Sample
variance is defined as the variance estimated from a sample. Just to recall that the sample
is a collection of data that is taken from the population data: a very large amount of data.
Formula: S2 =
∑ ( x−x )
N−1
17 .5
S2 =
5
S2 = 3.5
 A standard deviation of a sample estimates the standard deviation of a population based
on a random sample. The sample standard deviation, unlike the population standard
deviation, is a statistic that measures the dispersion of the data around the sample mean.
In statistics, “mean” equals the average of a set of numbers; to obtain the mean,
researchers add together a list of numbers and divide the total by the amount of numbers
on the list. To calculate the sample standard deviation, researchers divide the squared
deviations by the number of data sets minus 1.
Formula: S = √s
S = √ 3 .5
S = 1. 8708
Example 2.
x x- x̅ ( x- x̅ )2
1 -4.5 20.25
2 -3.5 12.25
3 -2.5 6.25
4 -1.5 2.25
5 -0.5 0.25
6 0.5 0.25
7 1.5 2.25
8 2.5 6.25
9 3.5 12.25
10 4.5 20.25
N=10 ∑(x- x̅)2=82. 5
∑x=55
Mean Population Standard Deviation Sample Standard Deviation
x́=¿
∑x σ =√ σ 2 s=√ s 2
N
55
x́= σ =√ 8.25 s=√ 9.1667
10
x̅=5. 5 σ =2.8723 s= 3.0277
Population Variance Sample Variance

σ2 = ∑ ¿¿ ¿ s2= ∑ ¿¿ ¿
82.5 82.5
σ2 = s2=
10 9
2= 2
σ 8.25 s = 9.1667
Example 3.
X (Sample) (x-x̄) (x-x̄ )2

11 -4.5 20.25
12 -3.5 12.25
13 -2.5 6.25
14 -1.5 2.25
15 -0.5 0.25
16 0.5 0.25
17 1.5 2.25
18 2.5 6.25
19 3.5 12.25
20 4.5 20.25
N = 10 ∑(x-x̄) = 82.5
∑x = 155
Mean (x̄) Population Variance Poppulation S.D Sample Variance
Sample S.D
∑x ∑( x− x̄ ) 2 ∑( x− x̄ ) 2
x̄ = σ2 = σ= √σ 2 S2 = S=
N N N−1
√s 2
155 82.5 82.5
x̄ = σ2 = σ =√ 8.25 S2 = S=
10 10 10−1
√ 9.1667
σ = 2.8723 82.5 S = 3.0277
x̄ = 15.5 σ2 = 8.25 S2 =
9
S2 = 9.1667
Practice Seatwork:
1.

6 -1.5 2.25
7 ̵ 0.5 0.25
8 0.5 0.25
9 1.5 2.25
30 5
n=4
Mean (x̄) Population Variance Poppulation S.D Sample Variance
Sample S.D
∑x ∑( x− x̄ ) 2 ∑( x− x̄ ) 2
x̄ = σ2 = σ= √σ 2 S2 = S=
N N N−1
√s 2
30 5 5
x̄ = σ2 = σ =√ 1.25 S2 = S=
4 4 4−1
√ 1.6667
σ = 1.1180 5 S = 1.2910
x̄ = 7.5 σ2 = 1.25 S2 =
2.8723 3
S2 = 1.6667
2.
1 -2 4
2 ̵1 1
3 0 0
4 1 1
5 2 4
15 10
r=5
Mean (x̄) Population Variance Poppulation S.D

15 10
X̅ = σ2 = σ2 =√ 2
5 5
X̅ = 3 σ2 = 1.4142 σ = 1.4142
3.
Population= 10, 11, 12, 13
Sample= 2
Sample x̅ x̅ f p(x̅) x̅*p(x̅) x̅-µ x̅ ( x̅-µ x̅ )2

1 10
10, 10 10 10 1 -1.5 2.25
16 16
2 21
10, 11 10.5 10.5 2 0.5 1
16 16
3 33
10, 12 11 11 3 0 0.25
16 16
4 46
10, 13 11.5 11.5 4 0.5 0
16 16
3 36
11, 10 10.5 12 3 1.0 0.25
16 16
2 25
11, 11 11 12.5 2 1 1
16 16
1 13
11, 12 11.5 13 1 1.5 2.25
16 16
11, 13 12 16 𝛔x̅ =11.5 σ x̅ 2=0.625
12, 10 1
12, 11 11.5
12, 12 12 µx́ = 11.5
12, 13 12.5 σx́ 2 = 0.625
13, 10 11.5 𝛔x̅ = √ σ x́ 2
13, 11 12 σx́ = 0.7905
13, 12 12.5
13, 13 13
CENTRAL LIMIT THEOREM

The central limit theorem (CLT) is a statistical theory that states that given a
sufficiently large sample size from a population with a finite level of variance, the mean
of all samples from the same population will be approximately equal to the mean of the
population.
x̅ −μ
FORMULA: Z=
σ / √ 20
Example 1.
1. μ=80
σ =5
N = 20
x́ >83
83−80
z=
5 / √ 20
z= 2.68
A=2.68
A=0.4963
A= 0. 5-0 .4963 80 85 90 95 100 105
A=0. 0037or 0. 37%
2. μ=40000
σ =4000
n= 20
38000 < x́ <41500
38000−40000
z=
4000/ √ 20
z= 2.24
A= 0.4875
41500−40000
z=
4000/ √ 20
z= 1.68
A=0.4535
3600 4000 4400
A=0.4875+0.4535
A=0.941 or 94.1%
3.
μ=100
σ= 9
n= 20
95< x̄ < 97
Solution:
95−100
z=
9 / √ 20
z = 2.48 91 100
A= 0.4934
97−100
z=
9 / √ 20
z = -1.49
A= 0.4319
A= 0.4934 - 0.4319
A= 0.0615 or 6.15%
Asignment:
1.
μ=45
σ =15
n=10
36< x́<50
36−45 50−45
z= z=
15/ √ 10 15/ √ 10
z=−1.90 z=1.05
A=0.4713 A=0.3531
A=0.4713+0.3531 30 45 60
A ¿ 0.8284∨82.44 %
2.
μ=496
σ =20
n=25
x́ > 485
485−496
z=
20/ √ 25
z=−2 .75
A=0.4970 436 456 476 496 516 536 556
A=0.5+0.4970
A=0.997∨99. %
ESTIMATION OF POPULATION PARAMETER

Deals with generalizations about population parameters which are made based on
sample data in inferential statistics. These generalizations can be made in the form of
hypothesis tests of confidence interval estimations.
Each is calculated with a degree of certainty and uncertainty.
Two forms of Estimation
Point Estimate- For a parameter is a single number designed to estimate a quantitative
parameter of a population.
Interval Estimate- It is the value which serve as an interval boundary in variable x and
used to estimate the population parameter.
Level of Confidence (1-a)
It is the probability that determined the interval and contained the parameter. The
common confidence are:
Level of Confidence Z-Score

90% or .90 1.645
95% or .95 1.96
99% or .99 2.576
CONFIDENCE INTERVAL ESTIMATOR

Population variance is known or population standard deviation x is given
x́ Point estimate and center of

confidence interval
a Multiples of the standard error to

Z
2 formulate interval estimate for the
level of confidence of (1-a)
σ Standard error of the mean of the

√n standard deviation of the sample
distribution of the sample means
a σ Product of the confidence

Z ( )
2 √n coefficient and the standard error
of estimate
a σ
x́ - Z ( )
2 √n Lower Confidence
a σ
x́ + Z ( ) Upper Confidence for the
2 √n
confidence interval
STEPS IN CONSTRUCTING THE CONFIDENCE INTERVAL

1. Determine the contents of confidence interval
 Check the assumption
 Find out the formula for probability distribution
 State the level of confidence
2. Write down the sample information
3. Specify the confidence interval
 Confidence coefficient
 Maximum error of estimate
Distinction between t-value and z-value
If the population standard deviation is not known then the t-distribution can be
used. If the population standard deviation is known, then the normal distribution an z-
score may be used.
Confidence interval estimates for population proportion

 Population Proportion- It is the fraction or part of the population that has a
certain trait/characteristics in a binomial experiences denoted by P & the value
of the parameter P is unknown.
 Sample Proportion- It is the fraction or part of the sample that contains
characteristics denoted by P.
 Point Proportion- A single no. to estimate a population parameter.
 Interval Proportion- An interval of values used to estimate a population
parameter for proportion.
 Confidence Proportion- Most commonly used are 0.95, 0.09 and 0.99
Formula for estimation of Population Parameter

Example 1.
n = 60
x́ = 24500
σ = 2350
Level of confidence (a) = 95%
2350 2350
24500- ¿1.96) ( ) < 24500 + [(1.96) ( ¿]
√ 60 √ 60
= 23905.3680 to 25094.6320
Example 2.
n= 132
x́= 30
σ = 9.45
Level of confidence (a) = 99%
=27.8872 to 32.1188
STEPS IN CONSTRUCTING THE CONFIDENCE INTERVAL

1. Check the assumption
2. Assumption for Confidence Interval
a. Probability formula is considered
b. The level of confidence 1 – α
3. Sample Information
a
a. Confidence coefficient Z
2
σ
b. Maximum error of estimation
√n
c. Upper and Lower confidence limit
4. Check the result.
Example 3.
A survey of 200 entering freshmen at WKU found that the average number of
credit hours enrolled was 16.58 with a sample deviation of 2.46. Find a 90% confidence
interval for the average number of hours enrolled for all freshmen.
Freshmen at WRU
2. Assumption for confidence interval
a. The standard normal z-distribution used to determine the confidence
coefficient if σ =2.46
b. The 90% confidence or 1-a= 0.90 was used
n = 200 x́= 16.58
a
a. Z =1.645
2
σ
b. Maximum error of estimation or
√n
σ 246 σ
= = 0.1739
√ n √ 200 √n
C . Upper and Lower Confidence limit
a σ a σ
x́- [(Z ) ( )] < x́+ [(Z )( )]
2 √n 2 √n
16.58 – [(1.645) (0.1739)] < 16.58 + [(1.645) (0.1739)]
=16.2939 to 16.8661
4. Check the result
Therefore, it reveals that 19.29 to 16.87 is the 90% confidence interval for
Parameterμ.
Example 4.
A random sampling of 100 act scores of students applying to western yields
x=21.2 with a sample deviation of 4.46. Find a 90% confidence interval for the true of
ACT score of all application.

Students ACT Score
2. Assumption for confidence interval
a. The standard normal z-distribution used to determine the confidence
coefficient if σ = 4.46
b. The 90% confidence or 1-a= 0.90 was used
n = 100 x́= 21.2
a
a. Z =1.645
2
σ
b. Maximum error of estimation or
√n
σ 4.46
=
√ n √ 100
σ
= 0.446
√n
C . Upper and Lower Confidence limit
a σ a σ
x́- [(Z ) ( )] < x́+ [(Z )( )]
2 √n 2 √n
21.2 – [(1.645) (0.446)] < 21.2 + [(1.645) (0.446)]
=20.4663 to 21.9337
4. Check the result
Therefore, it reveals that 20.4663 to 21.9337 is the 90% confidence interval for
Parameterμ.
Hypothesis Testing
 A statistical procedure of drawing conclusion that generally pertains to the
characteristic/s of population through sample data.
 Decisions or making conclusion is very crucial in statistical process
STEPS IN PERFORMING HYPOTHESIS TESTING

1. Formulation of Null and Alternative Hypotheses
2. Select the level of Significance (α)
3. Determine the Test of statistics to Use.
4. Define the Area of Rejection or Critical Region
5. Compute for the value of the statistical test.
6. Decision: Reject Ho or not to reject Ho then cite the level of significance used in the
study
7. Interpretation or Conclusion
Step 1: Formulation of Null and Alternative Hypotheses
 Null Hypothesis
- Denoted by symbol Ho
- Refers to any claim or assertion pertaining to the parameter of the population.
- The hypothesis that is intended to reject in conduction hypothesis testing.
- It is known as the “NULL hypothesis” because of insufficient or absence of
statistical evidence or facts to warrant its truthfulness
 Alternative Hypothesis
- Denoted by symbol Ha
- An assertion or claim that contradicts the null hypothesis.
ORIGINAL EQUAL NOT GREATER LESS AT AT
CLAIM EQUAL THAN THAN LEAST MOST
SYMBOLIC FORM
Ho µ=A µ≠A µ>A µ<A µ>A µ<A
Ha µ≠A µ=A µ<A µ>A µ<A µ>A
ACCEPTED SYMBOLIC FORM
Ho µ=A µ=A µ<A µ>A µ>A µ<A
Ha µ≠A µ≠A µ>A µ<A µ<A µ>A
Table 1. The formulation of Symbolic form Ho and Ha
Table 2. The equality symbols for Ho and Ha

A. For One – mean Test
If Ho is Then Ha is
µ≠A
Two – tailed test or non –
µ=A (to mean that µ can be < A
directional test
or > A)
µ<A µ>A One – tailed test or
µ>A µ<A directional test
B. For Two – mean Test
If Ho is Then Ha is
µ1 ≠ µ2
Two – tailed test or non –
µ1 = µ2 (to mean that µ1 can be > µ2
directional test
or can be < µ2
µ1 < µ2 µ1 > µ2 One – tailed test or
µ1 = µ2 µ1 < µ2 directional test
Note:
 Ho and Ha are always opposite with each other. Switching the symbols for each
hypothesis is necessary if the symbolic form is not accepted.
 Always remember that Ho must have an equal sign for every symbolic form.
 If the Ho of the claim does not have any equal sign therefore Ho and Ha symbolic
form will interchange.
Step 2. Select the level of significance
 Level of Confidence – Degree of assurance (belief) that a particular statistical
statement is correct under specified conditions
- (1 – α)
 Level of Significance – is the degree of uncertainty (doubt) about the statistical
statement under the same conditions used to determine the confidence level.
- Α can be 0.01, 0.05 and 0.10
Table 3. Types of Error
Situations
Decision
Ho is TRUE Ho is FALSE
Reject Ho Type I Error CORRECT
Fails to reject Ho CORRECT TYPE II error
Note:
The most commonly used values of α are 0.01 (1%), 0.05 (5%) and 0.10 (10%).
Choosing 0.01 level of significance means that the researcher is 99% confident and has
1% chance to commit type 1 error.
1 tailed 2 tailed
0.10 1.28 1.645
0.05 1.645 1.96
0.01 2.33 2.58
Step 3. Determine the Test Statistics to Use

What type of test are we going to use to test our hypothesis?
General Categories of Statistical Test

A. Parametric Test is being used based on the ff.
- The given data is quantitative and usually measured in terms of interval or ratio.
- Assumption are made concerning the parameters of population.
- Random selection of the sample
- Normal distribution of the population from which the sample were drawn.
If the test concerns means, some parametric tests for choice are:
1. Z – test
2. T – test
3. Paired t – test
4. Analysis of Variance (ANOVA)
B. Non – Parametric Test is being used based on the ff.

- No assumptions are made about the population’s parameters
- The given are qualitative and usually measured in terms of ordinal or nominal
- Researchers doubt the validity of the parametric test
If the test concerns means, some non – parametric tests for choices are:
1. Sign Test
2. Wilcoxon signed – rank test
3. Wilcoxon rank – sum test
4. Kruskal – Wallis test
5. Chi – square test
THE Z – TEST
Z – test is another type of parametric test that concerns mean (one or two population
means). It is being used based on the ff. assumptions:
 The probability distributions of the random variable is normal and the SD is
known or assumed.
 The population SD is estimated from sample SD.
 n > 30
a. z – test for one – sample Mean test
x́−μ
z= σ
√n
b. z – test for two Independent Means
(σ1, σ1 _ unknown_or_n1 > 30 _ & _ n2 > 30)
x́ 1−x́ 2
z= σ 21 σ 22
Note:
√ +
n1 n2
In case of population SD is unknown; the value of sample SD is can be used. The reason
is that in z – test the sample size (n) is large enough to represents the population.
THE T – TEST
T – test is almost similar to z – test, it is being used based on the following assumptions:
 the probability distribution of the random variable is appropriately normal
 n < 30
a. t-test for One – sample Mean
x́−μ
t= s
√n
b. t-test between Two Independent Means
Case 1: (σ 12=σ 22_but_unknown)
x́ 1−x́ 2
t= s21 s 22
Sp
√ +
n1 n2
STEP 4. DEFINE THE AREA OF REJECTION OR THE CRITICAL REGION

 Area of rejection – also known as the critical region. It is the area under the normal
curve in which null hypothesis is rejected based on the set condition (decision rule).
 Critical Value (CV) – Separates the area of rejection and the area in which null
hypothesis is not rejected under the normal curve.
a. For Two Tailed Test (Non – Directional) – the critical regions are the left and right
tails of the normal curve and there will be a – and a + critical value.
Decision Rule:
Reject Ho if the computed value is > + Tabular Value or < - Tabular Value. Otherwise,
do not reject the Ho.
b. For One Tailed Test (Directional) – the critical value is either negative or positive
 For Left Tail test (Ha is < )
Decision Rule:
Reject Ho if the computed value is < - CV. Otherwise, do not reject.
 For Right Tail test (Ha is >)
Decision Rule:
Reject Ho if the computed value is < - CV. Otherwise, do not reject.
STEP 5. COMPUTE FOR THE VALUE OF STATISTICAL TEST

This step is for the computation of the different tests that will be use to test the
hypothesis.
STEP 6. DECISION
Reject the Ho or not to reject Ho then cite the level of significance used in the study.
Format:
Since the computed ___ - value (___) is _____ the tabular value (____).
Therefore, the null hypothesis (Ho) at level of significance.
STEP 7. INTERPRETATION
The rejection of the null hypothesis, can lead to a conclusion that the alternative or the
research hypothesis is true. In contrary, non-rejection of Ho will direct to the conclusion
that the claim is true or it can be concluded that there is no sufficient evidence to support
the alternative hypothesis. In addition, the conclusion should be affirmed in the context of
the problem and the level of significance and sample size used are must be started.
Format:
(Rejection or Non-rejection) of the null hypothesis (Ho) means that (State the
paragraph form of the symbolic form in step 1; if “Rejection of the Ho” state
the Ha and if “Non-rejection of the Ho” state the Ho) base on the sample of (n)
using (0.01, 0.05, 0.10) level of significance. Therefore the claim of (Who is
claiming?) is (True or not).
1
Example #1
A Barangay Captain from a certain barangay in Valenzuela City claims that the average
monthly income of families with 5 members from his vicinity is P12, 000. But when the
City Statistics Office (CSO) conducted survey with 100 families with 5 members, to his
barangay randomly they found out that they only have an average monthly income of 10,
800 with a standard deviation of 1, 500. With this information the CSO assert that the
claim is not true. Using 0.05 level of significance test the claim of the Brgy. Captain.
Let µ = the average monthly income of families with 5 members.

Step 1.
Ho: µ = 12, 000 – The average monthly income of families with 5 members is 12,000
Ha: µ ≠ 12, 000 – The average monthly income of families with 5 members is not 12,000
Step 2. Level of Significance
α = 0.05
Step 3. Test Statistics

n = 100
Z – test; 1 – mean/s; 2 – tailed test
Step 4. Define the area of rejection
-1.96 1.96
Decision Rule:
Reject Ho if the computed values is less than or greater than 1.96. Otherwise do not reject
Ho.
Step 5. Compute for the z – value.
x́−μ
z= σ
√n
10,800−12,000
z= 1,500
√100
z=-8
Step 6. Decision
Since the computed z - value (-8) is less than the tabular value (-1.96). Therefore,
Rejected the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Rejection of the null hypothesis (Ho) means that the average monthly income of families
with 5 members is not 12, 000 base on the sample of 100 using 0.05 level of significance.
Therefore, the claim of Brgy. Captain is not true.
Example #2
Suppose that the Barangay Captain made the assertion that the average weekly income of
families with 5 members from his locality is greater than P12, 000. Considering the data
gathered by the CSO, what possible conclusions can be drawn? Assuming that the Brgy.
Captain is 99% confidence about his claim.
Let µ = the average weekly income of families with 5 members.

Step 1.
Ho: µ < 12, 000 – The average weekly income of families with 5 members is less than
12,000
Ha: µ > 12, 000 – The average weekly income of families with 5 members is greater than
12,000

α = 0.05
n = 100
Z – test; 1 – mean/s; 1 – tailed test
2.33
Decision Rule:
Reject Ho if the computed values is greater than 2.33. Otherwise do not reject Ho.
x́−μ
z= σ
√n
10,800−12,000
z= 1,500
√100
z=-8
Step 6. Decision
Since the computed z - value (-8) is less than the tabular value (2.33). Therefore, Rejected
the null hypothesis (Ho) at 0.01 level of significance.
Rejection of the null hypothesis (Ho) means that the average weekly income of families
with 5 members is greater than 12, 000 base on the sample of 100 using 0.01 level of
significance. Therefore, the claim of Brgy. Captain is not true.
Example #3
A Physics Professor claims that there is no significant difference between the mean
scores obtained by students in afternoon and morning session. If the professor is 95%
confident with his claim, perform the hypothesis testing.
Morning Afternoon
Mean 85 83
SD 15 10
N 40 40
Let µ1 = mean score of obtained by the morning session

Let µ2 = mean score of obtained by the afternoon session
Step 1.
Ho: µ1 = µ2 – There is no significant difference between the mean score obtained by
student in morning and afternoon
Ha: µ1 ≠ µ2 - There is significant difference between the mean score obtained by student
in morning and afternoon
α = 0.05
n1 = 40 n2 = 40
z – test; 2 – mean/s; 2 – tailed test
-1.96 1.96
Decision Rule:
Reject Ho if the computed values is less than or greater than + 1.96. Otherwise do not
reject Ho.
x́ 1−x́ 2
z= σ 21 σ 22
√ +
n1 n2
85−83
z= 152 102
√ +
40 40
z = 0.7016
Step 6. Decision
Since the computed z - value (0.7016) is less than the tabular value (1.96). Therefore, do
not reject the null hypothesis (Ho) at 0.05 level of significance.
Non Rejection of the null hypothesis (Ho) means that the mean score of obtained by the
morning and afternoon session is 85 and 83 base on the sample of 40 using 0.05 level of
significance. Therefore, the claim of professor is true.
Example #4
The mean score obtained by OLFU students in entrance examination is 87. A group of 25
freshmen students scored a average of 85 with a standard deviation of 5. Based on the
result the admission office asserts that the group’s average score is lower than 87. If you
were one of those student would you agree?
Make a necessary statistical analysis to support your answer use 0.05 level of
significance.
Let µ = the mean score obtained by OLFU students in entrance examination.
Step 1.
Ho: µ < 87 – The average score is less than 87
Ha: µ > 87 – The average score is greater than 87
α = 0.05
n = 25
t – test; 1 – mean/s; 1 – tailed test
n-1 = 24
1.711
Decision Rule:
Reject Ho if the computed values is greater than 1.711. Otherwise do not reject Ho.
Step 5. Compute for the t – value.
x́−μ
t= s
√n
85−87
t= 5
√ 25
t = -2
Step 6. Decision
Since the computed t - value (-2) is less than the tabular value (1.711). Therefore, do not
reject the null hypothesis (Ho) at 0.05 level of significance.
Non Rejection of the null hypothesis (Ho) means that the mean score obtained by OLFU
student is less than 87 base on the sample of 25 using 0.05 level of significance.
Therefore, the claim of admission is true.
Example #5
A rice dealer claims that the average mass of his sack of rice is 50kg. A sample of 20
sacks were taken and found to have a mean mass of 49.3kg with a standard deviation of 1
kg. Are you going to agree with the claim of the dealer? (Use 0.05 level of significance)
Let µ = the average mass of rice dealer sack of rice.
Step 1.
Ho: µ = 50 kg – The average mass of sack of rice is 50 kg
Ha: µ ≠ 50 kg – The average mass of sack of rice is not 50 kg
α = 0.05
n = 20
n-1 20-1=19
2.093
-2.093 2.093
Decision Rule:
reject Ho.
x́−μ
t= s
√n
49.3−50
t= 1
√ 20
t = -3.1305
Step 6. Decision
Since the computed t - value (-3.1305) is less than the tabular value (-2.093). Therefore,
Rejected the null hypothesis (Ho) at 0.05 level of significance.
Rejection of the null hypothesis (Ho) means that the average mass of rice dealer of sack
of rice is not 50 kg. base on the sample of 20 using 0.05 level of significance. Therefore,
the claim of rice dealer is not true.
Example #6
A group of nursing student selected two brands of pain reliever and test the average time
of each to take effect. For each brand the following was determined.
Brand x s n
A 5.2 mins 1 min 15 trials
B 4.7 mins 1.6 mins 14 trials
Assume unequal variances (σ 12≠ σ 22) and unknown. Test the hypothesis that there is no
significant difference between the average time of brand A and B to take effect, using
0.01 as the level of significance.
Let µ1 = the average time of brand A to take effect
Let µ2 = the average time of brand B to take effect
Step 1.
Ho: µ1 = µ2 – There is no significant difference between the time of brand average A and
B to take effect
Ha: µ1 ≠ µ2 - There is significant difference between the time of brand average A and B to
take effect
α = 0.01
n1 = 15 n2 = 14
n-1 = 13
3.012
-3.012 3.012
Decision Rule:
reject Ho.
x́ 1−x́ 2
t= s21 s 22
Sp
√ +
n1 n2
5.2−4.7
t= 12 1.6 2
√ +
15 14
t = 1.0010
Step 6. Decision
Since the computed t - value (1.0010) is less than the tabular value (3.012). Therefore, do
not reject the null hypothesis (Ho) at 0.05 level of significance.
Non Rejection of the null hypothesis (Ho) means that the average time of brand A and B
to take effect is 5.2 mins and 4.7 mins base on the sample of 15 and 14 using 0.01 level
of significance. Therefore, the claim of nursing student is true.
Pearson Product – Moment Correlation (r)

 The most familiar sort of statistical tool in quantifying the linear relationship
between two random variables x and y.
 Data are parametric (numerical measurement describing a characteristic of

sample.
Formula:
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
Steps in solving correlation:
1. State the null hypothesis (H0) and the alternative hypothesis (Ha).
2. Determine the tabular value (TV), df = n-2
3. Determine the computed value (CV).
4. State the conclusion.
a. Decision
i. rc>rt = reject H0
ii. rc<rt = accept Ha
r Verbal Interpretation
0.00 No Correlation
± 0.01 to ± 0.20 Slight Correlation
± 0.21 to ± 0.40 Low Correlation
± 0.41 to ± 0.70 Moderate Correlation
± 0.71 to ± 0.80 High Correlation
± 0.81 to ± 0.99 Very High Correlation
± 1.00 Perfect Correlation
Example #1
The researcher wants to determine if there is any substantial relationship between
the height (cm) and weight (kg) of 5 female students of OLFU, using a 0.01 level of
significance.
Step 1: Ho: There is no correlation between the height and weight of female students.
Ha: There is a correlation between the height and weight of female students.
Step 2: df = n-2 α = 0.01
df = 5-2 TV = 0.959
df = 3
Step 3:
Female x y x2 y2 xy
1 155 68 24,025 4,624 10,540
2 160 44 25,600 1,936 7,040
3 130 70 16,900 4,900 9,100
4 150 55 22,500 3,025 8,250
5 145 50 21,025 2,500 7,250
N=5 740 287 110,050 16,985 42,180
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
5(42,180)−(740)(287)
r= 2 2
√ [ 5(110,050)−( 740 ) ] [5(16,985)−( 287 ) ]
r =−0.5686674282∨−0.5687
Step 4:
Based on the result of r = 0.57 which is less than the tabular value of 0.959 accept
Ho and decline Ha. The height and weight of female students had no relationship. Based
on the result of r implies moderate correlation.
Example #2
The researchers wants to know if there is a correlation between the sleeping hours
and score in a quiz of 7 male students using a 0.05 level of significance.
Step 1: Ho: There is no correlation between the sleeping hours and score in a quiz of 7
male students.
Ha: There is a correlation between the sleeping hours and score in a quiz of 7
male students.
Step 2: df = n-2 α = 0.05
df = 7-2 TV = 0.754
df = 5
Step 3:
Male x y x2 y2 xy
1 3 0 9 0 0
2 6 7 36 49 42
3 4 4 16 16 16
4 8 10 64 100 80
5 4 3 16 9 12
6 7 8 49 64 56
7 9 10 81 100 90
N=7 41 42 271 338 296
N ∑ xy−∑ x ∑ y 7( 296)−( 41)(42)

r= r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ] 2 2
√ [ 7(271)− ( 41 ) ] [7(338)−( 42 ) ]
r =0.9706058416∨0.9706
Step 4:
Based on the result of r = 0.97 which is greater than the tabular value of 0.754
reject Ho and accept Ha. The sleeping hours and score in quiz of male students had a
relationship. Based on the result of r implies high correlation.
Example #3
The tour guide wants to determine if there is any substantial relationship between
the number of tourist in the morning and afternoon in 5 tourist destination. Used α = 0.01
as level of significance.
Step 1: Ho: There is no correlation between the morning and afternoon in tourist
destinations.
Ha: There is a correlation between the morning and afternoon in tourist
destinations.
Step 2: df = n-2 α = 0.01
df = 5-2 TV = 0.959
df = 3
Step 3:
Tourist Morning Afternoon
x2 y2 xy
Destination (x) (y)
1 25 30 625 900 750
2 20 70 400 4,900 1,400
3 25 20 625 400 500
4 35 10 1,225 100 350
5 40 30 1,600 900 1,200
N=5 145 160 4,475 7,200 4,200
N ∑ xy−∑ x ∑ y 5( 4,200)−(145)(160)
r= r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ] 2 2
√ [ 5(4,475)−( 145 ) ] [5(7,200)−( 160 ) ]
r =0.587136564∨0.5871
Step 4:
Based on the result of r = 0.59 which is less than the T.V. of 0.959 accept H o and
reject Ha. The morning and afternoon in tourist destination had no relationship. Based on
the result of r implies low correlation.
Example #4
After 10 days of listing the no. of buyers in 10 products, a business man wants to
determine if there is a correlation between the no. of buyers and price of the products
using the level of significance α = 0.01.
Step 1: Ho: There is no correlation between the no. of buyers and price of the products.
Ha: There is a correlation between the no. of buyers and price of the products.
Step 2: df = n-2 α = 0.01
df = 10-2 TV = 0.765
df = 8
Step 3:
Products x y x2 y2 xy
1 12 25 144 625 300
2 10 30 100 900 300
3 7 30 49 900 210
4 15 40 225 1,600 600
5 30 10 900 100 300
6 25 12 625 144 300
7 17 20 289 400 340
8 19 20 361 400 380
9 55 10 3,025 100 550
10 70 5 4,900 25 350
N = 10 260 202 10,618 5,194 3,630
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
10 (3,630)−(260)(202)
r= 2 2
√ [ 10(10,618)−( 260 ) ] [10( 5,194)−( 202 ) ]
r =0.7825374433∨0.7825
Step 4:
Based on the result of r = 0.78 which is greater than the T.V. of 0.765 decline H o,
accept Ha. The no. of buyers and price of products had a relationship. Based on the result
of r implies very high correlation.
Example #5
The following data were obtained in a study of the relationship between the
weight and chest size of infants at birth using a level of significance α = 0.05.
Step 1: Ho: There is no correlation between the weight and chest size of infants at birth.
Ha: There is a correlation between the weight and chest size of infants at birth.
Step 2: df = n-2 α = 0.05
df = 9-2 TV = 0.666
df = 7
Step 3:
Weight Chest Size
Infants x2 y2 xy
(x) (y)
1 5.64 29.5 31.8096 870.25 166.380
2 4.41 26.3 19.4481 691.69 115.983
3 9.00 32.2 81.0000 1,036.84 289.800
4 11.32 36.5 128.1424 1,332.25 413.180
5 7.08 27.2 50.1264 739.84 192.576
6 8.86 27.7 78.4996 767.29 245.422
7 4.74 28.3 22.4676 800.89 134.142
8 8.82 30.3 77.7924 918.09 267.246
9 7.61 28.7 57.9121 823.69 218.407
N=9 67.48 266.7 547.1982 7,980.83 2,043.136
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
9 (2,043.136)−(67.48)(266.7)
r= 2 2
√ [ 9(547.1982)−( 67.48 ) ] [9 (7,980.83)−( 266.7 ) ]
r =0.7684014652∨0.7684
Step 4:
Based on the result of r = 0.77 which is greater than the T.V. of 0.66 reject H o and
accept Ha. The weight and chest size of infants at birth had a relationship. Based on the
result of r implies high correlation.

STATISTICS AND PROBABILITY

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATISTICS AND PROBABILITY

Uploaded by

Copyright:

Available Formats

STATISTICS

Two Types of Random Variable

Mean and Variance of Discrete Random Variable

mean = median = mode

- The area under the normal curved is approximately 1 or 100%

Formula and notation

2. from 0 to -2.01 = 0.4778 or 47.78%

3. above 1.37 = 0.0853 or 8.53% (0.5 - 0.4147 = 0.0853)

4. above -1.12 = 0.8686 or 86.86% (0.5 + 0.3686 = 0.8686)

5. below -2 = 0.0228 0r 2.28% (0.5 – 0.4772 = 0.0228)

Population Variance Sample Variance

X (Sample) (x-x̄) (x-x̄ )2

X (Sample) (x-x̄) (x-x̄ )2

Mean (x̄) Population Variance Poppulation S.D

Sample x̅ x̅ f p(x̅) x̅*p(x̅) x̅-µ x̅ ( x̅-µ x̅ )2

CENTRAL LIMIT THEOREM

ESTIMATION OF POPULATION PARAMETER

Level of Confidence Z-Score

CONFIDENCE INTERVAL ESTIMATOR

x́ Point estimate and center of

a Multiples of the standard error to

σ Standard error of the mean of the

a σ Product of the confidence

STEPS IN CONSTRUCTING THE CONFIDENCE INTERVAL

Distinction between t-value and z-value

Confidence interval estimates for population proportion

Formula for estimation of Population Parameter

STEPS IN CONSTRUCTING THE CONFIDENCE INTERVAL

1. Check the assumption

STEPS IN PERFORMING HYPOTHESIS TESTING

Table 2. The equality symbols for Ho and Ha

Table 3. Types of Error

Step 3. Determine the Test Statistics to Use

General Categories of Statistical Test

B. Non – Parametric Test is being used based on the ff.

STEP 4. DEFINE THE AREA OF REJECTION OR THE CRITICAL REGION

STEP 5. COMPUTE FOR THE VALUE OF STATISTICAL TEST

Let µ = the average monthly income of families with 5 members.

Step 3. Test Statistics

Let µ = the average weekly income of families with 5 members.

Step 2. Level of Significance

Let µ1 = mean score of obtained by the morning session

Pearson Product – Moment Correlation (r)

 Data are parametric (numerical measurement describing a characteristic of

N ∑ xy−∑ x ∑ y 7( 296)−( 41)(42)

You might also like