Professional Documents
Culture Documents
( STATISTICAL METHODS )
INTRODUCTION
Decision making is an important aspect of our lives. We make decisions based on the
Information we have, our attitudes, and our values. Statistical methods help us examine information.
Moreover, statistics can be used for making decisions when we are faced, with uncertainties. For
instance, if we wish to estimate the proportion of people who will have a severe a reaction to a
dengue/flu a shot without giving a shot to everyone who wants it, statistics provides appropriate
methods. Statistical method enable us to look at information from a small collection of people or
items and make inferences about a larger collection of people or items.
* Definition of Terms
Statistic - numerical data of statistics such as mean, median, or standard deviation
Statistics – is the study of systematic collection, presentation, analysis, and interpretation
of numerical data
Collection - process of obtaining data
Presentation - organization of data into graphs, charts, or tables
Analysis - process of extracting relevant information from the organized data
Interpretation – refers to the task of drawing conclusion, making predictions from the analyzed data
Critical thinking is essential in understanding and evaluating information. There are more than
a few situations in statistics in which the lack of critical thinking can lead to conclusions that are
misleading or incorrect.
Statistical literacy is fundamental for applying and interpreting statistical results. Students need
to know correct statistical terminology. The knowledge of correct terminology helps the students
focus on correct analysis and processes.
Calculations and computers are very good at providing the numerical results of statistical
processes. It is up to the user of statistics to interpret the results in the context of application.
When correct process is used to analyze the data, What do the results mean?
The general prerequisite of statistical decision making is the gathering of data. We need to
Identify the individuals or objects to be included in the study and the characteristics or features of
the individuals that are of interest.
* Individuals are people or objects included in the study.
* Variable is the characteristics of the individual to be measured or observed
* Quantitative variable has a value or numerical measurement for which operation as in addition
or averaging make sense
*Qualitative variable describes an individual by placing the individual into category or group
such as male or female, non-numerical observation
Levels of Measurement
Parameter is numerical measure that describes an aspect of population
Statistic is a numerical measures that describes a sample
Levels of Measurement
Nominal level - applies to data that consist of names, label or categories. No criteria by
which data can be ordered
Ordinal level - applies to data that can be arranged in order. Differences between
data are meaningful
Interval level - applies to data that can be arranged in order
Ratio level - applies to data that can be arranged in order. In addition, both differences
between data values and ratios of data values are meaningful.
1. Kim, James, Peter are names of three students from the population of students in a
university. Solution: These data are at nominal level. These data values are simply
Names. We cannot determine if one name is “greater than or lesser than another. Any
ordering of the names would numerically meaningless.
2. In a high school graduating class of 320 students, John ranked 25th, James ranked 19th,
Walter ranked 10th, and Joe ranked 4th.
3. Water temperatures in degree Celsius of milk fish pond in Victorias City, varies from
1 – 2, 3 – 4 and 5 -6
4. Height of basketball players. A is ¾ taller than B
1.Financial analyst who develops stock portfolio based on historical rates of returns
2. Economist who uses statistical models to help explain and predict variables such as
Inflation rates, unemployment rate and changes in GDP
3.Market researcher who surveys consumers and converts the responses into useful
Information, say consumer preference of dairy products, medicine for cough, etc.
Statisticians are also statistic practitioners, frequently conducting empirical research and as
consultant
Kinds of Sampling
Methods of Sampling
*Pitfalls of Survey:
1. Nonresponse _ Individuals either cannot be contacted or refuse to
participate. Non response can result in significant under coverage of
a population.
2.Truthfulness of response: Respondents may lie intentionally or
inadvertently
3.Faulty recall: Respondents may not accurately remember when or
whether and event took place
4.Hidden bias : The order of questions might lead to biased response.
also, the number of responses on a Likert scale may force responses
that do not reflect the respondent’s feelings or experience
5.Vague wording: Words such as “often”, “seldom”, and occasionally
mean different things to different people.
6.Interviewer influence: Factors such as tone of voice, body language,
dress, gender authority, and ethnicity of the interviewer might
influence response.
7.Voluntary response: Individuals with strong feelings about the subject
are more likely than others to respond. Such a study is interesting
but not reflective of the population.
Statistical Measures
3.Mode, Mo is value which appears most often. It may or may not exist
Measures of Variability
The three measures of Central Location do not by themselves give
adequate description of the data. It is needed to know how the
observations spread out from the average.
Variance is the average of the squares of the deviations of individual
values from the mean
Standard Deviation is a special form of the average deviation from
the mean. It is the square root of variance.
Range is the difference between the largest and the smallest measure
Chebyshev’s Theorem
The two values most often used are the mean and standard
deviation. If the distribution has a small SD, we would expect most of the
values to be grouped closely around the mean. However, a large values of
the SD indicates a greater variability in which case we would expect the
observations to be more spread out from the mean.
Chebyshev a Russisn mathematician discovered that a fraction of the
Problem Exercise: If the IQ scores of a random sample of 1080 college students have
mean score of 120 and std. dev. of 8, determine the interval 810 of the IQ scores.
Solution: (1 - 1/K2) = 810/1080 = 3/4 1 – 3/4 = 1/K2 1/4 = 1/K2 K = 2, then
̅ ± 𝟐(𝟖) ; 𝟏𝟐𝟎 ± 𝟏𝟔; 𝟏𝟎𝟒 − 𝟏𝟑𝟔 , th are us at least 3/4 of 1080 or
𝑿
810 of the IQ scores are found at interval 104 - 136.
Note: The variable X will take on a value within KS thus, X ± KS where K refers to the
number of standard deviation . If K = 2, and S = 8, therefore KS = 16
Solution: If the number of passengers is within 2.5 SD for any dat𝑎 set, at
2. Z – Score measures the number of standard deviations the variable X is from the
mean. It is a measure of relative location of the observation in a data set. Observation
in a data set. Observations in two different data sets with the same Z-score can be said
to have the same relative location in terms of the same number of standard deviation
from the mean. Z-score is used to compare two observations from two different
populations or samples in order to determine their relative rank. A method of ranking
these two observations is to convert the individual observations into standard deviation
units known as Z-Score or Z-values.
𝑿−𝒖 ̅
𝑿−𝑿
Z= or Z=
𝝈 𝑺
Problem Exercise:
Let us assess the accomplishment of a student in Math 1 and Physics 1. The
student’s score in Math 1 is 82 and 89 in Physics 1. Can we conclude that the
student is a better student in Physics 1?
Soln. We should consider how the student performed relative to other student’s
in each class. For Math 1 the mean score of the class is 68 with SD of 8 while in
Physics 1 the mean score of the class is 80 with SD of 6.
𝟖𝟗−𝟖𝟎
Physics 1; µ = 80 𝝈 = 𝟔 𝑿 = 𝟖𝟗 Z = (X - µ)/𝝈 = 𝒁 = 𝟔
= + 1.50
Since the Z- score of student in Math1 is higher than his Z-score in Physics 1, the
Student’s relative performance in Math 1 is better than his performance in Physics 1.
Conclusion: The student is a better student in Math 1.
Problem Exercises:
1. Let us assess the encoding speed of an applicant whether he is suited for Dean’s
Office, Bus. Office, or Personnel Office.
𝑺𝒕𝒅.𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
CV = 𝑴𝒆𝒂𝒏
The smaller coefficient of Variation the more uniform the
distribution
𝑿− 𝝁 ̅
𝑿−𝑿
Z= 𝝈
or Z= 𝒔
Where: Z is the normal deviate and X is some specified value for the random variable.
Z measures the number of standard deviations an observation is from the mean.
After the conversion process, the mean of the distribution is 0 and SD is 1
Illustration: Telecom , a telephone answering service for business executive in Metro Bacolod
has found that the average telephone message is 150 sec. with SD of 15 sec. The
length of message is normally distributed a particular phone message took 180
seconds. How many seconds is it longer than the average?
𝑿−𝝁 𝟏𝟖𝟎−𝟏𝟓𝟎
Solution. Z = Z = = 2 SD’s or 30 seconds above the mean
𝝈 𝟏𝟓
What is the probability that the single message takes between 150 sec. and 180 sec.
𝟏𝟖𝟎−𝟏𝟓𝟎
Solution: Z = 𝟏𝟓 = 2 From normal curve table, Z = +/- 2.0 is equivalent to .4772
Telecom concludes that there is 47.72% chance that any single telephone message will
last 150 sec to 180 sec.
Exercises: Find the area under the normal curve. Use the normal curve table
Problem Exercises:
1. A random sample of 1000 construction workers gave their average daily wage at
P420 w ith SD of P35. Assuming that daily wages to be normally distributed,
a. what is the probability that a worker selected at random earns between Commented [F1]:
P490 and P380 a day?
b. how many workers earn less than P450 a day?
c. if workers who earn P480 and above are asked to contribute P70 for a sick
co-worker, how much is the expected contributions?
2. A study of prevailing market prices for one day shows that the average price of
rice
per kilo is P40.00 with SD of P1.50. Assuming that prices are normally distributed,
a. What percentage of rice sells at higher than P43.00 per kilo?
b. If in a particular market, 1000 sacks of rice were sold, how many kilos were
sold
at less than P42.00 per kilo? (1 sack = 50 kilos)
c. What average price per kilo should the government try to maintain so that
80% of rice sells at not more than P42.50 .
3. The average life of a certain type of motor is 10 years. The manufacturer replaces
free all motors that fail while under guarantee. If he is willing to replace only 3%
of the motors that fail, how long a guarantee should he offer? SD is 2 years.
4. Assume that heights of women in a population follow a normal curve with mean
of 64.3 inches and SD of 2.6 inches.
bna. What proportion of women stand between 60 inches and 66 inches?
b. A certain woman has a height of 0.5 SD above the mean., What proportion
of women are taller than she ?
5. A distribution of test scores in Statistics follow a normal distribution with mean of
80 and std. deviation of 12. There are 120 students who took the test.
a. How many scores do you expect to find above 100
b. How many scores do you expect to find between 90 and 110
6. A soft drink machine is regulated so that it discharges an average of 210 ml
per cup. If the amount of drink is normally distributed with standard deviation
of 18 ml,
a. What fraction of the cup will contain more than 230 ml?
b. How many cups will likely overflow if 245 ml cups are used in the next
1000 drinks
c. Below what value do we get the smallest 20 % of the drinks?
7. The IQs of 600 applicants to the College of Engineering are approximately
normally distributed with a mean of 115 and standard deviation of 12. If the
college requires an IQ of at least 95, how many of these students will be rejected
on this basis regardless of their other qualifications?
The values of the coefficient of correlation are between -1.0 and +1.0
If r is +1.0, it indicates that the two variables are perfectly related in a
a positive sense which means, if X increases, Y also increases; if X decreases,
Y do likewise. If r is negative, it indicates that X and Y are not linearly related,
meaning, if X increases, Y decreases.
r = 0.00 No correlation
r from +/- .01 to +/- .19 Negligible correlation
r from +/- .20 to +/- .39 Low Correlation
r from +/- .40 to +/- .59 Moderate Correlation
r from +/- .60 to +/- .79 Moderately High Correlation
r from +/- .80 to +/- 1.00 High Correlation
* Strong correlation does not imply any cause and effect relationship
*Causation: Changes in X causes changes in Y
*Common response: Changes in both X and Y are caused by changes in the
lurking variable Z
*Confounding: The effect (if any) of X on Y is confounded with the effect on Y of a
lurking variable Z
Association Between X and Y
1. Causation: Changes in X cause changes in Y
Example: Quitting smoking may reduce a person’s chance of
getting lung cancer if causation holds.
2. Common response: Both X and Y respond to changes in some unobserved
variables. Y can sometimes be predicted from X, but intervening
to change X would not bring about change in Y. The genetic
hypothesis claims that both smoking behavior and lung cancer are
responses to genetic predisposition; quitting smoking does not
3. Confounding: The effect on Y of the explanatory variable X is hopelessly
mixed up with the effects on Y of other variables. The “sloppy
lifestyle” hypothesis claims that smoking is confounded with other
types of behavior, so that we have no information about the effect
of smoking alone on health.
Techniques of Correlation
𝟔∑𝒅𝟐
r = 1- 𝑵(𝑵𝟐 −𝟏)
Where: 6 = constant
N = no. of pairs
d = difference between ranks
Example: Relationship between Height and Weight of persons
𝑵∑𝑿𝒀− (∑𝑿)(∑𝒀)
r=
√[𝑵(∑𝑿𝟐 )− (∑𝑿)𝟐 )] [𝑵(∑𝒀𝟐 )−(∑𝒀)𝟐 ]
3. Deviation from mean
∑(𝒅𝒙 .𝒅𝒚)
r= ̅
Where: dx = X - 𝑿 ̅
dy = Y - 𝒀
√ (∑𝒅𝒙𝟐 )(∑𝒅𝒚𝟐 )
Illustration:
(1) The relationship between AGE of machines and its REPAIR costs
X Y
Age in Years Repair costs in P 10 3
2 1.0
4 2.2
5 2.5
7 3.0
8 4.5
10 5.0
3. Ten candidates for Miss Philippines were ranked by Judge X and Judge Y
according to beauty and talent. Find If the choice of Judge X is consistent
with the choice of Judge Y
Candidates: A B C D E F G H I J
Judge X : 2 3 1 6 4 5 10 7 8 9
Judge Y : 2 1 3 5 6 4 8 10 9 7
Advantages:
1. Rank Order – provides a convenient way of estimating coef. of correlation if
N is small
2. Pearson Product-Moment(PPM) – takes into account the absolute size of
the measures and not merely their rank position
Coefficient of Determination, r2 expresses the proportion of total variation in Y that
can be accounted for or explained by the independent variable X.
Thus, r = .60, r2 = .36 meaning, 36 % of the variation in Y is accounted for by X
REGRESSION ANALYSIS
̂ = a + bX
𝒀 Where: ̂ = estimated value of Y
𝒀
a = y- intercept, that is value of Y when X = 0
b = slope i.e. change in Y for every unit change in X
X = independent variable
𝑵∑𝑿𝒀−(∑𝑿)(∑𝒀)
where: b= ̅ - b𝑿
a= 𝒀 ̅
𝑵∑𝑿𝟐 −(∑𝑿)𝟐
̂ = a + b X ± SE est
thus, 𝒀
SE est is the standard error of estimate. It measures the disperse on about an average
line called regression line.
Exercises:
1. An assembly plant wanted to study the relationship between the age of machine
and its repair cost. The following data represent the Age in years and Repair costs
of a random sample of 8 machines.
5 3.5
6 5.0
7 5.0
9 5.2
12 6.0
13 6.0
15 6.2
16 7.1
a. Determine the coefficient of correlation, r
b. Estimate the repair cost of ten- year old machine
2. The data below represent the electrical energy consumption in KWH and the
amount due over a period of 6 months.
3. Data below represent the supply of product A and its price per unit
Price per unit Supply (103) units
P 25 100
20 120
30 80
25 110
35 90
30 100
40 75
4. The marketing research dept. of A-1, Inc. wanted to study the relationship
between the Advertising expenditures and Sales volume of a certain product.
5 40
7 45
10 60
12 65
15 75
20 80
25 95
(a) Relationship between the teaching performance and tenure in years of faculty
Teaching Tenure in
Performance, % Years
84 4
86 6
90 14
87 8
92 15
94 12
95 16
80 5
85 7
88 10
5a. Calculate the coefficient of correlation by PPM
5b . Estimate the performance rating of a teacher who has been in the
profession for 9 years
Problem illustration:
Mr. de Los Santos has been concerned for sometimes with the overhead
costs in his furniture shop. For the last 7 months he has kept a record not only of the
direct labor hours in the shop but also the total costs of lumber used in the operation.
The data are found in the following table:
Kinds of Hypothesis
4. Non directional hypothesis does not specify the direction of relationship between
variables. It merely states the presence or absence of a relationship between two
variables or that one variable influences the other variable.
Level of Significance,𝜶
The significance level of a test is the maximum value of the probability of
rejecting the null hypothesis when in fact it is true.
5% level of significance implies that you are 95% confident that you have
made the right decision of accepting or rejecting the hypothesis.
1. Z-test is used when the population standard deviation is known. Sample size is 30
or more (n is 30 or more), Ronald Walpole, Intro. To Statistics
2. t-test is used when sample standard deviation is known. Sample size is less than 30.
Note: The normal approximation of the sample mean is good when n ≥ 30
If the sample size is small n< 30, the values of s2 fluctuate considerably from
sample to sample and the distribution of the mean is no longer a standard
normal distribution.
Uses of Z – test:
̅−µ
𝑿
Z= √𝒏
µ = population mean 𝑿 ̅ = sample mean
𝝈
n = sample size 𝝈 = Population standard deviation
Note: For Z – test, use table of critical values of Z based on the area under the
normal curve.
̅̅̅̅̅̅̅̅̅̅
𝑿𝟏−𝑿𝟐 ̅̅̅̅
Z= 𝟏 𝟏
n1 = first sample size n2 = second sample size
+
𝝈√𝒏𝟏 𝒏𝟐
Accept Ho if the absolute value of the computed Z is less than the table value.
Reject Ha.
Reject Ho if the absolute value of the computed Z is equal or greater than the table
Value. Accept Ha.
Uses of t-test:
̅ −µ
𝑿
t=
𝒔
√𝒏 − 𝟏 S = sample standard deviation
̅̅̅̅̅̅̅̅̅̅̅
𝑿𝟏− 𝑿𝟐 ̅̅̅̅
t =
𝒏𝟏𝒔𝟏𝟐+𝒏𝟐𝒔𝟐𝟐 𝟏 𝟏
√ ( + )
𝒏𝟏+𝒏𝟐−𝟐 𝒏𝟏 𝒏𝟐
3. Test of two samples, n1 = n2
𝑿𝟏− 𝑿𝟐
t= ∑𝒅𝟐
√𝒏(𝒏−𝟏)
Problem Exercises:
1. Two types of wires are being compared for strength. Fifty pcs. of each type of wires
are tested under similar conditions. Type A has an average tensile strength of 78.3 N,
while type B has tensile strength of 87.2 Nt. The combined standard deviation of wire
is 5.6 Nt. Which type of wire is stronger. Test at 1% alpha.
2. A sample survey of TV program in Metro Bacolod shows that 80 of 200 men prefer
watching NBA. From another group , 75 of 250 prefer watching PBA. What is the
is the preference of men? Test at 5% alpha.
3. A cigarette manufacturer claims that his cigarettes has an average nicotine content of
1.83 mg. and sample SD of .11 mg If a random sample of 28 cigarettes of this type
has an average nicotine content of 1.90 mg , will you agree with the claim of the
manufacturer? Test at 1% alpha.
1 2.0 1.9
2 2.0 1.9
3 2.3 2.0
4 2.1 2.1
5 2.4 2.3
6. Data from the subdivision survey shows that average monthly electrical consumption
of residential homes is 150 KWH with standard deviation of 18 KWH. A sample of 70
residential homes were selected randomly and were found to consume on the
average 190 KWH . Are the 70 residences consume significantly more than the rest?
Test at 2.5% alpha.
7. Alpha Company manufactures steel cable with an average tensile strength 150 Nt.
The laboratory tested 30 pcs. and found to have average tensile strength of 145 Nt.
and standard deviation of 6.5 Nt. Is the result of the laboratory in accordance with
what the manufacturer claim? Test at 5% level of significance.
8. Two types of thread are compared for strength. Twenty five pieces of each type of
thread are tested under similar conditions. Type X has an average strength of 78.3
N, and standard deviation of 5.6 N. Type Y has an average strength of 87.2 N
and standard deviation of 6.2 N. Which thread is stronger. Test at 5% level of
significance.
9. The average length of time for students to register at certain college has been 50
minutes. A new registration procedure using modern computing machine is being
tried. If a random sample of 28 students had an average registration time of 40
minutes with standard deviation of 12 minutes under the new system, can we
conclude that the new system had reduced significantly the registration time? Test
at 1% level of significance.
Chi Square is used as a test of significance when data are expressed in frequencies
or data are in terms of percentages or proportions and that can be reduced to
frequency.
The applications of chi square are with discrete data; however, any continuous data be
reduced to categories and the data so tabulated that chi square may be applied.
Example: Scores on a test of mental ability and dexterity test could be tabulated into
a contingency table.
Dexterity Test Score
To use the Chi square statistic, the data must be independent, i.e., no response
is related to any other responses. Also the categories into which data are placed must be
mutually exclusive, i.e. the frequency must be placed in one and only one category. And
finally, all data must be used. All the observed data must be used in a chi square
problem.
Classification of Data
1. to test the “goodness of fit” to a normal curve, i.e., to find out whether or not a
sample distribution conforms to hypothetical/ideal distribution
Example: Tossing of a coin 10 times
fo fe
Head 4 5
Tail 6 5
fo fe
Reduced in weight 18 22
Did not reduce in weight 12 8
Status
School Hired Not hired Total
Problem Examples:
1.In a public opinion poll conducted on attitude towards women in the military
were sampled. Some 113 subjects were interviewed. The question asked was
“Do you favor women in the military?” Test at 2% alpha if sex is independent
of response.
RESPONSE
Male 30 20 5 55
Female 38 18 2 58
__________________________________________________
Total 68 38 7 1 13
3. In 100 tosses of a coin, 63 heads and 37 tails are observed. Is this a balanced coin?
Test at 1% alpha.
With hypertension 21 36 30 87
No hypertension 48 26 19 93
_______________________________________________________
Ho:
Ha:
Note: Chi Square for a 2x2 table with df = (2-1)(2-1) = 1 without computing for fe
A c K
b d l
m N N
𝑵(𝒂𝒅−𝒃𝒄)𝟐
𝝌2 =
𝒌𝒍𝒎𝒏
𝑵(𝒂𝒅−𝒃𝒄)𝟐 𝟓𝟎𝟎(𝟏𝟕𝟓𝒙𝟔𝟎−𝟏𝟒𝟎𝒙𝟏𝟐𝟓)𝟐
Solution: 𝝌𝟐 = 𝒌𝒍𝒎𝒏
= 𝟑𝟎𝟎𝒙𝟐𝟎𝟎𝒙𝟑𝟏𝟓𝒙𝟏𝟖𝟓
= 7.007 > 3.841 Ho: Rejected
Ha: Accepted
df = (2-1)(2-1) = 1 𝜶 = 5% Table, App. D 𝝌𝟐(. 𝟎𝟓) = 𝟑. 𝟖𝟒𝟏
Note: For other types of table, say 2x3, 3x3, 3x4, 4x5, 5x5
df = (row -1)(column -1)
fe =( RxC )/N
Problem Exercise
A study was conducted to determine the relationship between sales and location
of fast food.
Sales
ANALYSIS OF VARIANCE
(ANOVA)
Analysis of Variance is a parametric test which is widely used and highly developed statistical
methods for comparing three or more means
1. The individuals in the group and subgroups are selected randomly from a normally
distributed population
2. The samples that constitute the groups are independent
Sources of Variation:
1 12 18 6 144 324 36
2 18 17 4 324 289 16
3 16 16 14 256 256 196
4 8 18 4 64 324 16
5 6 12 6 36 144 36
6 12 17 12 144 289 144
7 10 10 14 100 100 196
_________________________________________________________________
Total 82 108 60 1068 1726 640
̅
𝑿 11.71 15.43 8.57
Ho: No significant difference between the distance traveled by cars with one liter of
gasoline
Ha: There is a significant difference between the distance traveled by cars with one
liter of gasoline
Steps: 1. TSS = 𝚺X2 - (∑X)2/ N C.F. = (∑X)2/N Correction Factor = (250)2/21 = 2976.19
TSS = 3434 - (250)2/21 = 457. 81
ANOVA TABLE
Source of Variation Sum of Squares df Mean Square
Conclusion at 5% alpha:
Conclusion at 1% alpha:
Problem Exercises:
Trainees Raters
A B C D
1 10 6 8 7
2 4 5 3 4
3 8 4 7 4
4 3 4 2 2
5 6 8 6 7
6 9 7 8 7
(2) In an experiment designed to compare the effects of coaching on the scores obtained
on an aptitude test used for entrance to a professional school, three levels of
coaching were used : none, 4 hours, and 12 hours. A random sample of 18
applicants was chosen from the population of applicants. Their scores are given
below: Test at 5% alpha.
Scores Obtained
Group None 4 Hours 12 Hours
1 30 32 35
2 27 30 33
3 26 29 32
4 24 27 30
5 22 25 27
6 20 24 26
State: Ho:
Ha: