Professional Documents
Culture Documents
Quarter4 Statistics Lecture Notes
Quarter4 Statistics Lecture Notes
NAME: ____________________________________
JUST REFLECT
• Sometimes we hear claims on social media that we find unbelievable. Such as: a
whitening product advertisement stating that if you use their whitening product, then your
skin is like snow white.
• The weatherman stating that there is a 90% chance of rain tomorrow.
We might feel compelled to challenge such claims. To challenge claims, we must run a
research study upon a sample (since the surveying the entire population would be
impossible). To test a claim, you must write two hypotheses.
Hypothesis testing is a decision-making process for evaluating claims about a
population.
• Null hypothesis (Ho), is basically, “The population is like this.” It states, in formal terms,
that the population is no different than usual.
• Alternative hypothesis (Ha), is, “The population is like something else.” It states that
the population is different than the usual, that something has happened to this
population, and as a result it has a different mean, or different shape than the usual
case.
In order to state the hypothesis correctly, the researcher must translate the claim into
mathematical symbols. There are three possible sets of statistical hypotheses.
• A type I error occurs if one rejects the null hypothesis when it is true.
• A type II error occurs if one does not reject the null hypothesis when it is false.
Generally, statisticians agree on using three arbitrary significance levels: the 0.10,
0.05, and 0.01 level. That is, if the null hypothesis is rejected, the probability of type I error
will be 10%, 5% and 1%, and the probability of a correct decision will be 90%, 95% and
99%, depending on which level of significance is used. In other words, when α = 0.05, there
is a 5% chance of rejecting a true null hypothesis.
• You can reflect on these figures which are commonly used hypothesis testing
in research:
After a significance level is chosen, a critical value is selected from a table for the
appropriate test.
If the test is two-tailed, the critical value will be either positive or negative. If the test is
left-tailed, the critical value will be negative. If the test is right-tailed, the critical value will
be positive.
JUST LEARN
A hypothesis is essentially an idea about the population that you think might be true, but
which you cannot prove to be true. While you usually have good reasons to think it is true,
and you often hope that it is true, you need to show that the sample data support your idea.
JUST REFLECT
• You can reflect on these statements which are commonly used in research.
•
•
•
•
•
•
•
•
•
• The symbol ≠ in the alternative hypothesis suggests either a greater than ( > )
relation or a less than ( < ) relation.
• When the alternative hypothesis utilizes the ≠ symbol, the test is said to be
non-directional. Also called a two-tailed test.
• When the alternative hypothesis utilizes the > or the < symbol, the test is said
to be directional, may either be called left-tailed or right-tailed.
These are the graphical representations of two-tailed test and the one-tailed test:
JUST LEARN:
JUST EVALUATE
4th QUARTER – Week 3:
Directions: Choose the letter that corresponds to your answer. Write your answer on a
separate sheet.
2. If the hypothesis contains the greater than symbol (>) the rejection region is
______.
3. If the hypothesis contains the less than symbol (<) the rejection region is ____.
• If the computed value of the test statistic falls in RR, we reject the null hypothesis (Ho)
and accept the alternative hypothesis (H1).
• If the value of the test statistic does not fall into the rejection (critical) region, we accept
Ho. The region, other than the rejection region, is the acceptance region.
• Typical values for α are 0.01, 0.05 and 0.1. It is a value that we select based on the
certainty we need. In most cases, the choice of α is determined by the context we are
operating in, but 0.05 is the most commonly used value.
JUST LEARN:
DO IT IN A GROUP:
PROBLEM 1. Professor Balenciaga has reported her students’ grades for several semesters
and the average for all the grades of these students is 83. Her new class of 28 students
seem to be higher than the average of ability and she wants to demonstrate that the current
class is superior to the previous classes according to their average." Is there sufficient
evidence for the class average of 86.2 and the standard deviation of 12 present to support
her argument that the current class is superior? Using the 0.05 significance level.
PROBLEM 2. Professor Balenciaga has reported her students’ grades for several semesters
and the average for all the grades of these students is 83. Her new class of 30 students
seem to be higher than the average of ability and she wants to demonstrate that the current
class is superior to the previous classes according to their average." Is there sufficient
evidence for the class average of 86.2 and the standard deviation of 12 present to support
her argument that the current class is superior? Using the 0.05 significance level.
JUST EVALUATE:
Directions: Choose the letter that corresponds to your answer. Write your answer on a
separate sheet.
1. Null hypothesis is rejected as direct evidence that the alternative hypothesis is:
Test statistic is a value computed from the data. The test statistic is used to assess
the evidence in rejecting or accepting the null hypothesis. Each statistic test is used for a
different test.
JUST LEARN
HYPOTHESIS TESTING ON A POPULATION MEAN
(a) There is enough evidence to conclude that the mean number of hours is more
than 4.75.
(b) There is enough evidence to conclude that the mean number of hours is more
than 4.5.
(c) There is not enough evidence to conclude that the mean number of hours is
more than 4.5.
(d) There is not enough evidence to conclude that the mean number of hours is
more than 4.75.
4th QUARTER – Week 5
JUST LEARN
6. Decision
• If we reject 𝐻0, we can conclude that 𝐻𝐴 is true.
• If, however, we do not reject 𝐻0, we may conclude that 𝐻0 is true.
1. Data:
2. Assumption:
3. Hypothesis:
4. Test Statistics:
5. Decision Rule:
6. Decision:
4th QUARTER – Week 6
Test Statistic for Population Proportion
JUST LEARN:
Have you ever wondered whether tall people have longer arms than short people?
We’ll explore this question by collecting data on two variables — height and arm span
(measured from left fingertip to right fingertip).
Directions: Study the table given and answer the questions that follow.
Person Number 1 2 3 4 5 6 7 8 9 10 11 12
Arm Span 156 157 159 160 161 161 162 165 170 170 173 173
Height 162 160 162 155 160 162 170 166 170 167 185 176
The methods we employ to do this depend on the type of variables we are dealing
with; that is, they depend on whether the data are numerical or categorical. The ways of
measuring the relationship between the following pairs of variables.
• a numerical variable and categorical variable (for example, height and nationality)
• two categorical variables (for example, gender and religious denomination)
• two numerical variables (for example, height and weight)
In a relationship between two variables, if the values of one variable ‘depend’ on the
values of another variable, then the former variable is referred to as the dependent variable
and the latter variable is referred to as the independent variable.
BIVARIATE DATA - consist of two (2) variables can be dependent is the variable that
can cause the dependent variable to change. or dependent variable is the variable that
is influenced or affected by the independent variable.
It is useful to identify the independent and dependent
variables where possible since it is the usual practice when
displaying data on a graph the independent variable on the
horizontal axis and the dependent variable on the vertical axis.
EXAMPLE 1.
You want to test a new dosage of drug that supposedly prevents sneezing in people
allergic to flowers.
• Variable in the -axis: new dosage of drug
• Variable in the -axis: Sneezing
EXAMPLE 2.
A soap manufacturer wants to prove that a little amount of detergent can remove
greater amount of stain.
• Variable in the -axis: amount of detergent.
• Variable in the -axis: Amount of stain removed.
FORM OF AN ASSOCIATION
2. Linear form – when the points tend to follow a straight line.
3. Non-linear form – when the points tend to follow a curved line.
2. FORM (DIRECTION) - Refers to the direction of change in variable when variable gets
bigger. If variable also gets bigger, the slope is positive; but if variable gets smaller, the
slope is negative.
POSITIVE NEGATIVE
Positive association exists between Negative association exists
the variables if the gradient of the line is between the variables if the gradient of the
positive, that is, the dots on the scatterplot line is negative, that is, the dots on the
tend to go up as we go from left to right. scatterplot tend to go down as we go from
left to right.
DIRECTION OF AN ASSOCIATION
3. Positive – gradient of the line is positive.
• Negative – gradient of the line is negative.
4. VARIATION (STRENGTH) - Refers to the degree of “scatter” in the plot. If the dots are
widely spread, the relationship between variables is weak. If the dots are concentrated
around a line, the relationship is strong.
STRENGTH OF AN ASSOCIATION
Strong- small amount of scatter in the plot.
Moderate – modest amount of scatter in the plot.
Weak – large amount of scatter in the plot.
EXAMPLE 3.
Determine the relationship between the height and arm span. The date data
collected on these variables is shown in the table of ordered Pairs.
Height
172 159 178 162 156 174 151 162 165 185 186 176 166 180 158
(cm)
Arm
Span 172 162 182 164 159 180 151 165 168 189 188 184 167 184 161
(cm)
JUST EVALUATE
Directions: Construct a scatterplot using the tables and describe the a. shape (form), b.
trend (direction), and c. strength (variation).
4th QUARTER – Week 8
THE PEARSON PRODUCT-MOMENT CORRELATION
Directions: Identify the direction and the strength of the following correlation given. Choose
your answer from the box.
TASK: Research on the life of Karl Pearson and his important contributions in the field of
statistics. Do not forget to copy and study the formula he proposed for computing the
coefficient of correlation( r).
Correlation coefficient, computed from the sample data measures the strength and
direction of a linear relationship between two variables. The strength of correlation is indicated
by the coefficient of correlation. There are several coefficients of correlation. One that is most
commonly used in linear correlation is Pearson Product-Moment coefficient of correlation,
symbolized by r, named in honor of the statistician who did a lot of research on this area, Karl
Pearson.
Where, r is called the Pearson correlation coefficient. This indicates the degree of relationship
between the two values,
X is the values in the first set of data,
Y is the values in the second set of data, and
n is the total number of values/data pairs.
The Pearson correlation coefficient, r, can take a range of values from +1 to -1.
•
A value greater than 0 indicates a positive correlation; that is, as the value of one
variable increases, so does the value of the other variable.
•
A value less than 0 indicates a negative association; that is, as the value of one variable
increases, the value of the other variable decreases.
•
A value of 0 indicates that there is no correlation between the two variables.
• The direction of the points scattered tells the direction of correlation that exists between
the variables.
The stronger the association of the two variables, the closer the Pearson correlation
coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or
negative, respectively. See table below (Table of range of values).
• Achieving a value of +1 or -1 means that all your data points are included
on the line of best fit – there are no data points that show any variation
away from this line. Values for r between +1 and -1 (for example, r = 0.7
or -0.3) indicate that there is variation around the line of best fit.
• The closer the value of r to 0 the greater the variation around the line of
best fit. It indicates the closeness of the point to the trend line.
• The closer the points are to the trend line, the stronger the relationship is.
LESSON 2
Meaning
✓ A correlation coefficient of 1 means that for every positive increase in one variable, there is a
positive increase of a fixed proportion in the other. For example, shoe sizes go up in
(almost) perfect correlation with foot length.
✓ A correlation coefficient of -1 means that for every positive increase in one variable,
there is a negative decrease of a fixed proportion in the other. For example, the amount
of gas in a tank decreases in (almost) perfect correlation with speed.
✓ Zero means that for every increase, there isn’t a positive or negative increase. The two
just aren’t related.
Let’s find the value of the correlation coefficient from the table below.
STEP 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
STEP 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 =
4,257.
STEP 3: Take the square of the numbers in the x column and put the result in the x2 column.
STEP 4: Take the square of the numbers in the y column, and put the results in the y2 column.
Subject Age x Glucose level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298, which
means the relationship between variables is moderate positive correlation.
Assumptions
For the Pearson r correlation, both variables should be normally distributed (normally
distributed variables have a bell-shaped curve). Other assumptions include linearity and
homoscedasticity. Linearity assumes a straight-line relationship between each of the two
variables and homoscedasticity assumes that data are equally distributed about the
regression line.
JUST EVALUATE
I. Directions: Calculate r and make a generalization regarding the information that you get
from the computed correlation coefficient for each of the following:
a. ∑X = 225 b. ∑X = 32 c. ∑X = 180
∑Y=22 ∑Y = 1105 ∑Y = 147
∑X = 9653
2
∑X = 220
2
∑X2 = 6914
∑Y2 = 143 ∑Y2 = 364525 ∑Y2 = 5273
∑XY = 651 ∑XY = 3402 ∑XY = 4013
n=6 n=6 n=7
II. Directions: Solve the Problem.
The following are the heights of a father and his eldest son, in inches:
Heights of the Father 71 69 67 68 68 66 70 72 65 60
Heights of the Eldest Son 71 69 69 65 66 63 68 70 60 58
QUESTION: Do the data support the hypothesis that height is hereditary? Explain.
Accompany your explanation with statistical computations.
III. Directions: Read the statement carefully and choose the best answer.
For items 1 – 5. Complete the table below.
Consider the scores obtained in Math(X) and Statistics (Y) subjects by 10 students.
Observation Math Score (X) Stat Score (Y) X2 Y2 XY
1 5 2 25 4 10
2 8 7 64 49 56
3 10 8 100 64 80
4 12 9 144 81 108
Sum