Professional Documents
Culture Documents
Materials in
STAT 20053
STATISTICAL ANALYSIS
WITH SOFTWARE
APPLICATION
For the sole noncommercial use of the
Faculty of the Department of Mathematics and Statistics
Polytechnic University of the Philippines
2020
Contributors:
Elizon, Katrina
Baccay, Edcon
Bautista, Lincoln A.
Aranas, Peter John
Usona, Laurence P.
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
COLLEGE OF SCIENCE
Department of Mathematics and Statistics
The final grade will be based on the weighted average of the student’s
scores on each test assigned at the end of each lesson. The final SIS
grade equivalent will be based on the following table according to the
approved University Student Handbook.
Prepared by:
Katrina D. Elizon
Faculty Member, Department of Mathematics and Statistics
College of Science
Contents
INTRODUCTION TO THE
Statistics plays a major role in many aspects of our
lives. It is used in sports, for example, to help a
general manager decide which player might be the
STATISTICAL best fit for a team. It is used in politics to help
candidates understand how the public feels about
• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on empirical
evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
convert massive data into pertinent Definitions:
information that can be used in decision
• Universe is the set of all entities under
making.
study.
• Statistics can provide us information that we
• A Population is the total or entire group of
can use to make sensible decisions.
individuals or observations from which
What information is referred to in the information is desired by a researcher. Apart
definition? from persons, a population may consist of
mosquitoes, villages, institution, etc.
The information referred to the definition is the
data. According to the Merriam Webster • An individual is a person or object that is a
dictionary, data are “factual information used member of the population being studied.
as a basis for reasoning, discussion, or
• A statistic is a numerical summary of a
calculation”.
sample.
Data can be numerical, as in height, or
• Sample is the subset of the population.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Descriptive statistics consist of organizing
and summarizing data. Descriptive statistics
Field of Statistics
describe data through numerical summaries,
A. Mathematical Statistics- The study and tables, and graphs.
development of statistical theory and methods
• Inferential statistics uses methods that
in the abstract.
take a result from a sample, extend it to the
B. Applied Statistics- The application of population, and measure the reliability of the
statistical methods to solve real problems result.
involving randomly generated data and the
• A parameter is a numerical summary of a
development of new statistical methodology
population
motivated by real problems. Example branches
of Applied Statistics: psychometric, Example: Consider the Scenario.
econometrics, and biostatistics.
You are walking down the street and notice
Limitation of Statistics that a person walking in front of you drops
Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100
qualitative phenomenon. except you. Since you could keep the money
without anyone knowing, would you keep the
2. Statistics does not study individuals. money or return it to the owner?
In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generalization contains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential
of his soybean crop. He randomly sampled Statistics)
100 plants and weighted the soybeans on
each plant. 3. Janine wants to determine the variability of
her six exam scores in Algebra.
Population: Entire soybean crop (Descriptive Statistics)
Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.
For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)
- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Both interval and ratio data involve B. ______________________________
measurement. Most data analysis techniques
that apply to ratio data also apply to interval 2. Every year the PSA releases the Current
data..Therefore, in most practical aspects, Population Report based on a survey of
these types of data (interval and ratio) are 50,000 households. The goal of this report
grouped under metric data. In some other is to learn the demographic characteristics,
instances, these type of data are also known such as income, of all households within
as numerical discrete and numerical the Philippines.
continuous.
A. ______________________________
Example:
B. ______________________________
Categorize each of the following as nominal,
ordinal, interval or ratio measurement. 3. Researchers want to determine whether or
not higher folate intake is associated with a
1. Ranking of college athletic teams.
lower risk of hypertension (high blood
(Ordinal)
pressure) in women (27 to 44 years of
2. Employee number. (Nominal) age). To make this determination, they look
at 7373 cases of hypertension in these
3. Number of vehicles registered. (Ratio) women and find that those who consume
at least 1000 micrograms per day of total
4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension
compared with those who consume less
5. Number of car passers along C5 on a
than 200.
given day. (Ratio)
A. ______________________________
6. Zip code (Nominal)
B. ______________________________
7. Degree of pain (Ordinal)
II. Indicate whether the following statements
ACTIVITIES/ASSESSMENTS:
require the use of descriptive or inferential
Read each item carefully. Write the answer statistics.
on the yellow paper. Answers Only.
______________1. A teacher wants to know
I. A research objective is presented. For the attitudes of all students towards abortion.
each, identify the (A) population and (B)
______________2. A market analyst of a sales
sample in the study.
firm draws a chart showing the sales figures of
8. A polling organization contacts 2141 male a given product for the period 2006-2007.
university graduates who have a white-
______________3. A forecaster predicts the
collar job and asks whether or not they had
results of an election using the number of
received a raise at work during the past 4
votes cast in 15 out of 25 barangays.
months.
______________4. Men are better in math
A. ______________________________
than women.
_____________5. Forty percent of the ______________10. Brands of soft drinks
employees of an organization were recorded
tardy for at least 15 working days. ______________11. Socioeconomic status
7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.
9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.
An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
Second survey asked: “A recent television - Unrealistic Controlled Environments
show examined various theories about Elvis
- Inability to Control for All Variables
Presley’s death. Do you think it is possible that
Elvis is alive or not?” 5. Observation is a technique that involves
systematically selecting, watching and
8% of the respondents to the first question said
recoding behaviors of people or other
it is possible that Elvis is still alive and 16% of
phenomena and aspects of the setting in which
respondents to the second question said it is
they occur, for the purpose of getting (gaining)
possible that Elvis is still alive.
specified information. It includes all methods
3. A focus group is a group interview of from simple visual observations to the use of
approximately six to twelve people who share high level machines and measurements,
similar characteristics or common interests. A sophisticated equipment or facilities such as:
facilitator guides the group based on a
- Radiographic
predetermined set of topics.
- biochemical
4. Experiment is a method of collecting data
where there is direct human intervention on the - X-ray machines
conditions that may affect the values of the
- Microscope
variable of interest.
- Clinical examinations
Bear in mind that the experimental method has
several limitations that you should be aware of. - Microbiological examinations
The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.
SAMPLE SIZE
3. Degree of Variability
( e )
Three criteria need to be specified to
Zσ
determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
The sample size required to obtain a 2
1 Z
4 (e)
confidence interval for p with specified margin n≥ ≈ 385
of error e is given by
2 Where:
(e)
Z
n≥ p(1 − p)
Confidence level is 95%.
N
n≥
1 + Ne 2
Where:
Example:
The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥
n −1
Some definitions are needed to make the
1+ o notion of a good sample more precise.
N
Definitions: - Deliberately or purposively selecting a
“representative” sample.
• Observation unit - An object on which a Misspecifying the target population.
measurement is taken. This is the basic unit Failing to include all of the target population
of observation, sometimes called an element. in the sampling frame, called
In studying human populations, observation undercoverage.
units are often individuals. Including population units in the sampling
frame that are not in the target population,
• Target population - The complete collection
called overcoverage.
of observations we want to study.
- Having multiplicity of listings in the sampling
• Sampled population - The collection of all
frame.
possible observation units that might have
Substituting a convenient member of a
been chosen in a sample; the population
population for a designated member who is
from which the sample was taken.
not readily available.
• Sample - A subset of a population.
- Failing to obtain responses from all of the
• Sampling unit - A unit that can be selected chosen sample. (Nonresponse)
for a sample. We may want to study
- Allowing the sample to consist entirely of
individuals, but do not have a list of all
volunteers.
individuals in the target population. Instead,
households serve as the sampling units, and Advantage of Sampling Over Complete
the observation units are the individuals Enumeration
living in the households.
- Less Labor
• Sampling frame - A list, map, or other
specification of sampling units in the - Reduced Cost
population from which a sample may be - Greater Speed
selected. For a survey using in-person
interviews, the sampling frame might be a list - Greater Scope
of all street addresses.
- Greater Efficiency and Accuracy
• Sampling technique/Sampling Strategies - - Convenience
It is a plan you set forth to be sure that the
sample you use in your research study - Ethical Considerations
represents the population from which you
Two Type of Samples
drew your sample.
1. Probability Sample
• Sampling Bias - This involves problems in
your sampling, which reveals that your - Samples are obtained using some objective
sample is not representative of your chance mechanism, thus involving
population. randomization.
The following examples indicate some ways in
which selection bias can occur:
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.
Sampling Procedure
N PopulationSize
k= =
n SampleSize
Given:
50
(N) ( 500 )
n
n1 = N1 = 200 = 20
50
(N) ( 500 )
n
n2 = N2 = 300 = 30
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.
Example:
Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
• Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS:
accordance with an expert’s judgment.
I. Determine if the source would be a primary
Cases wherein Non-Probability Sampling is or a secondary source.
Useful
______________1. Government Records
- Only few are willing to be interviewed
______________2. Dictionary
- Extreme difficulties in locating or identifying
subjects ______________3. Artifact
REFERENCES:
https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
✦ Distinguish the three main forms of data presentation.
✦ Know the different parts of the table.
✦ Choose appropriate diagrams/graphs to present a given set of
data.
✦ Organize qualitative and quantitative data in tables.
✦ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
✦ Describe the shape of a distribution.
✦ Identify regions under the normal curve corresponding to
different standard normal values.
✦ Compute probabilities using the standard normal table and Excel.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Data Presentation
Data are usually collected in a raw format and thus
the inherent information is difficult to understand.
Therefore, raw data need to be summarized,
processed, and analyzed to usefully derive
information from them. However, no matter how well
manipulated, the information derived from the raw
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
and readers. Planning how the data will be presented
is essential before appropriately processing raw data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Presentation of Data
Presentation of data refers to an exhibition
or putting up data in an attractive and useful
manner such that it can be easily interpreted.
The three main forms of presentation of data
are:
Textual Presentation
Tabular Presentation
Graphical Presentation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Textual Presentation
• All the data is presented in the form of text,
phrases, or paragraphs.
• It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.
• Text is the principal method for explaining
findings, outlining trends, and providing
contextual information.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A researcher is asked to present the performance of a section in
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
The data presented in textual form would be like this:
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above,
while only 3 got 19 and below. Generally, the students
performed well in the test with 23 or 70% getting a passing
score of 38 and above.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
✦ Keep your paragraphs simple and short.
Advantage of Tabular
Presentation
✦ More information may be presented.
✦ Exact values can be read from a table to
retain precision.
✦ Flexibility is maintained without
distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Preparing Tables
The making of a compact table itself is an art. This should
contain all the information needed within the smallest possible
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the
table. It should answer the questions:
✦ Who? White females with breast cancer, black males with
lung cancer.
✦ What are the data? Counts, percentage distributions, rates.
https://byjus.com/commerce/tabular-presentation-of-data/
Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
1. Set an interval or range for your data. It is
needed for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click
“DATA ANALYSIS” on the tool bar
3. The dialog box “DATA ANALYSIS” will appear
and choose “HISTOGRAM” on the dialog box
then click OK.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW”
then click “OK”.
7. The result will appear on the new worksheet of
the excel file. Get the Percentage and total.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Answer:
✦ Useless Information – Don’t show decimals if they are not
needed.
✦ Poor Alignment – Make sure alignment makes sense.
• Don’t center numbers, always right justify – try to align
decimal points.
• Consider the appropriate placement of row titles.
✦ Difficult to Read – Use commas used when the number exceeds
a thousand.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Graphical Presentation
✦ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
✦ It is considered an important medium of communication
because we are able to create a pictorial representation of
the numerical figures.
✦ Suited when we need to show the results of the study to
nonprofessionals and or people who dislike numbers and too
lengthy texts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Bar Graph
✦ It is constructed by labeling each category
of data on either the horizontal or vertical
axis and the frequency or relative frequency
of the category on the other axis. Rectangles
of equal width are drawn for each category.
The height of each rectangle represents the
category’s frequency or relative frequency.
✦ It is use to organize discrete data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Bar graphs may also be drawn with horizontal
bars. Horizontal bars are preferable when
category names are lengthy.
• In bar graphs, the order of the categories does
not usually matter. However, bar graphs that
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Histogram
✦ It is constructed by drawing rectangles for each class of
data. The height of each rectangle is the frequency or
relative frequency of the class. The width of each rectangle
is the same and the rectangles touch each other.
✦ It is a graph used to present quantitative data, is similar to
the bar graph.
✦ It is use to organize continuous data.
Line Graph
✦ A graph that shows information that is
connected in some way (such as change over
time)
✦ Line segments are then drawn connecting the
points. It is use to organize continuous data.
✦ Very useful in identifying trends in the data
over time.
✦ It is rigidly defined.
where: where:
∑i=1 xi ∑i=1 fxi
xi = data values n xi = data values r
n = no. of
x̄ = f = frequency x̄ =
sample n n = no. of n
observations sample
observations
Population Mean
where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
xi = data values
N = no. of μ= f = frequency
μ=
observations N N
N = no. of
observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(2 )
1. Arrange the data from n
− < cf i
lowest to highest (or highest
x̃ = LB +
to lowest). f
where:
2. For an odd number of LB = lower boundary of the
data, the median of a data median class
set is the “middle i = class width
observation”. When the n = no. of observations
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
the two middle scores”. preceding the median class
f = frequency of the median
Polytechnic University of the Philippines
class
College of Science
Department of Mathematics and Statistics
Measures of Central Tendency:
MODE
• It is the most frequently occurring value in a list of data.
• It is sometimes called nominal average.
• It is an appropriate measure of average for data using the
nominal scale of measurement.
• It is the only measure of central tendency used in both
quantitative and qualitative data.
Advantage of Mode
✦ The mode is easy to understand.
✦ Like the median, it is not greatly affected by extreme
values.
✦ Like the median, it can be computed even when the
frequency distribution contains “open-ended” intervals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
( d1 + d2 )
d1
1.Obtain a frequency x ̂ = LB + i
distribution of the distinct
values of the data. where:
LB = lower boundary of the
2.The mode is the most modal class
i = class width
frequently occurring data
d1 = difference between the
(if there is one).
frequency of the modal class
and the class preceding it
d2 = difference between the
frequency of the modal class
and the class following it
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Whenever you hear the word average, be aware that
the word may not always be referring to the mean.
One average could be used to support one position,
while another average could be used to support a
different position.
• Mode is not always present in the data sets unlike
mean and median.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
than 120. Therefore, the mean should not be used when
unusual, or outlying, data values are present in the data set, as
the mean tends to be extremely sensitive to the unusual
values. Rather, the median should be reported in this case.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution:
To compute mean of grouped data, first you need to
fill out this table.
Class Frequency
x fx
Interval (f)
55 - 59 3
It is the midpoint of
50 - 54 6 every class interval.
45 - 49 7
To compute this:
LC + UP
40 - 44 9
x=
35 - 39 6
30 - 34 4
2
25 - 29 5 Ex:
7 55 + 59
fxi = x= = 57
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines
2
College of Science
Department of Mathematics and Statistics
Solution:
7
∑i=1 fxi
Frequency
Class Interval x fx
x̄ =
(f)
55 - 59 3 57 171
50 - 54 6 52 312 n
1,675
45 - 49 7 47 329
=
40 - 44 9 42 378
40
35 - 39 6 37 222
30 - 34 4 32 128
= 41.88
25 - 29 5 27 135
7
fxi = 1,675
Total n = 40 ∑
i=1
Solution:
Class n
First, compute , it will help us to
2
f LB < cf
Interval
55 - 59 3 54.5 40 determine the median class and the
50 - 54 6 49.5 37 < cf.
n 40
= = 20
45 - 49 7 44.5 31
40 - 44 9 39.5 24 2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The median class is the class
25 - 29 5 24.5 5 containing the 20th item. Hence, the
Total n = 40 median class is 40 - 44.
(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
d1 = 9 − 6 = 3
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Quartiles - split
the ordered data
into four quarters.
Percentiles - split
the ordered data
into 100 equal
parts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Formula for Quartile:
✦ For Ungrouped Data ✦ For Grouped Data
(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
nk
Qclass = + 0.5
where:
4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical
k = quartile position
observation corresponding
< cf = less than the cumulative
to that point is chosen for
frequency of the class
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
( 10 )
1. Arrange the data from nk
lowest to highest. Then use − < cf i
this formula. Dk = LB +
f
nk
Dclass = + 0.5 where:
10 LB = lower boundary of the
2. If the resulting decile class
positioning point is an i = class width
integer, the particular n = no. of observations
numerical observation k = decile position
corresponding to that point < cf = less than the cumulative
is chosen for the decile.If frequency of the class
preceding the decile class
not, use interpolation.
Polytechnic University of the Philippines
f = frequency of the decile class
College of Science
Department of Mathematics and Statistics
( 100 )
1. Arrange the data from nk
− < cf i
lowest to highest. Then use
this formula. Pk = LB +
f
nk
Pclass = + 0.5 where:
100 LB = lower boundary of the
2. If the resulting percentile class
positioning point is an i = class width
n = no. of observations
integer, the particular
k = percentile position
numerical observation
< cf = less than the cumulative
corresponding to that point
frequency of the class
is chosen for the percentile. preceding the percentile class
If not, use interpolation. f = frequency of the percentile
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
class
Example 1:
The data given below is the total number of hours
lost due to tardiness and absences of employees in a
company in a given year.
Month Hour Lost (x)
Find Q3, D4 and P55. January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(12)(3)
Qclass = = 9.5
4
2. Use interpolation since the computed Qclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Q3 = 40 + 0.5(42 − 40)
= 41
D4 = 30 + 0.3(32 − 30)
= 30.6
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: To compute P55 of ungrouped data:
(12)(55)
Pclass = + 0.5 = 7.1
100
2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Example 2:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute Q1, D7, and
P10.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute Q1, D7, and P10 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
55 - 59 3 54.5 40 determine the quartile class and the
50 - 54 6 49.5 37
< cf. nk (40)(1)
= = 10
45 - 49 7 44.5 31
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The quartile class is the class
25 - 29 5 24.5 5 containing the 10th item. Hence, the
Total n = 40 quartile class is 35 - 39.
(4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
55 - 59 3 54.5 40 determine the decile class and the
50 - 54 6 49.5 37
< cf. nk (40)(7)
= = 28
45 - 49 7 44.5 31
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The decile class is the class
25 - 29 5 24.5 5 containing the 28 item. Hence, the
Total n = 40 decile class is 45 - 49.
( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7
( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f
Example 2:
The ages of the town’s people in a certain community
is as follows:
Class Interval Frequency
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3
Solution:
To compute Q2, D5, and P50 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
subtract 0.5 to lower
32 - 38 38
class limit (LC).
39 - 45 20
Ex:
18 − 0.5 = 17.5
46 - 52 17
53 - 59 3
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
18 - 24 28 17.5 28 determine the quartile class and the
nk (160)(2)
25 - 31 54 24.5 82 < cf.
= = 80
4 4
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The quartile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 quartile class is 25 - 31.
(4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
18 - 24 28 17.5 28 determine the decile class and the
< cf. (160)(5)
25 - 31 54 24.5 82
nk
= = 80
10 10
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31.
( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54
( 100 )
nk
− < cf i (80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54
Sample Interpretation:
1. Jennifer just received the results of her SAT exam. Her
SAT Mathematics score of 600 is in the 74th percentile. What
does this mean?
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.
Measures of Dispersion/Variability
Based on the figure below, determine which between the
two scatter diagram illustrate larger variability?
Figure 1 Figure 2
Measures of Dispersion/Variability:
STANDARD DEVIATION
• It is a measure of how far away items in a data set are from
the mean.
• The larger the standard deviation, the more variation there
is in the data set.
• The standard deviation can never be a negative number,
due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
• The smallest possible value for the standard deviation is 0,
and that happens only in contrived situations where every
single number in the data set is exactly the same (no
deviation).
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
where: where:
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s =
n−1 values s = n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency
n = no. of sample observations
Population Standard Deviation
where: where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
N = no. of observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Measures of Dispersion/Variability:
VARIANCE
It represents all data points in a set and is calculated
by averaging the squared deviation of each mean.
Example 1:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute sample
standard deviation and sample variance.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: 7
∑i=1 f(xi − x̄)2
s=
n−1
Class
(xi − x̄) 2
f(xi − x̄) 2
3,124.20
Interval
55 - 59 228.61 685.83 s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86
s2 =
30 - 34 97.61 390.44
25 - 29 221.41 1107.05
n−1
7
f(xi − x̄)2 = 3,124.20
Total
∑
3,124.20 s2 =
40 − 1
i=1
= 80.11
Shape of Distribution
These two statistics give you insights into the shape of
the distribution.
✦ Skewness is the degree of distortion from the
symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.
3(x̄ − x̃)
where:
x̄ is the mean Sk =
x̃ is the median
s
s is the sample standard deviation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Kurtosis
It is actually the measure of outliers present in the
distribution. The outliers in a sample, therefore, have
even more effect on the kurtosis than they do on the
skewness.
Higher kurtosis means more of the variance is the
result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
the tails that mostly account for kurtosis, not the
central peak.
The kurtosis decreases as the tails become lighter. It
increases as the tails become heavier.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Normal Curve
50 100 150
The red curve is a model called the normal curve ,
which is used to describe continuous random variables
that are said to be normally distributed.
A continuous random variable is normally distributed,
or has a normal probability distribution, if its relative
frequency histogram has the shape of a normal curve.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
and μ + σ.
Mean:
✦ Changing the mean shifts the entire
curve left or right on the X-axis.
Standard Deviation:
✦ Changing the standard deviation
either tightens or spreads out the
μ1 < μ2, σ1 = σ2
width of the distribution along the X-
axis.
Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
A. C.
B. D.
Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.
= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Patterns for Finding Areas under a Standard Normal Curve
Using Table 1
D. Area to the right of a positive z value or to the left of a
negative z value.
= -
0 z1 0 0 z1
Area = 1
= -
0 z1 0 z1 0
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
A. Area to the right of a positive z value or to the left of a
negative z value.
Use Table 2 directly
z1 0 0 z1
B. Area between z values on same side of 0.
= -
z1 z2 0 z1 0 z2
= +
z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
D. Area to the right of a negative z value or to the left of a
positive z value.
= +
z1 0 z1 0 0
0.50 − Area Area = 0.50
E. Area between a given z value and 0.
= -
0 z1 0 0 z1
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example 1:
Scores on a standardized college entrance examination (CEE)
are normally distributed with mean 510 and standard
deviation 60. A selective university considers for admission
only applicants with CEE scores over 560. Find proportion of
all individuals who took the CEE who meet the university's
CEE requirement for consideration for admission.
Solution:
Given: μ = 510,σ = 60 and x = 560
Area = P(X > 560)
Step 1: Draw a normal curve and
shade the desired area.
X
450 510 570
Polytechnic University of the Philippines
560
College of Science
Department of Mathematics and Statistics
Example 2:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ = 38.72,σ = 3.17 and x = 35
Step 1: Draw a normal curve and shade
the desired area.
Area = P(X < 35)
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35
College of Science
Department of Mathematics and Statistics
−1.17
Use “TRUE”
for cumulative
since we want
the area under
the normal
curve.
Example 3:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the probability that a randomly selected three-year-
old girl is between 35 and 40 inches tall, inclusive.
Solution:
Given: μ = 38.72,σ = 3.17, and 35 ≤ X ≤ 40
Area = P(35 ≤ X ≤ 40)
Step 1: Draw a normal curve and
shade the desired area.
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35 40
College of Science
Department of Mathematics and Statistics
Using Table 1 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
( 3.17 3.17 )
=P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.6554 − 0.1210
= 0.5344
The probability a randomly
selected three-year-old female
is between 35 and 40 inches tall X
−2 −1 0 1 2
is 0.5344.
−1.17 0.40
X
−2 −1 0 1 2
Polytechnic University of the Philippines −1.17 0.40
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
2. What features
of the ‘Good
Presentation’
make it better
than the ‘Bad
Presentation’?
A.
B.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. Review the table and consider questions such as the
following.
Needs
Origin / Rating Poor Satisfactory V Good Excellent Total
Improvement
External 0% 2% 12% 19% 9% 41%
Internal 4% 8% 15% 23% 9% 59%
Grand Total 4% 10% 27% 41% 17% 100%
1. What percentage of the employees originated from within the
organization?
2. What percentage of the employees are both internal and rated
‘Very Good’?
3. What percentage of the employees received ‘Needs Improvement’
or ‘Poor’?
4. What category contains the greatest number of employees?
5. Do you see any notable differences in the percentage by category?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of
Salaries.
Salary Frequency Percentage
41,000 - 50,000 1 1%
51,000 - 60,000 20 13%
61,000 - 70,000 53 35%
71,000 - 80,000 43 29%
81,000 - 90,000 26 17%
91,000 - 100,000 6 4%
101,000 - 110,000 1 1%
Total 150 100%
1.What percentage of the employees earns less than or
equal 80,000?
2.What is the salary range of values?
3.What salary categories have percentage less than 5?
4.What salary category includes the most employees?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability that an instrument produced by this machine will last
A. less than 7 months.
B. between 7 and 12 months.
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
6. The lengths of human pregnancies are approximately normally distributed,
with mean μ = 266 days and standard deviation σ = 16 days.
What proportion of pregnancies lasts more than 270 days?
B. What proportion of pregnancies lasts less than 250 days?
C. What proportion of pregnancies lasts between 240 and 280 days?
D. What is the probability that a randomly selected pregnancy?
lasts more than 280 days?
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the
scores of 75 randomly selected students.
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
A. Based on the frequency distribution, compute measures of
central tendency, measures of variation, Q1, D9, P10 , Skewness
and kurtosis.
B. Based on the raw data, compute measures of central
tendency, measures of variation, Skewness and kurtosis using
Excel.
C. Compute Skewness and kurtosis of grouped and ungrouped
data. Make sure to describe the shape of the distribution
D. Do you think that computed value for grouped and
ungrouped data are the same?
8. Begin with the following set of data, call it Data Set I.
5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
A. Compute the sample standard deviation and sample mean of
Data Set I.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation
and sample mean of Data Set II.
C. Form a new data set, Data Set III, by subtracting 6 from
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III.
D. Comparing the answers to parts (a), (b), and (c), can you
guess the pattern? State the general principle that you expect
to be true.
References
https://prezi.com/rirrca9ckuiz/textual-
presentation-of-data/
https://www.toppr.com/guides/economics/
presentation-of-data/textual-and-tabular-
presentation-of-data/
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
What is HYPOTHESIS?
•A statement or claim regarding a characteristic of
one or more populations.
•A preconceived idea, assumed to be true but has to
be tested for its truth or falsity.
Reminders:
If you are conducting a research study and you want
to use a hypothesis test to support your claim, the
claim must be stated in such a way that it becomes
the alternative hypothesis, so it cannot contain the
condition of equality.
✦ Right tailed
Example:
H0: The defendant is innocent.
Ha: The defendant is not innocent.
Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.
Reminders:
It is important to note that we want to set
( α ) before we start our study because the
Type I error is the more ‘grevious’ error to
make.
The smaller (α ) is, the smaller the region
of rejection.
Decision Rule:
✦ Using Confidence Interval
Rejection of region
or critical region is
the set of all values of
the test statistic
which will lead to the
rejection of H0.
Acceptance Region is
the set of all values of
the test statistic that
leads the researcher to
retain H0.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
-2 0 2 -2 0 2
Two-tailed
Ha : μ1 ≠ μ2
Rejection Region
Rejection Region
STEP 1:
Rearrange
the data in
ascending
order.
Use "=DEVSQ( )”
function in excel
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
∑ i ( n+1−i
STEP 3: Calculate b as follows: b = a x − xi)
i=1
n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m=
2
Since n is even in this
example, m=8. That’s
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
why we used a1 to a8
Taking the ai weights from
the table of Shapiro -Wilk
Polytechnic University of the Philippines
College of Science
(based on the value of n)
Department of Mathematics and Statistics
We choose this
interval in the table of
Shapiro - Wilk,
because our n=16 and
our test statistic
(W=0.955) is within
Polytechnic University of the Philippines
this interval.
College of Science
Department of Mathematics and Statistics
Result
Inferential Statistics
1. Parametric Tests
✦ Assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met
so that the result of a parametric test is reliable.
✦ Apply to data in ratio scale, and some apply to data in
interval scale.
2. Non Parametric Test
✦ Refer to a statistical method in which the data is not
Example:
Determine whether the sample is independent or dependent.
1. An urban economist believes that commute times to
work in the South are less than commute times to work
in the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
Answer: Independent
2. In an experiment conducted in biology class, Prof.
Rhea measured the time required for 12 students to
catch a failing meter stick using their dominant hand
and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
Determine whether the sample is independent or
dependent.
3. A researcher wants to know if the mean
length of stay in for-profit hospitals is different
from the mean length of stay in not-for-profit
hospitals. He randomly selected 20 individuals in
the for-profit hospital and matched them with 20
individuals in the not-for-profit by diagnosis.
Answer:
Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Dependent Sample t - Test
The dependent sample t-test (also called
the paired t-test or paired-samples t-test)
compares the means of two related groups
to determine whether there is a statistically
significant difference between these
means.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your dependent variable should be measured at
the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two
categorical, "related groups" or "matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A teacher is interested to know if the new learning program
will help to increase the number of correct remembered
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test.
After the first test all subjects
are instructed how to use the
learning program and then
learn a second list of 50 words.
Learning performance is again
measured with the recall test. In
the following table the number
of correct remembered words
are listed for both tests.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis: Ho : μ1 ≥ μ2
The new learning program will not help to increase
the number of correct remembered words.
Alternative hypothesis: Ha : μ1 < μ2
The new learning program will help to increase the
number of correct remembered words.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.05
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is sufficient evidence to support that the new
learning program help to increase the number of
correct remembered words.
Result
Example:
Researchers wanted to know whether there was a difference in
comprehension among students learning a computer program
based on the style of the text. They randomly divided 18
students into two groups of 9 each. The researchers verified
that the 18 students were similar in terms of educational level,
age, and so on. Group 1 individuals learned the software using
visual manual (multimodal
instruction), while Group 2
individual learned the software
using textual manual (Unimodal
instruction). The following data
represent scores the students
received on an exam given to them
they studied from the manuals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Determine if the
variances are equal
or not equal.
Failed to
Reject Ho
Since we failed to reject Ho, we will proceed to t-test: Two
Sample Assuming Equal Variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is no enough evidence to support that
there is a difference in comprehension among
students learning a computer program based on
the style of the text.
Proper Presentation of Results
Assumptions
1. Your dependent variable should be measured at the
interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each category of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A Researchers wanted to compare math test scores of
students at the end of secondary school from various cities.
Eight randomly selected students from Makati, Manila,
and Quezon City each were administered the same exam;
the results are presented in the following table. Can the
researchers conclude
that the distribution of
exam scores is different
for each city at the
level of significance?
Determine if the
variances are equal
or not equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is enough evidence to support that the
distribution of exam scores of students in
mathematics is different for each city.
Result
Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
X X X
r = -1 r = -.6 r =0
Y Y
r = .6 r=1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reminders:
• Correlation does not imply causation.
• Watch out for hidden (lurking) variables.
Lurking Variable
• A variable that is not included as an explanatory
or response variable in the analysis but can affect
the interpretation of relationships between
variables.
• Can falsely identify a strong relationship between
variables or it can hide the true relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your two variables should be measured at the
interval or ratio level (i.e., they are
continuous).
2. There is a linear relationship between your
two variables.
3. There should be no significant outliers.
4. Your variables should be approximately
normally distributed.
Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of Pearson r
Note:
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A dietetics student wanted to look at the
relationship between calcium intake and
knowledge about calcium in sports
science students. Table shows the data
she collected. Is there a relationship
between calcium intake and knowledge
about calcium in sports science
students?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
Result
Assumptions
1. There are 2 variables, and both are measured as
categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1 and
no more than 20% of all cells may have expected
frequencies < 5.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
1. A doctor who knows that hypertension depends
on smoking habits can tell his smoking patients what
they should do.
2. If the traffic condition (light, moderate, heavy,
standstill) is found to be dependent on vehicle plate
numbers (odd, even) a traffic officer may decide to
revise traffic law enforcement.
Reminders:
The word contingency refers to
dependence, but this is only a
statistical dependence and cannot be
used to establish a direct cause-and-
effect link between the two variables in
question.
Example:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part
of a non-statistics degree course (e.g., psychology).
With current technology, it is possible to present
how-to guides for statistical programs online
instead of in a book. However, different people
learn in different ways. An educator would like to
know whether gender (male/female) is associated
with the preferred type of learning medium (online
vs. books). Use “Data_Example and Exercises file”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis:
Gender is independent with the preferred type of
learning medium.
Alternative hypothesis:
Gender is dependent with the preferred type of
learning medium.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.0.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Row Total
Grand Total
Column Total
6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results
Result
ACTIVITIES/ASSESSMENTS:
Determine whether the sampling is dependent or independent.
________1. A researcher wishes to compare academic
aptitudes of married mathematicians and their spouses. She
obtains a random sample of 287 such couples who take an
academic aptitude test and determines each spouses academic
aptitude.
________2. A political scientist wants to know how a random
sample of 18- to 25-year-olds feel about Democrats and
Republicans in Congress. She obtains a random sample of
1030 registered voters 18 to 25 years of age and asks, Do you
have favorable/unfavorable opinion of the Democratic/
Republican party? Each individual was asked to disclose his
or her opinion about each party.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
________3. An educator wants to determine whether a new
curriculum significantly improves standardized test scores for third
grade students. She randomly divides 80 third-graders into two
groups. Group 1 is taught using the new curriculum, while group 2 is
taught using the traditional curriculum. At the end of the school year,
both groups are given the standardized test and the mean scores are
compared.
________4. A stock analyst wants to know if there is difference
between the mean rate of return from energy stocks and that from
financial stocks. He randomly select 13 energy stocks and computes
the rate of return for the past year. He randomly selects 13 financial
stocks and compute the rate of return for the past year.
________5. An urban economist believes that commute times to work
in the South are less than commute times to work in the Midwest. He
randomly selects 40 employed individuals in the south and 45
employed individuals in the Midwest and determines their commute
times.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps
procedure.
1. A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements. The data are shown below.
I s t h e r e a s t a t i s t i c a l l y Normal Bone Osteopenia Osteoporosis
significant difference in mean Density
1200 1000 890
calcium intake in patients 1000 1100 650
with normal bone density as 980 700 1100
compared to patients with 900 800 900
osteopenia and osteoporosis? 750 500 400
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
800 700 350
ACTIVITIES/ASSESSMENTS:
2. Some studies have shown that in the United Men Women
(in $) (in $)
States, men spend more than women buying gifts
and cards on Valentine’s Day. Suppose a researcher 107.48 125.98
wants to test this hypothesis by randomly sampling 143.61 45.53
nine men and 10 women with comparable
demographic characteristics from various large cities 90.19 56.35
across the United States to be in a study. Each study 125.53 80.62
participant is asked to keep a log beginning one
70.7 46.37
month before Valentine’s Day and record all
purchases made for Valentine’s Day during that one- 83 44.34
month period. The resulting data are shown below.
129.63 75.21
Use these data and a 1% level of significance to test
to determine if, on average, men actually do spend 154.22 68.48
significantly more than women on Valentine’s Day.
93.8 85.82
Assume that such spending is normally distributed
in the population and that the population variances 126.11
are equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. A researcher is interested whether a training course increases
the teaching performance of the teachers who attended the
training courses. Test at 10% level of significance. The data are
shown below:
Case Before After Case Before After
1 85 95 11 89 97
2 84 98 12 87 98
3 86 97 13 82 95
4 87 92 14 81 95
5 89 96 15 86 92
6 82 93 16 89 91
7 80 94 17 89 94
8 84 95 18 84 95
9 86 90 19 85 96
10 82 82
Polytechnic University of the Philippines
20 88 97
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Head
4. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The following data represent the smoking status from a
random sample of 1054 U.S. residents 18 years or older by
level of education.
No. Of Years Smoking Status
of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51
References
h t t p s : / / w o l f w e b . u n r. e d u / h o m e p a g e / a n i a /
stat352f12lectures/352lecture21f12.pdf
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
http://www.real-statistics.com/tests-normality-
and-symmetry/statistical-tests-normality-
symmetry/shapiro-wilk-test/
Directions: Read each item carefully. Write the letter corresponding to the best answer on a yellow paper on each
item. Write NONE if no correct choice is given. Make sure to write also your solutions.
1. A bank surveyed all of its 60 employees to determine the proportion who participate in volunteer activities.
Which of the following statements is true?
(a) The bank should not use the data from this survey because this is an observational study.
(b) The bank does not need to use an inference procedure to determine the proportion of employees who
participate in volunteer activities because the survey was a census of all employees.
(c) The bank can use the result of this survey to prove that working for the bank causes employees to
participate in volunteer activities.
(d) The bank did not select a random sample of employees, so the survey will not provide the bank with useful
information.
2. In the design of a survey, which of the following best explains how to minimize response bias?
(a) Increase the sample size (c) Randomly select the sample
(b) Carefully word and field-test survey questions (d) Increase the number of questions in the survey
3. A body of principle, which deals with collection, analysis, interpretation and presentation of numerical facts or
data.
5. Which of the following statements regarding a researchers use of inferential statistics is true?
(a) It is best to measure every member of a population if possible.
(b) A random sample provides a perfect estimate of the population values.
(c) Descriptive statistics from a sample are used to estimate the characteristics of the population.
(d) We usually need to take several samples to obtain a good estimate of the population values.
7. What sampling technique is used when the respondents are chosen on the basis of pre-determined criteria set
by the researchers?
(a) cluster sampling (b) systematic sampling (c) purposive sampling (d) convenience sampling
(a) Normal (b) Unimodal (c) Negatively Skewed (d) Positively Skewed
(a) Increase (b) Decrease (c) stay the same (d) None of the above
11. If the statistics grades of Karen are 87, 85, 91, 89 and X, what must be the value of X so that the average is
89?
15. Mr. Martin had seven students in his after-school statistics tutorial. The scores they received on their last quiz
were as follows: 81, 73, 84, 78, 89, 82, 81. What was the mean score?
Page 2
(a) First Quartile (b) Fiftieth Percentile (c) Sixth decile (d) Third quartile
19. 5 is subtracted from each observation of a set, then the mean of the observation is reduced by
20. The standard deviation of 10 observations is 15. If 5 is added to each observations the value of new standard
deviation is
21. If the minimum value in a set is 9 and its range is 57, the maximum value of the set is
22. Which of the following situations exhibit the function of Inferential Statistics?
(a) The highest score obtained by BSS section 1 in their first quiz is 48.
(b) All the ten scores are closely scattered around the average value.
(c) Mathematical anxiety of the students will be related with their academic performance.
(d) Line graphs will be used to exhibit the fluctuating trend of monthly consumption of electricity.
23. Which of the following situations exhibit the function of Descriptive Statistics?
(a) Determining the most favored characteristics of the ideal teacher students perceived.
(b) Relating the number of absences committed by students with their academic performance.
(c) Citing the differences in perception of the male and female students towards NO ID-NO ENTRY policy.
(d) Comparing the course grades in Statistics of every section who are taking the subject during the first
semester.
For items 24 to 27, consider this situation. There were 200 students of PUP San Juan enrolled in General
Statistics in the first semester. A periodic examination was given and it was found out that the average score
is 93. When a random section with 50 students is chosen, it was found out thet 89 is the average score of the
section.
24. What do we call to the number 200?
(a) statistic (b) sample size (c) parameter (d) population size
(a) statistic (b) sample size (c) parameter (d) population size
(a) statistic (b) sample size (c) parameter (d) population size
(a) statistic (b) sample size (c) parameter (d) population size
For items 28 to 30, consider this situation.A group of undergraduate researchers aims to execute stratified
random sampling among 63 Section 1 students, 52 Section 2 students, 48 Section 3 students and 37 Section 4
students. The margin or error is 5%.
28. What is the sample size?
Page 3
(a) 124 students (b) 134 students (c) 144 students (d) 154 students
(a) TV station (b) encyclopedias (c) living organisms (d) scientific journals
32. A marketing team specializing in food products set stands in a mall to determine the preference of the mall-goers
in choosing and consuming finger-foods. What sampling technique is appropriate in doing this?
(a) cluster sampling (b) purposive sampling (c) convenience sampling (d) systematic sampling
33. A market research company asks a sample of students to rate the taste of a new soft drink. The response scale
is really yummy, yummy, ok, yuck, really yuck. This is an example of a
(a) Nominal Level (b) Ordinal Leve (c) Interval Level (d) Ratio Level
34. A researcher is studying students in college in PUP. She takes a sample of 400 students from 10 colleges. The
average age of selected college students in PUP is
35. A coffee shop wants to know the temperature of coffee that most people prefer. They brew coffee at the typical
temperature for the shop and then ask customers “Do you prefer coffee to be at this temperature?” and record
a yes or no answer for each customer. What is the level of measurement of the way they measured preferred
temperature?
36. The same coffee shop later repeats the study but this time they ask “Do you prefer coffee to be a lot colder, a
little cooler, this temperature, a little warmer or a lot hotter?” and record the persons response. Now, what is
the level of measurement of the way they measured preferred temperature?
(a) I, II and III (b) I, II, III and IV (c) II, III and IV (d) I, III and IV
38. Given a normally distribution, find the area under the curve which lies to the right of z = 1.96.
Page 4
(a) 0.9750 (b) 0.0196 (c) 0.4750 (d) 0.0250
For items 56 to 60, consider this situation. A researcher has collected the following sample data. 5, 12, 6, 8, 5,
6, 7, 5, 12, 4
39. Find the median.
43. Find the Pearson coefficient of skewness using the value of median.
Problem Solving
A. The PUPCET scores for the math portion of the test were normally distributed, with a mean of 23.4 and a
standard deviation of 4.8. Find the probability that a randomly selected student who took the math portion
of the PUPCET has a score that is
(a) less than 18.
Page 5
(g) D1
(h) D9
(i) P10
(j) P90
(k) Karl Pearsons Measure of Skewness
(l) Kurtosis
C. Construct a frequency distribution table.
(a) What percentage of couples married seven years has two children?
(b) What percentage of couples married seven years has at least two children?
Page 6
Republic of the Philippines
Polytechnic University of the Philippines
Directions: Read each item carefully. Write the letter corresponding to the best answer on a yellow paper on each
item. Write NONE if no correct choice is given. Make sure to write also your solutions.
4. If a researcher conducts a study in which the reading ability of a class of 20 second graders is tested at the
beginning and at the end of the year, the appropriate statistical procedure to analyze the results would be
5. Suppose a researcher is conducting a study in which five groups of adults, each group having a distinct life
situation, are assessed on a measure of stress. The appropriate statistical procedure to compare the groups is
a(n)
6. When the value of x variable increases and the value of y variable also increases. It is known as .
7. If the computed correlation coefficient of two continuous variables is 0.967, then describe the relationship.
(a) Weak Negative and Inverse Relationship
(b) Strong Negative and Inverse Relationship
(c) Strong Positive and Direct Relationship
(d) Weak Positive and Direct Relationship
8. If the computed value for Pearson r is negative, this implies that there is a/an relationship between
variables x and y.
9. You find children who take vitamins have higher health index scores than children who do not take vitamins
(p < 0.05). You have found that these two groups of children are
(a) significantly different
(b) different because of chance
(c) positively correlated
(d) negatively correlated
10. A conclusion in a research on Science Teaching in selected Quezon City high schools states, Most schools are
lack of adequate facilities. Which of the following is a proper recommendation for this conclusion?
(a) School administrators should be pro-active and skillful in acquiring adequate facilities.
(b) School administrators should conduct Science achievement tests that are centralized and uniform
(c) School administrators should hire more competent Science teachers for proper handling of the facilities.
(d) School administrators should work on the revision of the Science curricula so that lessons may adapt with
the facilities.
11. Which of the following is a positive correlation?
(a) Gas mileage decreases as vehicle weight increases
(b) As study time decreases, students achieve lower grades
(c) As levels of self-esteem decline, levels of depression increase
(d) People who exercise regularly are less likely to be obese
12. A friend of mine studies the effects of praise on happiness. She believes that children who receive praise are
happier overall than children who do not receive praise. She measures happiness by counting the number of
times a child smiles in a one hour period. She knows that in the population of children who do not receive praise
smiles average 4 times per hour with a standard deviation of .5, and that these data are normally distributed.
She selects a sample of 100 children whom she knows receive praise and finds that they smile an average of 3.5
times per hour.
An appropriate null hypothesis for this study is:
(a) Children who receive praise smile more than children who do not.
(b) Children who receive praise smile the same amount as children who do not.
(c) Children who receive praise are happier than children who do not.
(d) Children who receive praise do not smile more than children who do not.
13. What is the criterion for rejecting the null hypothesis using p value approach?
(a) If p value is less than or equal to the level of significance retain Ho, otherwise Reject Ho.
(b) If p value is less than or equal to the level of significance reject Ho, otherwise retain Ho.
(c) If p value is greater than or equal to the level of significance reject Ho, otherwise retain Ho.
(d) If p value is greater than or equal to the level of significance retain Ho, otherwise Reject Ho.
14. The alternative hypothesis of Shapiro wilk test is .
Page 2
(a) Equal variances assumed (c) Data follows a Normal Distribution
(b) Equal variances Not assumed (d) Data does not follows a Normal Distribution
15. An inspector needs to learn if customers are getting fewer ounces of a soft drink than the 28 ounces stated on
the label. After she collects data from a sample of bottles, she is going to conduct a test of a hypothesis. She
should use
(a) A two tailed test.
(b) A one tailed test with an alternative to the right.
(c) A one tailed test with an alternative to the left.
(d) Either a one or a two tailed test because they are equivalent.
16. A hypothesis test is done in which the alternative hypothesis is that more than 10% of a population is left-
handed. The computed p value is 0.25. Which statement is correct?
(a) We can conclude that more than 10% of the population is left-handed.
(b) We can conclude that more than 25% of the population is left-handed.
(c) We can conclude that exactly 25% of the population is left-handed.
(d) We cannot conclude that more than 10% of the population is left-handed.
17. If there is a negative correlation between no. of absences students have and grades. What can we conclude
from this research finding?
(a) That being absent leads to lower grades
(b) That students that are absent more often are likely to have lower grades
(c) That low grades leads to people being absent
(d) That this is an illusory correlation
18. It is a procedure on sample evidence and probability, used to test claims regarding a characteristic of one or
more populations.
19. If the computed p-value is 0.0001 and the level of significance is 0.01, what do you think will be the decision
of the researcher?
20. Which of the following statistical test is not used for testing significant difference?
Problem Solving
A. The ACT is a college entrance exam. ACT has determined that a score of 22 on the mathematics portion of
the ACT suggests that a student is ready for college-level mathematics. To achieve this goal, ACT recommends that
students take a core curriculum of math courses: Algebra I, Algebra II, and Geometry. Suppose a random sample
of 200 students who completed this core set of courses results in a mean ACT math score of 22.6 with a standard
deviation of 3.9. Do these results suggest that students who complete the core curriculum are ready for college-level
mathematics? That is, are they scoring above 22 on the math portion of the ACT?
Page 3
1. State the appropriate null and alternative hypotheses.
B. A corporation owns a chain of several hundred gasoline stations on the eastern seaboard. The marketing
director wants to test a proposed marketing campaign by running ads on some local television stations and deter-
mining whether gasoline sales at a sample of the companys stations increase after the advertising. The following
data represent gasoline sales for a day before and a day after the advertising campaign. Determine whether sales
increased significantly after the advertising campaign. Use an alpha of 0.05.
1. Step 1:
2. Step 2:
3. Step 3:
Check the assumptions.
4. Step 4:
5. Step 5:
6. Step 6:
Page 4