You are on page 1of 141

Instructional

Materials in
STAT 20053

STATISTICAL ANALYSIS
WITH SOFTWARE
APPLICATION
For the sole noncommercial use of the
Faculty of the Department of Mathematics and Statistics
Polytechnic University of the Philippines
2020















Contributors:


Elizon, Katrina
Baccay, Edcon
Bautista, Lincoln A.
Aranas, Peter John
Usona, Laurence P.



Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
COLLEGE OF SCIENCE
Department of Mathematics and Statistics

Course Title : STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION


Course Code : STAT 20053
Course Credit : 3 UNITS
Pre-Requisite :
Course Description : This course focuses on conceptual understanding of everyday
statistics, and basic statistical procedures. Topics include basic
concept of statistics, descriptive statistics, inferential statistics
especially on parametric estimation and hypothesis testing, and
illustrated and applied to practical situations. It also gives students
competence in basic computer technology by generating
descriptive statistics and performing statistical analysis using
EXCEL, JASP, and SPSS.
Week Dates Topics and Subtopics
• Definitions and Terminology
• Process of Statistics
Week 1 9/14 – 9/20 • Qualitative and Quantitative
• Discrete and Continuous
• Levels of Measurement
• Data Collection
Week 2 9/21 – 9/27 • Sources of Data
• Experimental and observation study design
Week 3 9/28 – 10/4 • Determining the Sample size
• Basic Sampling Design
Week 4 10/5 – 10/11
• Sources of Error in Sampling
• Textual Presentation
Week 5 10/12 – 10/18 • Tabular Presentation
• Graphical Presentation
• Measures of Central Tendency
Week 6 10/19 – 10/25 • Measures of Relative Position
• Measures of Variation
Week 7 10/26 – 10/31 • Skewness and Kurtosis
• Normal Distribution
Week 8 11/3 – 11/8
• Areas Under Standard Normal Curve
Week 9 11/9 – 11/15 • Procedure for Hypothesis Testing
Week 10 11/16 – 11/22 • Assessing and Testing Normality of the Data
• Inference about Two Means (Dependent and Independent
Week 11 11/23 – 11/27
Sample T – Test)
Week 12 12/1-12/6 • One-Way Analysis of Variance
Week 13 12/7-12/13 • Pearson Product Moment Correlation
Week 14 12/14-12/20 • Chi-Square Test
COURSE GRADING SYSTEM

The final grade will be based on the weighted average of the student’s
scores on each test assigned at the end of each lesson. The final SIS
grade equivalent will be based on the following table according to the
approved University Student Handbook.

Class Standing (CS) = (((Weighted Average of all the Activities) x 50 )+ 50)

Midterm and/or Final Exam (MFE) = (((Weighted Average of the Midterm


and/or Final Tests) x 50)+50)

Final Grade = (70% x CS) + (30% x MFE)

SIS Grade Final Grade Equivalent Description

1.00 97.00-100 Excellent


1.25 94.00-96.99 Excellent
1.50 91.00-93.99 Very Good
1.75 88.00-90.99 Very Good
2.00 85.00-87.99 Good
2.25 82.00-84.99 Good
2.50 79.00-81.99 Satisfactory
2.75 77.00-78.99 Satisfactory
3.00 75.00-76.99 Passing
5.00 65.00-74.99 Failure
INC Incomplete
W Withdrawn

Prepared by:

Katrina D. Elizon
Faculty Member, Department of Mathematics and Statistics
College of Science
Contents

1 Introduction to Statistical Concepts


1.1 Definitions and Terminology……………………………………….. 1
1.2 Process of Statistics ……………………………………………….. 2
1.3 Qualitative and Quantitative Variables……………………………. 4
1.4 Discrete and Continuous Variables ………………………………. 5
1.5 Levels of Measurement…………………………………………….. 6
2 Data Collection and Basic Concepts in Sampling Design
2.1 Data Collection …...…………………………………………………. 8
2.2 Sources of Data …………….………………………………………. 9
2.3 Methods of Collecting Primary and Secondary Data……………. 10
2.4 Sample Size Determination…………..……………………………. 11
2.5 Basic Sampling Design …………..……..…………………………. 14
2.6 Sources of Errors in Sampling…………..…………………………. 20
3 Descriptive Statistics
3.1 Textual Presentation ………………………………………………. 23
3.2 Tabular Presentation ………………………………………………. 25
3.3 Graphical Presentation…………………………………………….. 34
3.1 Measures of Central Tendency……….………………………….... 35
3.2 Measures of Relative Position…………………………………..…. 40
3.3 Measures of Variation or Dispersion………………………..…….. 45
3.5 Karl Pearson’s Measure of Skewness………………………..…… 49
3.8 Percentile Coefficient of Kurtosis………………………………….. 50
3.9 Normal Distribution………………………………………….………. 52
3.10 Areas Under a Standard Normal Curve……………….………… 54
4 Inferential Statistics
4.1 Procedures for Hypothesis Testing……….…………………….... 64
4.2 Assessing and Testing Normality of the Data ………………..…. 66
4.3 Inference about Two Means
(Dependent and Independent Sample T – Test……………..….. 72
4.4 One-Way Analysis of Variance ………………………………...... 79
4.5 Pearson Product Moment Correlation………………………..…… 85
4.6 Chi-Square Test…………………………………………………….. 91
MODULE 1: DEFINITION OF STATISTICS

INTRODUCTION TO THE
Statistics plays a major role in many aspects of our
lives. It is used in sports, for example, to help a
general manager decide which player might be the
STATISTICAL best fit for a team. It is used in politics to help
candidates understand how the public feels about

CONCEPTS various policies. And statistics is used in medicine to


help determine the effectiveness of new drugs. Used
a p p r o p r i a t e l y, s t a t i s t i c s c a n e n h a n c e o u r
understanding of the world around us. Used
Objectives: inappropriately, it can lend support to inaccurate
After successful completion of this beliefs. Understanding statistical methods will
provide you with the ability to analyze and critique
module, you should be able to:
studies and the opportunity to become an informed
consumer of information. Understanding statistical
• Define statistics.
methods will also enable you to distinguish solid
analysis from bogus “facts.”
• Enumerate the importance and
limitations of statistics Many people say that statistics is numbers. After all,
we are bombarded by numbers that supposedly
• Explain the process of statistics represent how we feel and who we are. Certainly,
statistics has a lot to do with numbers, but this
• Know the difference between definition is only partially correct. Statistics is also
descriptive and inferential about where the numbers come from (that is, how
statistics. they were obtained) and how closely the numbers
reflect reality.
• Distinguish between qualitative
Statistics is the science of collecting, organizing,
and quantitative variables.
summarizing, and analyzing information to draw
conclusions or answer questions. In addition,
• Distinguish between discrete and
statistics is about providing a measure of confidence
continuous variables. in any conclusions.

• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on empirical
evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
convert massive data into pertinent Definitions:
information that can be used in decision
• Universe is the set of all entities under
making.
study.
• Statistics can provide us information that we
• A Population is the total or entire group of
can use to make sensible decisions.
individuals or observations from which
What information is referred to in the information is desired by a researcher. Apart
definition? from persons, a population may consist of
mosquitoes, villages, institution, etc.
The information referred to the definition is the
data. According to the Merriam Webster • An individual is a person or object that is a
dictionary, data are “factual information used member of the population being studied.
as a basis for reasoning, discussion, or
• A statistic is a numerical summary of a
calculation”.
sample.
Data can be numerical, as in height, or
• Sample is the subset of the population.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Descriptive statistics consist of organizing
and summarizing data. Descriptive statistics
Field of Statistics
describe data through numerical summaries,
A. Mathematical Statistics- The study and tables, and graphs.
development of statistical theory and methods
• Inferential statistics uses methods that
in the abstract.
take a result from a sample, extend it to the
B. Applied Statistics- The application of population, and measure the reliability of the
statistical methods to solve real problems result.
involving randomly generated data and the
• A parameter is a numerical summary of a
development of new statistical methodology
population
motivated by real problems. Example branches
of Applied Statistics: psychometric, Example: Consider the Scenario.
econometrics, and biostatistics.
You are walking down the street and notice
Limitation of Statistics that a person walking in front of you drops
Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100
qualitative phenomenon. except you. Since you could keep the money
without anyone knowing, would you keep the
2. Statistics does not study individuals. money or return it to the owner?

3. Statistical laws are not exact.


Suppose you wanted to use this scenario as a account for the variability in our results. One
gauge of the morality of students at your goal of inferential statistics is to use statistics
school by determining the percent of students to estimate parameters.
who would return the money. How might you
do this? You could attempt to present the PROCESS OF STATISTICS
scenario to every student at the school, but
1. Identify the research objective.
this would be difficult or impossible if the
student body is large. A second possibility is to A researcher must determine the question(s)
present the scenario to 50 students and use he or she wants answered. The question(s)
the results to make a statement about all the must clearly identify the population that is to be
students at the school. studied. Identify the research objective.

In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generalization contains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential
of his soybean crop. He randomly sampled Statistics)
100 plants and weighted the soybeans on
each plant. 3. Janine wants to determine the variability of
her six exam scores in Algebra.
Population: Entire soybean crop (Descriptive Statistics)

Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.

For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)

1. Haircolor (Qualitative) 3. The distance of a 2005 Toyota Prius can


travel in city conditions with a full tank of
2. Temperature (Quantitative) gas. (Continuous)

3. Stages of breast cancer (Qualitative) 4. Number of words correctly spelled.


(Discrete)
4. Number of hamburger sold (Quantitative)
5. Time of a runner to finish one lap.
5. Number of children (Quantitative)
(Continuous)
6. Zip code (Qualitative)
LEVELS OF MEASUREMENT
7. Place of birth (Qualitative)

8. Degree of pain (Qualitative)

DISTINCTION BETWEEN DISCRETE AND


CONTINUOUS

Quantitative variables may be further classified


into:

1. A discrete variable is a quantitative


variable that either a finite number of
Levels of Measurement
possible values or a countable number of
It is important to know which type of scale is 3. Interval Level - This is a measurement level
represented by your data since different not only classifies and orders the
statistics are appropriate for different scales of measurements, but it also specifies that the
measurement. A characteristic may be distances between each interval on the scale
measured using nominal, ordinal, interval and are equivalent along the scale from low interval
ration scales. to high interval. A value of zero does not mean
the absence of the quantity. Arithmetic
1. Nominal Level - They are sometimes operations such as addition and subtraction
called categorical scales or categorical can be performed on values of the variable.
data. Such a scale classifies persons or
objects into two or more categories. Example:
Whatever the basis for classification, a
person can only be in one category, and - Te m p e r a t u r e o n F a h r e n h e i t / C e l s i u s
Thermometer
members of a given category have a
common set of characteristics. - Trait anxiety (e.g., high anxious vs. low
anxious)
Example:
- IQ (e.g., high IQ vs. average IQ vs. low IQ)
- Method of payment (cash, check, debit card,
credit card) 4. Ratio Level - A ratio scale represents the
highest, most precise, level of measurement. It
- Type of school (public vs. private) has the properties of the interval level of
- Eye Color (Blue, Green, Brown) measurement and the ratios of the values of
the variable have meaning. A value of zero
2. Ordinal Level - This involves data that may means the absence of the quantity. Arithmetic
be arranged in some order, but differences operations such as multiplication and division
between data values either cannot be can be performed on the values of the
determined or meaningless. An ordinal scale variable.
not only classifies subjects but also ranks them
in terms of the degree to which they possess a Example:
characteristics of interest. In other words, an
- Height and weight
ordinal scale puts the subjects in order from
highest to lowest, from most to least. Although - Time
ordinal scales indicate that some subjects are
higher, or lower than others, they do not
- Time until death
indicate how much higher or how much better. Operations that make sense for variables of
different scales.
Example:

- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Both interval and ratio data involve B. ______________________________
measurement. Most data analysis techniques
that apply to ratio data also apply to interval 2. Every year the PSA releases the Current
data..Therefore, in most practical aspects, Population Report based on a survey of
these types of data (interval and ratio) are 50,000 households. The goal of this report
grouped under metric data. In some other is to learn the demographic characteristics,
instances, these type of data are also known such as income, of all households within
as numerical discrete and numerical the Philippines.
continuous.
A. ______________________________
Example:
B. ______________________________
Categorize each of the following as nominal,
ordinal, interval or ratio measurement. 3. Researchers want to determine whether or
not higher folate intake is associated with a
1. Ranking of college athletic teams.
lower risk of hypertension (high blood
(Ordinal)
pressure) in women (27 to 44 years of
2. Employee number. (Nominal) age). To make this determination, they look
at 7373 cases of hypertension in these
3. Number of vehicles registered. (Ratio) women and find that those who consume
at least 1000 micrograms per day of total
4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension
compared with those who consume less
5. Number of car passers along C5 on a
than 200.
given day. (Ratio)
A. ______________________________
6. Zip code (Nominal)
B. ______________________________
7. Degree of pain (Ordinal)
II. Indicate whether the following statements
ACTIVITIES/ASSESSMENTS:
require the use of descriptive or inferential
Read each item carefully. Write the answer statistics.
on the yellow paper. Answers Only.
______________1. A teacher wants to know
I. A research objective is presented. For the attitudes of all students towards abortion.
each, identify the (A) population and (B)
______________2. A market analyst of a sales
sample in the study.
firm draws a chart showing the sales figures of
8. A polling organization contacts 2141 male a given product for the period 2006-2007.
university graduates who have a white-
______________3. A forecaster predicts the
collar job and asks whether or not they had
results of an election using the number of
received a raise at work during the past 4
votes cast in 15 out of 25 barangays.
months.
______________4. Men are better in math
A. ______________________________
than women.
_____________5. Forty percent of the ______________10. Brands of soft drinks
employees of an organization were recorded
tardy for at least 15 working days. ______________11. Socioeconomic status

______________6. There are very few ______________12. Status Employment


gender-related occupations.
______________13. Number of missing teeth
____________ 7. An account predicts
______________14. Number of vehicles
accuracy rate of a client’s financial resources.
registered
______________ 8. A quality control manager
______________15. Jersey Number
wishes to check production output.
______________16. Number of employees
______________ 9. Records indicated that
collecting retirement
75% of the faculty in the graduate school are
benefits from GSIS
doctoral degree holders.
______________17. Duration of a seizure
______________ 10. There is no relationship
between educational qualification of parents ______________18. Cause of death
and academic achievement of their children.
______________19. Dividends
III. Identify the qualitative and quantitative
variables and indicate the highest level of ______________20. Current assets list
measurement required in each. If
quantitative, classify whether discrete or ______________21. Number of heart attacks
continuous.
______________22. Account receivable

______________1. Occupation
______________23. Clothing size
______________2. Number of government
officials ______________24. Blood type

______________3. Favorite color ______________25. Ethnic group

______________4. Temperature in Celsius REFERENCES:


degrees
Statistics. Informed Decision using Data by
______________5. Type of school Michael Sullivan, III,. Fifth Edition

______________6. Volume of mineral water Sampling: Design and Analysis by Sharon L.


sold daily Lhr. Second Edition

______________7. Employee number

______________8. Civil status

______________9. Equity accounts


MODULE 2: DATA COLLECTION

DATA COLLECTION Everybody collects, interprets and uses information,


much of it in numerical or statistical forms in day-to-

AND BASIC Concepts


day life. It is a common practice that people receive
large quantities of information everyday through
conversations, televisions, computers, the radios,
in Sampling DESIGN newspapers, posters, notices and instructions. It is
just because there is so much information available
that people need to be able to absorb, select and
reject it. In everyday life, in business and industry,
Objectives: certain statistical information is necessary and it is
After successful completion of this independent to know where to find it how to collect it.
module, you should be able to:
Analysis of data can lead to powerful results. Data
can be used to offset anecdotal claims, such as the
• Determine the sources of data
suggestion that cellular telephones cause brain
(primary and secondary data).
cancer. Anecdotal means that the information being
conveyed is based on casual observation, not
• Distinguish the different methods
scientific research. Because data are powerful, they
data collection under primary and can be dangerous when misused. The misuse of
secondary data. data usually occurs when data are incorrectly
obtained or analyzed. For example, radio or
• Determine the appropriate television talk shows regularly ask poll questions for
sample size.
which respondents must call in or use the Internet to
supply their vote. Most likely, the individuals who are
• Differentiate various sampling
going to call in are those who have a strong opinion
techniques.
about the topic. This group is not likely to be
representative of people in general, so the results of
• Know the sources of errors in
the poll are not meaningful. Whenever we look at
sampling.
data, we should be mindful of where the data come
from.

Even when data tell us that a relation exists, we


need to investigate. For example, a study showed
that breast-fed children have higher IQs than those
who were not breast-fed. Does this study mean that
a mother who breast-feeds her child will increase the
child’s IQ? Not necessarily. It may be that some
factor other than breast-feeding contributes to the IQ
of the children. In this case, it turns out that mothers
who breastfeed generally have higher IQs than
those who do not. Therefore, it may be genetics that
leads to the higher IQ, not breast-feeding.
Data collection is the process of gathering 3. Determine the method to be used in data
and measuring information on variables of gathering and define the comprehensive
interest, in an established systematic fashion data collection points.
that enables one to answer stated research
questions, test hypotheses, and evaluate 4. Design data gathering forms to be used.
outcomes.
5. Collect data.
Without proper planning for data collection, a
Choosing of Method of Data Collection
number of problems can occur. If the data
collection steps and processes are not Decision-makers need information that is
properly planned, the research project can relevant, timely, accurate and usable. The cost
ultimately end up with a data set that does not of obtaining, processing and analyzing these
serve the purpose for which it was intended. data is high. The challenge is to find ways,
For example, if more than one person is which lead to information that is cost-effective,
involved in the data collection, but data relevant, timely and important for immediate
collectors do not follow consistent data use. Some methods pay attention to timeliness
collection practices, they can end up with data and reduction in cost. Others pay attention to
with different units, collection processes, and accuracy and the strength of the method in
variable names. using scientific.
Consequences from Improperly Collected The statistical data may be classified under
Data two categories, depending upon the sources.
approaches: Primary Data and Secondary
• Inability to answer research questions
Data.
accurately.
SOURCES OF DATA
• Inability to repeat and validate the study.
Whether conducting research in the social
• Distorted findings resulting in wasted
sciences, humanities arts, or natural sciences,
resources.
the ability to distinguish between primary and
• Misleading other researchers to pursue secondary sources is essential.
fruitless avenues of investigation.
Primary Sources - Provide a first-hand
• Compromising decisions for public policy. account of an event or time period and are
considered to be authoritative. They
• Causing harm to human participants and represent original thinking, reports on
animal subjects. discoveries or events, or they can share new
information. Often these sources are created
Steps in Data Gathering at the time the events occurred but they can
also include sources that are created later.
1. Set the objectives for collecting data
They are usually the first formal appearance
2. Determine the data needed based on the of original research.
set objectives.
Primary Data - are data documented by the agency may have been different from the
primary source. The data collectors purpose of the user of these secondary data.
documented the data themselves. Secondly, there may have been bias
introduced, the size of the sample may have
The first hand information obtained by the been inadequate, or there may have been
investigator is more reliable and accurate since arithmetic or definition errors, hence, it is
the investigator can extract the correct necessary to critically investigate the validity of
information by removing doubts, if any, in the the secondary data.
minds of the respondents regarding certain
questions. High response rates might be The primary data can be collected by the
obtained since the answers to various following five methods:
questions are obtained on the spot. It permits
1. Direct personal interviews - The
explanation of questions concerning difficult
researcher has direct contact with the
subject matter.
interviewee. The researcher gathers
Secondary Sources - offer an analysis, information by asking questions to the
interpretation or a restatement of primary interviewee.
sources and are considered to be
2. Indirect/Questionnaire Method - This
persuasive. They often involve
methods of data collection involve sourcing
generalisation, synthesis, interpretation,
and accessing existing data that were
commentary or evaluation in an attempt to
originally collected for the purpose of the study.
convince the reader of the creator's
argument. They often attempt to describe or Designing good “questioning tools” forms an
explain primary sources. important and time consuming phase in the
development of most research proposals.
Secondary Data - are data documented by a
Once the decision has been made to use
secondary source. The data collectors had the
these techniques, the following questions
data documented by other sources.
should be considered before designing our
In secondary data, data are primary data for tools:
the agency that collected them, and become
secondary for someone else who uses these
• What exactly do we want to know, according
to the objectives and variables we identified
data for his own purposes.
earlier? Is questioning the right technique to
Secondary data are less expensive to collect obtain all answers, or do we need additional
both in money and time. These data can also techniques, such as observations or
be better utilized and sometimes the quality of analysis of records?
such data may be better because these might
have been collected by persons who were
• Of whom will we ask questions and what
techniques will we use? Do we understand
specially trained for that purpose.
the topic sufficiently to design a
On the other hand, such data must be used questionnaire, or do we need some loosely
with great care, because such data may also structured interviews with key informants or
be full of errors due to the fact that the purpose a focus group discussion first to orient
of the collection of the data by the primary ourselves?
• Are our informants mainly literate or Example:
illiterate? If illiterate, the use of self-
administered questionnaires is not an - Can you describe exactly what the
option. traditional birth attendant did when your
labor started?
• How large is the sample that will be
interviewed? Studies with many respondents - What do you think are the reasons for a high
often use shorter, highly structured drop-out rate of village health committee
questionnaires, whereas smaller studies members?
allow more flexibility and may use
A closed-ended question is a type of
questionnaires with a number of open-ended
question that includes a list of response
questions.
categories from which the respondent will
Key Design Principles of a Good select his answer. It is useful if the range of
Questionnaire possible responses is known. This type of
question is usually appropriate for collecting
1. Keep the questionnaire as short as possible. objective data.

2. Decide on the type of questionnaire (Open Example:


Ended or Closed Ended).
Did you eat any of the following foods
3. Write the questions properly. yesterday?
4. Order the questions appropriately.
• Fish or meat Yes No
5. Avoid questions that prompt or motivate the
• Eggs. Yes No
respondent to say what you would like to hear.
• Milk or cheese Yes No
6. Write an introductory letter or an
introduction. Take Note!

7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.

8. Translate the questions if necessary. Example:

9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.

An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
Second survey asked: “A recent television - Unrealistic Controlled Environments
show examined various theories about Elvis
- Inability to Control for All Variables
Presley’s death. Do you think it is possible that
Elvis is alive or not?” 5. Observation is a technique that involves
systematically selecting, watching and
8% of the respondents to the first question said
recoding behaviors of people or other
it is possible that Elvis is still alive and 16% of
phenomena and aspects of the setting in which
respondents to the second question said it is
they occur, for the purpose of getting (gaining)
possible that Elvis is still alive.
specified information. It includes all methods
3. A focus group is a group interview of from simple visual observations to the use of
approximately six to twelve people who share high level machines and measurements,
similar characteristics or common interests. A sophisticated equipment or facilities such as:
facilitator guides the group based on a
- Radiographic
predetermined set of topics.
- biochemical
4. Experiment is a method of collecting data
where there is direct human intervention on the - X-ray machines
conditions that may affect the values of the
- Microscope
variable of interest.
- Clinical examinations
Bear in mind that the experimental method has
several limitations that you should be aware of. - Microbiological examinations

- Ethical, moral, and legal Concerns


It gives relatively more accurate data on size can produce accuracy of results.
behavior and activities but Investigators or Moreover, the results from the small sample
observer’s own biases, prejudice, desires, and size will be questionable. A sample size that is
etc. and needs more resources and skilled too large will result in wasting money and time
human power during the use of high level because enough sample will normally give an
machines. accurate result.

The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.

• Do not use inappropriate data for your


research.

• The choice of methods of data collection is


largely based on the accuracy of the
information they yield.

SAMPLE SIZE

“How many participants should be chosen for a


survey”?

One of the most frequent problems in


Representative Sample
statistical analysis is the determination of the
appropriate sample size. One may ask why
sample size is so important. The answer to this
is that an appropriate sample size is required
for validity. If the sample size it too small, it will
not yield valid results. An appropriate sample
Desired Confidence
Z - Score
Level
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58

3. Degree of Variability

Depending upon the target population and


attributes under consideration, the degree of
variability varies considerably. The more
Choosing of sample size depends on non- heterogeneous a population is, the larger the
statistical considerations and statistical sample size is required to get an optimum level
considerations. of precision.

Methods in Determining the Sample Size


• Non-statistical considerations – It may
include availability of resources, man power,
• Estimating the Mean or Average
budget, ethics and sampling frame.
The sample size required to estimate the
• Statistical considerations – It will include population mean µ to with a level of confidence
the desired precision of the estimate. with specified margin of error e, given by
2

( e )
Three criteria need to be specified to

determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.

2. Confidence Interval e is the level of precision.

It is statistical measure of the number of times Take Note:


out of 100 that results can be expected to be
within a specified range. For example, a If When σ is unknown, it is common practice to
confidence interval of 90% means that results conduct a preliminary survey to determine s
of an action will probably meet expectations and use it as an estimate of σ or use results
90% of the time. from previous studies to obtain an estimate of
σ. When using this approach, the size of the
To find the right z – score to use, refer to the sample should be at least 30. The formula for
table: the sample standard deviation s is
which we know only after we have taken the
∑ (x − x̄)2
s=
sample.
n−1
There are two ways to solve this dilemma:
Example:
1. We could determine a preliminary value for
A soft drink machine is regulated so that the p based on a pilot study or an earlier study.
amount of drink dispensed is approximately
normally distributed with a standard deviation Example:
equal to 0.5 ounce. Determine the sample size
needed if we wish to be 95% confident that our If last month 37% of all voters thought that
sample mean will be within 0.03 ounce from state taxes are too high, then it is likely that the
the true mean. proportion with that opinion this month will not
be dramatically different, and we would use the
Solution: The z – score for confidence level value 0.37 for p in the formula.
95% in the z – table is 1.96.
2. Simply to replace p in the formula by 0.5.
2
1.96(0.5)
( 0.03 )
n≥ = 1067.11 When p = 0.5, the maximum value of
p(1- p)=0.25. This is called the most
conservative estimate, since it gives the
We need a 1068 sample for our study. largest possible estimate of n.

• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
The sample size required to obtain a 2
1 Z
4 (e)
confidence interval for p with specified margin n≥ ≈ 385
of error e is given by

2 Where:

(e)
Z
n≥ p(1 − p)
Confidence level is 95%.

Where: The level of precision is 0.05.

Z is the z-score corresponding to level of Example:


confidence.
Suppose we are doing a study on the
e is the level of precision. inhabitants of a large town, and want to find
out how many households serve breakfast in
P is population proportion. the mornings. We don’t have much information
on the subject to begin with, so we’re going to
There is a dilemma in this formula:
assume that half of the families serve
It dependents on breakfast: this gives us maximum variability.
x
p= So p = 0.5. We want 99% confidence and at
N least 1% precision.
Solution: The z – score for confidence level Where:
99% in the z – table is 2.58.
no is Cochran’s sample size recommendation.
2
2.58
( 0.01 )
n≥ 0.5(1 − 0.5) = 16,641 N is the population size.

This is the link for online calculator of sample


We need a 16,641 sample for our study. size:

• Slovin’s Formula https://select-statistics.co.uk/calculators/


sample-size-calculator-population-proportion/
Slovin’s formula is used to calculate the
sample size n given the population size and https://www.calculator.net/sample-size-
error. It is computed as calculator.html

N
n≥
1 + Ne 2

Where:

N is the total population.

e is the level of precision.

Example:

A researcher plans to conduct a survey about


food preference of BS Stat students. If the
population of students is 1000, find the sample BASIC SAMPLING DESIGN
size if the error is 5%.
The goal in sampling is to obtain individuals for
Solution: a study in such a way that accurate information
1000 about the population can be obtained.
n≥ = 285.71
1 + 1000(0.05)2 Reason for Sampling

The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥
n −1
Some definitions are needed to make the
1+ o notion of a good sample more precise.
N
Definitions: - Deliberately or purposively selecting a
“representative” sample.

• Observation unit - An object on which a Misspecifying the target population. 

measurement is taken. This is the basic unit Failing to include all of the target population
of observation, sometimes called an element. in the sampling frame, called
In studying human populations, observation undercoverage.

units are often individuals. Including population units in the sampling
frame that are not in the target population,
• Target population - The complete collection
called overcoverage.
of observations we want to study.
- Having multiplicity of listings in the sampling
• Sampled population - The collection of all
frame.

possible observation units that might have
Substituting a convenient member of a
been chosen in a sample; the population
population for a designated member who is
from which the sample was taken.
not readily available.
• Sample - A subset of a population.
- Failing to obtain responses from all of the
• Sampling unit - A unit that can be selected chosen sample. (Nonresponse)
for a sample. We may want to study
- Allowing the sample to consist entirely of
individuals, but do not have a list of all
volunteers.
individuals in the target population. Instead,
households serve as the sampling units, and Advantage of Sampling Over Complete
the observation units are the individuals Enumeration
living in the households.
- Less Labor
• Sampling frame - A list, map, or other
specification of sampling units in the - Reduced Cost
population from which a sample may be - Greater Speed
selected. For a survey using in-person
interviews, the sampling frame might be a list - Greater Scope
of all street addresses.
- Greater Efficiency and Accuracy
• Sampling technique/Sampling Strategies - - Convenience
It is a plan you set forth to be sure that the
sample you use in your research study - Ethical Considerations
represents the population from which you
Two Type of Samples
drew your sample.
1. Probability Sample
• Sampling Bias - This involves problems in
your sampling, which reveals that your - Samples are obtained using some objective
sample is not representative of your chance mechanism, thus involving
population. randomization.
The following examples indicate some ways in
which selection bias can occur:
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.

- They are generally referred to as random - Results to a simple random sample.


samples.
Advantage: It is very simple and easy to use.
- They allow drawing of valid generalizations
about the universe/population. Disadvantage: The sample chosen may be
distributed over a wide geographic area.
2. Non - probability Sample
When to use: This is preferable to use if the
- Samples are obtained haphazardly, selected population is not widely spread geographically.
purposively or are taken as volunteers. Also, this is more appropriate to use if the
population is more or less homogenous with
- The probabilities of selection are unknown. respect to the characteristics of the population.
- They should not be used for statistical
inference.

Sampling Procedure

- Identify the population.


- Determine if population is accessible.
- Select a sampling method.
- Choose a sample that is representative of
the population.

- Ask the question, can I generalize to the Simple Random Sampling


general population from the accessible
population?

Sampling technique can be grouped into how


• Systematic Random Sampling
selections of items are made such as
probability sampling and non-probability - It is obtained by selecting every kth
sampling. individual from the population.

Basic Sampling Technique of Probability - The first individual selected corresponds to a


Sampling random number between 1 to k.

• Simple Random Sampling


Obtaining a Systematic Random Sample When to use: This is advisable to us if the
ordering of the population is essentially
1. Decide on a method of assigning a unique random and when stratification with numerous
serial number, from 1 to N, to each one of data is used.
the elements in the population.

2. Compute for the sampling interval

N PopulationSize
k= =
n SampleSize

3. Select a number, from 1 to k, using a


randomization mechanism. The element in
the population assigned to this number is
the first element of the sample. The other
elements of the sample are those assigned
to the numbers and so on until you get a
sample of size.
Systematic Random Sampling
Example:
• Stratified Random Sampling
We want to select a sample of 50 students
- It is obtained by separating the population
from 500 students under this method kth item
into non-overlapping groups called strata
and picked up from the sampling frame.
and then obtaining a simple random sample
Solution: from each stratum.
500
k= = 10 - The individuals within each stratum should
50 be homogeneous (or similar) in some way.
We start to get a sample starting form i and for
every kth unit subsequently. Suppose the Example:
random number i is 6, then we select 15, 25,
A sample of 50 students is to be drawn from a
35, 45, .. .
population consisting of 500 students
Advantage: Drawing of the sample is easy. It belonging to two institutions A and B. The
is easy to administer in the field, and the number of students in the institution A is 200
sample is spread evenly over the population. and the institution B is 300. How will you draw
the sample using proportional allocation?
Disadvantage: May give poor precision when
unsuspected periodicity is present in the
population.

When to use: This is advisable to us if the


ordering of the population is essentially
random and when stratification with numerous
data is used.
Solution:

There are two strata in this case.

Given:

N1 = 200 N2 = 300 N = 500 n = 50

50
(N) ( 500 )
n
n1 = N1 = 200 = 20

50
(N) ( 500 )
n
n2 = N2 = 300 = 30

The sample sizes are 20 from A and 30 from


B. Then the units from each institution are to
be selected by simple random sampling.

Advantage: Stratification of respondents is


advantageous in terms of precision of the
estimates of the characteristics of the
population. Sampling designs may vary by
stratum to adjust for the differences in the
conditions across strata. It is easy to use as a
random sampling design.

Disadvantage: Values of the stratification


variable may not be easily available for all
units in the population especially if the
characteristic of interest is homogeneous. It is
possible that there are not representative in Stratified Random Sampling
one or two strata. Also, transportation costs
can be high if the population covers a wide
geographic area.
• Cluster Sampling
When to use: If the population is such that the
- You take the sample from naturally occurring
distribution of the characteristics of the
groups in your population.
respondents under consideration concentrated
in small and spread segment of the population. - The clusters are constructed such that the
Thus, this is preferred to use if precise sampling units are heterogeneous within the
estimates are desired for stratified parts of the cluster and homogeneous among the
population and if sampling problems differ in clusters.
the various strata of the population.
Obtaining a Cluster Sample When to use: If the population can be
grouped into clusters where individual
1. Divide the population into non-overlapping population elements are known to be different
clusters. with respect to the characteristics under study,
this preferable to use.
2. Number the clusters in the population from 1
to N.

3. Select n distinct numbers from 1 to N using


a randomization mechanism. The selected
clusters are the clusters associated with the
selected numbers.

4. The sample will consist of all the elements in


the selected clusters.

Example:

A researcher wants to survey academic


performance of high school students in Cluster Sampling
MIMAROPA.
• Multi - Stage Sampling
1. He/She can divide the entire population into
different clusters.
- Selection of the sample is done in two or
more steps or stages, with sampling units
2. Then the researcher selects a number of varying in each stage.
clusters depending on his research through
- The population is first divided into a number
simple or systematic random sampling.
of first-stage sampling units from which a
3. Then, from the selected clusters the sample is drawn. Smaller units, called the
researcher can either include all the high secondary sampling units, comprising the
school students as subject or he can select a selected first-stage units then serve as the
number of subjects from each cluster through sampling units for the next stage. If needed
simple or systematic random sampling. additional stages may be added until the
units of observation for the survey are
Advantage: There is no need to come out with clearly identified. The units comprising the
a list of units in the population; all what is samples selected from the previous stage
needed is simply a list of the clusters. It is also constitute the frame for the stages.
less costly since the elements are physically
closer together. Obtaining a Multi-Stage Sampling

Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.

2. Select a sampling technique for each


3. Systematically apply the sampling
technique to each stage until the unit of
analysis has been selected.

Example:

Suppose we wish to study the expenditure


patterns of households in NCR. We can select
a sample of households for this study using
simple three-stage sampling.

- First, divide into smaller cities/municipalities


and a random sample of these cities/
municipalities is collected. Multi-Stage Sampling

- Second, a random sample of smaller areas


such as barangays is taken from within each
of the cities/municipalities chosen in the first Basic Sampling Technique of Non-
stage. Probability Sampling

- Third, a random sample of even smaller • Accidental Sampling - There is no system


areas such as households is taken from of selection but only those whom the
within each of the areas chosen in the researcher or interviewer meets by chance.
second stage.
• Quota Sampling - There is specified
Advantage: It is easier to generate adequate number of persons of certain types is
sampling frames. Transportation costs are included in the sample. The researcher is
greatly reduced since there is some form of aware of categories within the population
clustering among the ultimate or final samples; and draws samples from each category. The
i.e., they are in the sample lower-stage units. size of each categorical sample is
proportional to the proportion of the
Disadvantage: Its complexity in theory may be population that belongs in that category.
difficult to apply in the field. Estimation
procedures may be difficult for non-statisticians • Convenience Sampling - It is a process of
to follow. picking out people in the most convenient
and fastest way to get reactions
When to use: If no population list is available immediately. This method can be done by
and if the population covers a wide area. telephone interview to get the immediate
reactions of a certain group of sample for a
Take Note! certain issue.

Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
• Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS:
accordance with an expert’s judgment.
I. Determine if the source would be a primary
Cases wherein Non-Probability Sampling is or a secondary source.
Useful
______________1. Government Records
- Only few are willing to be interviewed
______________2. Dictionary
- Extreme difficulties in locating or identifying
subjects ______________3. Artifact

- Probability sampling is more expensive to ______________4. A TV show explaining what


implement happened in Philippines.

- Cannot enumerate the population elements. ______________5. Autobiography about


Rodrigo Duterte.
Sources of Errors in Sampling
______________6. Enrile diary describing
1. Non-sampling Error what he thought about the
world war II.
- Errors that result from the survey process.
______________7. Audio and video
- Any errors that cannot be attributed to the recordings
sample-to-sample variability.
______________8. Speeches
Sources of Non-Sampling Error
______________9. Newspaper
1. Non-responses
______________10. Review Articles
2. Interviewer Error
II. Determine the sample size of the following
3. Misrepresented Answers problems. Show your solution.

4. Data entry errors 1. A dermatologist wishes to estimate the


proportion of young adults who apply
5. Questionnaire Design
sunscreen regularly before going out in the
6. Wording of Questions sun in the summer. Find the minimum
sample size required to estimate the
7. Selection Bias proportion with precision of 3%, and 90%
confidence.
2. Sampling Error
2. The administration at a college wishes to
- Error that results from taking one sample estimate, the proportion of all its entering
instead of examining the whole population. freshmen who graduate within four years,
with 95% confidence. Estimate the
- Error that results from using sampling to
minimum size sample required. Assume
estimate information regarding a population.
1. that the population standard deviation is σ completed and returned at the end of the
= 1.3 and precision level is 0.05. program.

2. A government agency wishes to estimate ______________4. 24 Hour Fitness wants to


the proportion of drivers aged 16–24 who administer a satisfaction survey to its current
have been involved in a traffic accident in members. Using its membership roster, the
the last year. It wishes to make the club randomly selects 40 club members and
estimate to within 1% error and at 90% asks them about their level of satisfaction with
confidence. Find the minimum sample size the club.
required, using the information that several ______________5. A radio station asks its
years ago the proportion was 0.12. listeners to call in their opinion regarding the
use of U.S. forces in peacekeeping missions.
3. An internet service provider wishes to
estimate, to within one percentage error, ______________6. A tax auditor selects every
the current proportion of all email that is 1000th income tax return that is received.
spam, with 85% confidence. Last year the ______________7. For a survey, a sample of
proportion that was spam was 71%. municipalities was selected from every
Estimate the minimum size sample province in the country and included all child
required if the total email that is spam is
laborers in the selected municipalities.
10,000.
______________8. To determine his DSL
III. Determine the type of sampling. (ex. Internet connection speed, Shawn divides up
Simple Random Sampling, Purposive the day into four parts: morning, midday,
Sampling) evening, and late night. He then measures his
Internet connection speed at 5 randomly
______________1. To determine customer selected times during each part of the day.
opinion of its boarding policy, Southwest
Airlines randomly selects 60 flights during a ______________9. A college official divides
certain week and surveys all passengers on the student population into five classes:
the flights. freshman, sophomore, junior, senior, and
graduate student. The official takes a simple
______________2. A member of Congress random sample from each class and asks the
wishes to determine her constituency’s opinion members opinions regarding student services.
regarding estate taxes. She divides her
______________10. In the game of lotto, 6
constituency into three income classes: low-
balls are selected from a container with 42
income households, middle-income
balls.
households, and upper-income households.
She then takes a simple random sample of IV. Using proportional allocation, determine
households from each income class. the sample size needed for every school.
The total population of students is 10,679,
______________3. The presider of a guest-
and the minimum sample is 2,450.
lecture series at a university stands outside the
auditorium before a lecture begins and hands
every fifth person who arrives, beginning with
the third, a speaker evaluation survey to be
Population
School Sample
per School
Antipolo National
3,360
High School
Bagong Nayon
National 2,540
High School
Dela Paz National
2,122
High School
Sta. Cruz National
1,290
High School
Tubigan National
1,367
High School
Total 10,679

REFERENCES:

Statistics. Informed Decision using Data by


Michael Sullivan, III,. Fifth Edition
Sampling: Design and Analysis by Sharon L.
Lhr. Second Edition
http://www.economicsdiscussion.net/statistics/
sampling/advantages-of-sampling-over-
completeenumeration-in-statistics/11980
h t t p : / / w w w. n a t c o 1 . o r g / r e s e a r c
h / fi l e s /SamplingStrategies.pdf

https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
✦ Distinguish the three main forms of data presentation.
✦ Know the different parts of the table.
✦ Choose appropriate diagrams/graphs to present a given set of
data.
✦ Organize qualitative and quantitative data in tables.
✦ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
✦ Describe the shape of a distribution.
✦ Identify regions under the normal curve corresponding to
different standard normal values.
✦ Compute probabilities using the standard normal table and Excel.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Data Presentation
Data are usually collected in a raw format and thus
the inherent information is difficult to understand.
Therefore, raw data need to be summarized,
processed, and analyzed to usefully derive
information from them. However, no matter how well
manipulated, the information derived from the raw
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
and readers. Planning how the data will be presented
is essential before appropriately processing raw data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Presentation of Data
Presentation of data refers to an exhibition
or putting up data in an attractive and useful
manner such that it can be easily interpreted.
The three main forms of presentation of data
are:
Textual Presentation
Tabular Presentation
Graphical Presentation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Textual Presentation
• All the data is presented in the form of text,
phrases, or paragraphs.
• It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.
• Text is the principal method for explaining
findings, outlining trends, and providing
contextual information.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
A researcher is asked to present the performance of a section in
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
The data presented in textual form would be like this:
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above,
while only 3 got 19 and below. Generally, the students
performed well in the test with 23 or 70% getting a passing
score of 38 and above.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Advantage of Textual Presentation


✦ The data would be more interpreted.
✦ Can help in emphasizing some important points
in data.
✦ Small sets of data can be easily presented.

Remember!
✦ Keep your paragraphs simple and short.

✦ Always make sure that the readers are provided


with additional explanations about the relevance
of the figures and its implications.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Tabular Presentation:
• It is a systematic and logical arrangement of
data in the form of Rows and Columns with
respect to the characteristics of data.
• A table is best suited for representing individual
information and represents both quantitative
and qualitative information.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Advantage of Tabular
Presentation
✦ More information may be presented.
✦ Exact values can be read from a table to
retain precision.
✦ Flexibility is maintained without
distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Preparing Tables
The making of a compact table itself is an art. This should
contain all the information needed within the smallest possible
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the
table. It should answer the questions:
✦ Who? White females with breast cancer, black males with

lung cancer.
✦ What are the data? Counts, percentage distributions, rates.

✦ Where are the data from? Example: One hospital, or the

entire population covered by your registry.


✦ When? A particular year, time period.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
B. Boxhead: The boxhead contains the captions or
column headings. The heading of each column
should contain as few words as possible, yet
explain exactly what the data in the columns
represent.

C. Stubs: The row captions are known as the stub.


Items in the stub should be grouped to facilitate
interpretation of the data. For example, rows may
stand for score of classes and columns for data
related to sex of students. In the process, there will
be many rows for scores classes but only two
columns for male and female students.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

D. Footnotes: Footnotes are given at the foot of the


table for explanation of any fact or information
included in the table which needs some explanation.
Thus, they are meant for explaining or providing
further details about the data that have not been
covered in title, captions and stubs.
E. Sources of Data: We should also mention the source
of information from which data are taken. This may
preferably include the name of the author, volume,
page and the year of publication. This should also
state whether the data contained in the table is of
‘primary or secondary’ nature.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Parts of the Table

https://byjus.com/commerce/tabular-presentation-of-data/

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Construction of Data Tables
✦ The title should be in accordance with the
objective of study
✦ Comparison
✦ Alternative location of stubs
✦ Headings
✦ Footnote
✦ Size of columns
✦ Use of abbreviations
✦ Units
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example: Simple or One – Way Table


Optionally, the table may also include totals or
percentages.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example: Compound Table


A compound table is just an extension of a simple in which
there are more than one variable distributed among its
attributes (subvariable). An attribute is just a quality, property
or component of a variable according to which it can be
differentiated with respect to other variables.
We may refer to a compound table as a cross tabulation or
even to a contingency table depending on the context in which
it is used.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Organize Quantitative Variable in Table
Classes are categories into which data are grouped. When a
data set consists of a large number of different discrete data
values or when a data set consists of continuous data, we create
classes by using intervals of numbers.
Make sure that the classes do not overlap. This is necessary to
avoid confusion as to which class a data value belongs. Also,
make sure that the class widths are equal for all classes.
Upper Class
Lower Class Limit (LC) Limit (UC)
Number
The class width is the Age
(in thousands)
difference between 25 - 34 14,482
consecutive lower class 35 - 44 14,156
45 - 54 13,801
limits.
55 - 64 12,123
Polytechnic University of the Philippines
College of Science
65 - 74 7,010
Department of Mathematics and Statistics

One exception to the requirement of Scores Frequency


equal class widths occurs in open- 10 - 19 25
ended tables. A table is open ended if 20 - 29 36
the first class has no lower class limit 30 - 39 40
or the last class has no upper class 40 and over 12
limit.
Guidelines for Determining the Lower Class Limit of the First
Class and Class Width
Choosing the Lower Class Limit of the First Class:
Choose the smallest observation in the data set or a
convenient number slightly lower than the smallest
observation in the data set.
For example, the smallest observation is 10.2. A convenient
lower class limit of the first class is 10.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Guidelines for Determining the Lower Class Limit of the First


Class and Class Width
Determining the Class Width:
• Decide on the number of classes. Generally, there should be
between 5 and 20 classes. The smaller the data set, the fewer
classes you should have.
• Determine the class width by computing: x − xmin
cw is the class width cw = max
nc
nc is the number of classes
Round this value up to a convenient number.
Remember!
Creating the classes for summarizing continuous data is an art
form. There is no such thing as the correct frequency distribution.
However, there can be less desirable frequency distributions. The
larger the class width, the fewer classes a frequency distribution
will have.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
How to Construct Frequency
Distribution Table?
A frequency distribution list each
category of data and the number of
occurrences for each category of data.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example: Use the “Sample Data file”.

Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Final Output

Table 1 shows the frequency and percentage distribution of


the respondents in terms of sex. It can be gleaned from the
table that, out of 128 respondents considered in the study,
65 or 50.8% are male and 63 or 49.2% are female.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example: Use the “Sample Data file”.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
1. Set an interval or range for your data. It is
needed for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click
“DATA ANALYSIS” on the tool bar
3. The dialog box “DATA ANALYSIS” will appear
and choose “HISTOGRAM” on the dialog box
then click OK.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW”
then click “OK”.
7. The result will appear on the new worksheet of
the excel file. Get the Percentage and total.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Final Output

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example: Identify problems with the following


table.

Answer:
✦ Useless Information – Don’t show decimals if they are not
needed.
✦ Poor Alignment – Make sure alignment makes sense.
• Don’t center numbers, always right justify – try to align
decimal points.
• Consider the appropriate placement of row titles.
✦ Difficult to Read – Use commas used when the number exceeds
a thousand.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Graphical Presentation
✦ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
✦ It is considered an important medium of communication
because we are able to create a pictorial representation of
the numerical figures.
✦ Suited when we need to show the results of the study to
nonprofessionals and or people who dislike numbers and too
lengthy texts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Bar Graph
✦ It is constructed by labeling each category
of data on either the horizontal or vertical
axis and the frequency or relative frequency
of the category on the other axis. Rectangles
of equal width are drawn for each category.
The height of each rectangle represents the
category’s frequency or relative frequency.
✦ It is use to organize discrete data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example: Simple Bar Graph


The simple bar chart is used for the case of one
variable only.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Multiple Bar Graph\ Grouped


Example: Column Chart
The multiple bar chart is an extension of a simple bar chart
when there are quantities of several variables to be
displayed. The bars representing the quantities for the
different variables are piled next to one another for each
attribute. The figure becomes very cumbersome when there
are too many variables and components.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Component Bar Graph/ Subdivided
Example: Column Chart
In this type of bar chart, the components (quantities) of each
variable are piled on top of one another. It saves space as
compared to a multiple bar chart. One of the disadvantage
of this graph is that it is not always easy to compare size of
the components, or parts. It is used to represent data in
which the total magnitude is divided into different or
components.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Remember!
• Bar graphs may also be drawn with horizontal
bars. Horizontal bars are preferable when
category names are lengthy.
• In bar graphs, the order of the categories does
not usually matter. However, bar graphs that
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Histogram
✦ It is constructed by drawing rectangles for each class of
data. The height of each rectangle is the frequency or
relative frequency of the class. The width of each rectangle
is the same and the rectangles touch each other.
✦ It is a graph used to present quantitative data, is similar to
the bar graph.
✦ It is use to organize continuous data.

Polytechnic University of the Philippines


College of Science
https://newonlinecourses.science.psu.edu/
Department of Mathematics and Statistics stat500/lesson/1/1.6/1.6.2
Pie Chart
✦ It is a circle divided into sectors. Each sector represents a
category of data.The area of each sector is proportional to
the frequency of the category.
✦ Pie charts are typically used to present the relative
frequency of qualitative data. Inmost cases the data are
nominal, but ordinal data can also be displayed in a pie
chart.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

When should a bar graph or a


pie chart be used?
✦ Pie charts are useful for showing the
division of all possible values of a
qualitative variable into its parts.
✦ Bar graphs are useful when we want to
compare the different parts, not necessarily
the parts to the whole.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Line Graph
✦ A graph that shows information that is
connected in some way (such as change over
time)
✦ Line segments are then drawn connecting the
points. It is use to organize continuous data.
✦ Very useful in identifying trends in the data
over time.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Example: Simple Line Graph
The simplest of line graphs is the single line graph, so
called because it displays information concerning one
variable only, in terms of its frequencies.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example: Multiple Line Graph


Multiple line graphs illustrate information on
several variables so that comparison is possible
between them.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Guidelines for Constructing


Good Graphics
✦ Title and label the graphic axes clearly,
providing explanations if needed. Include units
of measurement and a data source when
appropriate.
✦ Avoid distortion.
✦ Minimize the amount of white space in the
graph. Use the available space to let the data
stand out. If you truncate the scales, clearly
indicate this to the reader.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Guidelines for Constructing
Good Graphics
✦ Avoid clutter, such as excessive gridlines and
unnecessary backgrounds or pictures.
✦ Don’t distract the reader.
✦ Avoid three dimensions.
✦ Do not use more than one design in the same
graphic. Let the data speak for themselves.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Grouped and Ungrouped Data


Data is often described as ungrouped Scores Frequency
or grouped. 1 - 10 5
11 - 20 9
Grouped data is the type of data 21 - 30 10
which is classified into groups after 31 - 40 12
41 - 50 24
collection. Total 60

Ungrouped data which is also known


as raw data is data that has not been Ungrouped data with a
placed in any group or category after frequency distribution
collection. No. of Television
Frequency
Sets
0 7
Ungrouped data without a 1 15
frequency distribution 2 12
3 4
1, 5, 4, 7, 2, 4, 1, 3, 8, 2, 2, 9 4 5
Polytechnic University of the Philippines 5 2
College of Science
Department of Mathematics and Statistics Total 45

Measures of Central Tendency:


MEAN
• It is the sum of the data values divided by the number of
data values.
• It is also called the average.
• It is appropriate only for data under interval and ratio scale
measurement.
Advantage of Mean
✦ Simple to understand and easy to calculate.

✦ It is rigidly defined.

✦ It is least affected fluctuation of sampling.

✦ It takes into account all the values in the series.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Formula for Mean:
Sample Mean
✦ For Ungrouped Data ✦ For Grouped Data

where: where:
∑i=1 xi ∑i=1 fxi
xi = data values n xi = data values r
n = no. of
x̄ = f = frequency x̄ =
sample n n = no. of n
observations sample
observations
Population Mean
where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
xi = data values
N = no. of μ= f = frequency
μ=
observations N N
N = no. of
observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Measures of Central Tendency:


MEDIAN
• It is the “middle observation” when the data set is sorted (in
either increasing or decreasing order).
• The median divides the distribution into two equal parts.
Advantage of Median
✦ The median is not affected by the size of extreme values but

by the number of observations.


✦ The median can be calculated even when the frequency

distribution contains “open-ended” intervals.


✦ It can also be used to define the middle of a number of

objects, properties, or quantities which are not really


quantitative in a nature.
✦ It can be easily interpreted.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Formula for Median:


✦ For Ungrouped Data ✦ For Grouped Data

(2 )
1. Arrange the data from n
− < cf i
lowest to highest (or highest
x̃ = LB +
to lowest). f
where:
2. For an odd number of LB = lower boundary of the
data, the median of a data median class
set is the “middle i = class width
observation”. When the n = no. of observations
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
the two middle scores”. preceding the median class
f = frequency of the median
Polytechnic University of the Philippines
class
College of Science
Department of Mathematics and Statistics
Measures of Central Tendency:
MODE
• It is the most frequently occurring value in a list of data.
• It is sometimes called nominal average.
• It is an appropriate measure of average for data using the
nominal scale of measurement.
• It is the only measure of central tendency used in both
quantitative and qualitative data.
Advantage of Mode
✦ The mode is easy to understand.
✦ Like the median, it is not greatly affected by extreme
values.
✦ Like the median, it can be computed even when the
frequency distribution contains “open-ended” intervals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Formula for Mode:


✦ For Ungrouped Data ✦ For Grouped Data

( d1 + d2 )
d1
1.Obtain a frequency x ̂ = LB + i
distribution of the distinct
values of the data. where:
LB = lower boundary of the
2.The mode is the most modal class
i = class width
frequently occurring data
d1 = difference between the
(if there is one).
frequency of the modal class
and the class preceding it
d2 = difference between the
frequency of the modal class
and the class following it
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Remember!
• Whenever you hear the word average, be aware that
the word may not always be referring to the mean.
One average could be used to support one position,
while another average could be used to support a
different position.
• Mode is not always present in the data sets unlike
mean and median.

• If you are interested in the “center of gravity” of your


data, then use the mean; if you are interested in the
“middle value” within your data, then use the median
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Choosing a Measure of Central Tendency:
We have discussed three types of central tendency-the
mode, the mean, and the median and examined how they
differ in terms of finding the center of a data distribution.
The next legitimate question to ask may be “When do we
use which measure?”
Consider the following data sets:

Data Set I 108 112 116 120 124


Data Set II 108 112 116 120 205

Determine the mean, median and mode.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

In both data sets, the median is 116, as it is the number that


divides the data set into two exact halves. However, you will
notice that the mean is not identical in both data sets. For the
first data set, the mean is equal to 116 where the mean of the
second data set is equal to 132.5

Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
than 120. Therefore, the mean should not be used when
unusual, or outlying, data values are present in the data set, as
the mean tends to be extremely sensitive to the unusual
values. Rather, the median should be reported in this case.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

• The mode is simply the most frequently occurring data


values in the data set. Therefore, it is mainly useful for the
nominal level of measurement. Both median and mean are
useful when the variable being measured can be quantified.
Also both data sets have no mode that’s why mode is not
appropriate measure to use in these data sets.

• It is better to use the median than to use the mean when


the sample is small or asymmetrical (i.e., skewed) and
unusual cases/outliers is present in the data sets. This is
why the average housing price is always reported with the
median, since even one million-dollar house can distort the
average housing price when most of the houses are in
Php500,000–Php650,000 range.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Example:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute mean,
median and mode.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
To compute mean of grouped data, first you need to
fill out this table.
Class Frequency
x fx
Interval (f)
55 - 59 3
It is the midpoint of
50 - 54 6 every class interval.
45 - 49 7
To compute this:
LC + UP
40 - 44 9

x=
35 - 39 6
30 - 34 4
2
25 - 29 5 Ex:
7 55 + 59
fxi = x= = 57
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines
2
College of Science
Department of Mathematics and Statistics

Solution:
7
∑i=1 fxi
Frequency
Class Interval x fx
x̄ =
(f)
55 - 59 3 57 171
50 - 54 6 52 312 n

1,675
45 - 49 7 47 329

=
40 - 44 9 42 378

40
35 - 39 6 37 222
30 - 34 4 32 128

= 41.88
25 - 29 5 27 135
7
fxi = 1,675
Total n = 40 ∑
i=1

The average age is 41.88

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution:
To compute median and mode of grouped data, first
you need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution: If the arrangement of


the class interval is
Class
Interval
f LB < cf descending order,
55 - 59 3 54.5 always start at the
50 - 54 6 49.5 bottom part.
45 - 49 7 44.5
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5 Copy the frequency
25 - 29 5 24.5 5 of the lowest class
Total n = 40
interval.
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class n
First, compute , it will help us to
2
f LB < cf
Interval
55 - 59 3 54.5 40 determine the median class and the
50 - 54 6 49.5 37 < cf.
n 40
= = 20
45 - 49 7 44.5 31
40 - 44 9 39.5 24 2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The median class is the class
25 - 29 5 24.5 5 containing the 20th item. Hence, the
Total n = 40 median class is 40 - 44.

(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution:
Class
Interval
f LB < cf The modal class is the class interval
55 - 59 3 54.5 40
with the highest frequency. The
50 - 54 6 49.5 37
modal class is 40 - 44.
45 - 49 7 44.5 31
40 - 44 9 39.5 24
If there are two class interval that
contains the highest frequency,
35 - 39 6 34.5 15
always choose the highest class
30 - 34 4 29.5 9
interval.
25 - 29 5 24.5 5

d1 = 9 − 6 = 3
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Measures of Relative Position


Quantiles are statistics that describe
various subdivisions of a frequency
distribution into equal proportions.
Three special Quantiles:
1. Quartiles
2. Deciles
3. Percentiles
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Quartiles - split
the ordered data
into four quarters.

Deciles - split the


ordered data into
ten equal.

Percentiles - split
the ordered data
into 100 equal
parts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Formula for Quartile:
✦ For Ungrouped Data ✦ For Grouped Data

(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
nk
Qclass = + 0.5
where:
4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical
k = quartile position
observation corresponding
< cf = less than the cumulative
to that point is chosen for
frequency of the class
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Formula for Decile:


✦ For Ungrouped Data ✦ For Grouped Data

( 10 )
1. Arrange the data from nk
lowest to highest. Then use − < cf i
this formula. Dk = LB +
f
nk
Dclass = + 0.5 where:
10 LB = lower boundary of the
2. If the resulting decile class
positioning point is an i = class width
integer, the particular n = no. of observations
numerical observation k = decile position
corresponding to that point < cf = less than the cumulative
is chosen for the decile.If frequency of the class
preceding the decile class
not, use interpolation.
Polytechnic University of the Philippines
f = frequency of the decile class
College of Science
Department of Mathematics and Statistics

Formula for Percentile:


✦ For Ungrouped Data ✦ For Grouped Data

( 100 )
1. Arrange the data from nk
− < cf i
lowest to highest. Then use
this formula. Pk = LB +
f
nk
Pclass = + 0.5 where:
100 LB = lower boundary of the
2. If the resulting percentile class
positioning point is an i = class width
n = no. of observations
integer, the particular
k = percentile position
numerical observation
< cf = less than the cumulative
corresponding to that point
frequency of the class
is chosen for the percentile. preceding the percentile class
If not, use interpolation. f = frequency of the percentile
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
class
Example 1:
The data given below is the total number of hours
lost due to tardiness and absences of employees in a
company in a given year.
Month Hour Lost (x)
Find Q3, D4 and P55. January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Solution: To compute Q3 of ungrouped data:


1. Arrange the data from lowest to highest.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

(12)(3)
Qclass = = 9.5
4
2. Use interpolation since the computed Qclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

Q3 = 40 + 0.5(42 − 40)
= 41

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution: To compute D4 of ungrouped data:

1. Arrange the data from lowest to highest.


20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(4)
Dclass = + 0.5 = 5.3
10
2. Use interpolation since the computed Dclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

D4 = 30 + 0.3(32 − 30)
= 30.6
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: To compute P55 of ungrouped data:

1. Arrange the data from lowest to highest.


20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

(12)(55)
Pclass = + 0.5 = 7.1
100
2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

P55 = 37 + 0.1(37 − 37)


= 37
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example 2:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute Q1, D7, and
P10.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
To compute Q1, D7, and P10 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution: If the arrangement of
Class
f LB < cf
the class interval is
Interval
descending order,
55 - 59 3 54.5
50 - 54 6 49.5
always start at the
45 - 49 7 44.5 bottom part.
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5 Copy the frequency
25 - 29 5 24.5 5 of the lowest class
Total n = 40
interval.
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
55 - 59 3 54.5 40 determine the quartile class and the
50 - 54 6 49.5 37
< cf. nk (40)(1)
= = 10
45 - 49 7 44.5 31
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The quartile class is the class
25 - 29 5 24.5 5 containing the 10th item. Hence, the
Total n = 40 quartile class is 35 - 39.

(4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
55 - 59 3 54.5 40 determine the decile class and the
50 - 54 6 49.5 37
< cf. nk (40)(7)
= = 28
45 - 49 7 44.5 31
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The decile class is the class
25 - 29 5 24.5 5 containing the 28 item. Hence, the
Total n = 40 decile class is 45 - 49.

( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution:
nk
First, compute , it will help us to
100
Class
f LB < cf
Interval
55 - 59 3 54.5 40
determine the percentile class and
50 - 54 6 49.5 37
the
< cf. nk (40)(10)
= =4
45 - 49 7 44.5 31
40 - 44 9 39.5 24 100 100
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The percentile class is the class
25 - 29 5 24.5 5 containing the 4th item. Hence, the
Total n = 40 percentile class is 25 - 29.

( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example 2:
The ages of the town’s people in a certain community
is as follows:
Class Interval Frequency

18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3

Find Q2, D5, and P50.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
To compute Q2, D5, and P50 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
subtract 0.5 to lower
32 - 38 38
class limit (LC).
39 - 45 20
Ex:
18 − 0.5 = 17.5
46 - 52 17
53 - 59 3
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution: If the arrangement of
the class interval is
Class
Interval
f LB < cf a s c e n d i n g o r d e r,
18 - 24 28 17.5 28 always start at the
25 - 31 54 24.5 upper part.
32 - 38 38 31.5
39 - 45 20 38.5 Copy the frequency
46 - 52 17 45.5 of the lowest class
53 - 59 3 52.5 interval.
Total n = 160

28 + 54 = 82 + 38 = 120 + 20 = 140 + 17 = 157 + 3 = 160

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
18 - 24 28 17.5 28 determine the quartile class and the
nk (160)(2)
25 - 31 54 24.5 82 < cf.
= = 80
4 4
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The quartile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 quartile class is 25 - 31.

(4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
18 - 24 28 17.5 28 determine the decile class and the
< cf. (160)(5)
25 - 31 54 24.5 82
nk
= = 80
10 10
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31.

( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution:
nk
First, compute , it will help us to
Class
f LB < cf 100
Interval
determine the percentile class and
18 - 24 28 17.5 28
the
(160)(50)
25 - 31 54 24.5 82
< cf. nk
= = 80
100 100
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The percentile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 percentile class is 25 - 31.

( 100 )
nk
− < cf i (80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Sample Interpretation:
1. Jennifer just received the results of her SAT exam. Her
SAT Mathematics score of 600 is in the 74th percentile. What
does this mean?
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.

2. Time taken to finish a test is 35 minutes. This time was the


first quartile. What does this mean?
25% of the learners finished the exam in 35 minutes or
less, and 75% of the learners finished the exam in more
than 35 minutes.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Measures of Dispersion/Variability
Based on the figure below, determine which between the
two scatter diagram illustrate larger variability?
Figure 1 Figure 2

Since the data points in figure 2 is more scattered than the


data points in figure 1, then the data set depicted in figure 2
is more varied.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Measures of Dispersion/Variability:
RANGE
It is the difference between the largest and the smallest
observations or items in a set of data.
R = Xmax. − Xmin.

Range is simple to calculate. However, we should be


cautious about using range as a measure of variability.
Range is a very crude measure of variability as it only
uses the highest and lowest values in computation.
Therefore, it does not accurately capture information
about how data values in the set differ if the data set
contains an unusual cases/outliers.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Measures of Dispersion/Variability:
STANDARD DEVIATION
• It is a measure of how far away items in a data set are from
the mean.
• The larger the standard deviation, the more variation there
is in the data set.
• The standard deviation can never be a negative number,
due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
• The smallest possible value for the standard deviation is 0,
and that happens only in contrived situations where every
single number in the data set is exactly the same (no
deviation).
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Formula for Standard Deviation:


Sample Standard Deviation
✦ For Grouped Data
For Ungrouped Data

where: where:
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s =
n−1 values s = n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency
n = no. of sample observations
Population Standard Deviation
where: where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
N = no. of observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Measures of Dispersion/Variability:
VARIANCE
It represents all data points in a set and is calculated
by averaging the squared deviation of each mean.

Variance is not easy to read as it is the squared format


and hence not easily interpretable. However,
Standard deviation being in the same units as the
mean we can easily understand the spread of data.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Formula for Variance:


Sample Variance
✦ For Grouped Data
✦For Ungrouped Data
where: where:
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s = values s =
2 2
n−1 n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency
n = no. of sample observations
Population Variance
where: where:
∑i=1 (xi − μ)2 xi = data ∑i=1 f(xi − μ)2
xi = data N r
values σ 2 = values σ =
2
μ = mean N μ = mean N
N = no. of observations f = frequency
N = no. of observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example 1:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute sample
standard deviation and sample variance.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Solution:
To compute SD and Var of grouped data, first you
need to fill out this table.
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
7 7
fxi = f(xi − x̄)2 =
Total n=
∑ ∑
i=1 i=1

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1

1,675 (x1 − x̄)2 = (57 − 41.88)2 = 228.61


x̄ =
40 (x2 − x̄)2 = (52 − 41.88)2 = 102.41
= 41.88 (x3 − x̄)2 = (47 − 41.88)2 = 26.21
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: 7
∑i=1 f(xi − x̄)2
s=
n−1
Class
(xi − x̄) 2
f(xi − x̄) 2
3,124.20
Interval
55 - 59 228.61 685.83 s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86

s2 =
30 - 34 97.61 390.44
25 - 29 221.41 1107.05
n−1
7
f(xi − x̄)2 = 3,124.20
Total

3,124.20 s2 =
40 − 1
i=1

= 80.11

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

How to interpret variance and standard


deviation?
Consider the following data set of toddler
weights in an outpatient clinic, assuming that the
data values were taken:
Data Set 15 13 20 19 14

Computed variance for this data set is 9.7.


Computed standard deviation for this data set is
3.11.
What does this mean?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

We cannot use variance as a measure of variability. Let us


assume that the values represent weight losses measured in
pounds taken from five subjects. Because the deviation of each
observation from the mean has been squared, the unit for the
variance is now in (pound)2 . What does (pound)2 mean? If we
were to say that data values differ from the mean on average
about 9.7 (pound)2, would this claim make sense? Probably not,
since there is no such a unit as a (pound)2.
Why do we then take the square of the deviation if the (unit)2
will not make sense to interpret at the end? The answer is
simple: If you do not square the deviation and sum each
deviation, it will always add up to zero no matter what data
set you work with.
n n
(xi − x̄) = 0 → (xi − x̄)2 ≠ 0
∑ ∑
i=1 i=1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
How can we then talk about variability if the measure of
variability comes out to be equal to zero? This is why we take
square of the deviation to compute the variance first and
then take square root of it to compute the standard
deviation, bringing us back to the original unit of
measurement.
We get the standard deviation of 3.11 by taking square root of
9.7; we can then say that the data values differ from the mean
(16.2 lbs.) on an average of about 3.11 pounds. We can
interpret this finding to mean that, on average, the weights fall
between 13.09 and 19.31 pounds. This makes more sense
when you look at the data set, compared to the variance. Note
that the mean and standard deviation should always be
reported together!
16.2 − 3.11 = 13.09
16.2 + 3.11 = 19.31
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Choosing a Measure of Dispersion/Variability:


We have discussed four types of dispersion/variability - the
range, the interquartile range, the variance, and the
standard deviation and examined how they differ. The next
legitimate question to ask may be “When do we use which
measure?”
You should use the range only as a crude measure, since it
is extremely sensitive to unusual values in the data set.
Interquartile range is not as sensitive to unusual data values,
where standard deviation is very sensitive to unusual values.
Therefore, the interquartile range should be used with the
median when the data contain unusual data values.
However, the standard deviation should be used with the
mean when the data are free of unusual data values.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Shape of Distribution
These two statistics give you insights into the shape of
the distribution.
✦ Skewness is the degree of distortion from the
symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Skewness
A symmetrical distribution will have a skewness of 0.
So, a normal distribution will have a skewness of 0.
In a symmetrical distribution, the Mean, Median and
Mode are equal to each other and the ordinate at
mean divides the distribution into two equal parts.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

There are two types of Skewness:


• Negatively Skewed/Skewed Left is when the tail of the left
side of the distribution is longer or fatter than the tail on the
right side. The mean and median will be less than the mode.
• Positively Skewed/Skewed Right means when the tail on the
right side of the distribution is longer or fatter. The mean and
median will be greater than the mode.

Skewness < 0 Skewness > 0 Skewness = 0


Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Karl Pearson’s Measure of


Skewness
Noticed that the mean, median and mode are not
equal in a skewed distribution.
The Karl Pearson's measure of skewness is based
upon the divergence of mean from mode in a skewed
distribution. Karl Pearson’s Coefficient of Skewness
(Sk), given by
where:
x̄ − x ̂
x̄ is the mean Sk =
x ̂ is the median
s
s is the sample standard deviation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
So far we have seen that Sk is strategically dependent
upon mode. If mode is not defined for a distribution
we cannot find Sk .But empirical relation between
mean, median and mode states that, for a moderately
symmetrical distribution, we have
Mean − Mode ≈ 3(Mean − Median)
Hence Karl Pearson's coefficient of skewness is
defined in terms of median as

3(x̄ − x̃)
where:
x̄ is the mean Sk =
x̃ is the median
s
s is the sample standard deviation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Kurtosis
It is actually the measure of outliers present in the
distribution. The outliers in a sample, therefore, have
even more effect on the kurtosis than they do on the
skewness.
Higher kurtosis means more of the variance is the
result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
the tails that mostly account for kurtosis, not the
central peak.
The kurtosis decreases as the tails become lighter. It
increases as the tails become heavier.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

• Mesokurtic (Kurtosis=3): This distribution has


kurtosis statistic similar to that of the normal
distribution.
• Leptokurtic (Kurtosis>3): Peak is higher and
sharper than normal distribution, which means that
data are heavy-tailed or profusion of outliers.
• Platykurtic (Kurtosis<3):
Compared to a normal
distribution, its tails are shorter
and thinner, and often its central
peak is lower and broader.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Percentile Coefficient of Kurtosis
A measure of kurtosis based on quartiles and
percentiles is
QD
k=
P90 − P10
where:
Q3 − Q1
QD is semi-interquartile range QD =
2

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

How to Calculate Measures of Central Tendency,


Measures of Variation, Skewness and Kurtosis for
Ungrouped and Sample Data Using Excel?
Example:
The data given below are the scores of randomly
selected applied statistics undergraduate students in
Section A and Section B. Compare the scores of Section
A and Section B based on measures of central tendency,
and measures of variation and determine which section
performed better in their final examination. Also,
describe the shape of the distribution of these two data
sets using skewness and kurtosis
Data Set A 40 38 42 40 39 39 43 40 39 40
Data Set B 46 37 40 33 42 36 40 47 34 45

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

1. Click “DATA” on the menu bar and Click “DATA


ANALYSIS” on the tool bar. The Dialog box will appear.
2. Select “Descriptive Statistics” then click “OK”.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
3. Highlight your data for the “INPUT RANGE” and click
the box of “LABELS IN FIRST ROW” then click “OK”.
4. Click “Summary statistics” and then click “OK”. Repeat the
process for Data Set B.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

When comparing distributions, it is better to use a measure of


variation/dispersion in addition to a measure of central tendency
but because in this example Data set A and Data set B have the
same value for measures of central tendency, we will just used
measure of variation/dispersion to compare these two data set.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Based on the result, Data set B has a larger variability since it


has larger value computed based on different measures of
variation. This means that Data Set B is much more spread
out than the Data Set A.
In this example, we want a data set with a large mean value
and a small standard deviation so we can say that this is the
section that performed better. Section A and Section B have
the same mean value but in terms of standard deviation
Section A have smaller value compared to Section B,
therefore, Section A performed better in their final
examination.
In terms of the shape of the distribution, these two data sets
have the shape in terms of Skewness and kurtosis. It shows
that Data Set A and Data Set B have platykurtic shaped and it
is skewed to the right.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Normal Distribution
✦ The normal distribution is sometimes called the bell curve
because the graph of its probability density looks like a
bell.

✦ It is also known as the Gaussian distribution, after the


German mathematician Carl Friedrich Gauss who first
described it.

✦ It is a probability function that describes how the values


of a variable are distributed.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Normal Curve

50 100 150
The red curve is a model called the normal curve ,
which is used to describe continuous random variables
that are said to be normally distributed.
A continuous random variable is normally distributed,
or has a normal probability distribution, if its relative
frequency histogram has the shape of a normal curve.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

No data will ever be exactly/perfectly normally


distributed in reality. If so, how do we know
whether or not a collected data set is normally
distributed?
We can begin with a visual display of the data in a
histogram to see if the data set is normally
distributed. However, a visual check, alone, may not
be sufficient to know whether the data are normally
distributed. There are statistical measures,
skewness and kurtosis, which, along with a
histogram, allow us to determine whether the set is
normally distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Why is it important to know if the data follows
a normal distribution?

The most important reason is that many human


characteristics fall into an approximately normal
distribution and that the measurement scores are
assumed to be normally distributed when
running most statistical analyses. Therefore, the
statistical results you get at the end may not be
trustworthy if the variable is not normally
distributed.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Properties of Normal Curve


1. The normal curve is bell-shaped and symmetric
about the mean, μ.
2. Because mean, median and mode are equal, the
normal curve has a single peak and the highest
point occurs at x = μ.
3. The normal curve has
inflection points at μ − σ Inflection point Inflection point

and μ + σ.

Polytechnic University of the Philippines


μ−σ μ μ+σ
College of Science
Department of Mathematics and Statistics

Properties of Normal Curve


4. The area under the normal curve is 1.

5. The area under the normal curve to the right


of μ equals the area under the curve to the
left of μ, which equals 0.50
6. The normal curve approaches, area = 1
but never touches the x-axis
as it extends farther and
farther away from the mean. 0.50 0.50

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
μ1 = μ2, σ1 < σ2 μ1 < μ2, σ1 < σ2

Mean:
✦ Changing the mean shifts the entire
curve left or right on the X-axis.
Standard Deviation:
✦ Changing the standard deviation
either tightens or spreads out the
μ1 < μ2, σ1 = σ2
width of the distribution along the X-
axis.
Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Determine whether the graph represent a normal


curve.

A. C.

B. D.

All of them did not represent the normal curve.


Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Role of Area under a Normal


Curve
Suppose that a random variable X is normally
distributed with mean μ and standard deviation σ . The
area under the normal curve for any interval of values of
the random variable X represents either
✦ the proportion of the population with the characteristic
described by the interval of values or
✦ the probability that a randomly selected individual
from the population will have the characteristic
described by the interval of values.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Standard Normal Distribution
A normal random variable having mean
value μ = 0 and standard deviation σ = 1 is
called a standard normal random variable,
and its density curve is called the standard
normal curve.

It will always be denoted by the letter Z.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Standardizing a Normal Random Variable


The normal random variable of a standard
x−μ
z=
normal distribution is called a standard
score or a z-score. Every normal random
variable X can be transformed into a z score
σ
via the following equation:
where X is a normal random variable, μ is the mean of X, and
σ is the standard deviation of X.
Probabilities for a standard normal
random variable are computed
using Standard Normal
Distribution Table which shows
a cumulative probability associated
with a particular z-score.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.

Whether positive or negative, larger z-scores


mean that scores are far away from the mean and
smaller z-scores means that scores are close to
the mean.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Standard Normal Distribution Table 1 (Positive Side P(Z < z))

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Standard Normal Distribution Table 2 (Negative Side P(Z < − z))

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Patterns for Finding Areas under a Standard Normal Curve


Using Table 1
A. Area to the right of a negative z value or to the left of a
positive z value.
Use Table 1 directly
0 z1 z1 0
B. Area between z values on either side of 0.
= -
z1 0 z2 0 z2 z1 0
1 − Area
C. Area between z values on same side of 0.

= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Patterns for Finding Areas under a Standard Normal Curve

Using Table 1
D. Area to the right of a positive z value or to the left of a
negative z value.

= -
0 z1 0 0 z1
Area = 1

E. Area between a given z value and 0.

= -
0 z1 0 z1 0
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Patterns for Finding Areas under a Standard Normal Curve

Using Table 2
A. Area to the right of a positive z value or to the left of a
negative z value.
Use Table 2 directly
z1 0 0 z1
B. Area between z values on same side of 0.
= -
z1 z2 0 z1 0 z2

C. Area between z values on either side of 0.

= +
z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Patterns for Finding Areas under a Standard Normal Curve

Using Table 2
D. Area to the right of a negative z value or to the left of a
positive z value.

= +
z1 0 z1 0 0
0.50 − Area Area = 0.50
E. Area between a given z value and 0.

= -
0 z1 0 0 z1
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example 1:
Scores on a standardized college entrance examination (CEE)
are normally distributed with mean 510 and standard
deviation 60. A selective university considers for admission
only applicants with CEE scores over 560. Find proportion of
all individuals who took the CEE who meet the university's
CEE requirement for consideration for admission.
Solution:
Given: μ = 510,σ = 60 and x = 560
Area = P(X > 560)
Step 1: Draw a normal curve and
shade the desired area.

X
450 510 570
Polytechnic University of the Philippines
560
College of Science
Department of Mathematics and Statistics

Using Table 1 By-hand Approach!


Step 2: Convert the value of x to a z-score.
P(X > 560) = P (Z > z) Area = P(Z > 0.83)
560 − 510 = 0.2033
( )
=P Z>
60
= P(Z > 0.83)
= 1 − P(Z ≤ 0.83)
= 1 − 0.7967 Z
−2 −1 0 1 2
= 0.2033
0.83
Use the Complement Rule
and determine one minus
the area.
The proportion of all CEE scores that exceed 560 is
0.2033 or 20.33%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Using Table 2 By-hand Approach!


Step 2: Convert the value of x to a z-score.
P(X > 560) = P (Z > z) Area = P(Z > 0.83)
560 − 510 = 0.2033
( )
=P Z>
60
= P(Z > 0.83)
= 0.2033 Z
−2 −1 0 1 2
0.83
The proportion of all CEE
scores that exceed 560 is
0.2033 or 20.33%.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Step 2: Used Excel to determine the area under
any normal curve. Technology Approach!
Use “TRUE” for
cumulative since we
want the area under the
normal curve.

The proportion of all CEE


scores that exceed 560 is
0.2033 or 20.33%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example 2:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ = 38.72,σ = 3.17 and x = 35
Step 1: Draw a normal curve and shade
the desired area.
Area = P(X < 35)

X
35.55 38.72 41.89
Polytechnic University of the Philippines
35
College of Science
Department of Mathematics and Statistics

Using Table 1 By-hand Approach!


Step 2: Convert the value of x to a z-score.
P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210
35 − 38.72
( 3.17 )
=P Z<
= P(Z < − 1.17)
= 1 − P(Z ≥ − 1.17)
= 1 − 0.8790 Z
−2 −1 0 1 2
= 0.1210
Use the Complement Rule −1.17
and determine one minus
the area.
The proportion of the pediatrician’s three-year-old
females who are less than 35 inches tall is 0.1210 or
12.10%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210
35 − 38.72
( 3.17 )
=P Z<
= P(Z < − 1.17)
= 0.1210
Z
−2 −1 0 1 2

−1.17

The proportion of the pediatrician’s three-year-old


females who are less than 35 inches tall is 0.1210 or
12.10%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Step 2: Used Excel to determine the area under


any normal curve. Technology Approach!

Use “TRUE”
for cumulative
since we want
the area under
the normal
curve.

The proportion of the


pediatrician’s three-
year-old females who
are less than 35 inches
tall is 0.1210 or 12.10%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example 3:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the probability that a randomly selected three-year-
old girl is between 35 and 40 inches tall, inclusive.
Solution:
Given: μ = 38.72,σ = 3.17, and 35 ≤ X ≤ 40
Area = P(35 ≤ X ≤ 40)
Step 1: Draw a normal curve and
shade the desired area.

X
35.55 38.72 41.89
Polytechnic University of the Philippines
35 40
College of Science
Department of Mathematics and Statistics
Using Table 1 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
( 3.17 3.17 )
=P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.6554 − 0.1210
= 0.5344
The probability a randomly
selected three-year-old female
is between 35 and 40 inches tall X
−2 −1 0 1 2
is 0.5344.
−1.17 0.40

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Using Table 2 By-hand Approach!


Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
( 3.17 3.17 )
=P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40)
= [0.50 − P(Z ≥ 0.40) + [0.50 − P(Z ≤ − 1.17)]
= [0.50 − 0.3446] + [0.50 − 0.1210]
= 0.1554 + 0.3790
= 0.5344 Area = P(−1.17 ≤ Z ≤ 0.40)
The probability a randomly selected
three-year-old female is between 35
and 40 inches tall is 0.5344.

X
−2 −1 0 1 2
Polytechnic University of the Philippines −1.17 0.40
College of Science
Department of Mathematics and Statistics

Step 2: Used Excel to determine the area under


any normal curve. Technology Approach!

Use “TRUE” for


cumulative since
we want the area
under the normal
curve.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:

1. Which one do you think is more informative?


Why?

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
2. What features
of the ‘Good
Presentation’
make it better
than the ‘Bad
Presentation’?
A.

B.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
3. Review the table and consider questions such as the
following.
Needs
Origin / Rating Poor Satisfactory V Good Excellent Total
Improvement
External 0% 2% 12% 19% 9% 41%
Internal 4% 8% 15% 23% 9% 59%
Grand Total 4% 10% 27% 41% 17% 100%
1. What percentage of the employees originated from within the
organization?
2. What percentage of the employees are both internal and rated
‘Very Good’?
3. What percentage of the employees received ‘Needs Improvement’
or ‘Poor’?
4. What category contains the greatest number of employees?
5. Do you see any notable differences in the percentage by category?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of
Salaries.
Salary Frequency Percentage
41,000 - 50,000 1 1%
51,000 - 60,000 20 13%
61,000 - 70,000 53 35%
71,000 - 80,000 43 29%
81,000 - 90,000 26 17%
91,000 - 100,000 6 4%
101,000 - 110,000 1 1%
Total 150 100%
1.What percentage of the employees earns less than or
equal 80,000?
2.What is the salary range of values?
3.What salary categories have percentage less than 5?
4.What salary category includes the most employees?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability that an instrument produced by this machine will last
A. less than 7 months.
B. between 7 and 12 months.
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
6. The lengths of human pregnancies are approximately normally distributed,
with mean μ = 266 days and standard deviation σ = 16 days.
What proportion of pregnancies lasts more than 270 days?
B. What proportion of pregnancies lasts less than 250 days?
C. What proportion of pregnancies lasts between 240 and 280 days?
D. What is the probability that a randomly selected pregnancy?
lasts more than 280 days?
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the
scores of 75 randomly selected students.
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
A. Based on the frequency distribution, compute measures of
central tendency, measures of variation, Q1, D9, P10 , Skewness
and kurtosis.
B. Based on the raw data, compute measures of central
tendency, measures of variation, Skewness and kurtosis using
Excel.
C. Compute Skewness and kurtosis of grouped and ungrouped
data. Make sure to describe the shape of the distribution
D. Do you think that computed value for grouped and
ungrouped data are the same?
8. Begin with the following set of data, call it Data Set I.
5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
A. Compute the sample standard deviation and sample mean of
Data Set I.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation
and sample mean of Data Set II.
C. Form a new data set, Data Set III, by subtracting 6 from
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III.
D. Comparing the answers to parts (a), (b), and (c), can you
guess the pattern? State the general principle that you expect
to be true.

9.Using “Encoded Data file”, construct frequency distribution


table for age, sex, marital status and educational attainment
and interpret the table.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

References
https://prezi.com/rirrca9ckuiz/textual-
presentation-of-data/
https://www.toppr.com/guides/economics/
presentation-of-data/textual-and-tabular-
presentation-of-data/
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
MODULE 4: INFERENTIAL STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
✦ Differentiate the null and alternative hypotheses.

✦ Formulates the appropriate null and alternative


hypotheses.
✦ Explain the logic of hypothesis testing.

✦ Assess and test if the data follows a normal distribution.

✦ Distinguish between independent and dependent


sampling.
✦ Identify the appropriate test statistics for normally
distributed data.
✦ Conduct test for two categorical variables.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

What is HYPOTHESIS TESTING?


Hypothesis testing is a procedure on sample
evidence and probability, used to test claims
regarding a characteristic of one or more populations.

What is HYPOTHESIS?
•A statement or claim regarding a characteristic of
one or more populations.
•A preconceived idea, assumed to be true but has to
be tested for its truth or falsity.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Procedures for Testing


Hypothesis
1. State the null and alternative hypothesis.
2. Set the level of significance or alpha level (α).
3. Determine the test distribution to use.
4. Calculate test statistic or p - value.
5. Make statistical Decision
6. Draw Conclusion
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative Hypothesis
Two Types of Hypothesis
1. Null Hypothesis
• Denoted by
• The statement being tested.
• Assumed true until evidence indicates otherwise.
• Must contain the condition of equality and must be written
with the symbol = , ≤ , or ≥.
2. Alternative Hypothesis
• Denoted by
• Statement that must be true if the null hypothesis is false
• Sometimes referred to as the research hypothesis
• Must contain the condition of equality and must be written
with the symbol ≠, < or >.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example Hypothesis: Null Hypothesis:


✦ Students who eat and not eat breakfast will perform the same on
a math exam.
✦ Students who experience and not experience test anxiety prior to
an English exam will get the same scores.
✦ Motorists who talk and not talk on the phone while driving will
get the same errors on a driving course.
Alternative Hypothesis:
✦ Students who eat breakfast will perform better on a math exam
than students who do not eat breakfast.
✦ Students who experience test anxiety prior to an English exam
will get higher scores than students who do not experience test
anxiety.
✦ Motorists who talk on the phone while driving will be more likely
to make errors on a driving course than those who do not talk on
the phone.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Reminders:
If you are conducting a research study and you want
to use a hypothesis test to support your claim, the
claim must be stated in such a way that it becomes
the alternative hypothesis, so it cannot contain the
condition of equality.

Two Types of Alternative Test


1. One - tailed test
✦ Left tailed

✦ Right tailed

2. Two - tailed test


Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
2. Set the Level of Significance or Alpha Level (α)
• You should establish a predetermined level of
significance, below which you will reject the null
hypothesis.
• The generally accepted levels are 0.10, 0.05, and 0.01.
• Be as rigorous as possible.
Two Types of Error

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example:
H0: The defendant is innocent.
Ha: The defendant is not innocent.

What happen to the defendant if the jury made type I


and type II error?

Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Reminders:
It is important to note that we want to set
( α ) before we start our study because the
Type I error is the more ‘grevious’ error to
make.
The smaller (α ) is, the smaller the region
of rejection.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
3. Determine the Test Distribution to Use.

Determine the appropriate statistical test to


be used.
✦ Dependent Sample t - Test
✦ Independent Sample t - Test
✦ One Way Analysis of Variance
(ANOVA) Test
✦ Pearson r
✦ Chi - Square Test
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

4. Calculate Test Statistic or p - value.


Performing statistical analysis using statistical
software such as Excel, SPSS, R, Minitab, SAS,
etc.

5. Make Statistical Decision

✦ Using confidence interval


✦ Using p-value approach
✦ Using traditional method
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Decision Rule:
✦ Using Confidence Interval

Reject the null hypothesis if the test statistic is not within


the range specified by the confidence interval.
✦ Using Traditional Approach
Reject Ho if the computed value of the test statistic falls in
the region of rejection.
✦ Using P-value Approach
Reject the null hypothesis if the computed p-value is less
than or equal to the set significance level , otherwise do not
reject the null hypothesis.
Example: If the level of significance (α = 0.05),
P-value Decision
0.01 Reject H0
0.05 Reject H0
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
0.10 Failed to Reject H0
Traditional Approach

Rejection of region
or critical region is
the set of all values of
the test statistic
which will lead to the
rejection of H0.
Acceptance Region is
the set of all values of
the test statistic that
leads the researcher to
retain H0.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

One-tailed and Left tailed One-tailed and Right tailed


Ha : μ1 < μ2 Ha : μ1 > μ2
Rejection Region
Rejection Region

-2 0 2 -2 0 2

Two-tailed
Ha : μ1 ≠ μ2
Rejection Region
Rejection Region

Polytechnic University of the Philippines -2 0 2


College of Science
Department of Mathematics and Statistics

In stating your decision you can use:


✦ Fail to reject the null hypothesis/ Do not reject

the null hypothesis/ Retain the null hypothesis


✦ Reject the null hypothesis.

It is important to recognize that we never accept


the null hypothesis. We are merely saying that the
sample evidence is not strong enough to warrant
rejection of the null hypothesis.
6. Draw Conclusion
Record conclusions and recommendations in a report,
and associate interpretations to justify your
conclusion or recommendations.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assessing and Testing Normality
of the Data
To determine if the data is follows a normality
distribution, we can use the graphical or
numerical method.
Graphical:
Normal Q-Q Plot
Histogram
Numerical:
Shapiro Wilk Test
Kolmogorov Smirnov Test
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

How to Check Normality?


Histogram plots the observed values against their
frequency, states a visual estimation whether the
distribution is bell shaped or not.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

How to Check Normality?


Q-Q probability plots display the observed values
against normally distributed data (represented by the
line).

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Reminders:
Graphical methods are typically not
very useful when the sample size is
small.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Hypotheses of Normality Test


The hypotheses used are:
Ho: The sample data follows a normal distribution.
Ha: The sample data does not follow a normal
distribution.

When we are testing normality:


• If P value > alpha, it means that the data are
normal.
• If P value ≤ alpha, it means that the data are NOT
normal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

How to Calculate Shapiro - Wilk Test in Excel?


Sample Data

STEP 1:
Rearrange
the data in
ascending
order.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
STEP 2: Calculate SS as follows:
n
(xi − x̄)
2

SS =
i=1

Use "=DEVSQ( )”
function in excel
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics

SS means Sum of Square


Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics

∑ i ( n+1−i
STEP 3: Calculate b as follows: b = a x − xi)
i=1

n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m=
2
Since n is even in this
example, m=8. That’s
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
why we used a1 to a8
Taking the ai weights from
the table of Shapiro -Wilk
Polytechnic University of the Philippines
College of Science
(based on the value of n)
Department of Mathematics and Statistics

Shapiro - Wilk Table

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Note that if n is odd, the median


data value is not used in the
Polytechnic University of the Philippines
calculation of b.
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

STEP 4: Calculate the test statistic: b2


W=
SS

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
STEP 5:
Find the value in the table of Shapiro - Will (for a
given value of n) that is closest to W, interpolating if
necessary. This is the p-value for the test.

We choose this
interval in the table of
Shapiro - Wilk,
because our n=16 and
our test statistic
(W=0.955) is within
Polytechnic University of the Philippines
this interval.
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
We used interpolation to get the
p-value of Shapiro-Wilk Test
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Result

Since the computed p-value is greater than the set


level of significance, we failed to reject the null
hypothesis. Therefore, the sample data follows a
normal distribution.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Inferential Statistics
1. Parametric Tests
✦ Assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met
so that the result of a parametric test is reliable.
✦ Apply to data in ratio scale, and some apply to data in

interval scale.
2. Non Parametric Test
✦ Refer to a statistical method in which the data is not

required to fit a normal distribution.


✦ Most non-parametric tests apply to data in an ordinal

scale, and some apply to data in nominal scale.


Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Inference About Two Means
To perform inference on the difference of two
population means, we must first determine whether the
data come from an independent or dependent sample.

Distinguish between Independent and Dependent Sample


✦ A sampling method is independent when the
individuals selected for one sample do not dictate
which individuals are to be in a second sample.
✦ A sampling method is dependent when the individual
selected to be in one sample are used to determine the
individuals to be in the second sample.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
Determine whether the sample is independent or dependent.
1. An urban economist believes that commute times to
work in the South are less than commute times to work
in the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
Answer: Independent
2. In an experiment conducted in biology class, Prof.
Rhea measured the time required for 12 students to
catch a failing meter stick using their dominant hand
and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
Determine whether the sample is independent or
dependent.
3. A researcher wants to know if the mean
length of stay in for-profit hospitals is different
from the mean length of stay in not-for-profit
hospitals. He randomly selected 20 individuals in
the for-profit hospital and matched them with 20
individuals in the not-for-profit by diagnosis.
Answer:
Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Dependent Sample t - Test
The dependent sample t-test (also called
the paired t-test or paired-samples t-test)
compares the means of two related groups
to determine whether there is a statistically
significant difference between these
means.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Assumptions
1. Your dependent variable should be measured at
the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two
categorical, "related groups" or "matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
A teacher is interested to know if the new learning program
will help to increase the number of correct remembered
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test.
After the first test all subjects
are instructed how to use the
learning program and then
learn a second list of 50 words.
Learning performance is again
measured with the recall test. In
the following table the number
of correct remembered words
are listed for both tests.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis: Ho : μ1 ≥ μ2
The new learning program will not help to increase
the number of correct remembered words.
Alternative hypothesis: Ha : μ1 < μ2
The new learning program will help to increase the
number of correct remembered words.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.05
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

3. Determine the Test


Distribution to Use.
Dependent Variable:
Number of correct remembered words
Independent Variable:
Treatment (Before and After)

Since we are comparing the means of two


related groups, we will use the dependent
sample t-test.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

4. Calculate Test Statistic or


p - value.
Click “Data”, then click “Data Analysis”

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho

Reject Ho

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

6. Draw Conclusion
There is sufficient evidence to support that the new
learning program help to increase the number of
correct remembered words.

Proper Presentation of Results

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
Professor Rhea measured the time (in second) required to
catch a falling meter sticks for 10 randomly selected
students' dominant hand and non-dominant hand. Professor
Rhea claims that the reaction time in an individual's
dominant hand is less than the reaction time in
their non-dominant hand.
Test the claim at the level
of significance. The data
obtained are presented:

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Independent Sample t - Test


The independent sample t - test allows
researchers to evaluate or to compare the mean
difference between two populations using the data
from two separate samples. It is used to test
whether population means are significantly
different from each other, using the means from
randomly drawn samples.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your dependent variable should be measured on a
continuous scale (i.e., it is measured at the interval or
ratio level).
2. Your independent variable should consist of two
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each group of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
Researchers wanted to know whether there was a difference in
comprehension among students learning a computer program
based on the style of the text. They randomly divided 18
students into two groups of 9 each. The researchers verified
that the 18 students were similar in terms of educational level,
age, and so on. Group 1 individuals learned the software using
visual manual (multimodal
instruction), while Group 2
individual learned the software
using textual manual (Unimodal
instruction). The following data
represent scores the students
received on an exam given to them
they studied from the manuals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

1. State the Null and Alternative


Hypothesis
Null hypothesis: Ho : μ1 = μ2
There is no significant difference between the scores of the
students learning computer program using textual and
visual style.
Alternative hypothesis: Ha : μ1 ≠ μ2
There is significant difference between the scores of the
students learning computer program using textual and
visual style.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.05
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
3. Determine the Test
Distribution to Use.
Dependent Variable:
Scores
Independent Variable:
Style of the Text (Visual and Textual)

Since we are comparing the means of two


independent groups, we will use the
independent sample t-test.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Click “Data”, then click “Data Analysis”

Determine if the
variances are equal
or not equal.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho
Since we failed to reject Ho, we will proceed to t-test: Two
Sample Assuming Equal Variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

4. Calculate Test Statistic or


p - value.
Click “Data”, then click “Data Analysis”

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho

Failed to
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho

6. Draw Conclusion
There is no enough evidence to support that
there is a difference in comprehension among
students learning a computer program based on
the style of the text.
Proper Presentation of Results

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
Twenty participants were given a list of 20 words to
process. The 20 participants were randomly assigned to
one of two treatment conditions. Half were instructed to
count the number of vowels in each word (shallow
processing). Half were instructed to judge whether the
object described by each word would be useful if one
were stranded on a desert island (deep processing).
After a brief distractor task, all subjects were given a
surprise free recall task. Did the instruction affect the
level of recall?The number of words correctly recalled
was recorded for each subject. Here are the data:
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Since the result of F-test conclude that the


variances of the two groups are equal, we will
apply “Assuming Equal Variances”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

One - Way Analysis of Variance


(ANOVA)
One-way analysis of variance (ANOVA)
is a method of test ing the equality of
three or more population means by
analyzing sample variances.
Ho : μ1 = μ2 = . . . = μk
Ha : At least one of the population means
is different from the others.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Assumptions
1. Your dependent variable should be measured at the
interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each category of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A Researchers wanted to compare math test scores of
students at the end of secondary school from various cities.
Eight randomly selected students from Makati, Manila,
and Quezon City each were administered the same exam;
the results are presented in the following table. Can the
researchers conclude
that the distribution of
exam scores is different
for each city at the
level of significance?

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

1. State the Null and Alternative


Hypothesis
Null hypothesis:
There is no significant difference between the
mathematics scores of students at various city.
Alternative hypothesis:
There is significant difference between the
mathematics scores of students at various city.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.10
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

3. Determine the Test


Distribution to Use.
Dependent Variable:
Mathematics Scores
Independent Variable:
Cities (Makati, Manila, Quezon City)
Since we are comparing the means of one
independent variable that consist of two
or more categorical groups, we will use
the one-way ANOVA.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Click “Data”, then click “Data Analysis”

Determine if the
variances are equal
or not equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Using p-value approach: If pvalue ≤ α , reject Ho,


otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Using p-value approach: If pvalue ≤ α , reject Ho,


otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho
Ho: Equal Variances Assumed
Ha: Equal Variances Not Assumed

Failed to
Reject Ho

E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics

4. Calculate Test Statistic or


p - value.
Click “Data”, then click “Data Analysis”

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α , reject Ho,
otherwise failed to reject Ho

Reject Ho

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

6. Draw Conclusion
There is enough evidence to support that the
distribution of exam scores of students in
mathematics is different for each city.

Proper Presentation of Results

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
A teacher is concerned about the level of
knowledge possessed by PUP students regarding
Philippine history. Students completed a high
school senior level standardized history exam.
Academic major of the students was also recorded.
Data in terms of percent correct is recorded below
for 24 students. Is there a significant difference
between the levels of knowledge possessed by PUP
students regarding Philippine history when
grouped according to their academic major?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Pearson Product Moment
Correlation
The Pearson product moment correlation
coefficient (Pearson r) is a measure of the
strength of a linear association between
two variables and is denoted by r.
Ho: There is no significant relationship
between two continuous variables.
Ha: There is significant relationship between
two continuous variables.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Pearson Product Moment


Correlation

If r is positive, the correlation is direct.


If r is negative, the correlation is inverse.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Sample of Observations from
Various r Values
Y Y Y

X X X
r = -1 r = -.6 r =0
Y Y

r = .6 r=1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Reminders:
• Correlation does not imply causation.
• Watch out for hidden (lurking) variables.

Lurking Variable
• A variable that is not included as an explanatory
or response variable in the analysis but can affect
the interpretation of relationships between
variables.
• Can falsely identify a strong relationship between
variables or it can hide the true relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Assumptions
1. Your two variables should be measured at the
interval or ratio level (i.e., they are
continuous).
2. There is a linear relationship between your
two variables.
3. There should be no significant outliers.
4. Your variables should be approximately
normally distributed.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Significance Testing of Pearson r

Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of Pearson r
Note:
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Example:
A dietetics student wanted to look at the
relationship between calcium intake and
knowledge about calcium in sports
science students. Table shows the data
she collected. Is there a relationship
between calcium intake and knowledge
about calcium in sports science
students?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis:
There is no significant relationship between the
calcium intake and knowledge about calcium in sports
science students.
Alternative hypothesis:
There is significant relationship between the calcium
intake and knowledge about calcium in sports science
students.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.0.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

3. Determine the Test


Distribution to Use.
Dependent Variable:
Calcium Intake
Independent Variable:
Knowledge about Calcium

Since we are testing the significant


relationship of two variables, we will use
Pearson r.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

4. Calculate Test Statistic or p - value.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
df
t=r
1 − r2

df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Result
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α ,
reject Ho, otherwise failed to reject
Ho Strong and
D i r e c t
Correlation

Polytechnic University of the Philippines


Reject Ho
College of Science
Department of Mathematics and Statistics
6. Draw Conclusion
There is sufficient evidence to conclude that there
is significant relationship between the calcium
intake and knowledge about calcium in sports
science students.
Proper Presentation of Results

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Exercises:
Apply the procedure in testing the hypothesis.

A group of twelve children participated in a


psychological study designed to assess the
relationship, if any, between age (years)
and average total sleep time (minutes). To
obtain a measure for average total sleep
time, recordings were taken on each child
on five consecutive nights and then
averaged. The results obtained are shown in
the table.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Chi-Square Distribution
Definition:
The chi-square distribution is
written as χ 2 distribution.
The symbol χ is the Greek letter
“chi”, pronounced as “ki”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Chi - Square: Test for


Independence
✦ Used to discover if there is association
between two categorical variables.
✦ Used when you want to decide whether
two variables are independent or
dependent.
✦ A contingency table will be constructed.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Chi - Square: Test for


Independence

H0: The two categorical variables are


independent.

Ha: The two categorical variables are


dependent.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Chi - Square: Test for
Independence
The test statistic for a test of independence is given
by
2 (O − E)2

χ =
E
where:
O is the observed frequency for a category
E is the expected frequency for a category
(row total)(column total)
E=
grand total
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Observed and Expected Frequencies


The frequencies obtained from the performance of an
experiment are called the observed frequencies and are
denoted by O.
The expected frequencies, denoted by E, are the
frequencies that we expect to obtain if the null hypothesis is
true.
Example of Contingency Table:
Observed Values Low Medium High Row Total
Some College 20 35 20 80
Bachelor's Degree 17 33 25 70
Masters Degree 11 18 21 50
Column Total 48 86 66 200
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Assumptions
1. There are 2 variables, and both are measured as
categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1 and
no more than 20% of all cells may have expected
frequencies < 5.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
1. A doctor who knows that hypertension depends
on smoking habits can tell his smoking patients what
they should do.
2. If the traffic condition (light, moderate, heavy,
standstill) is found to be dependent on vehicle plate
numbers (odd, even) a traffic officer may decide to
revise traffic law enforcement.

3. If poverty status of households is found to be


correlated with family size, government ought to
adopt a viable poverty management program
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Reminders:
The word contingency refers to
dependence, but this is only a
statistical dependence and cannot be
used to establish a direct cause-and-
effect link between the two variables in
question.

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Example:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part
of a non-statistics degree course (e.g., psychology).
With current technology, it is possible to present
how-to guides for statistical programs online
instead of in a book. However, different people
learn in different ways. An educator would like to
know whether gender (male/female) is associated
with the preferred type of learning medium (online
vs. books). Use “Data_Example and Exercises file”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis:
Gender is independent with the preferred type of
learning medium.
Alternative hypothesis:
Gender is dependent with the preferred type of
learning medium.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.0.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

3. Determine the Test


Distribution to Use.
Two Categorical Variables
Gender (Male and Female)
Preferred type of learning medium
(online vs. books)

Since we are testing the significant


relationship of two categorical variables,
we will use Chi-square test.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

4. Calculate Test Statistic or


p - value.
Click “Insert”, then click “Pivot Table”

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Row Total

Grand Total
Column Total

(row total)(column total)


E=
Polytechnic University of the Philippines
grand total
College of Science
Department of Mathematics and Statistics

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

5. Make Statistical Decision


Using p-value approach: If pvalue ≤ α, reject Ho,
otherwise failed to reject Ho

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Reject Ho

6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
A survey was conducted at a community college of 102
randomly selected students who dropped a course in the
current semester to learn why students drop courses.
Personal drop reasons include financial, transportation,
family issues, health issues, and lack of child care. Course
drop reasons include reducing ones load, being unprepared
for the course, the course was not what was expected,
dissatisfaction with teaching, and not getting the desired
grade. Work drop reasons include an increase in hours, a
change in shift, and obtaining full-time employment. Test
whether gender is independent of drop reason at the 1%
level of significance. Use “Data_Example and Exercises
file”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

Result

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
Determine whether the sampling is dependent or independent.
________1. A researcher wishes to compare academic
aptitudes of married mathematicians and their spouses. She
obtains a random sample of 287 such couples who take an
academic aptitude test and determines each spouses academic
aptitude.
________2. A political scientist wants to know how a random
sample of 18- to 25-year-olds feel about Democrats and
Republicans in Congress. She obtains a random sample of
1030 registered voters 18 to 25 years of age and asks, Do you
have favorable/unfavorable opinion of the Democratic/
Republican party? Each individual was asked to disclose his
or her opinion about each party.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
________3. An educator wants to determine whether a new
curriculum significantly improves standardized test scores for third
grade students. She randomly divides 80 third-graders into two
groups. Group 1 is taught using the new curriculum, while group 2 is
taught using the traditional curriculum. At the end of the school year,
both groups are given the standardized test and the mean scores are
compared.
________4. A stock analyst wants to know if there is difference
between the mean rate of return from energy stocks and that from
financial stocks. He randomly select 13 energy stocks and computes
the rate of return for the past year. He randomly selects 13 financial
stocks and compute the rate of return for the past year.
________5. An urban economist believes that commute times to work
in the South are less than commute times to work in the Midwest. He
randomly selects 40 employed individuals in the south and 45
employed individuals in the Midwest and determines their commute
times.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps
procedure.
1. A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements. The data are shown below.
I s t h e r e a s t a t i s t i c a l l y Normal Bone Osteopenia Osteoporosis
significant difference in mean Density
1200 1000 890
calcium intake in patients 1000 1100 650
with normal bone density as 980 700 1100
compared to patients with 900 800 900
osteopenia and osteoporosis? 750 500 400
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
800 700 350

ACTIVITIES/ASSESSMENTS:
2. Some studies have shown that in the United Men Women
(in $) (in $)
States, men spend more than women buying gifts
and cards on Valentine’s Day. Suppose a researcher 107.48 125.98
wants to test this hypothesis by randomly sampling 143.61 45.53
nine men and 10 women with comparable
demographic characteristics from various large cities 90.19 56.35
across the United States to be in a study. Each study 125.53 80.62
participant is asked to keep a log beginning one
70.7 46.37
month before Valentine’s Day and record all
purchases made for Valentine’s Day during that one- 83 44.34
month period. The resulting data are shown below.
129.63 75.21
Use these data and a 1% level of significance to test
to determine if, on average, men actually do spend 154.22 68.48
significantly more than women on Valentine’s Day.
93.8 85.82
Assume that such spending is normally distributed
in the population and that the population variances 126.11
are equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. A researcher is interested whether a training course increases
the teaching performance of the teachers who attended the
training courses. Test at 10% level of significance. The data are
shown below:
Case Before After Case Before After
1 85 95 11 89 97
2 84 98 12 87 98
3 86 97 13 82 95
4 87 92 14 81 95
5 89 96 15 86 92
6 82 93 16 89 91
7 80 94 17 89 94
8 84 95 18 84 95
9 86 90 19 85 96
10 82 82
Polytechnic University of the Philippines
20 88 97
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
Head
4. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
5. The following data represent the smoking status from a
random sample of 1054 U.S. residents 18 years or older by
level of education.
No. Of Years Smoking Status
of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51

Test whether smoking status and level of education are


independent at the α = 0.05 level of significance.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Head
6. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics

References
h t t p s : / / w o l f w e b . u n r. e d u / h o m e p a g e / a n i a /
stat352f12lectures/352lecture21f12.pdf
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
http://www.real-statistics.com/tests-normality-
and-symmetry/statistical-tests-normality-
symmetry/shapiro-wilk-test/

Polytechnic University of the Philippines


College of Science
Department of Mathematics and Statistics
Republic of the Philippines
Polytechnic University of the Philippines

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION


MIDTERM EXAMINATION

Name: Course & Section:

Directions: Read each item carefully. Write the letter corresponding to the best answer on a yellow paper on each
item. Write NONE if no correct choice is given. Make sure to write also your solutions.

1. A bank surveyed all of its 60 employees to determine the proportion who participate in volunteer activities.
Which of the following statements is true?
(a) The bank should not use the data from this survey because this is an observational study.
(b) The bank does not need to use an inference procedure to determine the proportion of employees who
participate in volunteer activities because the survey was a census of all employees.
(c) The bank can use the result of this survey to prove that working for the bank causes employees to
participate in volunteer activities.
(d) The bank did not select a random sample of employees, so the survey will not provide the bank with useful
information.
2. In the design of a survey, which of the following best explains how to minimize response bias?

(a) Increase the sample size (c) Randomly select the sample
(b) Carefully word and field-test survey questions (d) Increase the number of questions in the survey

3. A body of principle, which deals with collection, analysis, interpretation and presentation of numerical facts or
data.

(a) Statistic (b) Descriptive (c) Inferential (d) Statistics

4. Cluster sampling is an example of:

(a) Simple Random Sampling (c) Nonprobability Sampling


(b) Probability Sampling (d) Stratified Sampling

5. Which of the following statements regarding a researchers use of inferential statistics is true?
(a) It is best to measure every member of a population if possible.
(b) A random sample provides a perfect estimate of the population values.
(c) Descriptive statistics from a sample are used to estimate the characteristics of the population.
(d) We usually need to take several samples to obtain a good estimate of the population values.

6. The divides the distribution into ten equal parts.

(a) Decile (b) Percentile (c) Median (d) Quartile

7. What sampling technique is used when the respondents are chosen on the basis of pre-determined criteria set
by the researchers?
(a) cluster sampling (b) systematic sampling (c) purposive sampling (d) convenience sampling

8. In a distribution the mean < median < mode.

(a) Normal (b) Unimodal (c) Negatively Skewed (d) Positively Skewed

9. Which one of the following variables is not categorical?


(a) score on the exam.
(b) Educational Attainment: elementary graduate, high school graduate, college graduate.
(c) Color: blue, red, white.
(d) Subject: algebra, calculus, trigonometry
10. Given the data set, 40, 50, 70, 70, 60, 90, 80, 80, 90. What will happen if we replace the data value 90 in the
data set by 5, will the standard deviation .

(a) Increase (b) Decrease (c) stay the same (d) None of the above

11. If the statistics grades of Karen are 87, 85, 91, 89 and X, what must be the value of X so that the average is
89?

(a) 92 (b) 95 (c) 93 (d) 91

12. In descriptive statistics, we study


(a) The description of decision making process
(b) The methods for organizing, displaying, and describing data
(c) How to describe the probability distribution
(d) None of the above
13. In statistics, conducting a survey means
(a) Collecting information from elements
(b) Making mathematical calculations
(c) Drawing graphs and pictures
(d) None of the above
14. Which of the following represents the middle point in a set of numbers arranged in order of magnitude?

(a) Mean (b) Median (c) Mode (d) Variance

15. Mr. Martin had seven students in his after-school statistics tutorial. The scores they received on their last quiz
were as follows: 81, 73, 84, 78, 89, 82, 81. What was the mean score?

(a) 81.14 (b) 78.5 (c) 82 (d) 79.5

16. If all the units of a population are surveyed it is called

(a) Survey (b) Population (c) Census (d) Sample

17. For percentiles, the total number of partition values are

(a) 10 (b) 25 (c) 99 (d) 100

18. Which of the following represents median?

Page 2
(a) First Quartile (b) Fiftieth Percentile (c) Sixth decile (d) Third quartile

19. 5 is subtracted from each observation of a set, then the mean of the observation is reduced by

(a) 5 (b) 1 (c) 0 (d) 15

20. The standard deviation of 10 observations is 15. If 5 is added to each observations the value of new standard
deviation is

(a) 5 (b) 1 (c) 0 (d) 15

21. If the minimum value in a set is 9 and its range is 57, the maximum value of the set is

(a) 33 (b) 66 (c) 48 (d) 24

22. Which of the following situations exhibit the function of Inferential Statistics?
(a) The highest score obtained by BSS section 1 in their first quiz is 48.
(b) All the ten scores are closely scattered around the average value.
(c) Mathematical anxiety of the students will be related with their academic performance.
(d) Line graphs will be used to exhibit the fluctuating trend of monthly consumption of electricity.
23. Which of the following situations exhibit the function of Descriptive Statistics?
(a) Determining the most favored characteristics of the ideal teacher students perceived.
(b) Relating the number of absences committed by students with their academic performance.
(c) Citing the differences in perception of the male and female students towards NO ID-NO ENTRY policy.
(d) Comparing the course grades in Statistics of every section who are taking the subject during the first
semester.
For items 24 to 27, consider this situation. There were 200 students of PUP San Juan enrolled in General
Statistics in the first semester. A periodic examination was given and it was found out that the average score
is 93. When a random section with 50 students is chosen, it was found out thet 89 is the average score of the
section.
24. What do we call to the number 200?

(a) statistic (b) sample size (c) parameter (d) population size

25. What do we call to the number 93?

(a) statistic (b) sample size (c) parameter (d) population size

26. What do we call to the number 50?

(a) statistic (b) sample size (c) parameter (d) population size

27. What do we call to the number 89?

(a) statistic (b) sample size (c) parameter (d) population size

For items 28 to 30, consider this situation.A group of undergraduate researchers aims to execute stratified
random sampling among 63 Section 1 students, 52 Section 2 students, 48 Section 3 students and 37 Section 4
students. The margin or error is 5%.
28. What is the sample size?

Page 3
(a) 124 students (b) 134 students (c) 144 students (d) 154 students

29. How many students of Section 2 will be included in the sample?

(a) 15 students (b) 25 students (c) 35 students (d) 45 students

30. How many students of Section 4 will be included in the sample?

(a) 13 students (b) 17 students (c) 21 students (d) 25 students

31. Which of the following is an example of a primary source of data?

(a) TV station (b) encyclopedias (c) living organisms (d) scientific journals

32. A marketing team specializing in food products set stands in a mall to determine the preference of the mall-goers
in choosing and consuming finger-foods. What sampling technique is appropriate in doing this?

(a) cluster sampling (b) purposive sampling (c) convenience sampling (d) systematic sampling

33. A market research company asks a sample of students to rate the taste of a new soft drink. The response scale
is really yummy, yummy, ok, yuck, really yuck. This is an example of a

(a) Nominal Level (b) Ordinal Leve (c) Interval Level (d) Ratio Level

34. A researcher is studying students in college in PUP. She takes a sample of 400 students from 10 colleges. The
average age of selected college students in PUP is

(a) statistic. (b) parameter. (c) the median. (d) a population.

35. A coffee shop wants to know the temperature of coffee that most people prefer. They brew coffee at the typical
temperature for the shop and then ask customers “Do you prefer coffee to be at this temperature?” and record
a yes or no answer for each customer. What is the level of measurement of the way they measured preferred
temperature?

(a) Nominal (b) Ordinal (c) Interval (d) Ratio

36. The same coffee shop later repeats the study but this time they ask “Do you prefer coffee to be a lot colder, a
little cooler, this temperature, a little warmer or a lot hotter?” and record the persons response. Now, what is
the level of measurement of the way they measured preferred temperature?

(a) Nominal (b) Ordinal (c) Interval (d) Ratio

37. Determine the characteristics of a Normal Curve.


I. The normal curve is bell-shaped and symmetric about the mean.
II. The mean, median and mode are not equal.
III. The total area under the curve is equal to one.
IV. The normal curve approaches, but never touches the x-axis as it extends farther and farther away from the
mean.

(a) I, II and III (b) I, II, III and IV (c) II, III and IV (d) I, III and IV

38. Given a normally distribution, find the area under the curve which lies to the right of z = 1.96.

Page 4
(a) 0.9750 (b) 0.0196 (c) 0.4750 (d) 0.0250

For items 56 to 60, consider this situation. A researcher has collected the following sample data. 5, 12, 6, 8, 5,
6, 7, 5, 12, 4
39. Find the median.

(a) 5 (b) 6 (c) 7 (d) 8

40. Find the mode.

(a) 5 (b) 6 (c) 7 (d) 8

41. Find the mean.

(a) 5 (b) 6 (c) 7 (d) 8

42. Find the standard deviation.

(a) 1.2 (b) 2.2 (c) 3.2 (d) 4.2

43. Find the Pearson coefficient of skewness using the value of median.

(a) 1.2 (b) 2.2 (c) 3.2 (d) 4.2

Problem Solving
A. The PUPCET scores for the math portion of the test were normally distributed, with a mean of 23.4 and a
standard deviation of 4.8. Find the probability that a randomly selected student who took the math portion
of the PUPCET has a score that is
(a) less than 18.

(b) between 21 and 26.

B. Given the following frequency distribution.

Class Interval Frequency


240 - 259 5
220 - 239 5
200 - 219 12
180 - 199 13
160 - 179 5
140 - 159 10

Compute the following:


(a) Mean
(b) Median
(c) Mode
(d) Standard Deviation
(e) Q1
(f) Q3

Page 5
(g) D1
(h) D9
(i) P10
(j) P90
(k) Karl Pearsons Measure of Skewness
(l) Kurtosis
C. Construct a frequency distribution table.

No. of Children Frequency Percentage (%)


0
1
2
3
4
5
Total

(a) What percentage of couples married seven years has two children?

(b) What percentage of couples married seven years has at least two children?

Page 6
Republic of the Philippines
Polytechnic University of the Philippines

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION


FINAL EXAMINATION

Name: Course & Section:

Directions: Read each item carefully. Write the letter corresponding to the best answer on a yellow paper on each
item. Write NONE if no correct choice is given. Make sure to write also your solutions.

1. Which of the following is a alternative hypothesis?


(a) There will be a significant difference between the length of time taken to complete a test online and the
time taken to complete a test on paper.
(b) There is no significant factors.
(c) There will be no difference between the length of time taken to complete tests online and tests completed
on paper, and if there is it is due to chance.
(d) None of the above
2. The alternative hypothesis of F-test is .

(a) Equal variances assumed (c) Data follows a Normal Distribution


(b) Equal variances Not assumed (d) Data does not follows a Normal Distribution

3. The two forms of t-tests are

(a) One-way and two-way (c) Chi-square - Independent


(b) Independent and dependent (d) Pearson r and chi-square

4. If a researcher conducts a study in which the reading ability of a class of 20 second graders is tested at the
beginning and at the end of the year, the appropriate statistical procedure to analyze the results would be

(a) One-way ANOVA (c) Dependent sample t - test


(b) Independent sample t - test (d) Pearson r

5. Suppose a researcher is conducting a study in which five groups of adults, each group having a distinct life
situation, are assessed on a measure of stress. The appropriate statistical procedure to compare the groups is
a(n)

(a) One-way ANOVA (c) Dependent sample t - test


(b) Independent sample t - test (d) Pearson r

6. When the value of x variable increases and the value of y variable also increases. It is known as .

(a) No Relationship (c) Inverse Relationship


(b) Direct Relationship (d) None of the above

7. If the computed correlation coefficient of two continuous variables is 0.967, then describe the relationship.
(a) Weak Negative and Inverse Relationship
(b) Strong Negative and Inverse Relationship
(c) Strong Positive and Direct Relationship
(d) Weak Positive and Direct Relationship
8. If the computed value for Pearson r is negative, this implies that there is a/an relationship between
variables x and y.

(a) No Relationship (c) Inverse Relationship


(b) Direct Relationship (d) Undefined

9. You find children who take vitamins have higher health index scores than children who do not take vitamins
(p < 0.05). You have found that these two groups of children are
(a) significantly different
(b) different because of chance
(c) positively correlated
(d) negatively correlated
10. A conclusion in a research on Science Teaching in selected Quezon City high schools states, Most schools are
lack of adequate facilities. Which of the following is a proper recommendation for this conclusion?
(a) School administrators should be pro-active and skillful in acquiring adequate facilities.
(b) School administrators should conduct Science achievement tests that are centralized and uniform
(c) School administrators should hire more competent Science teachers for proper handling of the facilities.
(d) School administrators should work on the revision of the Science curricula so that lessons may adapt with
the facilities.
11. Which of the following is a positive correlation?
(a) Gas mileage decreases as vehicle weight increases
(b) As study time decreases, students achieve lower grades
(c) As levels of self-esteem decline, levels of depression increase
(d) People who exercise regularly are less likely to be obese
12. A friend of mine studies the effects of praise on happiness. She believes that children who receive praise are
happier overall than children who do not receive praise. She measures happiness by counting the number of
times a child smiles in a one hour period. She knows that in the population of children who do not receive praise
smiles average 4 times per hour with a standard deviation of .5, and that these data are normally distributed.
She selects a sample of 100 children whom she knows receive praise and finds that they smile an average of 3.5
times per hour.
An appropriate null hypothesis for this study is:
(a) Children who receive praise smile more than children who do not.
(b) Children who receive praise smile the same amount as children who do not.
(c) Children who receive praise are happier than children who do not.
(d) Children who receive praise do not smile more than children who do not.
13. What is the criterion for rejecting the null hypothesis using p value approach?
(a) If p value is less than or equal to the level of significance retain Ho, otherwise Reject Ho.
(b) If p value is less than or equal to the level of significance reject Ho, otherwise retain Ho.
(c) If p value is greater than or equal to the level of significance reject Ho, otherwise retain Ho.
(d) If p value is greater than or equal to the level of significance retain Ho, otherwise Reject Ho.
14. The alternative hypothesis of Shapiro wilk test is .

Page 2
(a) Equal variances assumed (c) Data follows a Normal Distribution
(b) Equal variances Not assumed (d) Data does not follows a Normal Distribution

15. An inspector needs to learn if customers are getting fewer ounces of a soft drink than the 28 ounces stated on
the label. After she collects data from a sample of bottles, she is going to conduct a test of a hypothesis. She
should use
(a) A two tailed test.
(b) A one tailed test with an alternative to the right.
(c) A one tailed test with an alternative to the left.
(d) Either a one or a two tailed test because they are equivalent.
16. A hypothesis test is done in which the alternative hypothesis is that more than 10% of a population is left-
handed. The computed p value is 0.25. Which statement is correct?

(a) We can conclude that more than 10% of the population is left-handed.
(b) We can conclude that more than 25% of the population is left-handed.
(c) We can conclude that exactly 25% of the population is left-handed.
(d) We cannot conclude that more than 10% of the population is left-handed.

17. If there is a negative correlation between no. of absences students have and grades. What can we conclude
from this research finding?
(a) That being absent leads to lower grades
(b) That students that are absent more often are likely to have lower grades
(c) That low grades leads to people being absent
(d) That this is an illusory correlation
18. It is a procedure on sample evidence and probability, used to test claims regarding a characteristic of one or
more populations.

(a) Parametric Statistics (c) Hypothesis


(b) Non-Parametric Statistics (d) Hypothesis Testing

19. If the computed p-value is 0.0001 and the level of significance is 0.01, what do you think will be the decision
of the researcher?

(a) Reject Ho (c) Reject Ha


(b) Failed to Reject Ho (d) Failed to Reject Ha

20. Which of the following statistical test is not used for testing significant difference?

(a) One-way ANOVA (c) Dependent sample t - test


(b) Independent sample t - test (d) Pearson r

Problem Solving

A. The ACT is a college entrance exam. ACT has determined that a score of 22 on the mathematics portion of
the ACT suggests that a student is ready for college-level mathematics. To achieve this goal, ACT recommends that
students take a core curriculum of math courses: Algebra I, Algebra II, and Geometry. Suppose a random sample
of 200 students who completed this core set of courses results in a mean ACT math score of 22.6 with a standard
deviation of 3.9. Do these results suggest that students who complete the core curriculum are ready for college-level
mathematics? That is, are they scoring above 22 on the math portion of the ACT?

Page 3
1. State the appropriate null and alternative hypotheses.

2. If p - value is 0.001, write your decision and conclusion.

B. A corporation owns a chain of several hundred gasoline stations on the eastern seaboard. The marketing
director wants to test a proposed marketing campaign by running ads on some local television stations and deter-
mining whether gasoline sales at a sample of the companys stations increase after the advertising. The following
data represent gasoline sales for a day before and a day after the advertising campaign. Determine whether sales
increased significantly after the advertising campaign. Use an alpha of 0.05.

Station Before After


1 10,500 12,600
2 8,870 10,660
3 12,300 11,890
4 10,510 14,630
5 5,570 8,580
6 9,150 10,115
7 11,980 14,350
8 6,740 6,900
9 7,340 8,890
10 13,400 16,540
11 12,200 11,300
12 10,570 13,330
13 9,880 9,990
14 12,100 14,050
15 9000 9,500
16 11,800 12,450
17 10500 13,450

1. Step 1:

2. Step 2:

3. Step 3:
Check the assumptions.

4. Step 4:

5. Step 5:

6. Step 6:

Page 4

You might also like