Professional Documents
Culture Documents
Statistical Analysis With Software Applicationpdf
Statistical Analysis With Software Applicationpdf
INTRODUCTION TO THE
Statistics plays a major role in many aspects of our
lives. It is used in sports, for example, to help a
general manager decide which player might be the
STATISTICAL best fit for a team. It is used in politics to help
candidates understand how the public feels about
• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on empirical
evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
convert massive data into pertinent Definitions:
information that can be used in decision
• Universe is the set of all entities under
making.
study.
• Statistics can provide us information that we
• A Population is the total or entire group of
can use to make sensible decisions.
individuals or observations from which
What information is referred to in the information is desired by a researcher. Apart
definition? from persons, a population may consist of
mosquitoes, villages, institution, etc.
The information referred to the definition is the
data. According to the Merriam Webster • An individual is a person or object that is a
dictionary, data are “factual information used member of the population being studied.
as a basis for reasoning, discussion, or
• A statistic is a numerical summary of a
calculation”.
sample.
Data can be numerical, as in height, or
• Sample is the subset of the population.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Descriptive statistics consist of organizing
and summarizing data. Descriptive statistics
Field of Statistics
describe data through numerical summaries,
A. Mathematical Statistics- The study and tables, and graphs.
development of statistical theory and methods
• Inferential statistics uses methods that
in the abstract.
take a result from a sample, extend it to the
B. Applied Statistics- The application of population, and measure the reliability of the
statistical methods to solve real problems result.
involving randomly generated data and the
• A parameter is a numerical summary of a
development of new statistical methodology
population
motivated by real problems. Example branches
of Applied Statistics: psychometric, Example: Consider the Scenario.
econometrics, and biostatistics.
You are walking down the street and notice
Limitation of Statistics that a person walking in front of you drops
Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100
qualitative phenomenon. except you. Since you could keep the money
without anyone knowing, would you keep the
2. Statistics does not study individuals. money or return it to the owner?
In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generalization contains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential
of his soybean crop. He randomly sampled Statistics)
100 plants and weighted the soybeans on
each plant. 3. Janine wants to determine the variability of
her six exam scores in Algebra.
Population: Entire soybean crop (Descriptive Statistics)
Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.
For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)
- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Both interval and ratio data involve B. ______________________________
measurement. Most data analysis techniques
that apply to ratio data also apply to interval 2. Every year the PSA releases the Current
data..Therefore, in most practical aspects, Population Report based on a survey of
these types of data (interval and ratio) are 50,000 households. The goal of this report
grouped under metric data. In some other is to learn the demographic characteristics,
instances, these type of data are also known such as income, of all households within
as numerical discrete and numerical the Philippines.
continuous.
A. ______________________________
Example:
B. ______________________________
Categorize each of the following as nominal,
ordinal, interval or ratio measurement. 3. Researchers want to determine whether or
not higher folate intake is associated with a
1. Ranking of college athletic teams.
lower risk of hypertension (high blood
(Ordinal)
pressure) in women (27 to 44 years of
2. Employee number. (Nominal) age). To make this determination, they look
at 7373 cases of hypertension in these
3. Number of vehicles registered. (Ratio) women and find that those who consume
at least 1000 micrograms per day of total
4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension
compared with those who consume less
5. Number of car passers along C5 on a
than 200.
given day. (Ratio)
A. ______________________________
6. Zip code (Nominal)
B. ______________________________
7. Degree of pain (Ordinal)
II. Indicate whether the following statements
ACTIVITIES/ASSESSMENTS:
require the use of descriptive or inferential
Read each item carefully. Write the answer statistics.
on the yellow paper. Answers Only.
______________1. A teacher wants to know
I. A research objective is presented. For the attitudes of all students towards abortion.
each, identify the (A) population and (B)
______________2. A market analyst of a sales
sample in the study.
firm draws a chart showing the sales figures of
8. A polling organization contacts 2141 male a given product for the period 2006-2007.
university graduates who have a white-
______________3. A forecaster predicts the
collar job and asks whether or not they had
results of an election using the number of
received a raise at work during the past 4
votes cast in 15 out of 25 barangays.
months.
______________4. Men are better in math
A. ______________________________
than women.
_____________5. Forty percent of the ______________10. Brands of soft drinks
employees of an organization were recorded
tardy for at least 15 working days. ______________11. Socioeconomic status
7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.
9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.
An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
Second survey asked: “A recent television - Unrealistic Controlled Environments
show examined various theories about Elvis
- Inability to Control for All Variables
Presley’s death. Do you think it is possible that
Elvis is alive or not?” 5. Observation is a technique that involves
systematically selecting, watching and
8% of the respondents to the first question said
recoding behaviors of people or other
it is possible that Elvis is still alive and 16% of
phenomena and aspects of the setting in which
respondents to the second question said it is
they occur, for the purpose of getting (gaining)
possible that Elvis is still alive.
specified information. It includes all methods
3. A focus group is a group interview of from simple visual observations to the use of
approximately six to twelve people who share high level machines and measurements,
similar characteristics or common interests. A sophisticated equipment or facilities such as:
facilitator guides the group based on a
- Radiographic
predetermined set of topics.
- biochemical
4. Experiment is a method of collecting data
where there is direct human intervention on the - X-ray machines
conditions that may affect the values of the
- Microscope
variable of interest.
- Clinical examinations
Bear in mind that the experimental method has
several limitations that you should be aware of. - Microbiological examinations
The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.
SAMPLE SIZE
3. Degree of Variability
( e )
Three criteria need to be specified to
Zσ
determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
The sample size required to obtain a 2
1 Z
4 (e)
confidence interval for p with specified margin n≥ ≈ 385
of error e is given by
2 Where:
(e)
Z
n≥ p(1 − p)
Confidence level is 95%.
N
n≥
1 + Ne 2
Where:
Example:
The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥
n −1
Some definitions are needed to make the
1+ o notion of a good sample more precise.
N
Definitions: - Deliberately or purposively selecting a
“representative” sample.
• Observation unit - An object on which a Misspecifying the target population.
measurement is taken. This is the basic unit Failing to include all of the target population
of observation, sometimes called an element. in the sampling frame, called
In studying human populations, observation undercoverage.
units are often individuals. Including population units in the sampling
frame that are not in the target population,
• Target population - The complete collection
called overcoverage.
of observations we want to study.
- Having multiplicity of listings in the sampling
• Sampled population - The collection of all
frame.
possible observation units that might have
Substituting a convenient member of a
been chosen in a sample; the population
population for a designated member who is
from which the sample was taken.
not readily available.
• Sample - A subset of a population.
- Failing to obtain responses from all of the
• Sampling unit - A unit that can be selected chosen sample. (Nonresponse)
for a sample. We may want to study
- Allowing the sample to consist entirely of
individuals, but do not have a list of all
volunteers.
individuals in the target population. Instead,
households serve as the sampling units, and Advantage of Sampling Over Complete
the observation units are the individuals Enumeration
living in the households.
- Less Labor
• Sampling frame - A list, map, or other
specification of sampling units in the - Reduced Cost
population from which a sample may be - Greater Speed
selected. For a survey using in-person
interviews, the sampling frame might be a list - Greater Scope
of all street addresses.
- Greater Efficiency and Accuracy
• Sampling technique/Sampling Strategies - - Convenience
It is a plan you set forth to be sure that the
sample you use in your research study - Ethical Considerations
represents the population from which you
Two Type of Samples
drew your sample.
1. Probability Sample
• Sampling Bias - This involves problems in
your sampling, which reveals that your - Samples are obtained using some objective
sample is not representative of your chance mechanism, thus involving
population. randomization.
The following examples indicate some ways in
which selection bias can occur:
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.
Sampling Procedure
N PopulationSize
k= =
n SampleSize
Given:
50
(N) ( 500 )
n
n1 = N1 = 200 = 20
50
(N) ( 500 )
n
n2 = N2 = 300 = 30
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.
Example:
Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
• Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS:
accordance with an expert’s judgment.
I. Determine if the source would be a primary
Cases wherein Non-Probability Sampling is or a secondary source.
Useful
______________1. Government Records
- Only few are willing to be interviewed
______________2. Dictionary
- Extreme difficulties in locating or identifying
subjects ______________3. Artifact
REFERENCES:
https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
✦ Distinguish the three main forms of data presentation.
✦ Know the different parts of the table.
✦ Choose appropriate diagrams/graphs to present a given set of
data.
✦ Organize qualitative and quantitative data in tables.
✦ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
✦ Describe the shape of a distribution.
✦ Identify regions under the normal curve corresponding to
different standard normal values.
✦ Compute probabilities using the standard normal table and Excel.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Data Presentation
Data are usually collected in a raw format and thus
the inherent information is difficult to understand.
Therefore, raw data need to be summarized,
processed, and analyzed to usefully derive
information from them. However, no matter how well
manipulated, the information derived from the raw
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
and readers. Planning how the data will be presented
is essential before appropriately processing raw data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Presentation of Data
Presentation of data refers to an exhibition
or putting up data in an attractive and useful
manner such that it can be easily interpreted.
The three main forms of presentation of data
are:
Textual Presentation
Tabular Presentation
Graphical Presentation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Textual Presentation
• All the data is presented in the form of text,
phrases, or paragraphs.
• It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.
• Text is the principal method for explaining
findings, outlining trends, and providing
contextual information.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A researcher is asked to present the performance of a section in
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
The data presented in textual form would be like this:
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above,
while only 3 got 19 and below. Generally, the students
performed well in the test with 23 or 70% getting a passing
score of 38 and above.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
✦ Keep your paragraphs simple and short.
Advantage of Tabular
Presentation
✦ More information may be presented.
✦ Exact values can be read from a table to
retain precision.
✦ Flexibility is maintained without
distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Preparing Tables
The making of a compact table itself is an art. This should
contain all the information needed within the smallest possible
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the
table. It should answer the questions:
✦ Who? White females with breast cancer, black males with
lung cancer.
✦ What are the data? Counts, percentage distributions, rates.
https://byjus.com/commerce/tabular-presentation-of-data/
Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
1. Set an interval or range for your data. It is
needed for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click
“DATA ANALYSIS” on the tool bar
3. The dialog box “DATA ANALYSIS” will appear
and choose “HISTOGRAM” on the dialog box
then click OK.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW”
then click “OK”.
7. The result will appear on the new worksheet of
the excel file. Get the Percentage and total.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Answer:
✦ Useless Information – Don’t show decimals if they are not
needed.
✦ Poor Alignment – Make sure alignment makes sense.
• Don’t center numbers, always right justify – try to align
decimal points.
• Consider the appropriate placement of row titles.
✦ Difficult to Read – Use commas used when the number exceeds
a thousand.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Graphical Presentation
✦ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
✦ It is considered an important medium of communication
because we are able to create a pictorial representation of
the numerical figures.
✦ Suited when we need to show the results of the study to
nonprofessionals and or people who dislike numbers and too
lengthy texts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Bar Graph
✦ It is constructed by labeling each category
of data on either the horizontal or vertical
axis and the frequency or relative frequency
of the category on the other axis. Rectangles
of equal width are drawn for each category.
The height of each rectangle represents the
category’s frequency or relative frequency.
✦ It is use to organize discrete data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Bar graphs may also be drawn with horizontal
bars. Horizontal bars are preferable when
category names are lengthy.
• In bar graphs, the order of the categories does
not usually matter. However, bar graphs that
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Histogram
✦ It is constructed by drawing rectangles for each class of
data. The height of each rectangle is the frequency or
relative frequency of the class. The width of each rectangle
is the same and the rectangles touch each other.
✦ It is a graph used to present quantitative data, is similar to
the bar graph.
✦ It is use to organize continuous data.
Line Graph
✦ A graph that shows information that is
connected in some way (such as change over
time)
✦ Line segments are then drawn connecting the
points. It is use to organize continuous data.
✦ Very useful in identifying trends in the data
over time.
✦ It is rigidly defined.
where: where:
∑i=1 xi ∑i=1 fxi
xi = data values n xi = data values r
n = no. of
x̄ = f = frequency x̄ =
sample n n = no. of n
observations sample
observations
Population Mean
where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
xi = data values
N = no. of μ= f = frequency
μ=
observations N N
N = no. of
observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(2 )
1. Arrange the data from n
− < cf i
lowest to highest (or highest
x̃ = LB +
to lowest). f
where:
2. For an odd number of LB = lower boundary of the
data, the median of a data median class
set is the “middle i = class width
observation”. When the n = no. of observations
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
the two middle scores”. preceding the median class
f = frequency of the median
Polytechnic University of the Philippines
class
College of Science
Department of Mathematics and Statistics
Measures of Central Tendency:
MODE
• It is the most frequently occurring value in a list of data.
• It is sometimes called nominal average.
• It is an appropriate measure of average for data using the
nominal scale of measurement.
• It is the only measure of central tendency used in both
quantitative and qualitative data.
Advantage of Mode
✦ The mode is easy to understand.
✦ Like the median, it is not greatly affected by extreme
values.
✦ Like the median, it can be computed even when the
frequency distribution contains “open-ended” intervals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
( d1 + d2 )
d1
1.Obtain a frequency x ̂ = LB + i
distribution of the distinct
values of the data. where:
LB = lower boundary of the
2.The mode is the most modal class
i = class width
frequently occurring data
d1 = difference between the
(if there is one).
frequency of the modal class
and the class preceding it
d2 = difference between the
frequency of the modal class
and the class following it
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Whenever you hear the word average, be aware that
the word may not always be referring to the mean.
One average could be used to support one position,
while another average could be used to support a
different position.
• Mode is not always present in the data sets unlike
mean and median.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
than 120. Therefore, the mean should not be used when
unusual, or outlying, data values are present in the data set, as
the mean tends to be extremely sensitive to the unusual
values. Rather, the median should be reported in this case.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution:
To compute mean of grouped data, first you need to
fill out this table.
Class Frequency
x fx
Interval (f)
55 - 59 3
It is the midpoint of
50 - 54 6 every class interval.
45 - 49 7
To compute this:
LC + UP
40 - 44 9
x=
35 - 39 6
30 - 34 4
2
25 - 29 5 Ex:
7 55 + 59
fxi = x= = 57
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines
2
College of Science
Department of Mathematics and Statistics
Solution:
7
∑i=1 fxi
Frequency
Class Interval x fx
x̄ =
(f)
55 - 59 3 57 171
50 - 54 6 52 312 n
1,675
45 - 49 7 47 329
=
40 - 44 9 42 378
40
35 - 39 6 37 222
30 - 34 4 32 128
= 41.88
25 - 29 5 27 135
7
fxi = 1,675
Total n = 40 ∑
i=1
Solution:
Class n
First, compute , it will help us to
2
f LB < cf
Interval
55 - 59 3 54.5 40 determine the median class and the
50 - 54 6 49.5 37 < cf.
n 40
= = 20
45 - 49 7 44.5 31
40 - 44 9 39.5 24 2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The median class is the class
25 - 29 5 24.5 5 containing the 20th item. Hence, the
Total n = 40 median class is 40 - 44.
(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
d1 = 9 − 6 = 3
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Quartiles - split
the ordered data
into four quarters.
Percentiles - split
the ordered data
into 100 equal
parts.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Formula for Quartile:
✦ For Ungrouped Data ✦ For Grouped Data
(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
nk
Qclass = + 0.5
where:
4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical
k = quartile position
observation corresponding
< cf = less than the cumulative
to that point is chosen for
frequency of the class
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
( 10 )
1. Arrange the data from nk
lowest to highest. Then use − < cf i
this formula. Dk = LB +
f
nk
Dclass = + 0.5 where:
10 LB = lower boundary of the
2. If the resulting decile class
positioning point is an i = class width
integer, the particular n = no. of observations
numerical observation k = decile position
corresponding to that point < cf = less than the cumulative
is chosen for the decile.If frequency of the class
preceding the decile class
not, use interpolation.
Polytechnic University of the Philippines
f = frequency of the decile class
College of Science
Department of Mathematics and Statistics
( 100 )
1. Arrange the data from nk
− < cf i
lowest to highest. Then use
this formula. Pk = LB +
f
nk
Pclass = + 0.5 where:
100 LB = lower boundary of the
2. If the resulting percentile class
positioning point is an i = class width
n = no. of observations
integer, the particular
k = percentile position
numerical observation
< cf = less than the cumulative
corresponding to that point
frequency of the class
is chosen for the percentile. preceding the percentile class
If not, use interpolation. f = frequency of the percentile
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
class
Example 1:
The data given below is the total number of hours
lost due to tardiness and absences of employees in a
company in a given year.
Month Hour Lost (x)
Find Q3, D4 and P55. January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(12)(3)
Qclass = = 9.5
4
2. Use interpolation since the computed Qclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Q3 = 40 + 0.5(42 − 40)
= 41
D4 = 30 + 0.3(32 − 30)
= 30.6
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: To compute P55 of ungrouped data:
(12)(55)
Pclass = + 0.5 = 7.1
100
2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Example 2:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute Q1, D7, and
P10.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute Q1, D7, and P10 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
55 - 59 3 54.5 40 determine the quartile class and the
50 - 54 6 49.5 37
< cf. nk (40)(1)
= = 10
45 - 49 7 44.5 31
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The quartile class is the class
25 - 29 5 24.5 5 containing the 10th item. Hence, the
Total n = 40 quartile class is 35 - 39.
(4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
55 - 59 3 54.5 40 determine the decile class and the
50 - 54 6 49.5 37
< cf. nk (40)(7)
= = 28
45 - 49 7 44.5 31
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The decile class is the class
25 - 29 5 24.5 5 containing the 28 item. Hence, the
Total n = 40 decile class is 45 - 49.
( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7
( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f
Example 2:
The ages of the town’s people in a certain community
is as follows:
Class Interval Frequency
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3
Solution:
To compute Q2, D5, and P50 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
subtract 0.5 to lower
32 - 38 38
class limit (LC).
39 - 45 20
Ex:
18 − 0.5 = 17.5
46 - 52 17
53 - 59 3
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
18 - 24 28 17.5 28 determine the quartile class and the
nk (160)(2)
25 - 31 54 24.5 82 < cf.
= = 80
4 4
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The quartile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 quartile class is 25 - 31.
(4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
18 - 24 28 17.5 28 determine the decile class and the
< cf. (160)(5)
25 - 31 54 24.5 82
nk
= = 80
10 10
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31.
( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54
( 100 )
nk
− < cf i (80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54
Sample Interpretation:
1. Jennifer just received the results of her SAT exam. Her
SAT Mathematics score of 600 is in the 74th percentile. What
does this mean?
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.
Measures of Dispersion/Variability
Based on the figure below, determine which between the
two scatter diagram illustrate larger variability?
Figure 1 Figure 2
Measures of Dispersion/Variability:
STANDARD DEVIATION
• It is a measure of how far away items in a data set are from
the mean.
• The larger the standard deviation, the more variation there
is in the data set.
• The standard deviation can never be a negative number,
due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
• The smallest possible value for the standard deviation is 0,
and that happens only in contrived situations where every
single number in the data set is exactly the same (no
deviation).
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
where: where:
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s =
n−1 values s = n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency
n = no. of sample observations
Population Standard Deviation
where: where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
N = no. of observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Measures of Dispersion/Variability:
VARIANCE
It represents all data points in a set and is calculated
by averaging the squared deviation of each mean.
Example 1:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute sample
standard deviation and sample variance.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Solution: 7
∑i=1 f(xi − x̄)2
s=
n−1
Class
(xi − x̄) 2
f(xi − x̄) 2
3,124.20
Interval
55 - 59 228.61 685.83 s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86
s2 =
30 - 34 97.61 390.44
25 - 29 221.41 1107.05
n−1
7
f(xi − x̄)2 = 3,124.20
Total
∑
3,124.20 s2 =
40 − 1
i=1
= 80.11
Shape of Distribution
These two statistics give you insights into the shape of
the distribution.
✦ Skewness is the degree of distortion from the
symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.
3(x̄ − x̃)
where:
x̄ is the mean Sk =
x̃ is the median
s
s is the sample standard deviation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Kurtosis
It is actually the measure of outliers present in the
distribution. The outliers in a sample, therefore, have
even more effect on the kurtosis than they do on the
skewness.
Higher kurtosis means more of the variance is the
result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
the tails that mostly account for kurtosis, not the
central peak.
The kurtosis decreases as the tails become lighter. It
increases as the tails become heavier.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Normal Curve
50 100 150
The red curve is a model called the normal curve ,
which is used to describe continuous random variables
that are said to be normally distributed.
A continuous random variable is normally distributed,
or has a normal probability distribution, if its relative
frequency histogram has the shape of a normal curve.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
and μ + σ.
Mean:
✦ Changing the mean shifts the entire
curve left or right on the X-axis.
Standard Deviation:
✦ Changing the standard deviation
either tightens or spreads out the
μ1 < μ2, σ1 = σ2
width of the distribution along the X-
axis.
Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
A. C.
B. D.
Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.
= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Patterns for Finding Areas under a Standard Normal Curve
Using Table 1
D. Area to the right of a positive z value or to the left of a
negative z value.
= -
0 z1 0 0 z1
Area = 1
= -
0 z1 0 z1 0
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
A. Area to the right of a positive z value or to the left of a
negative z value.
Use Table 2 directly
z1 0 0 z1
B. Area between z values on same side of 0.
= -
z1 z2 0 z1 0 z2
= +
z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
D. Area to the right of a negative z value or to the left of a
positive z value.
= +
z1 0 z1 0 0
0.50 − Area Area = 0.50
E. Area between a given z value and 0.
= -
0 z1 0 0 z1
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example 1:
Scores on a standardized college entrance examination (CEE)
are normally distributed with mean 510 and standard
deviation 60. A selective university considers for admission
only applicants with CEE scores over 560. Find proportion of
all individuals who took the CEE who meet the university's
CEE requirement for consideration for admission.
Solution:
Given: μ = 510,σ = 60 and x = 560
Area = P(X > 560)
Step 1: Draw a normal curve and
shade the desired area.
X
450 510 570
Polytechnic University of the Philippines
560
College of Science
Department of Mathematics and Statistics
Example 2:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ = 38.72,σ = 3.17 and x = 35
Step 1: Draw a normal curve and shade
the desired area.
Area = P(X < 35)
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35
College of Science
Department of Mathematics and Statistics
−1.17
Use “TRUE”
for cumulative
since we want
the area under
the normal
curve.
Example 3:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the probability that a randomly selected three-year-
old girl is between 35 and 40 inches tall, inclusive.
Solution:
Given: μ = 38.72,σ = 3.17, and 35 ≤ X ≤ 40
Area = P(35 ≤ X ≤ 40)
Step 1: Draw a normal curve and
shade the desired area.
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35 40
College of Science
Department of Mathematics and Statistics
Using Table 1 By-hand Approach!
Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
( 3.17 3.17 )
=P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.6554 − 0.1210
= 0.5344
The probability a randomly
selected three-year-old female
is between 35 and 40 inches tall X
−2 −1 0 1 2
is 0.5344.
−1.17 0.40
X
−2 −1 0 1 2
Polytechnic University of the Philippines −1.17 0.40
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
2. What features
of the ‘Good
Presentation’
make it better
than the ‘Bad
Presentation’?
A.
B.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. Review the table and consider questions such as the
following.
Needs
Origin / Rating Poor Satisfactory V Good Excellent Total
Improvement
External 0% 2% 12% 19% 9% 41%
Internal 4% 8% 15% 23% 9% 59%
Grand Total 4% 10% 27% 41% 17% 100%
1. What percentage of the employees originated from within the
organization?
2. What percentage of the employees are both internal and rated
‘Very Good’?
3. What percentage of the employees received ‘Needs Improvement’
or ‘Poor’?
4. What category contains the greatest number of employees?
5. Do you see any notable differences in the percentage by category?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of
Salaries.
Salary Frequency Percentage
41,000 - 50,000 1 1%
51,000 - 60,000 20 13%
61,000 - 70,000 53 35%
71,000 - 80,000 43 29%
81,000 - 90,000 26 17%
91,000 - 100,000 6 4%
101,000 - 110,000 1 1%
Total 150 100%
1.What percentage of the employees earns less than or
equal 80,000?
2.What is the salary range of values?
3.What salary categories have percentage less than 5?
4.What salary category includes the most employees?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability that an instrument produced by this machine will last
A. less than 7 months.
B. between 7 and 12 months.
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
6. The lengths of human pregnancies are approximately normally distributed,
with mean μ = 266 days and standard deviation σ = 16 days.
What proportion of pregnancies lasts more than 270 days?
B. What proportion of pregnancies lasts less than 250 days?
C. What proportion of pregnancies lasts between 240 and 280 days?
D. What is the probability that a randomly selected pregnancy?
lasts more than 280 days?
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the
scores of 75 randomly selected students.
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
A. Based on the frequency distribution, compute measures of
central tendency, measures of variation, Q1, D9, P10 , Skewness
and kurtosis.
B. Based on the raw data, compute measures of central
tendency, measures of variation, Skewness and kurtosis using
Excel.
C. Compute Skewness and kurtosis of grouped and ungrouped
data. Make sure to describe the shape of the distribution
D. Do you think that computed value for grouped and
ungrouped data are the same?
8. Begin with the following set of data, call it Data Set I.
5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
A. Compute the sample standard deviation and sample mean of
Data Set I.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation
and sample mean of Data Set II.
C. Form a new data set, Data Set III, by subtracting 6 from
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III.
D. Comparing the answers to parts (a), (b), and (c), can you
guess the pattern? State the general principle that you expect
to be true.
References
https://prezi.com/rirrca9ckuiz/textual-
presentation-of-data/
https://www.toppr.com/guides/economics/
presentation-of-data/textual-and-tabular-
presentation-of-data/
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
What is HYPOTHESIS?
•A statement or claim regarding a characteristic of
one or more populations.
•A preconceived idea, assumed to be true but has to
be tested for its truth or falsity.
Reminders:
If you are conducting a research study and you want
to use a hypothesis test to support your claim, the
claim must be stated in such a way that it becomes
the alternative hypothesis, so it cannot contain the
condition of equality.
✦ Right tailed
Example:
H0: The defendant is innocent.
Ha: The defendant is not innocent.
Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.
Reminders:
It is important to note that we want to set
( α ) before we start our study because the
Type I error is the more ‘grevious’ error to
make.
The smaller (α ) is, the smaller the region
of rejection.
Decision Rule:
✦ Using Confidence Interval
Rejection of region
or critical region is
the set of all values of
the test statistic
which will lead to the
rejection of H0.
Acceptance Region is
the set of all values of
the test statistic that
leads the researcher to
retain H0.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
-2 0 2 -2 0 2
Two-tailed
Ha : μ1 ≠ μ2
Rejection Region
Rejection Region
STEP 1:
Rearrange
the data in
ascending
order.
Use "=DEVSQ( )”
function in excel
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
∑ i ( n+1−i
STEP 3: Calculate b as follows: b = a x − xi)
i=1
n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m=
2
Since n is even in this
example, m=8. That’s
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
why we used a1 to a8
Taking the ai weights from
the table of Shapiro -Wilk
Polytechnic University of the Philippines
College of Science
(based on the value of n)
Department of Mathematics and Statistics
We choose this
interval in the table of
Shapiro - Wilk,
because our n=16 and
our test statistic
(W=0.955) is within
Polytechnic University of the Philippines
this interval.
College of Science
Department of Mathematics and Statistics
Result
Inferential Statistics
1. Parametric Tests
✦ Assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met
so that the result of a parametric test is reliable.
✦ Apply to data in ratio scale, and some apply to data in
interval scale.
2. Non Parametric Test
✦ Refer to a statistical method in which the data is not
Example:
Determine whether the sample is independent or dependent.
1. An urban economist believes that commute times to
work in the South are less than commute times to work
in the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
Answer: Independent
2. In an experiment conducted in biology class, Prof.
Rhea measured the time required for 12 students to
catch a failing meter stick using their dominant hand
and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
Determine whether the sample is independent or
dependent.
3. A researcher wants to know if the mean
length of stay in for-profit hospitals is different
from the mean length of stay in not-for-profit
hospitals. He randomly selected 20 individuals in
the for-profit hospital and matched them with 20
individuals in the not-for-profit by diagnosis.
Answer:
Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Dependent Sample t - Test
The dependent sample t-test (also called
the paired t-test or paired-samples t-test)
compares the means of two related groups
to determine whether there is a statistically
significant difference between these
means.
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your dependent variable should be measured at
the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two
categorical, "related groups" or "matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A teacher is interested to know if the new learning program
will help to increase the number of correct remembered
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test.
After the first test all subjects
are instructed how to use the
learning program and then
learn a second list of 50 words.
Learning performance is again
measured with the recall test. In
the following table the number
of correct remembered words
are listed for both tests.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis: Ho : μ1 ≥ μ2
The new learning program will not help to increase
the number of correct remembered words.
Alternative hypothesis: Ha : μ1 < μ2
The new learning program will help to increase the
number of correct remembered words.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.05
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is sufficient evidence to support that the new
learning program help to increase the number of
correct remembered words.
Result
Example:
Researchers wanted to know whether there was a difference in
comprehension among students learning a computer program
based on the style of the text. They randomly divided 18
students into two groups of 9 each. The researchers verified
that the 18 students were similar in terms of educational level,
age, and so on. Group 1 individuals learned the software using
visual manual (multimodal
instruction), while Group 2
individual learned the software
using textual manual (Unimodal
instruction). The following data
represent scores the students
received on an exam given to them
they studied from the manuals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Determine if the
variances are equal
or not equal.
Failed to
Reject Ho
Since we failed to reject Ho, we will proceed to t-test: Two
Sample Assuming Equal Variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is no enough evidence to support that
there is a difference in comprehension among
students learning a computer program based on
the style of the text.
Proper Presentation of Results
Assumptions
1. Your dependent variable should be measured at the
interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each category of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A Researchers wanted to compare math test scores of
students at the end of secondary school from various cities.
Eight randomly selected students from Makati, Manila,
and Quezon City each were administered the same exam;
the results are presented in the following table. Can the
researchers conclude
that the distribution of
exam scores is different
for each city at the
level of significance?
Determine if the
variances are equal
or not equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is enough evidence to support that the
distribution of exam scores of students in
mathematics is different for each city.
Result
Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
X X X
r = -1 r = -.6 r =0
Y Y
r = .6 r=1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reminders:
• Correlation does not imply causation.
• Watch out for hidden (lurking) variables.
Lurking Variable
• A variable that is not included as an explanatory
or response variable in the analysis but can affect
the interpretation of relationships between
variables.
• Can falsely identify a strong relationship between
variables or it can hide the true relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your two variables should be measured at the
interval or ratio level (i.e., they are
continuous).
2. There is a linear relationship between your
two variables.
3. There should be no significant outliers.
4. Your variables should be approximately
normally distributed.
Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of Pearson r
Note:
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A dietetics student wanted to look at the
relationship between calcium intake and
knowledge about calcium in sports
science students. Table shows the data
she collected. Is there a relationship
between calcium intake and knowledge
about calcium in sports science
students?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
Exercises:
Apply the procedure in testing the hypothesis.
Result
Assumptions
1. There are 2 variables, and both are measured as
categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1 and
no more than 20% of all cells may have expected
frequencies < 5.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
1. A doctor who knows that hypertension depends
on smoking habits can tell his smoking patients what
they should do.
2. If the traffic condition (light, moderate, heavy,
standstill) is found to be dependent on vehicle plate
numbers (odd, even) a traffic officer may decide to
revise traffic law enforcement.
Reminders:
The word contingency refers to
dependence, but this is only a
statistical dependence and cannot be
used to establish a direct cause-and-
effect link between the two variables in
question.
Example:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part
of a non-statistics degree course (e.g., psychology).
With current technology, it is possible to present
how-to guides for statistical programs online
instead of in a book. However, different people
learn in different ways. An educator would like to
know whether gender (male/female) is associated
with the preferred type of learning medium (online
vs. books). Use “Data_Example and Exercises file”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
1. State the Null and Alternative
Hypothesis
Null hypothesis:
Gender is independent with the preferred type of
learning medium.
Alternative hypothesis:
Gender is dependent with the preferred type of
learning medium.
2. Set the Level of Significance or Alpha
Level (α)
α = 0.0.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Row Total
Grand Total
Column Total
6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results
Result
ACTIVITIES/ASSESSMENTS:
Determine whether the sampling is dependent or independent.
________1. A researcher wishes to compare academic
aptitudes of married mathematicians and their spouses. She
obtains a random sample of 287 such couples who take an
academic aptitude test and determines each spouses academic
aptitude.
________2. A political scientist wants to know how a random
sample of 18- to 25-year-olds feel about Democrats and
Republicans in Congress. She obtains a random sample of
1030 registered voters 18 to 25 years of age and asks, Do you
have favorable/unfavorable opinion of the Democratic/
Republican party? Each individual was asked to disclose his
or her opinion about each party.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
________3. An educator wants to determine whether a new
curriculum significantly improves standardized test scores for third
grade students. She randomly divides 80 third-graders into two
groups. Group 1 is taught using the new curriculum, while group 2 is
taught using the traditional curriculum. At the end of the school year,
both groups are given the standardized test and the mean scores are
compared.
________4. A stock analyst wants to know if there is difference
between the mean rate of return from energy stocks and that from
financial stocks. He randomly select 13 energy stocks and computes
the rate of return for the past year. He randomly selects 13 financial
stocks and compute the rate of return for the past year.
________5. An urban economist believes that commute times to work
in the South are less than commute times to work in the Midwest. He
randomly selects 40 employed individuals in the south and 45
employed individuals in the Midwest and determines their commute
times.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps
procedure.
1. A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements. The data are shown below.
I s t h e r e a s t a t i s t i c a l l y Normal Bone Osteopenia Osteoporosis
significant difference in mean Density
1200 1000 890
calcium intake in patients 1000 1100 650
with normal bone density as 980 700 1100
compared to patients with 900 800 900
osteopenia and osteoporosis? 750 500 400
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
800 700 350
ACTIVITIES/ASSESSMENTS:
2. Some studies have shown that in the United Men Women
(in $) (in $)
States, men spend more than women buying gifts
and cards on Valentine’s Day. Suppose a researcher 107.48 125.98
wants to test this hypothesis by randomly sampling 143.61 45.53
nine men and 10 women with comparable
demographic characteristics from various large cities 90.19 56.35
across the United States to be in a study. Each study 125.53 80.62
participant is asked to keep a log beginning one
70.7 46.37
month before Valentine’s Day and record all
purchases made for Valentine’s Day during that one- 83 44.34
month period. The resulting data are shown below.
129.63 75.21
Use these data and a 1% level of significance to test
to determine if, on average, men actually do spend 154.22 68.48
significantly more than women on Valentine’s Day.
93.8 85.82
Assume that such spending is normally distributed
in the population and that the population variances 126.11
are equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. A researcher is interested whether a training course increases
the teaching performance of the teachers who attended the
training courses. Test at 10% level of significance. The data are
shown below:
Case Before After Case Before After
1 85 95 11 89 97
2 84 98 12 87 98
3 86 97 13 82 95
4 87 92 14 81 95
5 89 96 15 86 92
6 82 93 16 89 91
7 80 94 17 89 94
8 84 95 18 84 95
9 86 90 19 85 96
10 82 82
Polytechnic University of the Philippines
20 88 97
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Head
4. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The following data represent the smoking status from a
random sample of 1054 U.S. residents 18 years or older by
level of education.
No. Of Years Smoking Status
of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51
References
h t t p s : / / w o l f w e b . u n r. e d u / h o m e p a g e / a n i a /
stat352f12lectures/352lecture21f12.pdf
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
http://www.real-statistics.com/tests-normality-
and-symmetry/statistical-tests-normality-
symmetry/shapiro-wilk-test/