You are on page 1of 98

DESCRIPTIVE STATISTICS

•Deals with the method of


organizing, summarizing and
presenting a mass of data so as to
yield meaningful information.
INFERENTIAL STATISTICS
•Taking a sample and analyzing
the sample to make judgment or
claims about a population.
POPULATION

•Total amount of things.


•Set of all individuals under
study.
SAMPLE

•Refers to a small part of the


population that is used for study.
DESCRIPTIVE STATISTICS

•A university professor wants to analyze


the performance of students in a
particular course over the past semester.
They collect data on the grades
obtained by students in the final exam.
INFERENTIAL STATISTICS
•In a small town of 10,000 residents, an
upcoming election for mayor is
generating significant interest. A local
news outlet decides to conduct a survey to
infer the preferences of the voters. They
randomly select 500 residents to
participate in the survey.
DESCRIPTIVE OR INFERENTIAL?
• 1. A retail company wants to assess the satisfaction level of
its customers regarding a new product line they recently
launched. They decide to conduct a survey among a sample
of 500 customers to infer the satisfaction level of the entire
customer base.

• 2.A financial analyst wants to understand the distribution of


monthly household expenses among families in a
neighborhood. They decide to collect data from a sample of
households over a one-month period.
ACTIVITY
IDENTIFY THE GIVEN
PROBLEM IF IT’S
DESCRIPTIVE OR
INFERENTIAL
STATISTICS AND FIND
ITS SAMPLE AND
POPULATION.
•1. A group of students taking
Statistics conducted a study on the
effect of boy-girl relationship to
the academic performance of the
students.
•2. From all students registered this
semester, the Mathematics Department
would like to know how many students
like mathematics.
•3. Information will be collected to
new voters from 2020 election to
identify their opinion regarding
politics in the Philippines.
•4. A supermarket chain wants to
analyze the sales data from the past year
to identify trends and make predictions
for future sales.
•5. A hospital is conducting a study to
evaluate the effectiveness of a new
treatment method for a specific medical
condition.
•6. A company wants to assess customer
satisfaction levels with their products and
services.
•7. A school district wants to evaluate the
impact of a new teaching method on student
performance in standardized tests.
QUIZ. ONE HALF CROSSWISE
Identification
A. Identify the word/s refer to the following:
1. It refers to a small part of the population that is used for study.
2. It refers to a characteristic of interest measurable on each and every
individual in the universe.
3. It refers to the collection and interpretation of data; use to measure and
analyze variability.
4. It refers to the set of all individuals under study.
5. It refers to taking a sample and analyzing the sample to make judgment
or claims about a population.
wants to analyze customer feedback to improve its service quality.
QUIZ. ONE HALF CROSSWISE
B. Identify the given problem if it’s descriptive or
inferential statistics and identify the population,
sample, and variable from the given data.
1. A company conducts a survey to assess employee
satisfaction and identify areas for improvement in the
workplace.
2. A restaurant wants to analyze customer feedback to
improve its service quality.
MEASURES OF
CENTRAL TENDENCY

34
• A measure of central tendency is a descriptive
statistic that describes the average, or typical value
of a set of scores
• There are three common measures of central
tendency:
• the mode
• the median
• the mean
35
THE MODE
•The mode is 6

the score that 4

Frequency
3
occurs most 2

frequently in a 1

set of data 75 80 85 90
Score on Exam 1
95

36
BIMODAL DISTRIBUTIONS
•When a 6

5
distribution has 4

Frequency
two “modes,” it 3

2
is called 1

bimodal 0
75 80 85 90 95
Score on Exam 1

37
MULTIMODAL DISTRIBUTIONS
•If a distribution 6

has more than 2 5

Frequency
“modes,” it is 3

called
2

multimodal 0
75 80 85 90 95
Score on Exam 1

38
WHEN TO USE THE MODE
• The mode is not a very useful measure of central tendency
• It is insensitive to large changes in the data set
• That is, two data sets that are very different from each other can have the same
mode
7 120

6 100

5
80
4
60
3
40
2

20 39
1

0 0
1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70 80 90 100
WHEN TO USE THE MODE
•The mode is primarily used with nominally scaled
data
• It is the only measure of central tendency that is
appropriate for nominally scaled data

40
THE MEDIAN
• The median is simply another name for the 50th
percentile
• It is the score in the middle; half of the scores are
larger than the median and half of the scores are
smaller than the median
41
HOW TO CALCULATE THE
MEDIAN
• Conceptually, it is easy to calculate the median
• Sort the data from highest to lowest
• Find the score in the middle
• middle = (N + 1) / 2
• If N, the number of scores, is even the median is the average of the
middle two scores

42
WHEN TO USE THE MEDIAN
•The median is often used when the
distribution of scores is either positively or
negatively skewed
• The few really large scores (positively skewed)
or really small scores (negatively skewed) will
not overly influence the median 43
THE MEAN
• the arithmetic average of all the scores (X)/N
• The mean of a population is represented by the
Greek letter ; the mean of a sample is represented
by X

44
WHEN TO USE THE MEAN
• You should use the mean when
• the data are interval or ratio scaled
• Many people will use the mean with ordinally scaled data too
• and the data are not skewed
• The mean is preferred because it is sensitive to every
score
• If you change one score in the data set, the mean will
45

change
RELATIONS BETWEEN THE MEASURES OF
CENTRAL TENDENCY
• In symmetrical distributions, the median
and mean are equal
• For normal distributions, mean = median =
mode
• In positively skewed distributions, the
mean is greater than the median

In negatively skewed distributions,


the mean is smaller than the median

46
MEASURES OF
DISPERSION

47
• Measures of dispersion are descriptive statistics that
describe how similar a set of scores are to each other
• The more similar the scores are to each other, the lower
the measure of dispersion will be
• The less similar the scores are to each other, the higher the
measure of dispersion will be
• In general, the more spread out a distribution is, the larger
the measure of dispersion will be
48
MEASURES OF DISPERSION
• Which of the distributions of scores has the larger
dispersion? 125
100
The upper distribution has 75
50
more dispersion because 25
0
the scores are more spread 1 2 3 4 5 6 7 8 9 10
125
out 100
75
That is, they are less similar 50
25
to each other 0
1 2 3 4 5 6 7 8 9 10
49
•There are three main measures of
dispersion:
•The range
•Variance
•Standard Deviation
50
THE RANGE
•The range is defined as the difference
between the largest score in the set of
data and the smallest score in the set of
data, XL - XS
51
WHEN TO USE THE RANGE
• The range is used when
• you have ordinal data or
• you are presenting your results to people with little or no
knowledge of statistics
• The range is rarely used in scientific work as it is
fairly insensitive
• It depends on only two scores in the set of data, X L and XS
• Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9 52
VARIANCE
•Variance is defined as the average of the square
deviations:

( )
− 2
Population Sample
 X    2
𝑠 =
∑ 𝑓 2𝑋 − 𝑋
  2
𝑁 −1
N
 is the population
2
s2 is the sample variance, X is a
variance, X is a score,  is
score, is the sample mean, and N is
the population mean, and N
the number of scores 53
is the number of scores
STANDARD DEVIATION
•is defined as the square root of the variance:
Population Sample

√ √ ∑ 𝑓 (𝑋 −𝑋)
2
∑𝑓 2 −
2 ( 𝑋 − 𝜇)
𝜎 = 𝑠
2
=
𝑁 𝑁 −1
2 is the population s2 is the sample variance, X is
variance, X is a score,  is a score, is the sample mean,
the population mean, and N and N is the number of scores 54

is the number of scores


STANDARD DEVIATION

•Standard deviation = variance


•Variance = standard deviation 2

55
WHAT DOES THE VARIANCE AND
STANDARD DEVIATION MEAN
• Variance is the mean of the squared deviation scores
• The larger the variance and standard deviation is, the
more the scores deviate, on average, away from the
mean
• The smaller the variance and standard deviation is, the
less the scores deviate, on average, from the mean
56
MEASURE OF SKEW
• Skew is a measure of symmetry in the distribution of
scores
Normal (skew = 0)

Positive Skew Negative Skew

57
•The formula can be used to determine skew:

( 𝑋 −𝑋)
− 3

𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠=

3
𝑠 ( 𝑁 −1 )

58
MEASURE OF SKEW

• If s3 < 0, then the distribution has a negative skew


• If s3 > 0 then the distribution has a positive skew
• If s3 = 0 then the distribution is symmetrical
• The more different s3 is from 0, the greater the
skew in the distribution
59
KURTOSIS
(NOT RELATED TO HALITOSIS)
• Kurtosis measures whether the scores are spread out
more or less than they would be in a normal (Gaussian)
distribution

60
MEASURE OF KURTOSIS
• The measure of kurtosis is given by:

( 𝑋 −𝑋)
− 4

𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠=

4
𝑠 ( 𝑁 − 1)

61
• When the distribution is normally distributed, its
kurtosis equals 3 and it is said to be mesokurtic
• When the distribution is less spread out than
normal, its kurtosis is greater than 3 and it is said to
be leptokurtic
• When the distribution is more spread out than
normal, its kurtosis is less than 3 and it is said to be
platykurtic 62
•Collectively, the variance (s ), skew
2

(s ), and kurtosis (s ) describe the


3 4

shape of the distribution

63

You might also like