You are on page 1of 48

ADVANCED

STATISTICS

DR. SYLVINO V. TUPAS


University of St. La Salle-Bacolod
Learning Objectives

At the end of the course, students should be able to:


1. define statistics in its simplest sense;
2. familiarize with the four levels of data
measurements;
3. distinguish one data level from another;
4. differentiate between descriptive and inferential
statistics;
5. discuss normally distributed data; and
6. differentiate between parametric and non-
parametric statistics.
COVERAGE
 Statistics defined
 Types of Statistics
 Random Variables; Discrete and Continuous Data
 Level of Measurement
 Measures of Central Tendency:
Mean, Median, Mode, Mid-range
 Frequency Distribution
 Histograms, Pareto Chart
 Measures of Dispersion: Variances; Standard Deviation
 Measures of Symmetry
 Parametric and Non-Parametric
Statistics defined
A science dealing with the collection,
presentation, analysis, and interpretation
of numerical data.

Statistics is the science of learning from


data, and of measuring, controlling, and
communicating uncertainty; and it thereby
provides the navigation essential for
controlling the course of scientific and
societal advances (Davidian, M. and Louis, T. A).
Kinds of Statistics

(Enumerative)
Kinds of Statistics

(Analytical)
Random Variables
Continuous Random Variable
Discrete Random Variables
Activity

 Identify at least ten random variables.


 State whether continuous or discrete.
Levels of Measurement
Examples
 Number of white, black, and red cars in the parking lot
 Result of the recently crown Miss Mass Kara 2015
 weight of under nourish children in Purok 8
 Entrance test scores
 amount of cash in my pocket
 level of understanding in math
 result of Cebu Triathlon 2015
 body temperature in degree Celsius
 temperature of methane gas in Kelvin
 gender
 willing to teach in the senior high
 number of students enrolled in the maritime strand
 extent of understanding about global warming
Measures of Central Tendency
A measure of central tendency (also referred to as
measures of centre or central location) is a
summary measure that attempts to describe a
whole set of data with a single value that
represents the middle or centre of its distribution.
Mean
Properties of the Mean
Median

The median is the value of the middle


term in a data set that has been
arranged in increasing order.
Properties of Median

1. The median is less sensitive than the


mean to the presence of a few extreme
scores.

2. In a distribution that are strongly


asymmetrical, the median may be the
better choice for the measure of the
central tendency of the set of data.
Mode

The mode is the item that is most popular


or common. A distribution may also be
bimodal and multi modal.

The mode is easy to obtain but is not very


stable from sample to sample.
Mid-Range

Another measure of center that is not as


popular as the mode, mean and median is
called the midrange.

The midrange is the mean of the


maximum and minimum values of the
data set.
Example
Frequency Distribution
Arrangement of gathered data by
categories plus their corresponding
frequencies and class marks or midpoints
(Punsalan, 1989).

Tabular arrangement of data whereby the


data is grouped into different interval and
the number of observations that belongs
to each interval is determined.
Table 1. Distribution of the Respondents

Gender Frequency
Male 23
Female 107
Total 130

Class Frequency
Relative
Frequency = Total Frequency
Table 2. Preferred Color of Respondents

Color Frequency Relative Percentage


Frequency Frequency
Yellow 23 0.3286 32.86%

Green 7 0.1000 10.00%

Blue 14 0.2000 20.00%

Red 26 0.3714 37.14%

Total 70 1.0000 100.00%


Frequency Distribution
30

25

20

15

10

0
Yellow Green Blue Red
How to Make a Frequency Distribution Table for Group Data

Step1: Compute for the Range.


Range = highest score – lowest score

Step 2: Determine the number of Class Interval.

Step 3: Organize the Class Interval

Step 4: Tally each score to the Class Interval it


belongs to.
How to Make a Frequency Distribution Table for Group Data

Step 6: Compute the Midpoint for each Interval.

Step 7: Compute the Cumulative Frequency for


‘less than’ and ‘greater than’. (Ogive)

Step 8: Compute the Relative Frequency distribution.


How to Make a Frequency Distribution Table for Group Data
Activity
 Draw a Frequency Distribution Table and a Histogram.
27 95 63 69 43 75 31 50 74
82 79 65 52 60 49 80 50 65
83 58 47 60 70 59 61 41 71
66 59 50 49 58 63 52 60 59
59 61 58 70 63 78 59 53 49
49 51 70 63 59 49 61 72 68
66 55 61 49 53 65 50 89 75
Pareto Chart
 A Pareto chart, named after Vilfredo Pareto, is a type
of chart that contains both bars and a line graph,
where individual values are represented in
descending order by bars, and the cumulative total is
represented by the line.
 The left vertical axis is the frequency of occurrence.
The right vertical axis is the cumulative percentage of
the total number of occurrences.
 Pareto Principle, which postulates that 80 percent of
the trouble comes from 20 percent of the problems.

The Pareto chart is one of the seven basic tools of quality control.
How to construct Pareto Chart
Continuation
Construct a Pareto Chart
 Here a list of complains from a survey about common
problems at Symphony Hotel during the first quarter
of the year.
Analysis Sheet
Pareto Chart
Analysis
Variations
Measure of variation is a measure that
describes how spread out or scattered a set
of data. It is also known as measures of
dispersion or measures of spread.

There are 2 measures of variation: Range and


Standard Deviation
Range

- simplest of all the measures of variation


- difference between the highest and
lowest value in a set of data
- symbol R
Standard Deviation
- the most popular measures of variation
- square root of the variance
- denoted by SD
The standard deviation measures how
concentrated the data are around the mean; the
more concentrated, the smaller the standard
deviation.

•The standard deviation can never be a negative number


•The smallest possible value for the standard deviation is 0,
(every single number in the data set is exactly the same).
•The standard deviation is affected by outliers (extremely
low or extremely high numbers in the data set).
•The standard deviation has the same units as the original
data.
Chebyshev's Theorem
 developed by Russian mathematician
 at least ¾ of the data falls within 2 SD of the

mean
 at least 8/9 or 88.89% of the data falls within
3 SD of the mean
 This theorem can be applied to any
distribution regardless of its shape
The Empirical Rule
Chebyshev's theorem as applied to a normal
(bell-shaped) distribution
 the empirical rule states that...
- approximately 68% of the data values falls
within 1 SD of the mean
- approximately 95% of the data values falls
within 2 SD of the mean
- approximately 99.7% falls within 3 SD
Parametric and Non-Parametric

 Parametric tests are those that make


assumptions about the parameters of the
population distribution from which the
sample is drawn. This is often the assumption
that the population data are normally
distributed. Non-parametric tests are
“distribution-free” and, as such, can be used
for non-Normal variables.
Parametric test Non-Parametric REMARKS
test
Mean Median, Mode Descriptive

Paired t-test Wilcoxon Rank Sum Inferential Statistics


test Dependent Samples

Unpaired t-test Mann-Whitney U Inferential Statistics


test Independent Sample

Pearson Correlation Spearman Relationship/


Correlation Correlation

One-way ANOVA Kruskal Wallis Test Inferential Statistics


3 or more groups
Thank You!

You might also like