Oblicon Case Digests

© All Rights Reserved

6 views

Oblicon Case Digests

© All Rights Reserved

- MapleTa Quiz One CVEN2002
- Professional Education-1.pptx
- Pre Assess Report 2692505
- Analyzing Quantitative Data_510
- 9_math_imp_ch14_4 (1)
- Supplementary Reading -1
- f4 c7 Statistics Set 2
- Wk4 Quiz
- Lsa Survey
- MB0050
- 5e for mean lesson
- A LEVEL MATHS 1
- Marketing Research ch07
- Report #2 Skewness
- RRB Normalization-processCEN 1-18 _06-04-19.pdf
- Standard Deviation
- Sampling.pptx
- February 11
- 2. Normal Distribution Tutorial) - L2
- Activity No 6

You are on page 1of 142

Dante V. Partosa

Mathematics Department

College of Science and Information Technology

Ateneo de Zamboanga University

Preliminaries

studies to collect, organize,

summarize, analyze, and draw

conclusions.

Data are the values (measurements

or observations) that the variables

can assume.

Variables whose values are

determined by chance are called

random variables.

A collection of data values forms a

data set.

Each value in the data set is called

a data value or a datum.

Descriptive statistics consists of the

collection, organization, summation,

and presentation of data.

A population consists of all subjects

(human or otherwise) that are being

studied.

A sample is a subgroup of the

population.

Inferential statistics consists of

generalizing from samples to

populations, performing hypothesis

testing, determining relationships

among variables, and making

predictions.

Variables and Types of Data

Qualitative variables are variables

that can be placed into distinct

categories, according to some

characteristic or attribute. For

example, gender (male or female).

Quantitative variables are numerical

in nature and can be ordered or

ranked. Example: age is numerical

and the values can be ranked.

Variables and Types of Data

that can be counted.

Continuous variables can assume all

values between any two specific

values. They are obtained by

measuring.

Variables and Types of Data

classifies data into mutually exclusive

(nonoverlapping), exhausting categories

in which no order or ranking can be

imposed on the data.

Variables and Types of Data

classifies data into categories that can

be ranked; precise differences between

the ranks do not exist.

Variables and Types of Data

data; precise differences between units

of measure do exist; there is no

meaningful zero.

Variables and Types of Data

possesses all the characteristics of

interval measurement, and there exists

a true zero. In addition, true ratios exist

for the same variable.

Data Collection and Sampling

Techniques

One of the most common methods is through

the use of surveys.

Surveys can be done by using a variety of

methods -

Examples are telephone, mail questionnaires,

personal interviews, surveying records and

direct observations.

Data Collection and Sampling

Techniques

statisticians use four methods of

sampling.

Random samples are selected by using

chance methods or random numbers.

Data Collection and Sampling

Techniques

numbering each value in the population

and then selecting the kth value.

Data Collection and Sampling

Techniques

dividing the population into groups

(strata) according to some characteristic

and then taking samples from each

group.

Data Collection and Sampling

Techniques

dividing the population into groups and

then taking samples of the groups.

Computers and Calculators

numerical computation easier.

Many statistical packages are available.

One example is SSPW (SPSS), MINITAB,

PHStat, Excel. The TI-83 calculator can

also be used to do statistical calculations.

Data must still be understood and

interpreted.

Organizing Data

form, they are called raw data.

When the raw data is organized into a

frequency distribution, the frequency

will be the number of values in a

specific class of the distribution.

Organizing Data

organizing of raw data in table form,

using classes and frequencies.

The following slide shows an example

of a frequency distribution.

Three Types of Frequency

Distributions

be used for data that can be placed in

specific categories, such as nominal- or

ordinal-level data.

Examples - political affiliation, religious

affiliation, blood type etc.

Blood Type Frequency Distribution -

Example

A 5 20

B 7 28

O 9 36

AB 4 16

Ungrouped Frequency

Distributions

Ungrouped frequency distributions - can

be used for data that can be enumerated

and when the range of values in the data

set is not large.

Examples - number of miles your

instructors have to travel from home to

campus, number of girls in a 4-child family

etc.

Number of Miles Traveled -

Example

Class Frequency

5 24

10 16

15 10

Grouped Frequency Distributions

used when the range of values in the data

set is very large. The data must be

grouped into classes that are more than

one unit in width.

Examples - the life of boat batteries in

hours.

Lifetimes of Boat Batteries -

Example

C l as s C l as s F r e q u e n c y C u m u l a ti v e

l i m i ts Bo u n d a r i e s fr e q u e n c y

24 - 30 2 3 .5 - 3 7 .5 4 4

38 - 51 3 7 .5 - 5 1 .5 14 18

52 - 65 5 1 .5 - 6 5 .5 7 25

Terms Associated with a Grouped

Frequency Distribution

largest data values that can be included in

a class.

In the lifetimes of boat batteries example,

the values 24 and 30 of the first class are

the class limits.

The lower class limit is 24 and the upper

class limit is 30.

Terms Associated with a Grouped

Frequency Distribution

separate the classes so that there are

no gaps in the frequency distribution.

Terms Associated with a Grouped

Frequency Distribution

frequency distribution is found by

subtracting the lower (or upper) class

limit of one class minus the lower (or

upper) class limit of the previous

class.

Guidelines for Constructing a

Frequency Distribution

classes.

The class width should be an odd

number.

The classes must be mutually

exclusive.

Guidelines for Constructing a

Frequency Distribution

The classes must be exhaustive.

The class must be equal in width.

Procedure for Constructing a Grouped

Frequency Distribution

Find the range.

Select the number of classes desired.

Find the width by dividing the range by

the number of classes and rounding up.

Procedure for Constructing a Grouped

Frequency Distribution

value); add the width to get the lower

limits.

Find the upper class limits.

Find the boundaries.

Tally the data, find the frequencies, and

find the cumulative frequency.

Grouped Frequency Distribution -

Example

10 8 6 14

22 13 17 19

11 9 18 14

13 12 15 15

5 11 16 11

Grouped Frequency Distribution -

Example

values: H = 22 and L = 5.

Step 2: Find the range:

R = H L = 22 5 = 17.

Step 3: Select the number of classes

desired. In this case it is

equal to 6.

Grouped Frequency Distribution -

Example

dividing the range by the number of

classes. Width = 17/6 = 2.83. This

value is rounded up to 3.

Grouped Frequency Distribution -

Example

lowest class limit. For convenience,

this value is chosen to be 5, the

smallest data value. The lower class

limits will be 5, 8, 11, 14, 17, and 20.

Grouped Frequency Distribution -

Example

7, 10, 13, 16, 19, and 22. For

example, the upper limit for the first

class is computed as 8 - 1, etc.

Grouped Frequency Distribution -

Example

subtracting 0.5 from each lower class

limit and adding 0.5 to the upper

class limit.

Grouped Frequency Distribution -

Example

numerical values for the tallies in the

frequency column, and find the

cumulative frequencies.

The grouped frequency distribution is

shown on the next slide.

Note: The dash - represents to.

05 t o 07 4.5 - 7.5 2 2

08 t o 10 7.5 - 10.5 3 5

11 t o 13 10.5 - 13.5 6 11

14 t o 16 13.5 - 16.5 5 16

17 t o 19 16.5 - 19.5 3 19

20 t o 22 19.5 - 22.5 1 20

Histograms, Frequency Polygons,

and Ogives

graphs in research are:

The histogram.

The frequency polygon.

The cumulative frequency graph, or

ogive (pronounced o-jive).

Histograms, Frequency Polygons,

and Ogives

displays the data by using vertical

bars of various heights to represent

the frequencies.

Example of a Histogram

5

Frequency

5 8 11 14 17 20

N u m b e r o f C ig a re tte s S m o k e d p e r D a y

Histograms, Frequency Polygons,

and Ogives

displays the data by using lines that

connect points plotted for frequencies

at the midpoint of classes. The

frequencies represent the heights of

the midpoints.

Example of a Frequency Polygon

Frequency Polygon

5

Frequency

2 5 8 11 14 17 20 23 26

Histograms, Frequency Polygons,

and Ogives

ogive is a graph that represents the

cumulative frequencies for the

classes in a frequency distribution.

Example of an Ogive

Ogive

20

Cumulative Frequency

10

2 5 8 11 14 17 20 23 26

Other Types of Graphs

used to represent a frequency

distribution for a categorical variable.

Other Types of Graphs-

Pareto Chart

Make the bars the same width.

Arrange the data from largest to

smallest according to frequencies.

Make the units that are used for the

frequency equal in size.

Example of a Pareto Chart

Pareto C hart for the num ber of Crim es Inves tigated by Law

Enforcement Officers in U.S. National Parks During 1995.

250 100

200 80

Percent

Count

150 60

100 40

50 20

0 0

Defec t

Count 164 34 29 13

Perc ent 68.3 14.2 12.1 5.4

Cum % 68.3 82.5 94.6 100.0

Other Types of Graphs

graph represents data that occur over

a specific period of time.

2-4 Other Types of Graphs -

Time Series Graph

P O R T AU T H O R IT Y T R AN S IT R ID E R S H IP

89

Ridership (in millions)

87

85

83

81

79

77

75

199 0 19 91 1992 1993 19 94

Y ear

Other Types of Graphs

is divided into sections or wedges

according to the percentage of

frequencies in each category of the

distribution.

Other Types of Graphs -

Pie Graph

Pie Chart of the Robbery (29,

Number of Crimes 12.1%)

Investigated by Rape (34,

Law Enforcement 14.2%)

Officers In U.S.

National Parks Homicide

During 1995 (13, 5.4%)

Assaults

(164,

68.3%)

Organizing Data

Describing Data

Measures of Central Tendency

A statistic is a characteristic or

measure obtained by using the data

values from a sample.

A parameter is a characteristic or

measure obtained by using the data

values from a specific population.

The Mean (arithmetic average)

The mean is defined to be the sum

of the data values divided by the

total number of values.

We will compute two means: one

for the sample and one for a finite

population of values.

The mean, in most cases, is not an

actual data value.

The Sample Mean

X i s read as " X - bar " . The G reek symbol

i s read as " si gma" and i t means " to sum" .

X + X + ... + X

X= 1 2 n

n

X.

=

n

The Sample Mean - Example

T h e a g es i n w eek s o f a r a n d o m sa m p l e

o f s i x k i tte n s a t a n a n i m a l s h e l te r a r e

3 , 8 , 5 , 1 2 , 1 4 , a n d 1 2 . F i n d th e

a v e r a g e a g e o f t h i s s a m p l e.

T h e sa m p l e m ea n i s

X = X

=

3 + 8 + 5 +12 +14 +12

n 6

54

= = 9 w e e k s.

6

The Population Mean

mean. The symbol m i s r ead as " mu" .

N i s the si ze of the fi ni te popul ati on.

X + X + ... + X

m=

1 2 N

N

X.

=

N

The Population Mean - Example

the sal esperson, and two techni ci ans. The sal ari es are

l i sted as $50,000, 20,000, 12,000, 9,000 and 9,000

respecti vel y. ( Assume thi s i s the popul ati on.)

Then the popul ati on mean wi l l be

= X

m

N

50,000 +20,000 +12,000 +9,000 +9,000

=

5

= $20,000.

The Sample Mean for an Ungrouped

Frequency Distribution

di stri but i on i s gi ven by

(f X)

X= .

n

H ere f i s the frequency for the

correspondi ng val ue of X , and n = f .

The Sample Mean for an Ungrouped

Frequency Distribution - Example

are given in the table. Find the mean score

SSccoorree,,XX FFrreeqquueennccyy,,ff

00 22

11 44

22 1122

33 44

5

44 33

5

The Sample Mean for an Ungrouped

Frequency Distribution - Example

00 22 00

11 44 44

22 1122 2244

33 44 1122

44 33 1122

5

f X 52

X= = = 2.08.

n 25

The Sample Mean for a Grouped

Frequency Distribution

distributionis givenby

( f X m)

X= .

n

Here X is thecorresponding

m

class midpoint.

The Sample Mean for a Grouped

Frequency Distribution - Example

CCllaassss FFrreeqquueennccyy,,ff

1155.5

.5--2200.5.5 33

2200.5

.5--2255.5

.5 55

2255.5

.5--3300.5

.5 44

3300.5

.5--3355.5

.5 33

3355.5

.5--4400.5

.5 22

5

5

The Sample Mean for a Grouped

Frequency Distribution - Example

CCla

lasss FFrreeqquueennccyy,,ff XXmm ff?XXmm

1155.5

.5--2200.5

.5 33 1188 5544

2200.5

.5--2255.5

.5 55 2233 111155

2255.5

.5--3300.5

.5 44 2288 111122

3300.5

.5--3355.5

.5 33 3333 9999

5

3355.5

.5--4400.5

.5 22 3388 7766

5

The Sample Mean for a Grouped

Frequency Distribution - Example

f X m = 54 + 115 + 112 + 99 + 76

= 456

and n = 17. So

f Xm

X=

n

456

= = 26.82.

17

The Median

called a data array.

The median is defined to be the

midpoint of the data array.

The symbol used to denote the

median is MD.

The Median - Example

army recruits are 180, 201, 220,

191, 219, 209, and 186. Find the

median.

Arrange the data in order and

select the middle point.

The Median - Example

209, 219, 220.

The median, MD = 201.

The Median

an odd number of values in the

data set. In this case it is easy to

select the middle number in the

data array.

The Median

values in the data set, the median

is obtained by taking the average of

the two middle numbers.

The Median - Example

number of magazines: 1, 7, 3, 2, 3, 4.

Find the median.

Arrange the data in order and compute

the middle point.

Data array: 1, 2, 3, 3, 4, 7.

The median, MD = (3 + 3)/2 = 3.

The Median - Example

are: 18, 24, 20, 35, 19, 23, 26, 23,

19, 20. Find the median.

Arrange the data in order and

compute the middle point.

The Median - Example

23, 24, 26, 35.

The median,

MD = (20 + 23)/2 = 21.5.

The Median-Ungrouped Frequency

Distribution

distribution, find the median by

examining the cumulative

frequencies to locate the middle

value.

The Median-Ungrouped Frequency

Distribution

n/2. Locate the data point where

n/2 values fall below and n/2

values fall above.

The Median-Ungrouped Frequency

Distribution - Example

VCRs sold per week over a one-year

period. The data is given below.

NNoo. .SSeetstsSSoold

ld FFrreeqquueennccyy

11 44

22 99

33 66

44 22

55 33

The Median-Ungrouped Frequency

Distribution - Example

24/2 = 12.

Locate the point where 12 values would fall

below and 12 values will fall above.

Consider the cumulative distribution.

The 12th and 13th values fall in class 2.

Hence MD = 2.

The Median-Ungrouped Frequency

Distribution - Example

NNoo..SSeetstsSSoold

ld FFrreeqquueennccyy CCuum muulalatitv

ivee

FFrreeqquueennccyy

11 44 44

22 99 1133

33 66 1199

44 22 2211

55 33 2244

13th values.

The Median for a Grouped

Frequency Distribution

(n 2) - cf

MD = (w) + Lm

f

Where

n = sum of the frequencies

cf = cumulativefrequencyof the class

immediatelyprecedingthe median class

f = frequencyof the medianclass

w = width of the median class

Lm = lower boundary of the median class

The Median for a Grouped

Frequency Distribution - Example

CCllaassss FFrreeqquueennccyy,,ff

1155.5

.5--2200.5.5 33

2200.5

.5--2255.5

.5 55

2255.5

.5--3300.5

.5 44

3300.5

.5--3355.5

.5 33

5

3355.5

.5--4400.5

.5 22

5

The Median for a Grouped

Frequency Distribution - Example

CCla

lassss FFrreeqquueennccyy,,ff CCuum muulalatitv

ivee

FFrreeqquueennccyy

1155.5

.5--2200.5

.5 33 33

2200.5

.5--2255.5

.5 55 88

2255.5

.5--3300.5

.5 44 1122

3300.5

.5--3355.5

.5 33 1155

5

3355.5

.5--4400.5

.5 22 1177

5

The Median for a Grouped Frequency

Distribution - Example

17/2 = 8.5 9.

Find the class that contains the 9th value.

This will be the median class.

Consider the cumulative distribution.

The median class will then be 25.5

30.5.

The Median for a Grouped

Frequency Distribution

n =17

cf = 8

f =4

w = 25.520.5=5

Lm = 25.5

(n 2) - cf (17/ 2) 8

MD = (w) + Lm = (5) + 25.5

f 4

= 26.125.

The Mode

that occurs most often in a data set.

A data set can have more than one

mode.

A data set is said to have no mode if

all values occur with equal frequency.

The Mode - Examples

days) of U.S. space shuttle voyages for the

years 1992-94. Find the mode.

Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10,

14, 11, 8, 14, 11.

Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10,

10, 11, 11, 14, 14, 14. Mode = 8.

The Mode - Examples

long they could remain alive outside their

normal environment. The time, in minutes, is

given below. Find the mode.

Data set: 2, 3, 5, 7, 8, 10.

There is no mode since each data value

occurs equally with a frequency of one.

The Mode - Examples

speed of 15 mph for stopping distances. The

distance, in feet, is given below. Find the

mode.

Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24,

26, 26.

There are two modes (bimodal). The values

are 18 and 24. Why?

The Mode for an Ungrouped

Frequency Distribution - Example

VVaalluueess FFrreeqquueennccyy,,ff

1155 33

Mode 2200 55

2255 88

3300 33

3355 22

5

5

The Mode - Grouped Frequency

Distribution

The mode for grouped data is the

modal class.

The modal class is the class with the

largest frequency.

Sometimes the midpoint of the class

is used rather than the boundaries.

The Mode for a Grouped Frequency

Distribution - Example

CCllaassss FFrreeqquueennccyy,,ff

Modal 1155.5

.5--2200.5.5 33

Class

2200.5

.5--2255.5

.5 55

2255.5

.5--3300.5

.5 77

3300.5

.5--3355.5

.5 33

3355.5

.5--4400.5

.5 22

5

5

The Midrange

lowest and highest values in the data

set and dividing by 2.

The midrange is a rough estimate of the

middle value of the data.

The symbol that is used to represent the

midrange is MR.

The Midrange - Example

Minnesota, reported the following number of

water-line breaks per month. The data is as

follows: 2, 3, 6, 8, 4, 1. Find the midrange.

MR = (1 + 8)/2 = 4.5.

Note: Extreme values influence the midrange

and thus may not be a typical description of

the middle.

The Weighted Mean

values in a data set are not all equally

represented.

The weighted mean of a variable X is

found by multiplying each value by its

corresponding weight and dividing the sum

of the products by the sum of the weights.

The Weighted Mean

w X + w X +...+ wn X n wX

X= 1 1 2

= 2

w + w +...+ wn

1 2 w

where w , w , ..., wn are the wei ghts

1 2

Distribution Shapes

many shapes.

The three most important shapes are

positively skewed, symmetrical, and

negatively skewed.

Positively Skewed

Y

Positively Skewed

X

Mode < Median < Mean

Symmetrical

Y

Symmetrical

X

Mean = Median = Mode

Negatively Skewed

Negatively Skewed

X

Mean < Median < Mode

Measures of Variation - Range

value minus the lowest value. The

symbol R is used for the range.

R = highest value lowest value.

Extremely large or extremely small data

values can drastically affect the range.

Measures of Variation - Population

Variance

di stance each val ue i s from the mean.

The symbol for the popul ati on vari ance is

s (s i s the G reek l owercase l etter si gma)

2

( X - m ) , where

2

s =

2

N

X = i ndi vi dual val ue

m = popul ati on mean

N = popul ati on si ze

Measures of Variation - Population

Standard Deviation

root of the vari ance.

( X - m) 2

s = s = .

2

N

Measures of Variation - Example

the population: 10, 60, 50, 30, 40, 20.

Find the mean and variance.

The mean m = (10 + 60 + 50 + 30 + 40 +

20)/6 = 210/6 = 35.

The variance s 2 = 1750/6 = 291.67. See

next slide for computations.

Measures of Variation - Example

22

6600 +

+2255 662255

5500 +

+1155 222255

3300 --55 2255

4400 +

+55 2255

2200 --1155 222255

221100 11775500

3-3 Measures of Variation - Sample

3-58 Variance

variance o r the sample varianc e is a

statistic whose value approximates the

expected value of a population variance.

It is denoted by s , where

2

(X - X ) 2

s = , and

2

n-1

X = sample mean

n = sample size

Measures of Variation - Sample

Standard Deviation

are

root of he

t samplevariance.

( X - X )2

s = s =

2

.

n-1

Shortcut Formula for the Sample

Variance and the Standard Deviation

X - ( X ) / n

2 2

s=

2

n-1

X - ( X ) / n

2 2

s=

n-1

Sample Variance - Example

deviation for the following sample: 16,

19, 15, 15, 14.

X = 16 + 19 + 15 + 15 + 14 = 79.

X2 = 162 + 192 + 152 + 152 + 142

= 1263.

Sample Variance - Example

X - ( X ) / n

2 2

s =

2

n-1

1263- (79)/ 5

2

= = 3.7

4

s = 3.7 = 1.9.

Sample Variance for Grouped and

Ungrouped Data

midpoints for the observed value in the

different classes.

For ungrouped data, use the same

formula (see next slide) with the class

midpoints, Xm, replaced with the actual

observed X value.

Sample Variance for Grouped and

Ungrouped Data

f X - [( f X ) / n]

2 2

s = .

2 m m

n-1

For ungrouped data, replace Xm

with the observe X value.

Sample Variance for Grouped Data

- Example

XX ff ffX

X ffX 2

X 2

55 22 1010 5050

66 33 18

18 108

108

77 88 56

56 392

392

88 11 88 64

64

99 66 54

54 486

486

10

10 44 40

40 400

400

nn= 24

f X

=

= 24 f X = 186 186

f

fX=

X

22

=1500

1500

Sample Variance for Ungrouped

Data - Example

f X 2 - [( f X )2 / n]

s =

2

n-1

1500- [(186)/ 24] =

2

= 2.54.

23

s = 2.54 = 1.6.

Coefficient of Variation

be the standard deviation divided by the

mean. The result is expressed as a

percentage.

s s

CVar = 100% or CVar = 100%.

X m

Chebyshevs Theorem

will fall within k standard deviations of the

mean will be at least 1 1/k2, where k is any

number greater than 1.

For k = 2, 75% of the values will lie within 2

standard deviations of the mean. For k = 3,

approximately 89% will lie within 3 standard

deviations.

The Empirical (Normal) Rule

Approximately 68% of the data values will fall

within one standard deviation of the mean.

Approximately 95% will fall within two

standard deviations of the mean.

Approximately 99.7% will fall within three

standard deviations of the mean.

The Empirical (Normal) Rule

m s -- m s -- 95% m s --

m -s m -s m -s m m +s m +s m +s

Measures of Position z score

value is obtained by subtracting the

mean from the value and dividing the

result by the standard deviation.

The symbol z is used for the z

score.

Measures of Position z-score

standard deviations a data value falls above

or below the mean.

For samples:

X-X

z= .

s

For populations:

=

X -m

z .

s

z-score - Example

had a mean of 50 and a standard deviation of

10. Compute the z-score.

z = (65 50)/10 = 1.5.

That is, the score of 65 is 1.5 standard

deviations above the mean.

Above - since the z-score is positive.

Measures of Position - Percentiles

groups.

The Pk percentile is defined to be that

numerical value such that at most k% of

the values are smaller than Pk and at most

(100 k)% are larger than Pk in an ordered

data set.

Percentile Formula

value (X) is computed by using the

formula:

number of values below X + 0.5

Percentile= 100%

total number of values

Percentiles - Example

Find the percentile rank of a score of 12.

Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10.

Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20.

Percentile = [(6 + 0.5)/10](100%) = 65th

percentile. Student did better than 65% of the

class.

Percentiles - Finding the value

Corresponding to a Given

Percentile

sample size.

Step 1: Arrange the data in order.

Step 2: Compute c = (np)/100.

Step 3: If c is not a whole number, round up

to the next whole number. If c is a whole

number, use the value halfway between c

and c+1.

Percentiles - Finding the value

Corresponding to a Given

Percentile

the required percentile.

Example: Find the value of the 25th

percentile for the following data set: 2, 3, 5, 6,

8, 10, 12, 15, 18, 20.

Note: the data set is already ordered.

n = 10, p = 25, so c = (1025)/100 = 2.5.

Hence round up to c = 3.

Percentiles - Finding the value

Corresponding to a Given

Percentile

value X = 5.

Find the 80th percentile.

c = (10 80)/100 = 8. Thus the value of the

80th percentile is the average of the 8th and

9th values. Thus, the 80th percentile for the

data set is (15 + 18)/2 = 16.5.

Special Percentiles - Deciles and

Quartiles

groups.

Deciles are denoted by D1, D2, , D9

with the corresponding percentiles

being P10, P20, , P90

Quartiles divide the data set into 4

groups.

Special Percentiles - Deciles and

Quartiles

Q3 with the corresponding percentiles

being P25, P50, and P75.

The median is the same as P50 or Q2.

Outliers and the Interquartile

Range (IQR)

extremely low data value when

compared with the rest of the data

values.

The Interquartile Range, IQR

= Q3 Q1.

Outliers and the Interquartile

Range (IQR)

considered as an outlier:

Step 1: Compute Q1 and Q3.

Step 2: Find the IQR = Q3 Q1.

Step 3: Compute (1.5)(IQR).

Step 4: Compute Q1 (1.5)(IQR) and

Q3 + (1.5)(IQR).

Outliers and the Interquartile

Range (IQR)

considered as an outlier:

Step 5: Compare the data value (say X) with

Q1 (1.5)(IQR) and Q3 + (1.5)(IQR).

If X < Q1 (1.5)(IQR) or

if X > Q3 + (1.5)(IQR), then X is considered

an outlier.

Outliers and the Interquartile

Range (IQR) - Example

Given the data set 5, 6, 12, 13, 15, 18, 22, 50,

can the value of 50 be considered as an

outlier?

Q1 = 9, Q3 = 20, IQR = 11. Verify.

(1.5)(IQR) = (1.5)(11) = 16.5.

9 16.5 = 7.5 and 20 + 16.5 = 36.5.

The value of 50 is outside the range 7.5 to

36.5, hence 50 is an outlier.

Exploratory Data Analysis - Stem

and Leaf Plot

that uses part of a data value as the

stem and part of the data value as

the leaf to form groups or classes.

Exploratory Data Analysis - Stem

and Leaf Plot - Example

sample of 20 days showed the following

number of cardiograms done each day:

25, 31, 20, 32, 13, 14, 43, 02, 57, 23,

36, 32, 33, 32, 44, 32, 52, 44, 51, 45.

Construct a stem and leaf plot for the

data.

Exploratory Data Analysis - Stem

and Leaf Plot - Example

0 2

1 3 4

2 0 3 5

3 1 2 2 2 2 3 6

4 3 4 4 5

5 1 2 7

Exploratory Data Analysis

Box Plot

number of values, a box plot is used to

graphically represent the data set.

These plots involve five values: the

minimum value, the lower hinge, the

median, the upper hinge, and the

maximum value.

Exploratory Data Analysis

Box Plot

less than or equal to the median when the

data set has an odd number of values, or

as the median of all values less than the

median when the data set has an even

number of values. The symbol for the

lower hinge is LH.

Exploratory Data Analysis

Box Plot

values greater than or equal to the

median when the data set has an odd

number of values, or as the median of all

values greater than the median when the

data set has an even number of values.

The symbol for the upper hinge is UH.

Exploratory Data Analysis - Box

Plot - Example (Cardiograms data)

LH UH

MINIMUM MAXIMUM

MEDIAN

0 10 20 30 40 50 60

Information Obtained from a

Box Plot

the distribution is approximately symmetric.

If the median falls to the left of the center of

the box, the distribution is positively skewed.

If the median falls to the right of the center of

the box, the distribution is negatively skewed.

Information Obtained from a

Box Plot

distribution is approximately symmetric.

If the right line is larger than the left line, the

distribution is positively skewed.

If the left line is larger than the right line, the

distribution is negatively skewed.

- MapleTa Quiz One CVEN2002Uploaded bySean ZH
- Professional Education-1.pptxUploaded bySheila Rose Estampador
- Pre Assess Report 2692505Uploaded byAnuj Tiwari
- Analyzing Quantitative Data_510Uploaded byNarutoLLN
- 9_math_imp_ch14_4 (1)Uploaded bysanjit0907_982377739
- Supplementary Reading -1Uploaded byMadhu Ekanayake
- f4 c7 Statistics Set 2Uploaded byMK Joey Tham
- Wk4 QuizUploaded bymccheezy
- Lsa SurveyUploaded byJoyce Mabini Manzano
- MB0050Uploaded byV Eswara Ranga Sandeep
- 5e for mean lessonUploaded byapi-435887948
- A LEVEL MATHS 1Uploaded byBaidoo Alexander Cobby
- Marketing Research ch07Uploaded byimad
- Report #2 SkewnessUploaded byMay Grace D. Salazar
- RRB Normalization-processCEN 1-18 _06-04-19.pdfUploaded byRajkumar Sahu
- Standard DeviationUploaded byTaura Denis
- Sampling.pptxUploaded byHasan Karim
- February 11Uploaded bymichelle ann
- 2. Normal Distribution Tutorial) - L2Uploaded byVijay Naik
- Activity No 6Uploaded byAngelus Vincent Guilalas
- Standad DevUploaded byMuhammad Amir Akhter
- Assigment 1 Bus StatUploaded byNigara Yussupova
- (3) Statistics and Probability_solutionsUploaded byEphraim Ramos
- 117_chap2_SlidesUploaded byJeong Gyun Kang
- LING 228 PROJECT.docxUploaded byAlex Garcia
- chi11_williams.pptxUploaded byAnthony Cabel Albujar
- Population area relationship.pdfUploaded byAlbert Raja
- Lesson 1-07 Measures of Variation STATUploaded byallan.manaloto23
- Lesson 1-07 Measures of Variation.docxUploaded byJamiefel Pungtilan
- maths prac wk 3Uploaded byapi-300378200

- Full Obligations and Contracts Digested CasesUploaded byEmelson Maranes
- RA 8042Uploaded byRysz Mamon
- UST-GN-2011-Labor-Law-Index-and-Biblio.pdfUploaded byRysz Mamon
- 1- Introduction to Website ManagementUploaded byRysz Mamon
- Oblicon Case DigestsUploaded byiamkc14
- Palliative CareUploaded byRysz Mamon
- Political LawUploaded byjaine0305
- super consti digest.pdfUploaded byRysz Mamon
- DOLE Department Advisory No_ 01-2015Uploaded bygoannamarie7814
- Search and Seizure DigestUploaded byRysz Mamon

- A Dictionary of Units ~ Part 1Uploaded bykapil
- 16 Folding SamplesUploaded byDaniel Gureanu
- SIMPLE PENDULUM EXPERIMENT - PEKAUploaded byA. Suhaimi
- Cbse AIPMT Prelims 2006Uploaded byANUZ DUET
- Probset - StrainUploaded byClint Charles P. Brutas
- HYOSUNG Power Transformer Catalog English April2011Uploaded byjoabaar
- Encyclopedia+of+Medical+Decision+MakingUploaded byAgam Reddy M
- Technical Specification Scooptram ST7 9851 2681 01 Tcm835-1532854Uploaded byEmanuel Andrés Rosas Saldivar
- NITIE_SmaugsUploaded byVarun Mitra
- L12 - Dynamics Analysis and Forces 2 V1Uploaded byzul_fadhli_1988
- Convolutional EncodingUploaded byHaroon Jalil Baig
- Anodic Stripping Voltammetry of Arsenic(III) Using (2)Uploaded byotchithienebook
- Markings AirportsUploaded byArif Samoon
- Overhead Line DesignUploaded byPramod B.Wankhade
- chem 155 electrochemistryUploaded byapi-318921527
- Krishna Mur Ty 1970Uploaded byAtikDwiOktaviani
- TC_LR-S-GMI_2013Uploaded bynikhil kumar
- Total-and-Effective-Stress.pptUploaded bymortaza7094
- TICO Clip StripUploaded bymanish_kumar_shukla
- Bondek Design & Construct ManualUploaded byAkuma.Gokai7328
- Garden Problem - Michelle SweeneyUploaded bygkrall
- Manish K. Gupta- The Quest for Error Correction in BiologyUploaded byJinn679
- Mid.I - Q.bank & Assign - MPIUploaded byJyothi Prakash
- High Speed Interconnect ModelingUploaded byApikShafiee
- Top 5 TrendsUploaded byapi-3826257
- cUploaded bySagar Paul'g
- CHEMICAL ABSORPTION OF H2S FOR biogas purificationUploaded bysachinnande
- General PrinciplesUploaded byMichael Benton
- Zook rupture disc URAUploaded bymd_taheri
- Vessel DesignUploaded byDinda Naiya Azhari