6 views

Uploaded by Ashok

save

- Dansaber.wordpress.com-A Dramatic Tour Through Pythons Data Visualization Landscape Including Ggplot AndnbspAltair
- R02-descriptiveStats.128
- Pic to Graph
- stats - goal target sc
- stat quiz
- Ma Yank 33 Ashish 56
- Solomon B QP - S1 Edexcel
- Power Outages and Economic Growth in Africa
- STAB22 Midterm 2009F
- Draft 2015 observer deployment plan
- Stat Chapter 3 Notes
- White Blood Cells in Northwestern Gartersnakes
- AP Statistics Semester Exam Review
- Statistics_in_OpenOffice
- ASSGMNT
- ASWchapter6
- Control Charts
- Robust Data Clustering by Learning Multi-metric Lq-norm Distances
- Descriptive Stat1
- Hydraulic Design Manual_ Statistical Analysis of Stream Gauge Data
- ESTADISTICA CLASICA
- Holiday h Wrk XI MATHS2013
- 1511.06233.pdf
- lec5
- Balota Et Al. (2007)
- Kohl Brenner
- chap4
- jurnal penagian aktif denga surat paksa kepada wajib pajak (englishpajak
- 06839095
- IUT 9ABS304 Probability & Statistics
- TOC impact study.pdf
- Andan
- Utilizing a Lean Management System
- 5S.pdf
- Kanban-Kaizen
- 5S -Excellence Assessment.pdf
- OM-11a JIT
- DWM Overview RIB
- Stk Work Truth and Miths
- free_5s_audit_check_sheet_template.pdf
- Study of Effects of Theory of Constraints.pdf
- Leading a Lean Conversion Case Study
- Open Office
- 5S -Levels Assessment.pdf
- Open Office.docx
- FMEA Worksheet
- 4. Location
- Failure Mode and Effect Analysis
- Lean TPM - Downtime Waste Controls
- St Sourcing
- Chapter 08- 7 QC Tools Pareto - CE - Red Bead
- Fmea Intro
- 9. Pricing
- KAIZEN08.pdf
- Basic_TPS_Handbook_v1.pdf
- Kaizen Methods in Prodn.pdf
- Building the Lean Fulfillment Stream
- 0904-FMEA
- SE - FMEA
- What is an A3

You are on page 1of 18

**Measures of Relative Location and Detecting
**

Outliers

Exploratory Data Analysis

x

Slide 1

**Measures of Relative Location
**

and Detecting Outliers

z-Scores

Chebyshev’s Theorem

Empirical Rule

Detecting Outliers

Slide 2

z-Scores

**The z-score is often called the standardized value.
**

It denotes the number of standard deviations a data

value xi is from the mean.

xi x

zi

s

**A data value less than the sample mean will have a zscore less than zero.
**

A data value greater than the sample mean will have

a z-score greater than zero.

A data value equal to the sample mean will have a zscore of zero.

Slide 3

Raw data

Raw data on Apartment Rents

Mean = 490.8

Standard deviation = 54.74

425

440

450

465

480

510

575

430

440

450

470

485

515

575

430

440

450

470

490

525

580

435

445

450

472

490

525

590

435

445

450

475

490

525

600

435

445

460

475

500

535

600

435

445

460

475

500

549

600

435

445

460

480

500

550

600

440

450

465

480

500

570

615

440

450

465

480

510

570

615

Slide 4

**Example: Apartment Rents
**

**z-Score of Smallest Value (425)
**

xi x 425 490.80

z

1. 20

s

54. 74

Standardized Values for Apartment Rents

-1.20

-0.93

-0.75

-0.47

-0.20

0.35

1.54

-1.11

-0.93

-0.75

-0.38

-0.11

0.44

1.54

-1.11

-0.93

-0.75

-0.38

-0.01

0.62

1.63

-1.02

-0.84

-0.75

-0.34

-0.01

0.62

1.81

-1.02

-0.84

-0.75

-0.29

-0.01

0.62

1.99

-1.02

-0.84

-0.56

-0.29

0.17

0.81

1.99

-1.02

-0.84

-0.56

-0.29

0.17

1.06

1.99

-1.02

-0.84

-0.56

-0.20

0.17

1.08

1.99

-0.93

-0.75

-0.47

-0.20

0.17

1.45

2.27

-0.93

-0.75

-0.47

-0.20

0.35

1.45

2.27

Slide 5

Chebyshev’s Theorem

At least (1 - 1/k2) of the items in any data set will be

within k standard deviations of the mean, where k is

any value greater than 1.

• At least 75% of the items must be within

k = 2 standard deviations of the mean.

• At least 89% of the items must be within

k = 3 standard deviations of the mean.

• At least 94% of the items must be within

k = 4 standard deviations of the mean.

Slide 6

**Example: Apartment Rents
**

Chebyshev’s Theorem

Let k = 1.5 with

x = 490.80 and s = 54.74

**At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%
**

of the rent values must be between

x - k(s) = 490.80 - 1.5(54.74) = 409

and

x + k(s) = 490.80 + 1.5(54.74) = 573

Slide 7

**Example: Apartment Rents
**

**Chebyshev’s Theorem (continued)
**

Actually, 86% of the rent values

are between 409 and 573.

425

440

450

465

480

510

575

430

440

450

470

485

515

575

430

440

450

470

490

525

580

435

445

450

472

490

525

590

435

445

450

475

490

525

600

435

445

460

475

500

535

600

435

445

460

475

500

549

600

435

445

460

480

500

550

600

440

450

465

480

500

570

615

440

450

465

480

510

570

615

Slide 8

Empirical Rule

For data having a bell-shaped distribution:

• Approximately 68% of the data values will be

**within one standard deviation of the mean.
**

• Approximately 95% of the data values will be

within two standard deviations of the mean.

• Almost all (99.7%) of the items will be within

three standard deviations of the mean.

Slide 9

**Example: Apartment Rents
**

Empirical Rule

Within +/- 1s

Within +/- 2s

Within +/- 3s

425

440

450

465

480

510

575

430

440

450

470

485

515

575

430

440

450

470

490

525

580

Interval

436.06 to 545.54

381.32 to 600.28

326.58 to 655.02

435

445

450

472

490

525

590

435

445

450

475

490

525

600

435

445

460

475

500

535

600

435

445

460

475

500

549

600

% in Interval

48/70 = 69%

68/70 = 97%

70/70 = 100%

435

445

460

480

500

550

600

440

450

465

480

500

570

615

440

450

465

480

510

570

615

Slide 10

Detecting Outliers

**An outlier is an unusually small or unusually large
**

value in a data set.

A data value with a z-score less than -3 or greater

than +3 might be considered an outlier.

It might be an incorrectly recorded data value.

It might be a data value that was incorrectly included

in the data set.

It might be a correctly recorded data value that

belongs to the data set !

Slide 11

**Example: Apartment Rents
**

Detecting Outliers

The most extreme z-scores are -1.20 and 2.27.

Using |z| > 3 as the criterion for an outlier,

there are no outliers in this data set.

**Standardized Values for Apartment Rents
**

-1.20

-0.93

-0.75

-0.47

-0.20

0.35

1.54

-1.11

-0.93

-0.75

-0.38

-0.11

0.44

1.54

-1.11

-0.93

-0.75

-0.38

-0.01

0.62

1.63

-1.02

-0.84

-0.75

-0.34

-0.01

0.62

1.81

-1.02

-0.84

-0.75

-0.29

-0.01

0.62

1.99

-1.02

-0.84

-0.56

-0.29

0.17

0.81

1.99

-1.02

-0.84

-0.56

-0.29

0.17

1.06

1.99

-1.02

-0.84

-0.56

-0.20

0.17

1.08

1.99

-0.93

-0.75

-0.47

-0.20

0.17

1.45

2.27

-0.93

-0.75

-0.47

-0.20

0.35

1.45

2.27

Slide 12

**Exploratory Data Analysis
**

**Five-Number Summary
**

Box Plot

Slide 13

**Five-Number Summary
**

Smallest Value

First Quartile

Median

Third Quartile

Largest Value

Slide 14

**Example: Apartment Rents
**

**Five-Number Summary
**

Lowest Value = 425

First Quartile = 450

Median = 475

Third Quartile = 525

Largest Value = 615

425

440

450

465

480

510

575

430

440

450

470

485

515

575

430

440

450

470

490

525

580

435

445

450

472

490

525

590

435

445

450

475

490

525

600

435

445

460

475

500

535

600

435

445

460

475

500

549

600

435

445

460

480

500

550

600

440

450

465

480

500

570

615

440

450

465

480

510

570

615

Slide 15

Box Plot

**A box is drawn with its ends located at the first and
**

third quartiles.

A vertical line is drawn in the box at the location of

the median.

Limits are located (not drawn) using the interquartile

range (IQR).

• The lower limit is located 1.5(IQR) below Q1.

• The upper limit is located 1.5(IQR) above Q3.

• Data outside these limits are considered outliers.

… continued

Slide 16

**Box Plot (Continued)
**

**Whiskers (dashed lines) are drawn from the ends of
**

the box to the smallest and largest data values inside

the limits.

The locations of each outlier is shown with the

symbol * .

Slide 17

**Example: Apartment Rents
**

Box Plot

Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5

There are no outliers.

37

5

40

0

42

5

45

0

47

5

50

0

52

5

550

575 600

625

Slide 18

- Dansaber.wordpress.com-A Dramatic Tour Through Pythons Data Visualization Landscape Including Ggplot AndnbspAltairUploaded bySiddharth Tiwari
- R02-descriptiveStats.128Uploaded byFimaIei10
- Pic to GraphUploaded byDylan Liew
- stats - goal target scUploaded byapi-241737637
- stat quizUploaded bymgayanan
- Ma Yank 33 Ashish 56Uploaded byAshish Tewari
- Solomon B QP - S1 EdexcelUploaded byAnadi Ranjan
- Power Outages and Economic Growth in AfricaUploaded byOmar Chehab
- STAB22 Midterm 2009FUploaded byexamkiller
- Draft 2015 observer deployment planUploaded byDeckboss
- Stat Chapter 3 NotesUploaded byAdam Glassner
- White Blood Cells in Northwestern GartersnakesUploaded byKaiZer Draven
- AP Statistics Semester Exam ReviewUploaded bykiraayumi6789
- Statistics_in_OpenOfficeUploaded byAlejandro Rodriguez
- ASSGMNTUploaded byHairul Farid
- ASWchapter6Uploaded byKamal Yagami
- Control ChartsUploaded bymuneerpp
- Robust Data Clustering by Learning Multi-metric Lq-norm DistancesUploaded bykalokos
- Descriptive Stat1Uploaded byZia Ahmad
- Hydraulic Design Manual_ Statistical Analysis of Stream Gauge DataUploaded byEnoch Arden
- ESTADISTICA CLASICAUploaded byJavierDMO
- Holiday h Wrk XI MATHS2013Uploaded byRKS708
- 1511.06233.pdfUploaded byabhiman singh
- lec5Uploaded byJawaid Iqbal
- Balota Et Al. (2007)Uploaded bycoconut108
- Kohl BrennerUploaded bysamirsamira928
- chap4Uploaded byJose Luis Cruz Vernaza
- jurnal penagian aktif denga surat paksa kepada wajib pajak (englishpajakUploaded byHardyDinoAnsyah
- 06839095Uploaded byFlorin Nastasa
- IUT 9ABS304 Probability & StatisticsUploaded bysivabharathamurthy

- TOC impact study.pdfUploaded byAshok
- AndanUploaded byAshok
- Utilizing a Lean Management SystemUploaded byAshok
- 5S.pdfUploaded byAshok
- Kanban-KaizenUploaded byAshok
- 5S -Excellence Assessment.pdfUploaded byAshok
- OM-11a JITUploaded byAshok
- DWM Overview RIBUploaded byAshok
- Stk Work Truth and MithsUploaded byAshok
- free_5s_audit_check_sheet_template.pdfUploaded byAshok
- Study of Effects of Theory of Constraints.pdfUploaded byAshok
- Leading a Lean Conversion Case StudyUploaded byAshok
- Open OfficeUploaded byAshok
- 5S -Levels Assessment.pdfUploaded byAshok
- Open Office.docxUploaded byAshok
- FMEA WorksheetUploaded byAshok
- 4. LocationUploaded byAshok
- Failure Mode and Effect AnalysisUploaded byAshok
- Lean TPM - Downtime Waste ControlsUploaded byAshok
- St SourcingUploaded byAshok
- Chapter 08- 7 QC Tools Pareto - CE - Red BeadUploaded byAshok
- Fmea IntroUploaded byAshok
- 9. PricingUploaded byAshok
- KAIZEN08.pdfUploaded byAshok
- Basic_TPS_Handbook_v1.pdfUploaded byAshok
- Kaizen Methods in Prodn.pdfUploaded byAshok
- Building the Lean Fulfillment StreamUploaded byAshok
- 0904-FMEAUploaded byAshok
- SE - FMEAUploaded byAshok
- What is an A3Uploaded byAshok