2 views

Uploaded by Tian Ze

- Grade 10 Mean Mode and Median : EduGain.com
- Statistics and Set Theory
- Dnvgl Rp c205
- Moldex3D Approved
- Samplinf Distribution
- Quantitative Statistics Project
- Stats Test 2 Cheat Sheet 2.0
- Gentle Lentil Case Solution
- IBO 2004 pract 4
- Short Presentation What is Six Sigma
- Descriptive Stats
- 6- 241
- Spanish Norms for 3-4 Year-old SDQ
- Mathematicschapter-15.pdf
- EXERCISE3 - S2 2014
- GeNorm Manual
- bio-lab
- S1 Measures of Dispersion.ppt
- Basic Statistics Assignment
- Normal Distribution

You are on page 1of 69

Descriptive Statistics

Part 2: Descriptive Measures

Learning Objectives

Measures of centre

Measures of dispersion (spread)

Standardized (Z) scores

Identifying potential outliers

Box & whisker plots (boxplots)

Goal 2 of Descriptive

Statistics

In part 1, we looked at grouping and

graphing data.

Methods depended on data type.

summary statistics for data:

Measures of centre

Measures of spread

3

Measures of Centre

Most common measure of centre is the

MEAN (average).

Mean =

(sum of numbers)/(# of numbers)

Mathematical short-hand notation for

writing long sums.

If you have, say, n data points, called

x1 , x2 , x3 , , xn

Then:

x1 x2 xn

5

Suppose you have 4 data points (n =

4):

5, 9, 13.5, 18.2

Then:

x1 5, x2 9, x3 13.5, x4 18.2

x1 x2 x3 x4 5 9 13.5 18.2

45.7

6

Sigma Notation

In other words, sigma notation just

means add them up.

Words:

mean = sum of numbers/# of numbers

Sigma Notation:

Median

The value that separates the top and

bottom halves of ORDERED data.

To find the median: ~

x

Arrange data into increasing order.

Odd # of data points: Take middle number.

Even # of data points: Take average of the

two middle numbers.

Median: Example

Suppose you have the following data:

4, 6, 0, 3.1, 5.5, 7, 4

0, 3.1, 4, 4, 5.5, 6, 7

right simultaneously until you get down

to only 1 or 2 numbers. ~

x

10

Median: Example

Suppose you now have this data:

5, 1, 8, 9, 0, 7, 2, 1

Step 1: Arrange.

0, 1, 1, 2, 5, 7, 8, 9

~

x

11

Mode

Most frequent data point.

A data set can have any number of

modes.

0 modes: All data points occur once.

1 mode: One observation occurs more

than the others.

2 modes: Two observations occur equally

more often than the others.

Etc

12

Mode

Example: For the following data sets,

find the mode.

(a) 5, 9, 1, 4, 9, 4, 9, 6.

(b) 5, 9, 1, 4, 9, 4, 9, 6, 4.

(c) 1, 2, 3, 4, 5, 6, 7.

13

Sensitivity to Outliers

Consider the following data:

20, 21, 25, 25, 26, 27, 28, 29.

Calculate the mean.

Calculate the median.

Find the mode.

14

Sensitivity to Outliers

Now consider the same data set, but

with an outlier:

20, 21, 25, 25, 26, 27, 28, 29, 331.

mode.

Which of the mean, median, or mode is

most sensitive to (most affected by)

the outlier?

15

Robustness

Measures that are NOT sensitive to

outliers are called Robust.

Therefore, the following measures of

centre are robust:

16

Best?

Depends on your data.

If you have a few outliers

Generally use

outliers, then the MEAN is best.

Qualitative data:

Only the

is possible.

continuous data.

17

Measures of Centre:

Graphically

Where are the measures of centre on

the following distributions?

18

Measures of Centre:

Graphically

19

Using Minitab

Minitab quickly calculates measures of

centre for you (seen in section 1 for the

average of the circles).

20

Measures of Spread

Does the mean or median tell you how

your data is spread out (dispersed)?

NO! For example:

Consider the following data:

49, 50, 51

Mean = 50, median = 50.

0, 50, 100

Mean = 50, median = 50.

21

Measures of Spread

It is very important in statistical

analyses to be able to describe how the

data is spread out.

Three main ways of doing this:

Range

Standard Deviation

Interquartile Range

22

Range

Range: Max value Min value of the

data.

Example: 34, 10, 49, 28, 51, 19.

Range = 51 10 = 41.

Is it sensitive to outliers?

23

Standard Deviation

Range only involves the maximum and

minimum observations.

It therefore ignores ALMOST ALL of

your data!

Standard deviation takes ALL

observations into account.

Measures how much, on average, each

value differs from the mean.

24

Deviation

1. Find the mean of the data.

2. Find the DIFFERENCE between each

data point and the mean. These are

called RESIDUALS.

3. Square ALL residuals from step 2.

25

Deviation

4. ADD all the numbers from step 3.

5. DIVIDE by n 1. This gives SAMPLE

VARIANCE.

26

Deviation

6. Finally, take the SQUARE ROOT to get

the sample standard deviation.

(x x)

n 1

27

Example Calculation

A company conducted a survey to

determine how long it takes their

employees to get to work. The data is

recorded in minutes. Find the variance

and standard deviation of the data.

Include units.

13.0, 17.5, 24.6, 18.0, 20.4, 17.7.

28

Example Calculation

Step 1: Find the mean. DO NOT

round intermediate calculations, but

round your final answer (in this case,

the standard deviation) two decimal

places.

x

6

x 18.53333333 minutes

29

Example Calculation

(Data was 13, 17.5, 24.6, 18, 20.4, 17.7 with

mean 18.53333333).

For the rest of the calculation, use a table:

xi x

( xi x ) 2

17.5

-1.03333333

1.06777777

24.6

6.06666667

36.80444449

18

-0.53333333

0.28444444

20.4

1.86666667

3.48444446

17.7

-0.83333333

0.69444444

xi

13

TOTAL = 72.95333334

30

Example Calculation

Divide that total by n 1. This gives

variance.

Variance S 2

deviation. (As specified earlier, round to

2 decimal places).

Standard Deviation S 14.59066667 3.82

31

The height data from 5 UPEI students

are (in inches): 65, 75, 71, 68, 66.

Calculate the variance and standard

deviation. Include units in each.

x x

n 1

32

Standard Deviation:

Graphically

The mean measures where the

CENTRE of your data is.

Standard deviation measures how

SPREAD OUT your data is.

Large S => lots of spread, and vice

versa.

33

Standard Deviation:

Graphically

Example: The datasets graphed on the

next slide have the same mean, and

are graphed on the same scales.

Which one has the larger standard

deviation?

34

Standard Deviation:

Graphically

35

Square?

Recall that to calculate S, we have to

square the residuals:

( xi x )

happen if we DIDNT square them.

36

Square?

Consider the data: 2, 5, 10, 12, 17.

Mean = 9.2

xi x

xi

2

-7.2

-4.2

10

0.8

12

2.8

17

7.8

( xi x ) 2

Always

Happens!

Total =

37

The total (and therefore, the

MEAN) of a set of RESIDUALS

is ALWAYS 0!

38

As you can see, calculating S is fairly

tedious by hand.

Minitab can do this quickly!

Its one of the calculations that are

done using the stat->basic statistics

(the same way we found the mean in

section 1).

39

We use descriptive measures (mean,

standard deviation, etc.) of samples to

ESTIMATE the descriptive measure of a

population.

Statistic: A descriptive measure for a

SAMPLE.

Parameter: A descriptive measure for

a POPULATION.

40

Notation

The SAMPLE mean is

mean:

41

Notation

The SAMPLE variance and standard

deviation are

2

Statistics

S and S

POPULATION variance and standard

deviation:

Parameters

2

42

Population Parameters

The MEAN of a POPULATION is

calculated in the same way as for a

SAMPLE.

The STANDARD DEVIATION of a

POPULATION is slightly different than

that of a sample.

43

Sample Standard

Population Standard

Deviation

Deviation

Slight Difference: sample mean vs.

population mean

(x x)

i

n 1

(x )

i

Reason is given in section 4.

44

Suppose you somehow have

population data, with the following

parameters:

17 , 2

x = 25. How many standard deviations

away from the mean is this value?

45

To do this, first think about the

DISTANCE between the observation

and the mean:

Next, figure out how many standard

deviations this is:

46

A Z score gives a formula to

determine this information.

Z = Number of standard deviations an

observation is from the mean.

From our thought process,

Z = (distance from mean)/(St.Dev.), or:

47

Example

x

Z

for a population.

28.1, 5.83

Calculate the Z scores of the data

points: x = 39.4, x = 13.6, x = 28.1

What is the significance of the SIGN

(+, -, or zero) of your Z score?

48

Z Scores

If a Z score is POSITIVE, then the

observation it came from was ABOVE

the mean.

If a Z score is NEGATIVE, then the

observation was BELOW the mean.

If a Z score is ZERO, then the

observation EQUALLED the mean.

49

For the following population data:

6.6, 10.4, 11.7, 15.3

11, 3.1

(a) Calculate the Z scores of ALL the

observations (round each to 1 decimal).

(b) Find the mean and population standard

deviation of these Z scores (1 decimal).

50

The MEAN of a set of Z

scores is

The STANDARD DEVIATION

of a set of Z scores is

This information will be used in

section 4 and for the rest of the

course!

51

Interquartile Range

Standard deviation uses the MEAN to

determine spread.

Interquartile Range uses the MEDIAN.

Also provides a method of determining

potential outliers.

52

Quartiles: Definitions

A common practice in statistics is to

divide your data into QUARTERS.

First Quartile (Q1): The observation

below which is the bottom 25% of the

data.

Second Quartile (Q2): Same, but 50%.

Note that Q2 =

53

Find Q2 (the median) first.

Q1 = median of data to the LEFT of Q2.

Ignore Q2 and everything to its right.

Q2.

Ignore Q2 and everything to its left.

54

Quartiles: Example

Find the quartiles for the following data

sets:

(a) 12, 34, 21, 9, 20, 16, 80, 45, 32.

(b) 105, 51, 142, 88, 100, 97.

First: Arrange

(a) 9, 12, 16, 20, 21, 32, 34, 45, 80.

(b) 51, 88, 97, 100, 105, 142.

55

5 Number Summary

The following five values of a data set

make up its 5 number summary:

Minimum

Q1

Q2

Q3

Maximum

56

5 Number Summary

Example: Find the 5 number

summary for the data set (a) of our

quartile example.

You must arrange the data first, which was:

9, 12, 16, 20, 21, 32, 34, 45, 80.

max=80.

5 number summary written in curly

brackets: {9, 14, 21, 39.5, 80}.57

IQR = Q3 Q1

Gives an idea of the spread of the inner

half of the data.

Example: For our dataset (a) of the

quartile example,

IQR = 39.5 14 = 25.5

58

The upper and lower limits use the IQR

to give a method of determining

potential outliers.

Lower Limit (LL) = Q1 1.5(IQR)

Upper Limit (UL) = Q3 + 1.5(IQR)

59

Potential Outliers

Data that falls WITHIN the upper and

lower limits is considered OK.

The following data points are

considered POTENTIAL OUTLIERS:

Higher than UL.

Lower than LL.

60

For dataset (a) of the quartile example,

find the potential outliers, if any.

(Ordered) data was:

9, 12, 16, 20, 21, 32, 34, 45, 80.

61

(Continued)

5 number summary was

{9, 14, 21, 39.5, 80}

IQR = Q3 Q1 = 25.5.

LL = Q1 1.5(IQR) = 14 1.5(25.5)

= -24.25

UL = Q3 + 1.5(IQR) = 39.5 + 1.5(25.5)

= 77.75

62

(Continued)

Therefore, all data points BETWEEN

-24.25 and 77.75 are OK:

Potential Outliers

Data in here is OK

LL = -24.25

UL = 77.75

63

(Continued)

Therefore, the only potential outlier for

that dataset is

64

Adjacent Values

Adjacent values are the two values

WITHIN the LL and UL, but CLOSEST

to them.

Example: In our quartile example:

9, 12, 16, 20, 21, 32, 34, 45, 80:

LL was -24.25 and UL was 77.75.

65

Modified Boxplots

A way to picture the 5 number

summary, adjacent values, and

potential outliers.

Modified boxplot:

Make a box from the three quartiles.

The whiskers (lines) are drawn from the

box to the ADJACENT values.

Mark the potential outliers as *.

66

Modified

Boxplots

80

70

60

50

40

30

20

10

0

Data (arranged):

9, 12, 16, 20, 21, 32, 34, 45, 80.

5 number summary was {9,

14, 21, 39.5, 80}

Adjacent values: 9, 45

Potential outlier: 80

2. Mark a small horizontal line for

each quartile and connect to

make a box.

3. Mark the adjacent values and

connect.

4. Mark a * for each potential

outlier (dont connect).

67

For the following dataset:

105, 10, 205, 88, 100, 97, 60, 127

Find:

5 number summary.

Lower limit and upper limit.

Adjacent values.

Potential outliers.

68

Boxplots in Minitab

Minitab makes modified boxplots.

Warning: Its mechanism for finding Q1

and Q3 is a bit different from the way

we do it by hand.

They will be fairly close to what you

would find by hand.

69

- Grade 10 Mean Mode and Median : EduGain.comUploaded byEduGain
- Statistics and Set TheoryUploaded bychillewad
- Dnvgl Rp c205Uploaded byAshish Gupta
- Moldex3D ApprovedUploaded byweb312_tw
- Samplinf DistributionUploaded byshubhangi
- Quantitative Statistics ProjectUploaded byArman Taba
- Stats Test 2 Cheat Sheet 2.0Uploaded byMark Stancliffe
- Gentle Lentil Case SolutionUploaded byvarshika
- IBO 2004 pract 4Uploaded bypdbiocomp
- Short Presentation What is Six SigmaUploaded byGuilherme Moro Bigaran
- Descriptive StatsUploaded byanshul shandil
- 6- 241Uploaded byAhmed Shawkey Shahen
- Spanish Norms for 3-4 Year-old SDQUploaded byverolu01
- Mathematicschapter-15.pdfUploaded byAnkit Rao
- EXERCISE3 - S2 2014Uploaded byRenukadevi Rpt
- GeNorm ManualUploaded byManikantan K
- bio-labUploaded bySheen Junio Visda
- S1 Measures of Dispersion.pptUploaded byMattyVipp
- Basic Statistics AssignmentUploaded byravindra erabatti
- Normal DistributionUploaded bysitalcoolk
- AssignmentUploaded byEthan Hunt
- Z test - Part 1Uploaded byמורן רזניק
- II FTUploaded bySiddharth Gandhi
- Medical Statistics Made Easy for the Medical PractUploaded byLuming Li
- Dropping BallsUploaded byJennifer Whalen
- AggUploaded byYatharth
- chapter5Uploaded byapi-268563289
- 1396-5738-1-PBUploaded byBangYongGuk
- Chapter 4Uploaded byRaj Kumar
- ITEMANUploaded byNabila

- Msb11e Ppt Ch02Uploaded byÖzlem van Şagmıt
- S1 2005-06Uploaded byek
- kest106Uploaded byMd Ibrahim
- SPSS Intermediate Understanding Your DataUploaded byClyde Aguilar
- 11-statistics1-P62-summer-2002-2014Uploaded byHalim Guru Fisika
- Introduction to StatisticsUploaded byProfessor Sameer Kulkarni
- the data explortation projectUploaded byapi-299777256
- One Variable Analysis 1-1Uploaded bywolfacci
- STP Maths 9A AnswersUploaded byAbigail Judith
- 330874352 Caro Mio Ben Low PDFUploaded byStephanie Garcia
- stat 2Uploaded bymohd razali abd samad
- CHAPTER 7Uploaded byFaZz Baa
- Assignment II Stat iUploaded byfazalulbasit9796
- Statistics Examples 3Uploaded byvivi_15o689_11272315
- gask ridge isuUploaded byapi-391866160
- openSAP_ds1_Week_3_All_SlidesUploaded byqwerty_qwerty_2009
- Lesson 9-4 Reflection and SummaryUploaded bynickidion
- Journal of Air Transport Management Volume 22 Issue None 2012 [Doi 10.1016_j.jairtraman.2012.01.006] Marco Linz -- Scenarios for the Aviation Industry- A Delphi-based Analysis for 2025Uploaded byJoana Pina
- Chapter 1Uploaded byapi-3729261
- Boxplot LessonUploaded byEdgar Alcantar
- DMA-Cwk-4257293Uploaded byAigerim Tulegenova
- A_Comparison_of_2D-3D_Pose_Estimation_MethodsUploaded byThiago Souto Maior
- Basic Business Stats(Foster Stine Water)Uploaded byBrijraj Patil
- Statistics Summary 675Uploaded byAhsan Afzal
- Chapter 04Uploaded byJose Q. Hdz
- Project Work 2017Uploaded byKelvin Teo
- Handout SevenUploaded byCassie Desilva
- A New Approach To Diagnosis of Sucker Rod Pump Systems by Analyzing Segments of Downhole Dynamometer Cards.pdfUploaded byGaldir Reges
- Summary of Formula - StatisticsUploaded byEzekiel D. Rodriguez
- IBHM_528-560Uploaded byalphamale173