Engineering Descriptive Stats

Engineering Descriptive Stats

Descriptive Statistics

populations vs. samples

• we want to describe both samples and

populations

• the latter is a matter of inference…

“outliers”

• minority cases, so different from the majority

that they merit separate consideration

– are they errors?

– are they indicative of a different pattern?

• think about possible outliers with care, but

beware of mechanical treatments…

• significance of outliers depends on your

research interests

summaries of distributions

• graphic vs. numeric

– graphic may be better for visualization

– numeric are better for statistical/inferential

purposes

• resistance to outliers is usually an advantage

in either case

general characteristics

0.22

• kurtosis [“peakedness”]

0.4

0.8

0.00

-5 5

X

X

0.0

-5 5

D

0.0

-5 5

‘leptokurtic’

D

’platykurtic’

5

4 right

(positive) • skew (skewness)

3 skew

X

2

5

1

4 left

0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 (negative)

D 3 skew

X

0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

D

central tendency

• measures of central tendency

– provide a sense of the value expressed by

multiple cases, over all…

• mean

• median

• mode

mean

• center of gravity

• evenly partitions the sum of all

measurement among all cases; average of

all measures

n

∑x i

x= i =1

n

mean – pro and con

• crucial for inferential statistics

descriptive purposes

mean

rim diameter (cm)

unit 1 unit 2

unit 1 unit 2 9 26

12.6 16.2 25

11.6 16.4 24

16.3 13.8 23

13.1 13.2 22

12.1 11.3 21

26.9 14.0 20

9.7 9.0 19

11.5 12.5 18

14.8 15.6 17

13.5 11.2 3 16 24

12.4 12.2

15 56

13.6 15.5

14.0== 8 14 0

11.7

651 13 28 ==13.3

641 12 25

n 12 13

total 168.1 172.6 65 11 237

total/n 14.0 13.3 10

7 9 0

R: mean(x)

trimmed mean

rim diameter (cm)

9.7 9.0 9 26

11.5 11.2 25

11.6 11.3 24

12.1 11.7 23

12.4 12.2 22

12.6 12.5 21

13.1 13.2 20

13.5 13.8 19

13.6 14.0 18

14.8 15.5 17

16.3 15.6 3 16 24

26.9 16.2 15 56

16.4 8 14 0

13.2== 651 13 28 ==13.4

n 10 11 641 12 25

total 131.5 147.2 65 11 237

total/n 13.2 13.4 10

7 9 0

R: mean(x, trim=.1)

median

• 50th percentile…

• more resistant to effects of outliers…

median

unit 1 unit 2

9 26

rim diameter (cm)

25

24

unit 1 unit 2

23

9.7 9.0

22

11.5 11.2

21

11.6 11.3

20

12.1 11.7

19

12.4 12.2

18

12.6 12.5

17

12.9 <-- 13.2 13.2 3 16 24

13.1 13.8 15 56

13.5 14.0 8 14 0

13.6 15.5 651 13 28 ==13.20

14.8 15.6 12.85== 641 12 25

16.3 16.2 65 11 237

26.9 16.4 10

7 9 0

mode

• the most numerous category

• for ratio data, often implies that data have

been grouped in some way

• can be more or less created by the grouping

procedure

• for theoretical distributions—simply the

location of the peak on the frequency

distribution

isolated scatters

hamlets

villages

regional centers

regional centers

modal class = ‘hamlets’

0.00

0.22

1.5

2.0

5

2.5 -5

1.0

dispersion

• measures of dispersion

– summarize degree of clustering of cases, esp.

with respect to central tendency…

• range

• variance

• standard deviation

range

unit 1 unit 2 unit 1 unit 2

* 9 26

9.7 9.0

| 25

11.5 11.2 | 24

11.6 11.3 | 23

12.1 11.7 | 22

| 21

12.4 12.2 | 20

12.6 12.5 | 19

13.1 13.2 | 18

13.5 13.8 | 17

| 3 16 24 *

13.6 14.0 | 15 56 |

14.8 15.5 | 8 14 0 |

16.3 15.6 | 651 13 28 |

26.9 16.2 | 641 12 25 |

| 65 11 237 |

16.4 | 10 |

* 7 9 0 *

R: var(x)

variance

• analogous to average deviation of cases from

mean

• in fact, based on sum of squared deviations from

the mean—“sum-of-squares”

∑( x − x)

2

i

s =

2 i =1

n −1

variance

• computational form:

2

n

n

∑

i =1

x − ∑ xi / n

2

i

i =1

s =

2

n −1

• note: units of variance are squared…

mean = 22.6 mm

variance = 38 mm2

standard deviation

• square root of variance:

n 2

∑( x − x)

n n

∑ x − ∑ xi / n

2 2

i i

s= i =1

s= i =1 i =1

n −1 n −1

standard deviation

• units are in same units as base measurements

mean = 22.6 mm

standard deviation = 6.2 mm

– should give at least some intuitive sense of where most

of the cases lie, barring major effects of outliers

rim diameter (cm)

12.6 16.2 -1.4 2.9 1.98 8.54

11.6 16.4 -2.4 3.1 5.80 9.75

16.3 13.8 2.3 0.5 5.25 0.27

13.1 13.2 -0.9 -0.1 0.83 0.01

12.1 11.3 -1.9 -2.0 3.64 3.91

26.9 14.0 12.9 0.7 166.20 0.52

9.7 9.0 -4.3 -4.3 18.56 18.29

11.5 12.5 -2.5 -0.8 6.29 0.60

14.8 15.6 0.8 2.3 0.63 5.40

13.5 11.2 -0.5 -2.1 0.26 4.31

12.4 12.2 -1.6 -1.1 2.59 1.16

13.6 15.5 -0.4 2.2 0.17 4.94

11.7 -1.6 2.49

n: 12 13 variance: 19.29 5.02

stand. dev.: 4.39 2.24

trimmed dispersion measures

• variance and sd are even more sensitive to

extreme values (outliers) than the mean…

• why??

variance simply by eliminating cases from the

tails, and calculating the variance in the normal

way…

trimmed standard deviation

• trimmed sd is calculated differently

(n − 1) s 2

sT = W

nT − 1

• sT = trimmed standard deviation

n = number of cases in untrimmed batch

s2w = variance of trimmed (winsorized) batch

nT = number of cases in the trimmed batch

