Problems on Counting Techniques

Averages

Averages can be tricky.

Consider:

Rate of

Return

Year 1

Year 2

Year 3

Year 4

Year 5

0.07

0.1

0.12

0.3

0.15

What is the average rate of return over the five year period?

Arithmetic average = .148

Correct average = .145321

Consider:

Dallas and Fort Worth are approximately 30 miles apart. On a round trip from

Dallas to Fort Worth and back, you average 30 mph on the first leg from Dallas to Fort

Worth. How fast to you have to travel on the return leg from Fort Worth to Dallas so that

you average 60 mph for the round trip?

Usual answer:

90 mph

Correct answer:

it is impossible

Measures of Location

The Arithmetic Average

The arithmetic average of a set of values is the sum of the values divided by the

number of values.

If x1, x2, . . . . xn represent the n numerical values from a random sample, then the

formula for the sample mean is:

x xi n

i

To find the average( when I use this term subsequently, I will mean the arithmetic

average), using EXCEL, one uses the function average. It is used just like the

median function.

Specifically, one types =average( range of data). For the data on steel thickness,

you would have something that looks like the below:

By closing the parentheses, you get the average for the data as 354.55.

From Grouped Data

If we do not have the raw data but only the frequency distribution of the data , the

formula for the sample mean becomes:

x

i

f m /n

i

EXCEL does not compute this formula directly. To compute this in EXCEL for

the steel thickness data, one can use the following procedure:

Interval

341.5

344.5

347.5

350.5

353.5

356.5

359.5

362.5

344.5

347.5

350.5

353.5

356.5

359.5

362.5

365.5

m(i)

Midpoint

f(i)

Freq

f(i)*m(i)

343

346

349

352

355

358

361

364

1

3

8

8

20

13

5

2

343

1038

2792

2816

7100

4654

1805

728

60

21276

Average

354.6

pf

i

/n

then the formula for the mean from grouped data (and also the formula for a discrete

probability distribution) is:

pm

i

Using the above, it is then possible to generalize the definition of the mean for

data from a continuous distribution with probability density function f(x) as:

xf ( x )dx

Consider the problem of having two groups of people, 50 people in Group 1 with

an average hourly wage of $15.00 and 100 people in Group 2 with an average hourly

wage of $17.00, can I find the mean of the pooled group of 150 people.

The average of the pooled group is just the total hourly wages of all 150 people

divided by the 150 people. Using the formula for the arithmetic average, one can show

that:

nx xi

i

Therefore the sum of the hourly wages in the first group is 50 x 15 = 750.

The sum of the hour wages in the second group is 100 x 17 = 1700. Finally the mean of

the pooled group is:

pooled average = (750 + 1700)/(50 + 100) = $16.33

This can be written in formula terms as:

pooled

pooled

ni xi / ni

i

with the median:

Average

Group

1

Group

2

Change

5

10

15

20

25

4

12

18

19

23

-1

2

3

-1

-2

15

15.2

0.2

Notice that the change in the means is the same as the mean of the changes.

Summary

Criterion

Median

Mean

Ease of Understanding

High

Reasonable

Computation

Moderate

Easy

Effect of Outliers

None

High

None

Easy

Population for fixed sample

of size n

Baseline

Simpsons Paradox

Consider the following data found in the file meandemo.xls:

Males

Male

Average

Prof

35

60,000

65,000

Assoc Prof

25

50,000

20

55,000

Asst Prof

15

40,000

15

45,000

Average

Female

Females Average

52,667

52,500

Time 1

Group 1

30

35

48

Group 2

14

85

98

Group 3

60

63

65

All

Groups

Time 1

Median

Time 2

Time 2

Median

Median

Change

35

31

32

75

32

-3

85

60

83

85

83

-2

63

61

62

98

62

-1

62

60

Measures of Scale

The simplest way to measure scale is to find the average distance of each datpoint

from the measure of location (in our case the arithmetic mean). Symbolically this can be

written:

( x x) 0

i

The fact that some deviations are positive and some negative can be corrected in

one of two ways:

1) Use the absolute value to compute the mean absolute deviation (MAD), which

in formula terms is:

MAD

i

x x /n

i

s ( x i x)

2

and,

/ ( n 1)

In EXCEL, the function stdev uses the above formula for computing the sample

standard deviation:

For the steel thickness data, you would type =stdev(range) as shown below:

EXCEL does not automatically compute the standard deviation if the data is

grouped. The computing formula to use in this case is given by:

2

(

i

f mi

i

n x ) / (n 1)

The necessary terms can be computed in EXCEL as shown in the following table

for the steel data:

Interval

341.5

344.5

347.5

350.5

353.5

356.5

359.5

362.5

344.5

347.5

350.5

353.5

356.5

359.5

362.5

365.5

m(i)

Midpoint

f(i)

Freq

343

346

349

352

355

358

361

364

1

3

8

8

20

13

5

2

343

1,038

2,792

2,816

7,100

4,654

1,805

728

117,649

359,148

974,408

991,232

2,520,500

1,666,132

651,605

264,992

Sum

60

21,276

7,545,666

f(i)*m(i)

f(i)*m(i)*m(i)

If only the proportion of observations in each bin are available, then the following

approximate formula may be used:

2

p mi

i

The standard deviation for data following a theoretical distribution function f(x)

can also be defined as:

f ( x ) dx

and,

Percent of Data

Region

68%

95%

99.7%

For the steel thickness data (which is mound shaped) the exact results are:

Region

Values

mean

+/- 1 sd

mean

+/- 2 sd

mean

350.1

345.6

341.1

%

to

to

to

359.0

363.5

368.0

73.0%

96.7%

100.0%

Chebyshevs Inequality

For any distribution, at least 100(1- 1/k2)% of the data must lie in the region, the

mean +/- k standard deviations.

Specifically, for k=2, at least 75% of the data must lie in the range mean +/- 2

standard deviations.

For k=3, at least 88.9% of the data must lie in the range mean +/- 3 standard

deviations.

Class

Mean

Standard Deviation

Monday

85

Wednesday

90

A Student from the Monday night class takes the Wednesday exam and scores 92

To what score in the Monday night class, does this score correspond?

Define:

t ( x x) / s

and

x x ts

xMonday = 85 + .25 x 6 = 86.5

