MATH 1050Y

A Non-Calculus Based Introduction to Probability & Statistical Methods

Section A FW 2012-13 Instructor: Jaclyn Semple

Tuesday 12-1pm SC 203 Tuesday 1-2pm SC 203 Tuesday 4-5pm GCS 115 Remember: Weekly quizzes (iClicker) Weekly assignments due

2 - 50

2 - 51

2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Graphs of Data 2-4 Measures of Central Tendency 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis

MATH 1050Y-A (FW 2012-13)

A measure of central tendency is a value at the centre or the middle of a data set.

We will consider the following four measures of central tendency: mean median mode midrange.

2 - 52

2 - 53

The Mean

The (arithmetic) mean of a set of values is the number obtained by adding the values and dividing the total by the number of values. For a sample of n observations, the mean is referred to as the sample mean, x . If all N values of the population are available, this is referred to as the population mean, .

The Median

The median of a data set is the middle value when the values are arranged in order of increasing magnitude. For a sample of n observations, the median is % denoted by x (x-tilde). To find the median, first arrange the values in order. If the number of values is odd, the median is the number that is located in the exact middle of the list. If the number of values is even, the median is found by computing the mean of the two middle numbers.

MATH 1050Y-A (FW 2012-13)

x=

x

n

MATH 1050Y-A (FW 2012-13)

population mean mu

2 - 54

2 - 55

Odd Number of Values (Ordered)

0.42 0.48 0.66 0.73 1.10 1.10 5.40

The Mode

The mode of a data set is the value that occurs most frequently. When two values occur with the same greatest frequency, each one is a mode and the data set is said to be bimodal. When more than two values occur with the same greatest frequency, each one is a mode and the data set is said to be multimodal. When no value is repeated, we say that there is no mode. The mode is often denoted by M. It is the only measure of central tendency that can be use with nominal data.

2 - 56

MATH 1050Y-A (FW 2012-13)

% x = 0.73

Even Number of Values (Ordered)

0.42 0.48 0.73 1.10 1.10 5.40

% x

MATH 1050Y-A (FW 2012-13)

0.73 + 1.10 2

= 0.915

2 - 57

1) 5.40 1.10 0.42 0.73 0.48 1.10 2) 27 27 27 55 55 55 88 88 99 3) 1 2 3 6 7 8 9 10 Mode is 1.10 Bimodal: 27 & 55 No Mode

The Midrange

The midrange is the value midway between the highest and lowest values in the data set. The midrange is found by adding the highest value to the lowest value and then dividing the sum by 2. That is,

midrange =

2 - 58

2 - 59

Rounding Rule

A simple rule for rounding calculations of measures of central tendency is this: Carry one more decimal place than is present in the original data set. Note: Round only the final answer, never in the middle of a calculation.

A potato chip packaging plant selects 10 bags for a quality control check. The weights in grams are listed below. Find the mean, median, mode, and midrange for this data set. 454.1 454.4 455.0 455.1 454.2 454.6 454.9 454.4 454.7 455.2

2 - 60

2 - 61

When sample data are summarized in a frequency table, we can approximate the mean by replacing class limits with class midpoints and assuming that each class midpoint is repeated a number of times equal to the class frequency, f. We then use the following formula to approximate the sample mean.

class midpoint

Approximating the mean from the axial load data.

x=

MATH 1050Y-A (FW 2012-13)

( f x) f

2 - 62

MATH 1050Y-A (FW 2012-13)

2 - 63

From our previous example of statistics marks, approximate the mean by completing the table:

Marks (%) 30 39 40 49 50 59 60 69 70 79 80 89 90 - 99 Total

MATH 1050Y-A (FW 2012-13)

Weighted Mean

In some situations, the values vary in their degree of importance. In this situation, we may wish to compute a weighted mean

f 2 3 6 12 13 9 5 f=50

xf

takes frequency/importance into account Eg. Mean crop yield from 3 farms of different size A weighted mean is a mean computed with different values assigned different weights, w. We use the following formula to compute a weighted mean:

x=

2 - 64

MATH 1050Y-A (FW 2012-13)

(w x) w

2 - 65

Three assessment results (quiz, test, and final exam) in a course for a particular student are 65, 70 and 85. Find the students average mark in the course if the quiz is worth 25%, the test 45%, and final exam 30%.

Unfortunately, there is no single best measure of central tendency. This is because the best measure of central tendency largely depends on the data set being analyzed. One disadvantage of the mean is that it is sensitive to every data value, so even one unusually large or small value can affect the mean dramatically. The median largely overcomes this disadvantage. For a more complete comparison of the mean, median, mode, and midrange, refer to Table 2-6 in the textbook (p64)

(w x) = x= w

2 - 66

2 - 67

A distribution of data is symmetric if the left side of its histogram is roughly a mirror image of its right half. A distribution of data is skewed if it is not symmetric and extends more to one side than the other. Data skewed to the left are said to be negatively skewed; the mean and median are to the left of the mode. Data skewed to the right are said to be positively skewed; the mean and median are to the right of the mode.

MATH 1050Y-A (FW 2012-13)

NOTE: The mean and median cannot always be used to identify the shape of a distribution of data values.

MATH 1050Y-A (FW 2012-13)

2 - 68

2 - 69

Returning to the histogram of the ages of faculty members cars, what is the shape of the distribution?

Look at the following histogram for salaries of baseball players. What shape would you say the data take?

4

A. Symmetric B. Left-skewed

MATH 1050Y-A (FW 2012-13)

A. B. C. D. E.

2 - 70

MATH 1050Y-A (FW 2012-13)

2 - 71

Coming up

Reminders: Assignment #2 due Tuesday in seminar Quiz #2 Tuesday in seminar For next class: Do practice questions from 2-4 Read Section 2-5 bring iClicker

2 - 72

