You are on page 1of 29

Key Terms arithmetic mean average See mean It is better to avoid this sometimes vague term.

It usually refers to the (arithmetic) mean, but it can also signify the median, the mode, the geometric mean, and weighted mean, among other things A bar graph such that the area over each class interval is proportional to the relative frequency of data within this interval The sum of a list of numbers, divided by the total number of numbers in the list. Also called arithmetic mean "Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two. The median can be estimated from a histogram by finding the smallest number such that the area under the histogram to the left of that number is 50% For lists, the mode is the most common (frequent) value. A list can have more than one mode. For histograms, a mode is a relative maximum ("bump"). A data set has no mode when all the numbers appear in the data with the same frequency. A data set has multiple modes when two or more values appear with the same frequency. A distribution with more than one mode. The histogram of a multimodal distribution has more than one "bump" The range of a set of numbers is the largest value in the set minus the smallest value in the set. Note that the range is a single number, not many numbers

histogram mean median


multimodal distribution range


A total is determining the overall sum of numbers or a quantity.

Mean, Mode, Median Given the data set {13, 3, 10, 9, 7, 10, 12, 8, 6, 3, 9, 6, 11, 5, 9, 13, 8, 7, 7} find the mean, median and mode.

Enter the data into a list. (See Basic Commands for entering data.)

Find the Mean and Median: Method 1: (fast and easy) Press 2nd MODE (QUIT) to return to the home screen. Press 2nd STAT (LIST). Arrow to the right to MATH. Choose option #3: mean( if you want the mean. Choose option #4: median( if you want the median. Your choice will appear on the home screen waiting for you to tell it which list to use. Remember the List names appear on the face of the calculator above the number

keys 1-6.

Find the Mean and Median: Method 2: (a bit more sophisticated) Press STAT. Arrow to the right to CALC. Now choose option #1: 1-Var Stats. When 1-Var Stats appears on the home screen, tell the calculator the name of the list you are using (such as: 1-Var Stats L1) Press ENTER. Arrow up and down the screen to see the statistical information about the data.

Find the Mode: (While there is no specific calculator function to find the mode, the calculator is helpful in ordering the data so that you can find the mode easily.) Sort the data into ascending or descending order to help find the mode. STAT, #2 SortA(, and specify L1, or the list you are using.

Look at the list (STAT, #1 EDIT). Examine the data to see which value(s) appear(s) most often. The mode for this data set is 7 and 9. Each of these values appears 3 times in the data set. ]Mode of a probability distribution The mode of a discrete probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. The mode of a continuous probability distribution is the value x at which its probability density function attains its maximum value, so, informally speaking, the mode is at the peak. As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1, x2, etc. The above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal). In symmetric unimodal distributions, such as the normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode. The mode is if there is more than one number in the plot Example 1,2,2,3,4,5,5,6,5,1 The numbers that repeat are the mode. There can be more than one mode anytime [edit]Mode of a sample The mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the

dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal. For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize the data by assigning frequency values to intervals of equal distance, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode. The following MATLAB code example computes the mode of a sample: X = sort(x); indices = find(diff([X; realmax]) > 0); % indices where repeated values change [modeL,i] = max (diff([0; indices])); % longest persistence length of repeated values mode = X(indices(i)); The algorithm requires as a first step to sort the sample in ascending order. It then computes the discrete derivative of the sorted list, and finds the indices where this derivative is positive. Next it computes the discrete derivative of this set of indices, locating the maximum of this derivative of indices, and finally evaluates the sorted sample at the point where that maximum occurs, which corresponds to the last member of the stretch of repeated values. [edit]Comparison of mean, median and mode See also: mean and median Comparison of common averages of values { 1, 2, 2, 3, 4, 7, 9 } Type Description Example Result

Arithmetic Sum divided by number of mean values: Middle value separating the greater and lesser halves of a data set Most frequent value in a data set

(1+2+2+3+4+7+9) /7


1, 2, 2, 3, 4, 7, 9


1, 2, 2, 3, 4, 7, 9

[edit]When do these measures make sense? Unlike mean and median, the concept of mode also makes sense for "nominal data" (i.e., not consisting of numerical values). For example, taking a sample of Korean family names, one might find that "Kim" occurs more often than any other name. Then "Kim" would be the mode of the sample. In any voting system where a plurality determines victory, a single modal value determines the victor, while a multi-modal outcome would require some tie-breaking procedure to take place. Unlike median, the concept of mean makes sense for any random variable assuming values from a vector space, including the real numbers (a onedimensional vector space) and the integers (which can be considered embedded in the reals). For example, a distribution of points in the plane will typically have a mean and a mode, but the concept of median does not apply. The median makes sense when there is a linear order on the possible values. Generalizations of the concept of median to higherdimensional spaces are the geometric median and the centerpoint. [edit]Uniqueness and definedness For the remainder, the assumption is that we have (a sample of) a realvalued random variable. For some probability distributions, the expected value may be infinite or undefined, but if defined, it is unique. The mean of a (finite) sample is always defined. The median is the value such that the fractions not exceeding it and not falling below it are both at least 1/2. It is not necessarily unique, but never infinite or totally undefined. For a data sample it is the "halfway" value when the list of values is ordered in increasing value, where usually for a list of even length the numerical average is taken of the two values closest to "halfway". Finally, as said before, the mode is not necessarily unique.

Certain pathological distributions (for example, the Cantor distribution) have no defined mode at all.[citation needed] For a finite data sample, the mode is one (or more) of the values in the sample. Properties Assuming definedness, and for simplicity uniqueness, the following are some of the most interesting properties.

All three measures have the following property: If the random variable (or each value from the sample) is subjected to the linear or affine transformation which replaces X by aX+b, so are the mean, median and mode. However, if there is an arbitrary monotonic transformation, only the median follows; for example, if X is replaced by exp(X), the median changes from m to exp(m) but the mean and mode won't.[citation needed] Except for extremely small samples, the mode is insensitive to "outliers" (such as occasional, rare, false experimental readings). The median is also very robust in the presence of outliers, while the mean is rather sensitive. In continuous unimodal distributions the median lies, as a rule of thumb, between the mean and the mode, about one third of the way going from mean to mode. In a formula, median (2 mean + mode)/3. This rule, due to Karl Pearson, often applies to slightly non-symmetric distributions that resemble a normal distribution, but it is not always true and in general the three statistics can appear in any order.[3][4] For unimodal distributions, the mode is within standard deviations of the mean, and the root mean square deviation about the mode is between the standard deviation and twice the standard deviation.[5]

[edit]Example for a skewed distribution An example of a skewed distribution is personal wealth: Few people are very rich, but among those some are extremely rich. However, many are rather poor.

Comparison of mean, median and mode of two log-normal distributions with different skewness. A well-known class of distributions that can be arbitrarily skewed is given by the log-normal distribution. It is obtained by transforming a random variable X having a normal distribution into random variable Y = eX. Then the logarithm of random variable Y is normally distributed, hence the name. Taking the mean of X to be 0, the median of Y will be 1, independent of the standard deviation of X. This is so because X has a symmetric distribution, so its median is also 0. The transformation from X to Y is monotonic, and so we find the median e0 = 1 for Y. When X has standard deviation = 0.25, the distribution of Y is weakly skewed. Using formulas for the log-normal distribution, we find:

Indeed, the median is about one third on the way from mean to mode. When X has a larger standard deviation, = 1, the distribution of Y is strongly skewed. Now

Here, Pearson's rule of thumb fails.

[edit]Univariate frequency tables Univariate frequency distributions are often presented as lists ordered by quantity showing the number of times each value appears. For example, if 100 people rate a five-point Likert scale assessing their agreement with a statement on a scale on which 1 denotes strong agreement and 5 strong disagreement, the frequency distribution of their responses might look like: Rank 1 2 3 4 5 Degree of agreement Strongly agree Agree somewhat Not sure Disagree somewhat Strongly disagree Number 20 30 20 15 15

A different tabulation scheme aggregates values into bins such that each bin encompasses a range of values. For example, the heights of the students in a class could be organized into the following frequency table. Height range 4.55.0 feet 5.05.5 feet 5.56 feet Number of students 25 35 20 Cumulative number 25 60 80

Height range

Number of students

Cumulative number

6.06.5 feet



A frequency distribution shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. It is a way of showing unorganized data e.g. to show results of an election, income of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc. Some of the graphs that can be used with frequency distributions are histograms, line graphs, bar charts and pie charts. Frequency distributions are used for both qualitative and quantitative data. [edit]Joint frequency distributions Bivariate joint frequency distributions are often presented as (twoway) contingency tables: Two-way contingency table with marginal frequencies Dance Men Women Total 2 16 18 Sports 10 6 16 TV 8 8 16 Total 20 30 50

The total row and total column report the marginal frequencies or marginal distribution, while the body of the table reports the joint frequencies.[1]

[edit]Applications Managing and operating on frequency tabulated data is much simpler than operation on raw data. There are simple algorithms to calculate median, mean, standard deviation etc. from these tables. Statistical hypothesis testing is founded on the assessment of differences and similarities between frequency distributions. This assessment involves measures of central tendency or averages, such as the mean and median, and measures of variability or statistical dispersion, such as the standard deviation or variance. A frequency distribution is said to be skewed when its mean and median are different. The kurtosis of a frequency distribution is the concentration of scores at the mean, or how peaked the distribution appears if depicted graphicallyfor example, in a histogram. If the distribution is more peaked than the normal distribution it is said to be leptokurtic; if less peaked it is said to be platykurtic. Letter frequency distributions are also used in frequency analysis to crack codes and are referred to the relative frequency of letters in different languages. How to Draw a Frequency Distribution Table Descriptive statistics, The basics 9 Comments A frequency distribution table is one way you can organize data so that it makes more sense. For example, lets say you have a list of IQ scores for a gifted classroom in a particular elementary school. The IQ scores are: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133, 136, 138, 141, 142, 149, 150, 154. That list doesnt tell you much about anything. You could draw afrequency distribution table, which will give a better picture of your data than a simple list. Step 1: Figure out how many classes (categories) you need. There are no hard rules about how many classes to pick, but there are a couple of general guidelines: Pick between 5 and 20 classes. For the list of IQs above, we picked 5 classes.

Make sure you have a few items in each category. For example, if you have 20 items, choose 5 classes (4 items per category), not 20 classes (which would give you only 1 item per category). Step 2: Subtract the minimum data value from the maximum data value. For example, our the IQ list above had a minimum value of 118 and a maximum value of 154, so: 154 118 = 36 Step 3: Divide your answer in Step 2 by the number of classes you chose in Step 1. 36 / 5 = 7.2 Step 4: Round the number from Step 3 up to a whole number to get the class width. Rounded up, 7.2 becomes 8. Step 5: Write down your lowest value for your first minimum data value: The lowest value is 118 Step 6: Add the class width from Step 4 to Step 5 to get the next lower class limit: 118 + 8 = 126 Step 7: Repeat Step 6 for the other minimum data values (in other words, keep on adding your class width to your minimum data values) until you have created the number of classes you chose in Step 1. We chose 5 classes, so our 5 minimum data values are: 118 126 (118 + 8) 134 (126 + 8) 142 (134 + 8) 150 (142 + 8) Step 8: Write down the upper class limits. These are the highest values that can be in the category, so in most cases you can

subtract 1 from class width and add that to the minimum data value. For example: 118 + (8 1) = 125 118 125 126 133 134 142 143 149 150 157 Step 9: Add a second column for the number of items in each class, and label the columns with appropriate headings: Numb IQ er 118 125 126 133 134 142 143 149 150 157

Step 10: Count the number of items in each class, and put the total in the second column. The list of IQ scores are: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133, 136, 138, 141, 142, 149, 150, 154. Numb IQ er 118 125 4 126 134 143 150 6 4 1 2 133 142 149 157

Tip: If you are working with large numbers (like hundreds or thousands), round Step 4 up to a large whole number thats easy to make into classes, like 100, 1000, or 10,000. FREQUENCY DISTRIBUTIONS As discussed earlier, there are two major means of summarizing a set of numbers: pictures and summary numbers. Each method has advantages and disadvantages and use of one method need not exclude the use of the other. This chapter describes drawing pictures of data, which are called frequency distributions.

FREQUENCY TABLES The first step in drawing a frequency distribution is to construct a frequency table. A frequency table is a way of organizing the data by listing every possible score (including those not actually obtained in the sample) as a column of numbers and the frequency of occurrence of each score as another. Computing the frequency of a score is simply a matter of counting the number of times that score appears in the set of data. It is necessary to include scores with zero frequency in order to draw the frequency polygons correctly. For example, consider the following set of 15 scores which were obtained by asking a class of students their shoe size, shoe width, and sex (male or female). Example Data Sho Shoe e Gend Widt Siz er h e 10. B 5 6.0 B 9.5 D 8.5 A 7.0 B 10. C 5 7.0 C 8.5 D 6.5 B 9.5 C M F M F F M F M F M

7.0 B 7.5 B 9.0 D 6.5 A 7.5 B


The same data entered into a data file in SPSS appears as follows:

To construct a frequency table, start with the smallest shoe size and list all shoe sizes as a column of numbers. The frequency of occurrence of that shoe size is written to the right. Frequency Table of Example Data Shoe Size 6.0 6.5 Absolute Frequency 1 2

7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5

3 2 0 2 1 2 0 2 15

Note that the sum of the column of frequencies is equal to the number of scores or size of the sample (N = 15). This is a necessary, but not sufficient, property in order to insure that the frequency table has been correctly calculated. It is not sufficient because two errors could have been made, canceling each other out. While people think of their shoe size as a discrete unit, a shoe size is actually an interval of sizes. A given shoe size may be considered the midpoint of the interval. The real limits of the interval, the two points which function as cut-off points for a given shoe size, are the midpoints between the given shoe sizes. For example, a shoe size of 8.0 is really an interval of shoe sizes ranging from 7.75 to 8.25. The smaller value is called the lower real limit, while the larger is called the upper real limit. In each case, the limit is found by taking the midpoint between the nearest score values. For example, the lower limit of 7.75 was found as the average (midpoint) of 7.5 and 8.0 by adding the values together and dividing by two (7.5 + 8.0) / 2 = 15.5/2 = 7.75. A similar operation was performed to find the upper real limit of 8.25, that is, the midpoint of 8.0 and 8.5. To generate a frequency table using the SPSS package, select STATISTICS and FREQUENCIES as illustrated below:

In the frequencies box, select the variable name used for shoe size and the following choices:

The listing of the results of the analysis should contain the following:

FREQUENCY DISTRIBUTIONS The information contained in the frequency table may be transformed to a graphical or pictorial form. No information is gained or lost in this transformation, but the human information processing system often finds the graphical or pictorial presentation easier to comprehend. There are two major means of drawing a graph, histograms and frequency polygons. The choice of method is often a matter of convention, although there are times when one or the other is clearly the appropriate choice. Histograms A histogram is drawn by plotting the scores (midpoints) on the X-axis and the frequencies on the Y-axis. A bar is drawn for each score value, the width of the bar corresponding to the real limits of the interval and the height corresponding to the frequency of the occurrence of the score value. An example histogram is presented below for the book example. Note that although there were no individuals in the example with shoe sizes of 8.0 or 10.0, those values are still included on the Xaxis, with the bar for these values having no height.

The figure above was drawn using the SPSS computer package. Included in the output from the frequencies command described above was a histogram of shoe size. Unfortunately, the program automatically groups the data into intervals as described in Chapter 9. In order to generate a figure like the one above, the figure on the listing must be edited. To edit a figure in the listing file, place the cursor (arrow) on the figure and hit the right mouse button. When a menu appears, select the last entry on the list as follows:

Edit the graph selecting the following options:

If the data are nominal categorical in form, the histogram is similar, except that the bars do not touch. The example below presents the data for shoe width, assuming that it is not interval in nature. The example was drawn using the example SPSS data file and the Bar Graph command.

When the data are nominal-categorical in form, the histogram is the only appropriate form for the picture of the data. When the data may be assumed to be interval, then the histogram can sometimes have a large number of lines, called data ink, which make the comprehension of the graph difficult. A frequency polygon is often preferred in these cases because much less ink is needed to present the same amount of information. In some instances artists attempt to "enhance" a histogram by adding extraneous data ink. Two examples of this sort of excess were taken from the local newspaper. In the first, the arm and building add no information to the illustration. San Francisco is practically hidden, and no building is presented for Honolulu. In the

second, the later date is presented spatially before the earlier date and the size of the "bar" or window in this case has no relationship to the number being portrayed. These types of renderings should be avoided at all costs by anyone who in the slightest stretch of imagination might call themselves "statistically sophisticated." An excellent source of information about the visual display of quantitative information is presented in Tufte (1983)

Absolute Frequency Polygons An absolute frequency polygon is drawn exactly like a histogram except that points are drawn rather than bars. The X-axis begins with the midpoint of the interval immediately lower than the lowest interval, and ends with the interval immediately higher than the highest interval. In the example, this would mean that the score values of 5.5 and 11.0 would appear on the X-axis. The frequency polygon is drawn by plotting a point on the graph at the intersection of the midpoint of the interval and the height of the frequency. When the points are plotted, the dots are connected with lines, resulting in a frequency polygon. An absolute frequency polygon of the data in the book example is presented below.

Note that when the frequency for a score is zero, as is the case for the shoe sizes of 8.0 and 10.0, the line goes down to the X-axis. Failing to go down to the X-axis when the frequency is zero is the most common error students make in drawing non-cumulative frequency polygons. As of yet, I have been unable to find a means to directly draw a frequency polygon using the SPSS graphics commands. It was not possible to instruct the computer package to include the points on the X-axis where the frequency goes down to zero. (I might be willing to reward the student who discovers a direct method extra credit.) The absolute frequency polygon drawn above used an indirect method in SPSS. A new data set was constructed from the frequency table as follows:

The graph was drawn by selecting graphics and then line as follows (note that the case button is selected:

The next screen selects the columns to use in the display. All the following graphs will be created in a similar manner by selecting different variables as rows and columns.

Relative Frequency Polygon In order to draw a relative frequency polygon, the relative frequency of each score interval must first be calculated and placed in the appropriate column in the frequency table. The relative frequency of a score is another name for the proportion of scores that have a particular value. The relative frequency is computed by dividing the

frequency of a score by the number of scores (N). The additional column of relative frequencies is presented below for the data in the book example. Frequency Table of Example Data Shoe Size 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 Absolute Frequency 1 2 3 2 0 2 1 2 0 2 15 Relative Frequency .07 .13 .20 .13 .00 .13 .07 .13 .00 .13 .99

The relative frequency polygon is drawn exactly like the absolute frequency polygon except the Y-axis is labeled and incremented with relative frequency rather than absolute frequency. The frequency distribution pictured below is a relative frequency polygon. Note that it appears almost identical to the absolute frequency polygon.

A relative frequency may be transformed into an absolute frequency by using an opposite transformation; that is, multiplying by the number of scores (N). For this reason the size of the sample on which the relative frequency is based is usually presented somewhere on the graph. Generally speaking, relative frequency is more useful than absolute frequency, because the size of the sample has been taken into account. Absolute Cumulative Frequency Polygons An absolute cumulative frequency is the number of scores which fall at or below a given score value. It is computed by adding up the number of scores which are equal to or less than a given score value. The cumulative frequency may be found from the absolute frequency by either adding up the absolute frequencies of all scores smaller than or equal to the score of interest, or by adding the absolute frequency of a score value to the cumulative frequency of the score value immediately below it. The following is presented in tabular form. Frequency Table of Example Data Shoe Size 6.0 6.5 7.0 Absolute Frequency 1 2 3 Absolute Cumulative Freq 1 3 6

7.5 8.0 8.5 9.0 9.5 10.0 10.5

2 0 2 1 2 0 2 15

8 8 10 11 13 13 15

Note that the cumulative frequency of the largest score (10.5) is equal to the number of scores (N = 15). This will always be the case if the cumulative frequency is computed correctly. The computation of the cumulative frequency for the score value of 7.5 could be done by either adding up the absolute frequencies for the scores of 7.5, 7.0, 6.5, and 6.0, respectively 2 + 3 + 2 + 1 = 8, or adding the absolute frequency of 7.5, which is 2, to the absolute cumulative frequency of 7.0, which is 6, to get a value of 8. Plotting scores on the X-axis and the absolute cumulative frequency on the Y-axis draws the cumulative frequency polygon. The points are plotted at the intersection of the upper real limit of the interval and the absolute cumulative frequency. The upper real limit is used in all cumulative frequency polygons because of the assumption that not all of the scores in an interval are accounted for until the upper real limit is reached. The book example of an absolute cumulative frequency polygon is presented below.

A cumulative frequency polygon will always be monotonically increasing, a mathematicians way of saying that the line will never go down, but that it will either stay at the same level or increase. The line will be horizontal when the absolute frequency of the score is zero, as is the case for the score value of 8.0 in the book example. When the highest score is reached, i.e. at 10.5, the line continues horizontally forever from that point. The cumulative frequency polygon, while displaying exactly the same amount of information as the absolute frequency distribution, expresses the information as a rate of change. The steeper the slope of the cumulative frequency polygon, the greater the rate of change. The slope of the example cumulative polygon is steepest between the values of 6.75 and 7.25, indicating the greatest number of scores between those values. Rate of change information may be easier to comprehend if the score values involve a measure of time. The graphs of rate of rat bar pressing drawn by the behavioral psychologist are absolute cumulative polygons, as are some of the graphs in developmental psychology, such as the cumulative vocabulary of children. Relative Cumulative Polygon The first step in drawing the relative cumulative polygon is computing the relative cumulative frequency; that is, dividing the absolute cumulative frequency by the number of scores (N). The result is the proportion of scores that fall at or below a given score. The relative cumulative frequency becomes: Frequency Table of Example Data Shoe Size 6.0 Absolute Frequency 1 Absolute Cum Freq 1 Relative Cum Freq .06

6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5

2 3 2 0 2 1 2 0 2 15

3 6 8 8 10 11 13 13 15

.20 .40 .53 .53 .67 .73 .87 .87 1.00

Drawing the X-axis as before and the relative cumulative frequency on the Y-axis draws the relative cumulative frequency polygon directly from the preceding table. Points are plotted at the intersection of the upper real limit and the relative cumulative frequency. The graph that results from the book example is presented below.

Note that the absolute and relative cumulative frequency polygons are identical except for the Y-axis. Note also that the value of 1.000 is the largest relative cumulative frequency, and the highest point on the polygon.