Professional Documents
Culture Documents
Topic #6
The central tendency of age among Berkeley faculty member clearly increased from 1969-70 to 1988-89. On the other hand, the central tendency of Democratic House vote by CD did not change much from 1948 to 1970 (though other characteristics of its frequency distribution certainly did change).
The Mode
The mode (or modal value) of a variable in a set of data is the value of the variable that is observed most frequently in that data (or, given a continuous frequency curve, is at the point of greatest density). Note: the mode is the value that is observed most frequently, not the frequency itself. (I see this error too frequently on tests.) The mode is defined for every type of variable [i.e., nominal, ordinal, interval, or ratio]. However, the mode is used as a measure of central tendency primarily for nominal variables only.
Given a list of observed values (raw data): Construct a frequency table (see next slide).
Notice that the modal number of problem sets turned in is 5, even though most students turned in fewer than 5,
So if we recoded the variable to create just two dichotomous categories:
(a) turned in all 5 (b) did not turn in all 5
The Median
The median (or median value) of a variable in a data set is
the value in the middle of the observations, in the sense that no more than half of the cases have lower values and no more than half of the cases have higher values or, more generally, such that no more than half of the cases have values that lie on either side of the median value.
Given a quite precisely measured continuous variable and a very large number of cases, we can in practice say that
half the cases have lower values and half have higher values (e.g., LEVEL OF INCOME, SAT scores). Equivalently, the median value is the value of the case at the 50th percentile of the distribution.
The median should be clearly distinguished from another (infrequently used) measure of central tendency that is defined only for variables that are interval in nature.
This is the midrange (or midpoint) value in the distribution, which is the value in the middle of the observations in the (different) sense that it lies exactly halfway between the minimum (lowest) and maximum (highest) observed values (i.e., at the midpoint of the range of values), i.e., (min + max) / 2, e.g., for Problem Sets, (1 + 5) / 2 = 3
TABLE 1 PERCENT OF POPULATION AGED 65 OR HIGHER IN THE 50 STATES (UNIVARIATE DATA ARRAY)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri 12.4 3.6 12.7 14.6 10.6 9.2 13.4 11.6 17.8 10.0 10.1 11.5 12.1 12.1 14.8 13.6 12.3 10.8 13.4 10.7 13.7 11.5 12.6 12.1 13.8 Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming 12.5 13.8 10.6 11.5 13.0 10.0 13.0 11.8 13.3 12.5 12.8 13.7 14.8 14.7 10.7 14.0 12.4 9.7 8.2 11.9 10.6 11.8 13.9 13.2 8.9
(1) Rank of state (2) Name of state (3) % over 65 in state (value of variable) (4) percentile rank of state
The Mean
The mean (or mean value) of a variable in a set of data is the result of adding up all the observed values of the variable and dividing by the number of cases (i.e., the average as the term is most commonly used). The mean is defined if and only if the variable is at least interval in nature [i.e., interval or ratio]. Suppose we have a variable X and a set of cases numbered 1,2,...,n. Let the observed value of the variable in each case be designated x1, x2, etc. Thus:
Average Values
For a quantitative variable, the range of observed values of a variable in a set of data is the interval extending from the minimum observed value to the maximum observed value. For every measure of central tendency, the average value of the observations lies somewhere in this range, often (but certainly not always) somewhere near the middle of this range.
The modal or median observed value can equal the minimum or (like the modal problem sets turned in) the maximum value, but the mean observation can do so only in the very special case in which all cases have identical observed values, so there is no dispersion [next topic] in the data and min value = max value. Once again, remember that the median is not (necessarily) the midpoint of the range.
The sum (or mean) of the squared deviations from the mean is less than the sum of the squared deviations from any other value of the variable.
While the median and the mean are both proper and useful measures of central tendency for quantitative variables, they have different definitions and properties and may (depending on the distribution of the data) give very different answers to the question what is the average value in this data?