Professional Documents
Culture Documents
Analyzing and Summarizing Data
Analyzing and Summarizing Data
Summarizing Data
Most sets of data show a distinct tendency to group around a central value (or central tendency).
The purpose of central tendency is to find a single value that best represents an entire distribution of scores.
When people talk about an average value or the middle value or the most frequent value, they are talking informally about the mean, median, and modethree measures of central tendency.
TERMINOLOGY
Central Tendency
- the extent to which the data values group around a typical or central value
Variation
- the amount of dispersion, or scattering, of values away from a central value
Shape
- the pattern of the distribution of values from the lowest value to the highest value
IMPORTANCE
1) To find representative value
It give us one value for the distribution and this value represents the entire distribution.
2) To condense data
Average converts the whole set of figures into just one figure and thus helps in condensation.
3) To make comparisons
To make comparisons of two or more than two distributions, we have to find the representative values of these distributions.
Mean
- The average with which you are probably most familiar. - The sample mean is represented by (read x-bar or sample mean). - The mean is found by adding all the values of the variable x (this sum of x values is symbolized x) and dividing the sum by the number of these values, n (the sample size).
Find the Mean for the following times (in mins) collected for 10 consecutive days.
Day 1 2 3 4 5 6 7 8 9 10 Time (min) 39 29 43 52 39 44 40 31 44 35
Answer:
What if on Day 4, the time you spent is 102 minutes instead of 52 minutes:
Day 1 2 3 4 5 6 7 8 9 10 Time (min) 39 29 43 102 39 44 40 31 44 35
Mean
Use the mean to describe the middle of a set of data that does not have an outlier (extreme values). Advantages: Most popular measure in fields such as business, engineering and computer science. It is unique - there is only one answer. Useful when comparing sets of data. Disadvantages: Affected by extreme values (outliers)
Median
- The value of the data that occupies the middle position when the data are ranked in order according to size. - The sample median is represented by x (read x-tilde or sample median). - The median is not affected by extreme values, so you can use the median when extreme values are present.
3) Determine the value of the median by counting its rank as given by the depth.
Activity:
A) Find the median for the set of data {6, 3, 8, 5, 3}. rd Median = 5 (3 value)
B) Find the median of the sample 9, 6, 7, 9, 10, 8. th Median = 8.5 (3.5 value)
29 31 35 39 39 40 43 44 44 52 29 31 35 39 39 40 43 44 44 102
th
th
Median
Use the median to describe the middle of a set of data that does have an outlier. Advantages: Extreme values (outliers) do not affect the median as strongly as they do the mean Easy to calculate and in some cases, can be obtained by inspection It is unique - there is only one answer. Disadvantages: Not capable of further algebraic treatment Ranking a large number of data can be tedious
Mode - The value of x that occurs most frequently - Can be used with categorical data
- Like the median, extreme values do not affect the mode - Often, there is no mode or there are several modes in a set of data - Distributions can be: unimodal, bimodal, or multimodal
Activity: For Categorical Data Find the mode. Flavor f Vanilla 28 Chocolate 22 Strawberry 15 Neapolitan 8 Butter Pecan 12 Rocky Road 9 Fudge Ripple 6 Mode: Vanilla
Mode
Use the mode when the data is non-numeric or when asked to choose the most popular item. Advantages: Extreme values (outliers) do not affect the mode. Disadvantages: Not necessarily unique - may be more than one answer When no values repeat in the data set, there is no mode and may seem useless. When there is more than one mode, it is difficult to interpret and/or compare
.
Midrange
- The number exactly midway between a lowest-valued data, L, and a highest-valued data, H
Activity:
Find the mean, median, mode and midrange. {6, 7, 8, 9, 9, 10} Mean = 8.17 Median = 8.5 Mode = 9 Midrange = 8
2) Identify the circumstances where the median instead of the mean is the preferred measure of central tendency.
3) Under what circumstances will the mean, the median, and the mode all have the same value? 4) Under what circumstances is the mode the preferred measure of central tendency? 5) Explain why the mean is often not a good measure of central tendency for a skewed distribution? 6) Draw and determine the shape of the distribution when: a) The mean, median and mode are equal b) The mode is lowest, followed by median and mean c) The mean is lowest, followed by the median and mode