Professional Documents
Culture Documents
Descriptive Statistics
print all
Prev
Next
1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
| 10
Contents
All Modules
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
------
Tukey Fences
There are several methods for determining outliers in a sample. A very popular method is based on the following:
Outliers are values below Q1-1.5(Q3-Q1) or above Q3+1.5(Q3-Q1) or equivalently, values below Q1-1.5 IQR or above Q
These are referred to as Tukey fences.6 For the diastolic blood pressures, the lower limit is 64 - 1.5(77-64) = 44.5 and
is 77 + 1.5(77-64) = 96.5. The diastolic blood pressures range from 62 to 81. Therefore there are no outliers. The bes
typical diastolic blood pressure is the mean (in this case 71.3) and the best summary of variability is given by the stand
(s=7.2).
Table 13 displays the means, standard deviations, medians, quartiles and interquartile ranges
for each of the continuous variables in the subsample of n=10 participants who attended the
seventh examination of the Framingham Offspring Study.
Table 14 displays the observed minimum and maximum values along with the limits to
determine outliers using the quartile rule for each of the variables in the subsample of n=10
participants. Are there outliers in any of the variables? Which statistics are most appropriate to
summarize the average or typical value and the dispersion?
1
Determined byQ1-1.5(Q3-Q1)
2
Determined by Q3+1.5(Q3-Q1)
Since there are no suspected outliers in the subsample of n=10 participants, the mean and
standard deviation are the most appropriate statistics to summarize average values and
dispersion, respectively, of each of these characteristics.
Based solely on a comparison of the means and medians in Table 15 above, there is evidence that there
was one or more characteristics with values that were outliers?
True
False
Table 16 displays the observed minimum and maximum values along with the limits to
determine outliers using the quartile rule for each of the variables in the full sample (n=3,539).
Tukey Fences
Characteristic Minimum Maximum Lower Limit1 Upper Limit2
1
Determined byQ1-1.5(Q3-Q1)
2
Determined by Q3+1.5(Q3-Q1)
Complete the following steps to interpret descriptive statistics. Key output includes N, the
mean, the median, the standard deviation, and several graphs.
In This Topic
Statistics
Key Result: N
In these results, you have 68 observations.
The median and the mean both measure central tendency. But unusual values, called
outliers, affect the median less than they affect the mean. When you have unusual values,
you can compare the mean and the median to decide which is the better measure to use. If
your data are symmetric, the mean and median are similar.
Statistics
Statistics
Examine the shape of your data to determine whether your data appear to be skewed
When data are skewed, the majority of the data are located on the high or low side of the
graph. Often, skewness is easiest to detect with a histogram or boxplot.
Right-skewed
Left-skewed
The histogram with right-skewed data shows wait times. Most of the wait times are relatively short, and
only a few wait times are long. The histogram with left-skewed data shows failure time data. A few items
fail immediately, and many more items fail later.
This individual plot shows that the data on the right has more variation than the data on the left.
If you have additional information that allows you to classify the observations into groups,
you can create a group variable with this information. Then, you can create the graph with
groups to determine whether the group variable accounts for the peaks in the data.
Simple
With Groups
For example, a manager at a bank collects wait time data and creates a simple histogram. The histogram
appears to have two peaks. After further investigation, the manager determines that the wait times for
customers who are cashing checks is shorter than the wait time for customers who are applying for home
equity loans. The manager adds a group variable for customer task, and then creates a histogram with
groups.
Statistics
SE
Varia Machi N Mea StDe Minim Medi Maxim
ble ne N * Mean n v um Q1 an Q3 um
In these results, the summary statistics are calculated separately by machine. You can easily see the
differences in the center and spread of the data for each machine. For example, Machine 1 has a lower
mean torque and less variation than Machine 2. To determine whether the difference in means is
significant, you can perform a 2-sample t-test.
Minitab.com
License Port