A summary of the material covered in this session is available below:
We discuss three measures of centrality:
o Mean: the average value and is measured in the same units as the data. The sample mean is computed from all data in the sample and is therefore sensitive to outliers; even a single outlier will adversely affect the sample mean. o Median: the middle data point, 50% of the data is greater/less than the median. The median is measured in the same units as the data. The sample median is computed from the middle data point or the middle two data points and is not sensitive to one (or a few) outliers. o Mode: most frequently occurring data point. The mode is not necessarily unique and is rarely used. We discuss the following measures of spread: o Range: defined as maximum – minimum value and measured in the same units as the data. We only need two data points to compute the range. The range tells us about the spread of the data in the entire sample. o Interquartile Range (IQR): defined as Q3 – Q1. IQR describes the spread of the middle 50% of the data and is measured in the same units as the data. The IQR is computed from two data points only (the two quartiles). o Variance/Standard deviation (SD): variance/SD tells us how close the data is, on average, to the mean. Small variance/SD implies the data is, on average, concentrated around the mean. Large variance/SD implies the data is spread out around the mean. Variance is measured in units of the data squared which makes it difficult to interpret. SD is measured in the same units as the data and is preferred for interpretation. To interpret the SD, we use the following guidelines: For a symmetric, bell shaped distribution, approximately 70% of the data in the sample will lie within one standard deviation of the mean For a symmetric, bell shaped distribution, approximately 95% of the data in the sample will lie within two standard deviations of the mean For symmetric, bell shaped data: we use the mean and standard deviation as measures of centre and spread. For skewed data: we use the median and the IQR as measures of centre and spread. Standard error: the standard error of the sample mean tells us how good the sample mean (x¯x¯) is as an estimate of the unknown population mean ( μμ). Small standard error of the sample mean implies that the sample mean is very close to the unknown population mean (i.e., our estimate is accurate). Large standard error of the sample mean implies that the sample mean is possibly far away from the unknown population mean (i.e., our estimate is not accurate). The standard error of the sample mean depends on the size of the sample and the standard deviation. We know this from the formula for the standard error of the sample mean: s/n−−√s/n