You are on page 1of 2

Definitions

Bimodal Distribution: A distribution of data points in which two values occur more frequently than
the rest of the values in the data set.

Chebyshev’s Theorem: No matter what the shape of a distribution, at least 75 percent of the values
in the population will fall within 2 standard deviations of the mean and at least 89 percent will fall
within 3 standard deviations.

Coding: A method of calculating the mean for grouped data by recoding values of class midpoints to
more simple values.

Coeffi cient of Variation: A relative measure of dispersion, comparable across distributions, that
expresses the standard deviation as a percentage of the mean.

Deciles: Fractiles that divide the data into 10 equal parts.

Dispersion: The spread or variability in a set of data.

Distance Measure: A measure of dispersion in terms of the difference between two values in the
data set.

Exploratory Data Analysis (EDA): Methods for analyzing data that require very few prior
assumptions.

Fractile: In a frequency distribution, the location of a value at or above a given fraction of the data.

Geometric Mean: A measure of central tendency used to measure the average rate of change or
growth for some quantity, computed by taking the nth root of the product of n values representing
change.

Interfractile Range: A measure of the spread between two fractiles in a distribution, that is, the
difference between the values of two fractiles.

Interquartile Range: The difference between the values of the fi rst and the third quartiles; this
difference indicates the range of the middle half of the data set.

Kurtosis: The degree of peakedness of a distribution of points.

Mean: A central tendency measure representing the arithmetic average of a set of observations.

Measure of Central Tendency: A measure indicating the value to be expected of a typical or middle
data point.

Measure of Dispersion: A measure describing how the observations in a data set are scattered or
spread out.

Median: The middle point of a data set, a measure of location that divides the data set into halves.

Median Class: The class in a frequency distribution that contains the median value for a data set.
Mode: The value most often repeated in the data set. It is represented by the highest point in the
distribution curve of a data set.

Parameters: Numerical values that describe the characteristics of a whole population, commonly
represented by Greek letters.

Percentiles: Fractiles that divide the data into 100 equal parts.

Quartiles: Fractiles that divide the data into four equal parts.

Range: The distance between the highest and lowest values in a data set.

Skewness: The extent to which a distribution of data points is concentrated at one end or the other;
the lack of symmetry.

Standard Deviation The positive square root of the variance; a measure of dispersion in the same
units as the original data, rather than in the squared units of the variance.

Standard Score: Expressing an observation in terms of standard deviation units above or below the
mean; that is, the transformation of an observation by subtracting the mean and dividing by the
standard deviation.

Statistics: Numerical measures describing the characteristics of a sample. Represented by Roman


letters.

Stem and Leaf Display: A histogram-like display used in EDA to group data, while still displaying all
the original values.

Summary Statistics: Single numbers that describe certain characteristics of a data set.

Symmetrical: A characteristic of a distribution in which each half is the mirror image of the other
half.

Variance: A measure of the average squared distance between the mean and each item in the
population.

Weighted Mean: An average calculated to take into account the importance of each value to the
overall total, that is, an average in which each observation value is weighted by some index of its
importance.

You might also like