You are on page 1of 6

A

Absolute measures of variability – deal with the dispersion of the data points.
Analysis of data (or interpretation) – the process of systematically applying statistical
and/or logical techniques to describe and illustrate, condense and recap, and evaluate
data.

B
Bar chart (Bar graph) – are the pictorial representation of data (generally grouped), in
the form of vertical or horizontal rectangular bars, where the length of bars are
proportional to the measure of data. A bar chart is similar to bar histogram. The bases
of the rectangles are arbitrary intervals whose centers are the codes. The height of each
rectangle represents the frequency (f) of that category. It is also applicable for
categorical data (or nominal-level).
Bimodal – a statistics (of a distribution) that has two modes.
Box plot (box and whisker plot) – displays the five-number summary of a set of data.
It is a graph of a data set obtained by drawing a horizontal line from the minimum data
value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a
box whose vertical sides pass through Q 1 and Q3 with a vertical line inside the box
passing through the median or Q2.

C
Categorical Frequency distribution – a table to organize data that can be placed in
specific categories, such as nominal- or ordinal-level data.
Central tendency – the number that is the most typical or most representative of a set
of data, like the mean, median or mode.
Chebyshev’s Inequality – in probability theory, a theorem that characterizes the
dispersion of data away from its mean (average). The formula for the fraction for which
no more than a certain number of values can exceed is 1/K 2; in other words, 1/K 2 of a
distribution’s values can be more than or equal to K standard deviations away from the
mean of the distribution.
Class boundaries (or Real Limits) – is the upper and lower values of a class of group
frequency distribution whose values has additional decimal place more that the class
limits and end with the digit 5.
Class limits (or Apparent Limits) – is the highest and lowest value describing a class.
Coefficient of variation – is a statistical measure of the relative dispersion of data
points in a data series around the mean.
Cumulative frequency – is the sum of the frequencies accumulated up to the upper
boundary of a class in a frequency distribution.
Cumulative frequency polygon (ogive) – is a graph that shows the cumulative
frequencies for the classes in a frequency distribution. The vertical axis represents the
cumulative frequency of the distribution while the horizontal axis represents the
midpoints of the frequency distribution.

D
Decile – is a measure of position that divides the data set into ten (10) equal parts.
Dispersion (spread) – refers to the extent to which the data values of a numeric
random variable are scattered about their central location value.
Dot plot – is a simple form of data visualization that consists of data points plotted as
dots on a graph with an x- and y-axis. These types of charts are used to depict
graphically certain data trends or groupings.

F
Frequency – (1) It is the number of values in a specific class of a frequency distribution.
(2) Frequency is how often some things repeats.
Frequency distribution – is a grouping of the data into categories showing the number
of observations in each of the non-overlapping classes.
Frequency polygon – A frequency polygon is a graph that displays the data by using
lines that connect points plotted for the frequencies at the midpoints of the classes. The
vertical axis represents the frequency of the distribution while the horizontal axis
represents the midpoints of the frequency distribution.

G
Grouped frequency distribution – is used when the range of the data set is large and
grouping of data with several class interval.
H
Histogram – A histogram is a graph consisting of bars of equal width drawn adjacent to
each other (without gaps). The horizontal axis (x-axis) and vertical axis (y-axis)
represent the classes and class frequencies respectively.

I
Interquartile range – is the distance between the first and third quartiles. This
corresponds to the spread of fifty (50) percent of the data values.
Interval – It is the distance between the class lower boundary and the class upper
boundary and it is denoted by the symbol “i”.

K
Kurtosis – (1) the degree of peakedness of a distribution, usually taken relative to a
normal distribution. (2) Basically, it measures the bell curve, which means, kurtosis
measures if the data is sharp or flat relative to a normal distribution. It focuses on how
returns are ranged around the mean.

L
Leptokurtic – are distributions where values clustered heavily or pile up in the center.
There are tall distribution with narrow humps and long and high tails. Its kurtosis is
positive, (kurtosis > 0) and it denotes a high degree of peakedness.
Lower class limit – is the smallest data value that can go into the class.

M
Mean – is the average of all sores in a distribution. It is the most frequency used in
central tendency.
Mean deviation – measures the average deviation of the scores from arithmetic mean.
It gives equal weight to the deviation of every score in the distribution.
Measures of variation – define how spread out the values are in a dataset. They are
also referred to as measures of dispersion/spread.
Median – is a point in a distribution of scores at which half of the scores fall below and
half are above the point. It is the midpoint of the data set/ middle scores or the 50th
percentile.
Mesokurtic – are intermediate distribution, which are neither too peaked nor too flat.
The values are immediately distributed about the center. Its kurtosis is zero. (kurtosis =
0)
Midhinge – It is used to overcome potential problems introduced by extreme values (or
outliers) in the data set.
Midpoint or class mark – is the point halfway between the class limits of each class
and it is representative of the data within that class.
Mode – is the most frequent score/data in the in the distribution.
Multimodal – a frequency distribution with two or more modes.

N
Nominal level (or nominal variable) – is a type of data that is used to label variables
without providing any quantitative value. In nominal variables, the numerical values just
"name" the attribute uniquely. In this case, numerical value is simply a label.
Non-overlapping – (1) not occupying the same area in part. (2) entirely separate or
distinct.

O
Ordinal level (or ordinal variable) – is quantitative data which have naturally occurring
orders and the difference between is unknown. It can be named, grouped, and ranked.
Original – (1) a term used for its first recorded data points. (2) Not secondary,
derivative, or imitative. (3) Being the first instance or source from which a copy,
reproduction, or translation is or can be made.
Outlier – is a data entry that is far removed from the other entries in the data set. It is
an extremely high or an extremely low data value when compared with the rest of the
data values.

P
Pareto chart – is a graph used to represent a frequency distribution for a categorical
data (or nominal-level) and frequencies are displayed by the heights of vertical bars,
which are arranged in order form highest to lowest.
Percentage – simply means "per hundred" and it is obtained by multiplying the relative
frequency by 100%.
Percentile – is a measure of position that divides the data set into 100 equal parts.
Pearson’s coefficient of skewness – is a way to figure out the skewness of a
distribution. The mean, mode and median can be used to figure out if you have a
positively or negatively skewed distribution.
Pictograph (pictogram) – immediately suggests the nature of the data being shown. It
is a combination of the attention-getting quality and the accuracy of the bar chart.
Appropriate pictures arranged in a row (sometimes in a column) present the quantities
for comparison.
Pie Chart (Circle graph) – is a circle divided into portions that represent the relative
frequencies (or percentage) of the data belonging to different categories. The data in a
pie chart should be categorical or nominal-level.
Platykurtic – are flat distributions with values more evenly distributed about the center
with broad humps and short tails. Its kurtosis is negative (kurtosis < 0) and it denotes a
low degree of peakedness.

Q
Quantiles – each of any set of values of a variate, which divide a frequency distribution
into equal groups, each containing the same fraction of the total population.
Quartile – is a measure of position that divides the data set into four (4) equal parts.
Quartile deviation – defined mathematically as half of the difference between the
upper and lower quartile. It indicates the distance we need to go above and below the
median to include midhinge or approximately the middle 50% of the scores.

R
Range – is the difference of the highest (maximum) value and the lowest (minimum)
value in a data set or distribution.
Raw data – is the data gathered in original form.
Relative frequency – It is the value obtained when the frequencies in each class of the
frequency distribution is divided by the total number of values.
S
Skewed left distribution – results when the “tail” of the graph elongates more to one
side than to the other. A distribution is skewed left (negatively skewed) when its tail
extends to the left.
Skewed right distribution – is a type of distribution in which most values are clustered
around the left tail of the distribution while the right tail of the distribution is longer.
Skewness – refers to the symmetry and asymmetry of the frequency distribution. It is
the measure of how much the probability distribution of a random variable deviates from
the normal distribution.
Standard deviation – is the most common used indicator of the degree of dispersion
and is the most dependable measure to estimate the variability in a total population. The
standard deviation is also the square root of variance.
Stem-and-Leaf plot – is a special table where each data value is split into a "stem" (the
first digit or digits) and a "leaf" (usually the last digit).
Symmetric distribution – results when a vertical line can be drawn through the middle
of the graph of the distribution and the resulting halves are approximately mirror images.

T
Time series graph – represents data that occur over specific period under observation.
In addition, it shows for a trend or patter on the increase of decrease over the period.

U
Uniform distribution – results when all entries, or classes, in the distribution have
equal or approximately equal frequencies. A uniform distribution is also symmetric.
Unimodal – is a distribution with a single mode.
Upper class limit – is the largest data value that can go into the class.

V
Variance – the mean of the squares of the deviation from the arithmetic mean in a data
set. This shows how much the data measured varies from the average value.

You might also like