You are on page 1of 5

DESCRIPTIVE STATISTIC You may ask questions such ask:

Where is the approximate middle, or center, of


A Statistics is concerned with the describing, the graph?
interpretation and analyzing of data. It is, How spread out are the data values on the
therefore, an essential element in any graph?
improvement process. A Statistics is often What is the overall shape of the graph?
categorized into descriptive and inferential Does it have any interesting patterns?
statistics. It uses analytical methods which
provide the math to model and predict Outlier:
variation. It uses graphical methods to help A data point that is significantly greater or
making numbers visible for communication smaller than other data points in a data set.
purposes. It is useful when analyzing data to identify
outliers. They may affect the calculation of
Why do we Need Statistics? descriptive statistics. Outliers can occur in any
To find why a process behaves the way it does. given data set and in any distribution. The
To find why it produces defective goods or easiest way to detect them is by graphing the
services. To center our processes on ‘Target’ or data or using graphical methods such as:
‘Nominal’. To check the accuracy and precision Histograms.
of the process. To prevent problems caused by Boxplots.
assignable causes of variation. Normal probability plots.
To reduce variability and improve process
capability. Outliers may indicate an experimental error or
To know the truth about the real world. incorrect recording of data. They may also occur
by chance. It may be normal to have high or low
Descriptive Statistics: data points. You need to decide whether to
exclude them before carrying out your analysis.
A method of describing the characteristics of a An outlier should be excluded if it is due to
data set. measurement or human error.
A useful because they allow you to make sense
of the data. The following measures are used to describe a
A help exploring and making conclusions about data set:
the data in order to make rational decisions. Measure of position (also referred to as central
Includes calculating things such as the average tendency or location measures).
of the data, its spread and the shape it Measures of spread (also referred to as
produces. variability or dispersion
For example, we may be concerned about measures).
describing: Measures of shape.
The weight of a product in a production line. If assignable causes of variation are affecting
The time taken to process an application. the process, we will see changes in:
A descriptive statistic involves describing, Position.
summarizing and organizing the data so it can Spread.
be easily understood. Graphical displays are Shape.
often used along with the quantitative Any combination of the three.
measures to enable clarity of communication.
Measures of Position:
When analyzing a graphical display, you can Position Statistics measure the data central
draw conclusions tendency.
based on several characteristics of the graph.
Central tendency refers to where the data is Measures of Spread: The Spread refers to how
centered. the data deviates from the position measure. It
You may have calculated an average of some gives an indication of the amount of variation in
kind. the process. An important indicator of quality.
Despite the common use of average, there are Used to control process variability and improve
different statistics by which we can describe the quality. All manufacturing and transactional
average of a data set: processes are variable to some degree.
Mean There are different statistics by which
Median we can describe the spread of a data set:
Mode
Range.
Mean: Standard deviation.
The total of all the values divided by the size of Spread
the data set. It is the most commonly used Range:
statistic of position. It is easy to understand and The difference between the highest and the
calculate. It works well when the distribution is lowest values. The simplest measure of
symmetric and there are no outliers. variability. Often denoted by ‘R’. It is good
The mean of a sample is denoted by ‘x-bar’. enough in many practical cases. It does not
The mean of a population is denoted by “ . make full use of the available data.It can be
misleading when the data is skewed or in the
Median: presence of outliers.
The middle value where exactly half of the data
values are above it and half are below it. A Standard Deviation:
useful statistic due to its robustness. It can
reduce the effect of outliers. Often used when The average distance of the data points from
the data is nonsymmetrical. Ensure that the their own mean.
values are ordered before calculation. With an
even number of values, the median is the mean A low standard deviation indicates that the data
of the two middle values. points are clustered around the mean. A large
standard deviation indicates that they are
Median Calculation: widely scattered around the mean. The
standard deviation of a sample is
denoted by ‘s’.
The standard deviation of a population
is denoted by “ ”.

Mode:
The value that occurs the most often in a data
set.
It is rarely used as a central tendency measure
It is more useful to distinguish between
unimodal and Perceived as difficult to understand because it is
multimodal distributions not easy to picture what it is.
When data has more than one peak.
It is however a more robust measure of
variability.
Standard deviation is computed as follows:
Standard deviation:
A standard deviation is a measure of how
dispersed the data is in relation to
the mean.
The standard deviation can never be negative.
Importance of standard deviation:
Standard deviation is important because it helps
in understanding the
measurements when the data is distributed.
The more the data is distributed, the greater
will be the standard
deviation of that data. Variance
The standard deviation is used in finance by Is a measure of the variation around the mean
business owners to It is the square of the standard deviation
understand risk management and make better Inter Quartile Range
business decisions. Is also used to measure variability
It helps in calculating the margins of error that Quartiles divide an ordered data set into 4 parts
occur in the survey reports Each contains 25% of the data
Quartile range contains the middle 50% of the
data ex. Q3-Q1

Frequency Distributions,
Histograms, and Related Topics

ORGANIZING DATA
Frequency Tables
» A frequency table organizes data into classes
or intervals and shows how many data values
are in each class. The classes or intervals are
constructed so that each data value falls into
exactly one class. The number of classes should
be determined by the spread of the data and
the purpose of the frequency table (the number
of classes is often, but not always, given to you)
To find the Class Width (Integer Data):
largest data value — smallest data value/
desired number of classes

2. Create distinct classes.


The lower class limit is the lowest data value
that can fit in a class.
The upper class limit is the highest data value
that can fit in a class.
The class width is the difference between the The relative frequency of a class is the
lower class limit of one class and the lower class proportion of all data values that fall into that
limit of the next class. class.
To create the classes:
Use the smallest data value as the lower class The total of the relative frequencies should be
limit of the first class 1, but rounded results may make the total
Find the lower class limit of the second class by slightly higher or lower than 1.
adding the class width
Continue to find all lower class limits by To find the relative frequency:
following this pattern
3. Fill in upper class limits to create distinct
classes that accommodate all possible data
values from the data set.
Note: there should be NO overlap

4. Tally the data into classes. Find the class 8. Find the cumulative frequency for each
frequency. class.
To tally data:
Examine each data value The cumulative frequency for a class is the sum
Determine which class contains the data value. of the frequencies for that class and all previous
Each data value should fall into exactly once classes.
class.
Make a tally mark in that class’ tally column To find the cumulative frequency: add the
relative frequency of each class and all classes
To find the class frequency: add up the tallies before it.
and put the total in the class frequency column.
STATISTICS
5. Compute the midpoint (class mark) for each Statistics is the discipline that concerns the
class. The center of each class is called the
collection, organization, analysis, interpretation,
midpoint or class mark, it is often used as a
and presentation of data. In applying statistics
representative value of the entire
class. to a scientific, industrial, or social problem, it is
conventional to begin with a statistical
To find the midpoint: population or a statistical model to be studied.
Midpoint = lower class limit + upper class limit/
Data is a collection of discrete values that
2
convey information, describing quantity,
6. Determine the class boundaries.
The halfway points between the upper limit of quality, fact, statistics, other basic units of
one class and the lower limit of the next class meaning, or simply sequences of symbols that
are called class boundaries. may be further interpreted.

A datum is an individual value in a collection of


To find the Class Boundaries (Integer Data):
data.
To find the upper class boundaries, add 0.5 unit
to the upper class limits. To find the lower class Data is usually organized into structures such as
boundaries, subtract 0.5 unit from the lower tables
class limits.

7. Find the relative frequency of each class.


Data analysis is a process of inspecting, anything that can take on different values
cleansing, transforming, and modeling data with across your data set (e.g.,height or test scores).
the goal of discovering useful information,
informing conclusions, and supporting decision-
making. * There are 4 levels of measurement:
* Data analysis has multiple facets and * Nominal: the data can only be categorized
approaches, encompassing, diverse techniques
under a variety of names, and is used in * Ordinal: the data can be categorized and
different business, science, and social science ranked
domains. * Interval: the data can be categorized, ranked,
* In today's business world, data analysis plays a and evenly spaced
role in making decisions more scientific and * Ratio: the data can be categorized, ranked,
helping businesses operate more effectively. evenly spaced, and has a natural zero.
A descriptive statistic is a summary statistic that LEVELS OF MEASUREMENT
quantitatively describes or summarizes features
from a collection of information, while Nominal, ordinal, interval, and ratio data
descriptive statistics (in the mass noun sense) is - Going from lowest to highest the 4
the process of using and analyzing those levels of measurement are cumulative.
statistics. Descriptive statistic is distinguished This mean that they each take on the
from inferential statistics (or inductive statistics) properties of lower levels and add new
by its aim to summarize a sample, rather than properties
use the data to learn about the population that
the sample of data is thought to represent. NOMINAL LEVEL

This generally means that descriptive statistics, Categorize data by labelling


unlike inferential statistics, is not developed on
Ex. City of birth, gender, car brands
the basis of probability theory, and are
frequently nonparametric statistics. ORDINAL LEVEL

Statistical inference Categorize by rank

is the process of using data analysis to infer Ex. Top 5 Olympic medalist
properties of an underlying distribution of
INTERVAL LEVEL
probability. Inferential statistical analysis infers
properties of a population, for example by Can be categorized, ranked, and evenly
testing hypotheses and deriving estimates. It is spaced. But there is no zero point.
assumed that the observed data set is sampled
Ex. Test Scores, Temperature in Fahrenheit
from a larger population.
RATIO LEVEL
LEVELS OF MEASUREMENT
Can be categorized, ranked, evenly spaced,
* Levels of measurement, also called scales of
and has a natural zero.
measurement, tell you how precisely variables
are recorded. In scientific research, a variable is Ex. Height, Age, Weight

You might also like