You are on page 1of 16


Week 01, Part 1
Statistics, symbols and
19th century scientific philosophy described
a clockwork universe
A small number of mathematical laws could be
used to describe reality and predict the future.
All that was needed were the laws and
sufficiently precise measurements to put into the
An error function was part of the laws and
any error was attributed to measurement

A little history
more accurate measurements were
accompanied by MORE error.
attempts to discover the laws of
biology and sociology had failed.
well-known laws of physics and
chemistry were proving to be rough

By the end of the 19th century

almost all sciences had shifted to using
statistical models.
Such models make random variability endogenous.
Data recording and storing has been simplified
by the computer.
There is a LOT of data out there, and a single
observation is not enough.
At the simplest level, statistics is all about
making sense out of datawhich we have in
abundance in our world.

By the end of the 20th century . . .

An academic discipline
The science of uncertainty.
A branch of mathematics that consists of a set of
analytical techniques that can be applied to data to
help make judgments and decisions in problems
involving uncertainty.
A set of tools and methods that allow one to get
information from data.
The science of collecting, organizing, presenting,
analyzing and interpreting data.

What is Statistics?
Specificnumbers that summarize or
describe samples.
The sample of adult males was 67.5 inches
tall on average .

A sample of workers in New York and

California typically spend 35.22 minutes
commuting to work.

Specific numbers . . .
Lower case Greek Letters, or UPPER CASE
LATIN: , , , N
Lower case Latin letters: , s, r, n

Groups of data and their symbols

A distinction is made between two
branches of statistics. The difference
lies in their purpose.
Descriptive Statistics
Methods that display, organize and
summarize data in an informative way.
They describe a fixed data set.

Inferential Statistics
Methods that draw conclusions, make
inferences (guesses) and make predictions
about populations based on a sample of data.
A variable is some characteristic of a
population or sample that will likely
change from item to item in the sample or
The individual values of a variable are
Data or data set
Data are the observed values of a variable.

Components of data
What distinguishes different types of data?
An inherent order based on
The amount of information one observation provides.
The complexity of the information.
The number of statistical techniques that can be used on that
The Hierarchy of Data
Like any hierarchy, the higher in the system you get, the
information, complexity and number of techniques usable on
the data increase.
Characteristics of a lower data level are passed along to a
higher level, but not the reverse.

Data Differences
Generally, the presence or absence of

Data, Variables, Observations

a quality or characteristic in the
observation. They may be
represented with text or may be
coded into numerical values, which is
Qualitative or for convenience only

Categorical Nominal data tells only the name of

the characteristic or quality and an
indication of its presence or absence.

Ordinal data is Nominal data that

has an order that adds information.

Real numbers which are the result of

a count or measurement, which are
always reported numerically and
have a numerical meaning.
Quantitative, Discrete data is the result of a count
Numerical or and is represented by integers.

Interval Continuous data is the result of a

measurement and is represented by
any real number.

Categories of Data--Levels
Categorical or qualitative data
The presence or absence of a specific characteristic in an
item, for example, the characteristic of gender. A person is
either female or not female.
The simplest type or the lowest level of data.
It contains the least information, only the name of the
characteristic or quality and an indication of its presence or
The fewest statistical methods are available for use on nominal
A 2 category variable is often called binary.
Variables with more than 2 categories exist.
Categories may be represented by numbers, but the numbers
are assigned arbitrarily, and thus have no numerical

Types of Data-Nominal
Categorical or qualitative data
A ranked multi-category nominal variable, for example, an individuals
response to the request, Rank the quality of your meal. Categories
for response are: low quality, poor quality, average quality, good
quality and excellent quality.
The ranking provides more information than simply the presence of a
It is understood that as the categories are moved through in order, the
amount of some quality is increasing or decreasing.
The difference between levels cannot be measured.
Each individuals perception of the quality and its level is likely different.
This data is more complex than nominal and is of a higher level.
The ranking may be represented by ordered numbers, but the numbers
are assigned arbitrarily, and thus have no numerical meaning.

Types of Data-Ordinal
Quantitative, numerical or interval data
Values are the result of a count, thus only integers are
possible observations, for example, the number of seats in a
row, the number of stocks that increased in value, etc.
It can assume only a countable number of different values.
It is possible to list the complete set of values such a
variable can take on.
It is considered to have gaps between values along the
real number line, although no intervening number is
They are always represented by numbers, the numbers are
always ordered, and no values are skipped even if there are
no observations there.

Types of Data-Discrete
Quantitative, numerical or interval data
Values are the result of a measurement, thus any real
number is a potential observation, for example, individual
height and weight are both continuous.
The inability to measure to an infinite level of detail does not
make the variable other than continuous. If an infinite
number of values are possible, the variable is continuous.
It is not possible to list the complete set of values such a
variable can take on.
Values are always represented by numbers but the numbers
are generally rounded due to circumstances or for
convenience. They may be represented by integers only.

Types of Data-Continuous
The techniques we will use in a given situation
will be determined by the type or level of data
we have.
Since qualities of data levels carry as one moves up
the hierarchy of data, but not down, in general the
lowest level of data dictates the statistical methods
available for use.
Statisticians have developed many clever ways to
define their way out of this problem, as we will
learn over the semester.

The tyranny of the data type