1/9/2017

Week 01, Part 1

Statistics, symbols and

data

19th century scientific philosophy described

a clockwork universe

A small number of mathematical laws could be

used to describe reality and predict the future.

All that was needed were the laws and

sufficiently precise measurements to put into the

laws.

An error function was part of the laws and

any error was attributed to measurement

inaccuracies.

A little history

more accurate measurements were

accompanied by MORE error.

attempts to discover the laws of

biology and sociology had failed.

well-known laws of physics and

chemistry were proving to be rough

approximations.

almost all sciences had shifted to using

statistical models.

Such models make random variability endogenous.

Data recording and storing has been simplified

by the computer.

There is a LOT of data out there, and a single

observation is not enough.

At the simplest level, statistics is all about

making sense out of datawhich we have in

abundance in our world.

An academic discipline

The science of uncertainty.

A branch of mathematics that consists of a set of

analytical techniques that can be applied to data to

help make judgments and decisions in problems

involving uncertainty.

A set of tools and methods that allow one to get

information from data.

The science of collecting, organizing, presenting,

analyzing and interpreting data.

What is Statistics?

Specificnumbers that summarize or

describe samples.

The sample of adult males was 67.5 inches

tall on average .

California typically spend 35.22 minutes

commuting to work.

Specific numbers . . .

Populations

Parameters

Lower case Greek Letters, or UPPER CASE

LATIN: , , , N

Samples

Statistics

Lower case Latin letters: , s, r, n

A distinction is made between two

branches of statistics. The difference

lies in their purpose.

Descriptive Statistics

Methods that display, organize and

summarize data in an informative way.

They describe a fixed data set.

Inferential Statistics

Methods that draw conclusions, make

inferences (guesses) and make predictions

about populations based on a sample of data.

Variable

A variable is some characteristic of a

population or sample that will likely

change from item to item in the sample or

population.

Observation

The individual values of a variable are

observations.

Data or data set

Data are the observed values of a variable.

Components of data

What distinguishes different types of data?

An inherent order based on

The amount of information one observation provides.

The complexity of the information.

The number of statistical techniques that can be used on that

data.

The Hierarchy of Data

Like any hierarchy, the higher in the system you get, the

information, complexity and number of techniques usable on

the data increase.

Characteristics of a lower data level are passed along to a

higher level, but not the reverse.

Data Differences

Generally, the presence or absence of

a quality or characteristic in the

observation. They may be

represented with text or may be

coded into numerical values, which is

Qualitative or for convenience only

the characteristic or quality and an

indication of its presence or absence.

has an order that adds information.

a count or measurement, which are

always reported numerically and

have a numerical meaning.

Quantitative, Discrete data is the result of a count

Numerical or and is represented by integers.

measurement and is represented by

any real number.

Categories of Data--Levels

Categorical or qualitative data

Nominal

The presence or absence of a specific characteristic in an

item, for example, the characteristic of gender. A person is

either female or not female.

The simplest type or the lowest level of data.

It contains the least information, only the name of the

characteristic or quality and an indication of its presence or

absence.

The fewest statistical methods are available for use on nominal

data.

A 2 category variable is often called binary.

Variables with more than 2 categories exist.

Categories may be represented by numbers, but the numbers

are assigned arbitrarily, and thus have no numerical

meaning.

Types of Data-Nominal

Categorical or qualitative data

Ordinal:

A ranked multi-category nominal variable, for example, an individuals

response to the request, Rank the quality of your meal. Categories

for response are: low quality, poor quality, average quality, good

quality and excellent quality.

The ranking provides more information than simply the presence of a

characteristic.

It is understood that as the categories are moved through in order, the

amount of some quality is increasing or decreasing.

The difference between levels cannot be measured.

Each individuals perception of the quality and its level is likely different.

This data is more complex than nominal and is of a higher level.

The ranking may be represented by ordered numbers, but the numbers

are assigned arbitrarily, and thus have no numerical meaning.

Types of Data-Ordinal

Quantitative, numerical or interval data

Discrete

Values are the result of a count, thus only integers are

possible observations, for example, the number of seats in a

row, the number of stocks that increased in value, etc.

It can assume only a countable number of different values.

It is possible to list the complete set of values such a

variable can take on.

It is considered to have gaps between values along the

real number line, although no intervening number is

skipped.

They are always represented by numbers, the numbers are

always ordered, and no values are skipped even if there are

no observations there.

Types of Data-Discrete

Quantitative, numerical or interval data

Continuous

Values are the result of a measurement, thus any real

number is a potential observation, for example, individual

height and weight are both continuous.

The inability to measure to an infinite level of detail does not

make the variable other than continuous. If an infinite

number of values are possible, the variable is continuous.

It is not possible to list the complete set of values such a

variable can take on.

Values are always represented by numbers but the numbers

are generally rounded due to circumstances or for

convenience. They may be represented by integers only.

Types of Data-Continuous

The techniques we will use in a given situation

will be determined by the type or level of data

we have.

Since qualities of data levels carry as one moves up

the hierarchy of data, but not down, in general the

lowest level of data dictates the statistical methods

available for use.

Statisticians have developed many clever ways to

define their way out of this problem, as we will

learn over the semester.

