You are on page 1of 24

Chapter 2

Data

1
© 2010 Pearson Education
2.1 What Are Data?
• Data values or observations are information collected
regarding some subject.
• Data can be numbers, names, etc., and tells us the “Who
and What”.
• Data are useless without their context.
• Data are often organized into a data table like that below.

2
© 2010 Pearson Education
2.1 What Are Data?
The rows of a data table correspond to individual cases
about Whom we record some characteristics.
These characteristics may be collected on or about …
• respondents – individuals who answer a survey
• subjects or participants – people in an experiment
• experimental units – animals, plants, websites, or other
inanimate objects


Cases

3
© 2010 Pearson Education
2.1 What Are Data?
The characteristics recorded about each individual or case
are called variables.
These are usually shown as the columns of a data table
and identify What has been measured.

         Variables
            

4
© 2010 Pearson Education
2.1 What Are Data?
Data tables are cumbersome for complex data sets, so
often two or more separate data tables are linked together
in a relational database.

Each data table included in the database is a relation


because it is about a specific set of cases with information
about each of these cases for all the variables.

5
© 2010 Pearson Education
2.1 What Are Data?
Example: A typical relational database is provided
consisting of three relations: customer data, item data, and
transaction data.

For example, we
can look up a
customer to see
what items they
purchased, or we
may look up an
item to see who
purchased it.

6
© 2010 Pearson Education
2.2 Variable Types
When a variable names categories and answers questions
about how cases fall into those categories, it is called a
categorical variable.

When a variable has measured numerical values with units


and the variable tells us about the quantity of what is
measured, it is called a quantitative variable.

7
© 2010 Pearson Education
2.2 Variable Types
Categorical variables …

• arise from descriptive responses to questions like


“What kind of advertising do you use?”.

• may only have two possible values (like “yes” or “no”).

• may be a number like a zip code.

8
© 2010 Pearson Education
2.2 Variable Types
Quantitative variables must have units. The units indicate …

• how each value has been measured.

• the corresponding scale of measurement.

• how much of something we have.

• how far apart two values are.

9
© 2010 Pearson Education
2.2 Variable Types
Some variables can be both categorical and quantitative.
How data are classified depends on Why we are collecting
the data.

For example, variable Age is obviously the quantitative


value, measured in years, that may be used for finding the
average age of customers.

Age can also be the categorical value used to classify books


for an internet store.

10
© 2010 Pearson Education
2.2 Variable Types
Counts

In gathering data, we often count things.

Counts can be used in two different ways.


1) We can count the cases in each category to summarize
what the variable tells us.
2) We can count how many of something we have.

11
© 2010 Pearson Education
2.2 Variable Types
Counts

A company may count the number of purchases shipped by


different means to summarize the categorical variable
Shipping Method. Here the count is used categorically.

12
© 2010 Pearson Education
2.2 Variable Types
Counts

A company may want to know how many teenage customers


they have. Here the count is being used quantitatively.

13
© 2010 Pearson Education
2.2 Variable Types
Identifiers

An identifier variable is a unique identifier assigned to each


individual or item in a group.

For example, social security numbers, student ID numbers,


tracking numbers, transactions numbers, etc. are all
identifier variables for people or items.

14
© 2010 Pearson Education
2.2 Variable Types
Identifiers

Identifier variables …
• do not have units.
• are a special kind of categorical variable.
• are useful in combining data from different sources to
avoid duplication.
• are not variables to be analyzed.

15
© 2010 Pearson Education
2.2 Variable Types
Other Data Types

Categorical variables used only to name categories are


sometimes called nominal variables.

When data values can be ordered, we say that the variable


has ordinal values. For example, employees can be ranked
according to the number of months employed.

16
© 2010 Pearson Education
2.2 Variable Types
Other Data Types

We may differentiate quantitative variables according to


whether their measured values have a defined value for
zero.

For example, 80oF is not twice 40oF since 0o is an arbitrarily


chosen value. However, 80 mph is twice 40 mph since we
know that 0 mph corresponds to when an object is still.

Data without a defined value for zero are said to be on an


interval scale, otherwise, they are said to be on a ratio scale.

17
© 2010 Pearson Education
2.2 Variable Types
Cross-Sectional and Time Series Data

Variables that are measured at regular intervals over time


are called a time series. For example, determining total
costs each month of a year.

When several variables are all measured at the same time


point, the data is called cross-sectional data. For example,
determining sales revenue, number of customers, and
expenses for the last month of business.

18
© 2010 Pearson Education
2.3 Where, How, and When
When data are collected can be important. Data that are
decades old may mean something different than similar
values recorded last year.

Where data are collected can be important. Data collected in


Mexico may differ in meaning than data collected in the
United States.

19
© 2010 Pearson Education
2.3 Where, How, and When
How data are collected can make the difference between
insight and nonsense.

For example, data that come


from a voluntary survey on
the Internet are almost always
worthless.

However data provided by


agencies and businesses on
websites can be extremely
useful.

20
© 2010 Pearson Education
2.3 Where, How, and When
Data can be found …

• by performing an experiment and actively manipulating


variables.

• in information collected by public or private agencies.

• on internet sites.

21
© 2010 Pearson Education
What Can Go Wrong?
• Don’t label variables as categorical or quantitative without
thinking about what they represent.

• Don’t automatically assume variables that are numbers are


quantitative.

• Always be skeptical and look for bias in how the data was
collected.

22
© 2010 Pearson Education
What Have We Learned?
• The five W’s (Who, What, Why, Where, and When) help
nail down the context of the data.

• We must know at least the Who, What, and Why to be


able to say anything useful based on the data.
• The Who are the cases.
• The What are the variables.
• The Why help us decide which way to treat the data.

23
© 2010 Pearson Education
What Have We Learned?
• Categorical variables identify a category for each case.

• Identifier variables are categorical variables that name


each case.

• Quantitative variables record measurements or amounts of


something and must include units.

• Variables can be categorical or quantitative depending on


what we want to learn from them.

24
© 2010 Pearson Education

You might also like