You are on page 1of 17

Exploring data Types of data

Outcomes
Know the definitions

Identify the four types of data in a data set


Exploring data Types of data

E.g.:

Population of cars
Sample of cars N = 96
n = 12

Sampled without replacement


Exploring data Types of data

E.g.:

Population of cars
Sample of cars N = 96
n = 12

Sampled without replacement


Exploring data Types of data

E.g.: # Cases /Sample points/ Elements/ Observations


Each row represents a case /sample
point/ element/ observation
# Car
01
Population of cars
02 Sample of cars N = 96
03 n = 12
04
05
06
07
08 Sampled without replacement
09
10
11
12
Exploring data Types of data

E.g.: # Cases /Sample points/ Elements/ Observations


Each row represents a case /sample
point/ element/ observation
# Car
01
02
03
04
05 Note:
06 This is only a list of labels: Car 1, Car 2, …, Car 12.
07 This is not a data set because there are no
08 columns containing information that describe
09
the observations.
10
11
12
Exploring data Types of data

E.g.: # Cases /Sample points/ Elements/ Observations


Each row represents a case /sample
point/ element/ observation
# Car Colour No specific order (Raw data)
01 Red
02 Yellow
03 Blue
04 Red
05 Blue
Now
06 Grey we have a data set because we have
07 Green listed something that describes the
08 Green observations (cars).
09 Blue
10 Green
11 Green
12 Yellow
Exploring data Types of data

E.g.: # Cases /Sample points/ Elements/ Observations


Each row represents a case /sample
point/ element/ observation
# Car Colour No specific order (Raw data)
01 Red
02 Yellow Grouped data (Arranged)
03 Blue #Red #Yellow #Blue #Green #Grey
04 Red 2 2 4 3 1
05 Blue
06 Grey
07 Green
08 Green
09 Blue
10 Green
11 Green
12 Yellow
Exploring data Types of data

E.g.: Variable/ Characteristic = Thing of interest Can assume many different values

# Car Colour Only one variable = univariate data


01 Red
02 Yellow ‘Colour’ is a category ⇒ qualitative variable
03 Blue
‘Colour’ has no natural order ⇒ nominal variable
04 Red
05 Blue
06 Grey
07 Green
08 Green
09 Blue
10 Green
11 Green
12 Yellow
Exploring data Types of data

E.g.: Variable/ Characteristic = Thing of interest Can assume many different values

# Car Colour Age Two variables ⇒ bivariate data


01 Red 5
02 Yellow 9 ‘Age’ is a measurement ⇒ quantitative variable
03 Blue 3
‘Age’ (in years) are whole numbers ⇒ discrete variable
04 Red 2
05 Blue 19
06 Grey 8
07 Green 15
08 Green 7
09 Blue 10
10 Green 6
11 Green 3
12 Yellow 4
Exploring data Types of data

E.g.: Variable/ Characteristic = Thing of interest Can assume many different values

# Car Colour Age Distance Three variables ⇒ multivariate data


01 Red 5 50016.30
02 Yellow 9 127694.00 ‘Distance’ is a measurement ⇒ quantitative variable
03 Blue 3 36011.50
‘Distance’ has decimals ⇒ continuous variable
04 Red 2 27558.20
05 Blue 19 240321.20
06 Grey 8 93483.60
07 Green 15 211027.70
08 Green 7 92577.30
09 Blue 10 125465.60
10 Green 6 88640.00
11 Green 3 38401.60
12 Yellow 4 43605.20
Exploring data Types of data

E.g.: Variable/ Characteristic = Thing of interest Can assume many different values

# Car Colour Age Distance Risk Four variables ⇒ multivariate data


01 Red 5 50016.30 High
02 Yellow 9 127694.00 Medium ‘Risk’ is not a measurement ⇒ qualitative variable
03 Blue 3 36011.50 High
‘Risk’ has a natural order ⇒ ordinal variable
04 Red 2 27558.20 High
05 Blue 19 240321.20 Low
06 Grey 8 93483.60 Medium
07 Green 15 211027.70 Low
08 Green 7 92577.30 Medium
09 Blue 10 125465.60 Low
10 Green 6 88640.00 Medium
11 Green 3 38401.60 High
12 Yellow 4 43605.20 High
Exploring data Types of data

E.g.:
# Car Colour Age Distance Risk
01 Red 5 50016.30 High
02 Yellow 9 127694.00 Medium
Sample of cars
03 Blue 3 36011.50 High
n = 12
04 Red 2 27558.20 High
05 Blue 19 240321.20 Low
06 Grey 8 93483.60 Medium
07 Green 15 211027.70 Low
08 Green 7 92577.30 Medium
09 Blue 10 125465.60 Low
10 Green 6 88640.00 Medium
11 Green 3 38401.60 High
12 Yellow 4 43605.20 High
Exploring data Types of data

E.g.:
# Car Colour Age Distance Risk
01 04 Red 5 50016.30 High
02 54 Yellow 9 127694.00 Medium
Sample of cars
03 14 Blue 3 36011.50 High
n = 12
04 18 Red 2 27558.20 High
05 09 Blue 19 240321.20 Low
06 45 Grey 8 93483.60 Medium
07 68 Green 15 211027.70 Low
08 07 Green 7 92577.30 Medium
09 43 Blue 10 125465.60 Low
10 66 Green 6 88640.00 Medium
11 27 Green 3 38401.60 High
12 08 Yellow 4 43605.20 High
Exploring data Types of data

E.g.:
# Car Colour Age Distance Risk
01 04 Red 5 50016.30 High
02 54 Yellow 9 127694.00 Medium New data collected by the
03 14 Blue 3 High
04 18 Red 2
36011.50
27558.20 High
researcher through
05 09 Blue 19 240321.20 Low experimentation/
06 45 Grey 8 93483.60 Medium observation/ survey
07
08
68
07
Green
Green
15
7
211027.70 Low
92577.30 Medium
⇒ Primary data
09 43 Blue 10 125465.60 Low
Otherwise
10 66 Green 6 88640.00 Medium
11 27 Green 3 38401.60 High
⇒ Secondary data
12 08 Yellow 4 43605.20 High
Exploring data Types of data – Notation

E.g.:
Variable names
• Always single capitals
# Car C A D R
• Usually (but not always) the last letters of the
01 04 Red 5 50016.30 High
alphabet, like X, Y, Z
02 54 Yellow 9 127694.00 Medium
03 14 Blue 3 36011.50 High
04 18 Red 2 27558.20 High
05 09 Blue 19 240321.20 Low
06 45 Grey 8 93483.60 Medium
07 68 Green 15 211027.70 Low
08 07 Green 7 92577.30 Medium
09 43 Blue 10 125465.60 Low
10 66 Green 6 88640.00 Medium
11 27 Green 3 38401.60 High
12 08 Yellow 4 43605.20 High
Exploring data Types of data – Notation

E.g.:
Variable names
• Always single capitals
# Car C A D R
• Usually (but not always) the last letters of the
01 04 c₀₁ a₀₁ d₀₁ r₀₁
alphabet, like X, Y, Z
02 54 c₀₂ a₀₂ d₀₂ r₀₂
03 14 c₀₃ a₀₃ d₀₃ r₀₃
04 18 c₀₄ a₀₄ d₀₄ r₀₄
05 09 c₀₅ a₀₅ d₀₅ r₀₅ Variable values
06 45 c₀₆ a₀₆ d₀₆ r₀₆ • Always lower case version of variable name
07 68 c₀₇ a₀₇ d₀₇ r₀₇ • Must have a subscript /index equal to the case
08 07 c₀₈ a₀₈ d₀₈ r₀₈
number
09 43 c₀₉ a₀₉ d₀₉ r₀₉
10 66 c₁₀ a₁₀ d₁₀ r₁₀
11 27 c₁₁ a₁₁ d₁₁ r₁₁
12 08 c₁₂ a₁₂ d₁₂ r₁₂
Exploring data Types of data – Summary

In general:
• Sampling with /without replacement
# X Y Z
• Raw /grouped data
• Primary /secondary data
1 x1 y1 z1
• Univariate /bivariate /multivariate data
2 x2 y2 z2 • Variables
• Names are single capitals
3 x3 y3 z3 • Values are lower case with case# index
• Qualitative
⋮ ⋮ ⋮ ⋮ • Ordinal
• Nominal
n xn yn zn • Quantitative
• Discrete
• Continuous
Data

Sample size = #Cases

You might also like