Professional Documents
Culture Documents
04 Descriptive Statistics
04 Descriptive Statistics
PART I:
Descriptive Statistics
• A. Role of statistics
• Description of data
• Inference/ association
• Explanation
• Decision-making
1
9/6/2016
1999
Median
Total Urban Household
State Region Population Population Income ($)
2
9/6/2016
Temperature Relative
(in degrees Humidity
Hour Celcius) (in %) Cloudiness Rain
0 23 60 No Clouds no
1 22.5 65 No Clouds no
2 22 68 No Clouds no
3 21.5 70 No Clouds no
4 21 73 No Clouds no
5 21.5 69 No Clouds no
6 22.5 68 No Clouds no
7 24 62 No Clouds no
8 25 60 No Clouds no
9 28 58 No Clouds no
10 30 55 No Clouds no
11 32 54 Partly Cloudy no
12 33 50 Partly Cloudy no
13 34 48 Partly Cloudy no
14 34 48 Partly Cloudy no
15 33 48 Cloudy no
16 25 49 Cloudy yes
17 24 50 Cloudy yes
18 27 52 Cloudy no
19 26 53 Partly Cloudy no
20 25.5 52 Partly Cloudy no
21 25 56 Partly Cloudy no
22 24.5 58 No Clouds no
23 24 59 No Clouds no
Types of variables
1) Nominal
describes/names/ labels categories
Notes:
- Categories must be mutually exclusive
- Variable with only two outcomes (yes/no, present/absent) = binary
3
9/6/2016
• 2) Ordinal
Outcomes sorted in ordered categories e.g.: poor/medium/good
dark/medium/light
Weather data: ‘degree of cloudiness’.
• 3) Interval variable:
= continuous, can take any possible numerical value [-∞ ; + ∞] or [0 ;
100]
E.g., ‘relative humidity’ = 55.3575% or 55%, dollars, age, minutes, years
4
9/6/2016
• Sex (male/female)
• Year when house built
• Housing Quality (very high/ high/ low/ very low)
• Temperature
• Amount of carbon monoxide in air
• Amount of precipitation
• Race
• Age
• Income
• Large datasets
• Seeing patterns or regularities is impossible, unless we do “something”
with the data
• -> 2 common procedures that help us understand and reveal regularities
• graphical display of data
• statistical summary measures
5
9/6/2016
A few rules
1) Variable names: usually CAPITALIZED and abbreviated
POP90, POP_URB_90
6
9/6/2016
C. Description of data
Black 3
Brown 25
Blond 12
Red 4
Total 44
Age Frequency
15-19 3
20-24 15
25-29 12
30-34 11
Total 41
7
9/6/2016
Cumulative distribution
= number of observation at value x or higher
8
9/6/2016
• Extremely useful
• at beginning of every data analysis
• no limits to your creativity
• Should be self-explanatory
• Clear title
• Clear labels (axes, title, legend )
9
9/6/2016
Frequency distributions
1) Kurtosis (peakedness)
Flat
Peaked
Normal
10
9/6/2016
Symetrical
Scatter diagrams
• 2 variables in 1graph (= x-y graph in Excel)
Relative Relationship between
Humidity (%) Temperature and Relative Humidity
75
70
65
60
55
50
45
40 Temperature
(Celcius)
19 21 23 25 27 29 31 33 35
11
9/6/2016
14 no clouds 25 no rain
12
20
10 partly cloudy
8 15
6 cloudy
10
4 rain
5
2
0 0
no clouds
54%
partly
29%
cloudy
no
rain 92%