Professional Documents
Culture Documents
Probability &
Statistics
2DI90 – Chapter 6 of MR
What is Statistics?
According to the Encyclopedia Britannica:
“Statistics is the art and science of gathering, analyzing, and
making inferences from data”
The data in this example is loosely based on a recent poll, as described in http://www.rasmussenreports.com/
public_content/politics/elections/election_2012/election_2012_presidential_election/wisconsin/ 3
election_2012_wisconsin_president.
Probability AND Statistics
Probability and Statistics are NOT the same thing!!!
Probabilistic Model
(explains how data was created) Sample
Population (World)
5
Descriptive Statistics
Typically we can only say something sensible about the world
if we assume a statistical model for it. Nevertheless, a good
start is to summarize the contents of a dataset, or
represent them in a palatable way.
For the golf ball dataset we can easily compute this (with a
computer) and see that
Notice the
In our example
units are
squared !!!
always non-negative
12
The Sample Range
Another way to assess variability:
In our example
13
The Sample Range
Suppose there was an element in the dataset that
corresponded to a defective ball, and the corresponding
distance was only 150 meters. The sample range would change
to 135.3m (before was 58.3m), and the sample standard
deviation would change to 17.36m (before was 13.41m).
261.3 ; 258.1 ; 254.2 ; 257.7 ; 237.9 ; 255.8 ; 241.4 ; 256.8 ; 255.3 ; 255 ; 259.4 ; 270.5 ; 270.7 ; 272.6 ;
272.2 ; 251.1 ; 232.1 ; 273 ; 266.6 ; 273.2 ; 265.7 ; 264.5 ; 245.5 ; 280.3 ; 248.3 ; 267 ; 271.5 ; 240.8 ;
268.9 ; 263.5 ; 262.2 ; 244.8 ; 279.6 ; 272.7 ; 278.7 ; 255.8 ; 276.1 ; 274.2 ; 267.4 ; 244.5 ; 252 ; 264 ;
247.7 ; 273.6 ; 264.5 ; 285.3 ; 277.8 ; 261.4 ; 253.6 ; 278.5 ; 260 ; 271.2 ; 254.8 ; 256.1 ; 264.5 ; 255.4 ;
259.5 ; 274.9 ; 272.1 ; 273.3 ; 279.3 ; 279.8 ; 272.8 ; 268.5 ; 283.7 ; 263.2 ; 257.5 ; 233.7 ; 260.2 ; 263.7 ;
244.3 ; 241.2 ; 254.4 ; 274 ; 260.7 ; 260.6 ; 255.1 ; 233.7 ; 253.7 ; 250.2 ; 251.4 ; 270.6 ; 273.4 ; 242.9 ;
276.6 ; 237.8 ; 261 ; 236 ; 251.8 ; 280.3 ; 268.3 ; 266.8 ; 254.5 ; 234.3 ; 251.6 ; 226.8 ; 240.5 ; 252.1 ;
245.6 ; 270.5
14
Other Numerical Summaries
There are many other numerical summaries that are important
(we’ll encounter them later, in the context of graphical
representations of data)
17
230 240 250 260 270 280
Histograms
Scatterplots are still a bit difficult to read – a way we can get
a better view is by aggregating data into bins
Histogram of x
15
Frequency
5
0
20
0
Histogram of x
220 240 260 280 300
x
15
Frequency
Just right!!!
5
0
Histogram of x
230 240 250 260 270 280 290
4
x
Frequency
3
2
5
0
5
0
70 80 90 100 110
20 40 60 80
Frequency
6000
x
4500
0
0 50 100 150 200 250 4000 4500 5000 5500 6000 6500 7000 7500
Index x
80
60
Frequency
0.00
40
r
20
-0.04
0
0 50 100 150 200 250 -0.06 -0.04 -0.02 0.00 0.02 0.04
Index r
There seems to be much less of
a temporal trend on the returns,
so histograms and box-plots are
potentially valid representations
of the data…
27
Quantile-Quantile Plots
We can compare the order statistics, to the values we would
expect for some distributions (e.g. a normal distribution).
30
-0.04 0.00
Density
20
10
0
-3 -2 -1 0 1 2 3 -0.04 0.00
Theoretical Quantiles r
0.4
Sample Quantiles
Density
0
0.2
-1
-2
0.0
-3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Theoretical Quantiles r
30
Normal Quantile-Quantile Plots
Example: Synthetic data – exponential distribution
Normal Q-Q Plot Histogram of r
line
0.8
3
Density
2
0.4
1
0.0
0
-3 -2 -1 0 1 2 3 0 1 2 3 4
Theoretical Quantiles r
280
Sample Quantiles
30
270
Sample Quantiles
-0.04 0.00
Density
20
260
10
250
0 line
230
-3 -2 -1 0 1 2 3 -0.04 0.00
-2 -1 0 1 2 32
Theoretical Quantiles r
Theoretical Quantiles
What’s Next
Now that we can summarize and represent data in nice ways we
would like to make meaningful statements about the processes
that generated the data.