Professional Documents
Culture Documents
Descriptive Statistics and Confidence Limits
Descriptive Statistics and Confidence Limits
Dr. Charlie Wu
September 11, 2019
1
Content
1. Statistical data analysis of geoscientific data
2. Descriptive statistics
central tendency
dispersion, skewness and kurtosis
SD & SE example
3. Theoretical probability distribution
4. Z table
5. Confidence limits for population data
6. Confidence limits for sample mean
7. t table
8. Class example
9. Homework 2
2
Statistical analysis of geoscientific data
Descriptive Stat: data summary and presentation
data presentation
Central tendency (Mean,…) , dispersion (SD,…)
Shape
Dispersion/variability:
Standard deviation (SD)= spread of data, variance
4
Descriptive Statistics of RRE sample
5
Sample Sample Sample
Mean Standard
31.40 x variable
deviation
x
s
0.00
Sample size -1
31.4 + 0 + 0.99 + 0 + 0.02 (24.92)2+(-6.48)2+(-5.49)2+(-6.48)2+(-6.46)2
0.99 -1
13.93
The “standard error of the mean”,
s 13.93 also called the “standard deviation
6.23 of the mean”, is a method used to
estimate the standard deviation of
6
a sampling distribution.
Sample size N=15 Sampling Stat
A 27 20 12
0 11 0
0
6
0
5 A
N=
15
mean
8.5
SD
9.3
SE
2.4
SD of mean
21 19 0 0 7 B 15 16.8 37.5 9.7 from each
C 15 21.5 34.7 9.0
B 0 0 1 11 8 D 15 19.5 26.7 6.9 sample
150 3 9 0 0 E 15 11.7 17.9 4.6
16 17 12 20 5 A~E
Repeated sampling 5 times Stat
C 32 20 10 6 6 N= mean SD of mean
0 37 14 17 17 A~E 5 15.6 5.4 5.4
141 9 0 5 8
SD & SE
SE estimated
example
D 56 25 43 0 0
0 2 0 0 37
12 91 13 11 2 from 5 samples
0 17 0 0 4
E 39 0 2 0 9
9 60 30 0 6
F
G
H
I
J
:
:
:
7
Sampling Stat
N= mean SD SE
A 15 8.5 9.3 2.4
B 15 16.8 37.5 9.7
C 15 21.5 34.7 9.0
D 15 19.5 26.7 6.9
E 15 11.7 17.9 4.6
8
MEASURES OF CENTRAL TENDENCY
• Mode
• Median
• Arithmetic mean
9
http://en.wikipedia.org/wiki/Mean_(statistics)
1 Arithmetic mean
2 Geometric mean
3 Harmonic mean
4 Power mean
5 Weighted mean
6 Interquartile mean
10
MEASURES OF DISPERSION
• Range
Population Sample
• Standard deviation
Population Sample
• Variance
• Coefficient of variation
Population Sample
1 𝑛 𝑥𝑖 −μ 3 𝑛 𝑛 𝑥𝑖 −𝑥 3
• Skewness (symmetry) σ σ𝑖=1
𝑛 𝑖=1 σ (𝑛−1)(𝑛−2) 𝑠
𝑛(𝑛+1) 𝑛 𝑥𝑖 −𝑥 4 3 𝑛−1 2
• Kurtosis (peakness) σ𝑖=1 -
(𝑛−1)(𝑛−2)(𝑛−3) 𝑠 (𝑛−2)(𝑛−3)
11
99.7% Normal/Gaussian/Z
95.5% Distribution
68.3%
+1 SD
-1 SD
34.13%
34.13%
+2 SD
-2 SD
13.59%
13.59%
+3 SD
-3 SD
One standard deviation "one sigma" (red area) accounts for about 68 percent of the data points.
Two standard deviations (the red and green areas) account for roughly 95 percent of the data points.
Three (3) standard deviations (the red, green and blue areas) account for about 99 percent of the data
points.
12
Theoretical probability distribution
Probability density function (pdf).
Has 2 parameters: µ and σ.
50%
90%
95%
99% 13
Theoretical distributions
Pdf: probability density function
Cdf: cumulative density function
-SD +SD
14
Standard Standard Standard Standard
Cumulative Cumulative Cumulative Cumulati
%
Deviation from Deviation from Deviation from
Deviation from
0.13
probability probability probability probabili
Mean Mean Mean Mean
-3.00 -3.000.0014
0.00135 0.0014 0.00 0.000.5000 0.5000
-2.90 -2.900.0019 0.0019 0.10 0.100.5398 0.5398
Z
-2.80 -2.800.0026 0.0026 0.20 0.200.5793 0.5793
-2.70 -2.700.0035 0.0035 0.30 0.300.6179 0.6179
-3 SD
-2.60 -2.600.0047 0.0047 0.40 0.400.6554 0.6554
-2.576 -2.50 -2.500.0062 0.0062 0.50 0.500.6915 0.6915
-2.40 -2.400.0082 0.0082 0.60 0.600.7258 0.7258
-2.30 -2.300.0107 0.0107 0.70 0.700.7580 0.7580
-2.20 -2.200.0139 0.0139 0.80 0.800.7882 0.7882
-2.10 -2.100.0179 0.0179 0.90 0.900.8159 0.8159
2.15%
-2.00 -2.000.0228 0.0228 1.00 1.000.8414 0.8414
-1.96 -1.90 0.0227
-1.900.0287 0.0287 1.10 1.100.8643 0.8643
-1.80 -1.800.0359 0.0359 1.20 1.200.8849 0.8849
-1.70 -1.700.0446 0.0446 1.30 1.300.9032 0.9032
-1.64
-2 SD
-1.60 -1.600.0548 0.0548 1.40 1.400.9192 0.9192
-1.50 -1.500.0668 0.0668 1.50 1.500.9332 0.9332
-1.40 -1.400.0808 0.0808 1.60 1.600.9452 0.9452
%
-0.67
-0.60 -0.600.2743 0.2743 2.40 2.400.9918 0.9918
-0.50 -0.500.3085 0.3085 2.50 2.500.9938 0.9938
%
dard Standard
-0.30 Standard
-0.300.3821 0.3821 2.70 2.700.9965 0.9965
umulative Cumulative Cumulative Cumulative
on from Deviation Deviation
-0.20 from from
-0.200.4207 0.4207 2.80 2.800.9974 0.9974
probability probability probability probability
an Mean
-0.10 Mean
-0.100.4602 0.4602 2.90 2.900.9981 0.9981
.00
0.0014 0.0014 0.00 0.00
0.000.5000 0.5000
0.5000 3.00 3.000.9987 0.9987
Z
.90
0.0019 0.0019 0.10 0.100.5398 0.5398
99%
95%
90%
.80
50%
0.0026 0.5793
0.0026 0.20 0.200.5793
.70
0.0035 0.0035 0.30 0.300.6179 0.6179
68.26%
95.45%
99.73%
.60
0.0047 0.0047 0.40 0.400.6554 0.6554
Probability
%
.50
0.0062 0.0062 0.50 0.500.6915 0.6915
.40 0.0082 0.7258
34.13
.00
0.0228 0.0228 1.00 0.8413
1.000.8414 0.8414
.90
0.0287 0.0287 1.10 1.100.8643 0.8643
.80
0.0359 0.0359 1.20 1.200.8849 0.8849
.70
0.0446 0.0446 1.30 1.300.9032 0.9032
.60
0.0548 0.0548 1.40 1.400.9192 0.9192
%
.50
0.0668 0.0668 1.50 1.500.9332 0.9332
13.59
.40
0.0808 0.0808 1.60 1.600.9452 0.9452
.30
0.0968 1.64
0.0968 1.70 1.700.9554 0.9554
.20
0.1151 0.1151 1.80 1.800.9641 0.9641
.10
0.1357 0.1357 1.90 1.900.9713 0.9713
2 SD
.00
0.1587 1.96 0.1587 2.00
0.9773
2.000.9773 0.9773
.90
0.1841 0.1841 2.10 2.100.9821 0.9821
.80
0.2119 0.2119 2.20 2.200.9861 0.9861
.70
0.2420 0.2420 2.30 2.300.9893 0.9893
.60
0.2743 0.2743 2.40 2.400.9918 0.9918
2.15%
.50
0.3085 0.3085 2.50 2.500.9938 0.9938
For
15
.40
0.3446
2.576 0.3446 2.60 2.600.9953 0.9953
.30
0.3821 0.3821 2.70 2.700.9965 0.9965
.20
0.4207 0.4207 2.80 2.800.9974 0.9974
.10
0.4602 0.4602 2.90 2.900.9981 0.9981
Z
Population
0.99865
z
16
Standard Normal Distribution
(Use “distcalc” program provided in the class)
Level of 2 sided 1 sided 1 sided
significance (α) 0.05 0.05 0.025
1 side (α) 0.025 0.05 0.025
Probability 0.95 0.95 0.975
Z value 1.96 1.644 1.96
0.975
0.025 0.95 0.025 0.95 0.05
0.025
-1.96
1.644
1.96
1.96
17
Confidence limits for population data
(If your sample is truly representative)
Mean
Standard
deviation
18
Confidence limits for population data
Q: In a normal distribution with mean 4 and variance 25, what are the
upper and lower limit scores for the middle 50% of the data?
Solution:
From left table, Z=0.67
Sx=√25=5, =4
Upper limit
Xup=4+0.67*5=7.35
19
t distribution (for N<30)
The t-distribution is a family of distributions, a slightly different distribution
for each sample size (degrees of freedom)
It is flatter and more spread out than the normal z-distribution
As sample size increases, the t-distribution approaches a normal distribution
20
Confidence limits for Sample Mean μ or x-
For sample size N ≥ 30 & representative sampling
Standard deviation
Standard
deviation
Mean
23
-3.182
3.182
-5.840
5.840
24
1 tail=0.025, 2 tails=0.05
T Table (1 sided/tail) Probability=1-0.05=95%
df\p 0.40 0.25 0.10 0.05 0.025 0.01 0.005 0.0005
1 0.324 1.000 3.077 6.313 12.70 31.82 63.65 636.6
2 0.288 0.816 1.885 2.919 4.302 6.964 9.924 31.59
3 0.276 0.764 1.637 2.353 3.182 4.540 5.840 12.92
4 0.270 0.740 1.533 2.131 2.776 3.746 4.604 8.610
5 0.267 0.726 1.475 2.015 2.570 3.364 4.032 6.868
6 0.264 0.717 1.439 1.943 2.446 3.142 3.707 5.958
7 0.263 0.711 1.414 1.894 2.364 2.997 3.499 5.407
8 0.261 0.706 1.39 1.859 2.306 2.896 3.355 5.041
9 0.260 0.702 1.383 1.833 2.262 2.821 3.249 4.780
10 0.260 0.699 1.372 1.812 2.228 2.763 3.169 4.586
11 0.259 0.697 1.363 1.795 2.200 2.718 3.105 4.437
12 0.259 0.695 1.356 1.782 2.178 2.681 3.054 4.317
13 0.258 0.693 1.350 1.770 2.160 2.650 3.012 4.220
14 0.258 0.692 1.345 1.761 2.144 2.624 2.976 4.140
25
15 0.257 0.691 1.340 1.753 2.131 2.602 2.946 4.072
t Table (1 sided)
df\p 0.40 0.25 0.10 0.05 0.025 0.01 0.005 0.0005
16 0.257599 0.690132 1.336757 1.745884 2.11991 2.58349 2.92078 4.0150
17 0.257347 0.689195 1.333379 1.739607 2.10982 2.56693 2.89823 3.9651
18 0.257123 0.688364 1.330391 1.734064 2.10092 2.55238 2.87844 3.9216
19 0.256923 0.687621 1.327728 1.729133 2.09302 2.53948 2.86093 3.8834
20 0.256743 0.686954 1.325341 1.724718 2.08596 2.52798 2.84534 3.8495
21 0.256580 0.686352 1.323188 1.720743 2.07961 2.51765 2.83136 3.8193
22 0.256432 0.685805 1.321237 1.717144 2.07387 2.50832 2.81876 3.7921
23 0.256297 0.685306 1.319460 1.713872 2.06866 2.49987 2.80734 3.7676
24 0.256173 0.684850 1.317836 1.710882 2.06390 2.49216 2.79694 3.7454
25 0.256060 0.684430 1.316345 1.708141 2.05954 2.48511 2.78744 3.7251
26 0.255955 0.684043 1.314972 1.705618 2.05553 2.47863 2.77871 3.7066
27 0.255858 0.683685 1.313703 1.703288 2.05183 2.47266 2.77068 3.6896
28 0.255768 0.683353 1.312527 1.701131 2.04841 2.46714 2.76326 3.6739
29 0.255684 0.683044 1.311434 1.699127 2.04523 2.46202 2.75639 3.6594
30 0.255605 0.682756 1.310415 1.697261 2.04227 2.45726 2.75000 3.6460
inf 0.253347 0.674490 1.281552 1.644854 1.95996 2.32635 2.57583 26
3.2905
Class example (Sample data)
t distribution
N=4
95% confidence intervals
2 tailed 27
Depth Perm Porosity
1541.7 -- 29.8
1544.1 -- 28.9
1619.0 2329 29.2
1622.0 312 26.3
limits?
2 0.116 18.40 1.80
3 0.038 22.50 16.10
4 0.005 20.70 16.20
5 0.005 99.00 82.60
6 0.005 42.10 45.10
7 0.005 32.70 51.10
8 0.005 19.50 47.50
9 0.005 31.80 70.60
10 0.005 54.30 28.60
11 0.005 48.80 42.10
33
Kalimantan area. Please select Au or Zr and perform the follow tasks:
Summarized above are grades (ppm, g/ton) of gold (Au), Zircon (Zr) and
total REE (rare earth element) of 32 data points/specimens from Central