Professional Documents
Culture Documents
Statistics Template
Statistics Template
STD DEV's
# DATA INPUT
from MEAN
1
1
2
3
0.9
4
5
0.8
6
7
0.7
8
9
0.6
10
11
12
0.5
13
14 0.4
15
16 0.3
17
18 0.2
19
20 0.1
21
22 0
23
24
25
26
27
28
29
30 Descriptive Statistics
31
32
33 Count: - Mean: #DIV/0! Enter Confidence Level: 90.0%
34
35 Minimum: - Median: #VALUE! Minimum #DIV/0!
36
37 Maximum: - Mode: #VALUE! Mean #DIV/0!
38
39 Standard Deviation: #DIV/0! Skewness: #DIV/0! Maximum #DIV/0!
40
41 Coefficient of Variation: #DIV/0! Kurtosis: #DIV/0! (+/-) Err:502
42
43
44 Distribution of Data Points around the Mean
45
46 12
47
48
49 10
50
MEASURE OF DISPERSION
the mean for a data set is value around which the other values tend to cluster, it
conveys no indication of the closeness of the clustering (that is, the
dispersion). All observations could be close to the mean or far away. If you
want an indication of how closely these other values are clustered around the
mean, you must look beyond measures of central tendency to measures of
dispersion
RANGE = the difference between the highest (H) and lowest (L) observations.
The higher the range, the greater the amount of variation in a data set. R = H -
L
VARIANCE =is the average of the squared deviations between each
observation and the mean. However, stitisticans have determined when you
have a relatively small sample, you can get a better estimate of the true
population variance if you calculate variance by dividing the sum of the squared
deviations by n-1, instead of n
The term, n-1, is know as the number of degrees of freedom that can be used
to estimate population variance.
This adjustment is necessary, because samples are usually more alike than the
populations from which they are taken. Without this adjustment, the sample
variance is likely to underestimate the true variation in the population. Division
by n -1 in a seanse artificially inflates the sample variance but in so doing, it
makes the sample variance a better estimator of the population variance. As
the sample size increases, the relative affect of this adjustment decreases (e.g.,
dividing by four rather than five will have a greater affect on the quotient than
dividing by 29 instead of 30)
STANDARD DEVIATION = is a widely used measurement of variability or
diversity used in statistics and probability theory. It shows how much variation
or "dispersion" there is from the average (mean, or expected value). A low
standard deviation indicates that the data points tend to be very close to the
mean, whereas high standard deviation indicates that the data are spread out
over a large range of values.
Standard deviation is a statistical measurement that sheds light on historical
volatility.
A measure of the dispersion of a set of data from its mean. The more spread
apart the data, the higher the deviation. Standard deviation is calculated as the
square root of variance.
COEFFICIENT OF VARIATION = A statistical measure of the dispersion of
data points in a data series around the mean.
The coefficient of variation represents the ratio of the standard deviation to the
mean, and it is a useful statistic for comparing the degree of variation from one
data series to another, even if the means are drastically different from each
other.
(in statistics) an indication of how well the mean of a sample estimates the
mean of a population. It is measured by the standard deviation of the means of
randomly drawn samples of the same size as the sample in question.
STANDARD ERROR OF THE MEAN
= n an estimate of the amount that an obtained mean may be expected to
differ by chance from the true mean.
For example, the sample mean is the usual estimator of a population mean.
However, different samples drawn from that same population would in general
have different values of the sample mean. The standard error of the mean (i.e.,
of using the sample mean as a method of estimating the population mean) is
the standard deviation of those sample means over all possible samples (of a
given size) drawn from the population. Secondly, the standard error of the
mean can refer to an estimate of that standard deviation, computed from the
sample of data being analyzed at the time.
When you take a sample of observations from a population, the mean of the
sample is an estimate of the parametric mean, or mean of all of the
observations in the population. If your sample size is small, your estimate of the
mean won't be as good as an estimate based on a larger sample size.
You'd often like to give some indication of how close your sample mean is likely
to be to the parametric mean. One way to do this is with the standard error of
the mean. If you take many random samples from a population, the standard
error of the mean is the standard deviation of the different sample means.
About two-thirds (68.3%) of the sample means would be within one standard
error of the parametric mean, 95.4% would be within two standard errors, and
almost all (99.7%) would be within three standard errors.
t DISTRIBUTION
CONFIDENCE LEVEL
A confidence interval gives an estimated range of values which is likely to
include an unknown population parameter, the estimated range being
calculated from a given set of sample data.
If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage
(confidence level) of the intervals will include the unknown population
parameter. Confidence intervals are usually calculated so that this percentage
is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence
intervals for the unknown parameter.
The width of the confidence interval gives us some idea about how uncertain
we are about the unknown parameter (see precision). A very wide interval may
indicate that more data should be collected before anything very definite can be
said about the parameter.
Confidence intervals are more informative than the simple results of hypothesis
tests (where we decide "reject H0" or "don't reject H0") since they provide a
range of plausible values for the unknown parameter
CONFIDENCE INTERVAL
Outliers are often easy to spot in histograms. For example, the point on the far left in the above figure is an outlier.
Outliers can also occur when comparing relationships between two sets of data. Outliers of this type can be easily id
If your data set contains hundreds of observations, an outlier or two may not be cause
for alarm. But outliers can spell trouble for models fitted to small data sets: since the
sum of squares of the residuals is the basis for estimating parameters and calculating
error statistics and confidence intervals, one or two bad outliers in a small data set
can badly skew the results. When outliers are found, two questions should be asked:
(i) are they merely "flukes" of some kind (e.g., data entry errors, or the result of
exceptional conditions that are not expected to recur), or do they represent a real
effect that you might wish to include in your model; and
(ii) how much have the coefficients, error statistics, and predictions, etc., been
affected?
An outlier may or may not have a dramatic effect on a model, depending on the
amount of "leverage" that it has. Its leverage depends on the values of the
independent variables at the point where it occurred: if the independent variables
were all relatively close to their mean values, then the outlier has little leverage and
will mainly affect the value of the estimated CONSTANT term and the SEE. However, if
one or more of the independent variable had relatively extreme values at that point,
the outlier may have a large influence on the estimates of the corresponding
coefficients: e.g., it may cause an otherwise insignificant variable to appear significant,
or vice versa.
The best way to determine how much leverage an outlier (or group of outliers) has, is
to exclude it from fitting the model, and compare the results with those originally
obtained.
µ =Population Mean
α = Significance Level
d.f. = Degree of Freedom [ n-1]