You are on page 1of 14

Enter File Name Here

STD DEV's
# DATA INPUT
from MEAN
1
1
2
3
0.9
4
5
0.8
6
7
0.7
8
9
0.6
10
11
12
0.5
13
14 0.4
15
16 0.3
17
18 0.2
19
20 0.1
21
22 0
23
24
25
26
27
28
29
30 Descriptive Statistics
31
32
33 Count: - Mean: #DIV/0! Enter Confidence Level: 90.0%
34
35 Minimum: - Median: #VALUE! Minimum #DIV/0!
36
37 Maximum: - Mode: #VALUE! Mean #DIV/0!
38
39 Standard Deviation: #DIV/0! Skewness: #DIV/0! Maximum #DIV/0!
40
41 Coefficient of Variation: #DIV/0! Kurtosis: #DIV/0! (+/-) Err:502
42
43
44 Distribution of Data Points around the Mean
45
46 12
47
48
49 10
50

Mean Plus 2 Std Dev Minus 2 Std Dev


0

Mean Plus 2 Std Dev Minus 2 Std Dev


Enter File Name Here
STD DEV's
# DATA INPUT
from MEAN
1
1
2
3
0.9
4
5
0.8
6
7
0.7
8
9
0.6
10
11
12
0.5
13
14 0.4
15
16 0.3
17
18 0.2
19
20 0.1
21
22 0
23
24
25
26
27
28
29
30 Descriptive Statistics
31
32
33 Count: - Mean: #DIV/0! Enter Confidence Level: 90.0%
34
35 Minimum: - Median: #VALUE! Minimum #DIV/0!
36
37 Maximum: - Mode: #VALUE! Mean #DIV/0!
38
39 Standard Deviation: #DIV/0! Skewness: #DIV/0! Maximum #DIV/0!
40
41 Coefficient of Variation: #DIV/0! Kurtosis: #DIV/0! (+/-) Err:502
42
43
44 Distribution of Data Points around the Mean
45
46 12
47
48
49 10
50

Mean Plus 2 Std Dev Minus 2 Std Dev


0

Mean Plus 2 Std Dev Minus 2 Std Dev


N = Population Size
n = Sample Size, number of observations
x = Value of the observation

MEASURE OF CENTRAL TENDENCY


Is the central value around which data observations (e.g. historical prices) tend
to cluster.
MEAN =The artithmetic mean (or simply the mean or average) is the measure
of central tendency most commonly used in contract pricing.

MEDIAN = the median is a measure of central tendency that is often used


when a few observations might pull the measure from the center of rhe
remaining data. The median is the middle value of a data set when the
observations are arrayed from the lowest to the highest (or from the highest to
the lowest) If the dtat set contains a even number observations, the median is
the arithmetic means of the two middle observations.
MODE = Occasionally, you may only want to know which value occurs most
often. The mode is the value of the observation that occurs most often in the
data set. A distribution contain two values occurring an equal number of times
is called bimodal. One with more than two values occurring an equal number of
times is called multimodal.

MEASURE OF DISPERSION
the mean for a data set is value around which the other values tend to cluster, it
conveys no indication of the closeness of the clustering (that is, the
dispersion). All observations could be close to the mean or far away. If you
want an indication of how closely these other values are clustered around the
mean, you must look beyond measures of central tendency to measures of
dispersion
RANGE = the difference between the highest (H) and lowest (L) observations.
The higher the range, the greater the amount of variation in a data set. R = H -
L
VARIANCE =is the average of the squared deviations between each
observation and the mean. However, stitisticans have determined when you
have a relatively small sample, you can get a better estimate of the true
population variance if you calculate variance by dividing the sum of the squared
deviations by n-1, instead of n
The term, n-1, is know as the number of degrees of freedom that can be used
to estimate population variance.
This adjustment is necessary, because samples are usually more alike than the
populations from which they are taken. Without this adjustment, the sample
variance is likely to underestimate the true variation in the population. Division
by n -1 in a seanse artificially inflates the sample variance but in so doing, it
makes the sample variance a better estimator of the population variance. As
the sample size increases, the relative affect of this adjustment decreases (e.g.,
dividing by four rather than five will have a greater affect on the quotient than
dividing by 29 instead of 30)
STANDARD DEVIATION = is a widely used measurement of variability or
diversity used in statistics and probability theory. It shows how much variation
or "dispersion" there is from the average (mean, or expected value). A low
standard deviation indicates that the data points tend to be very close to the
mean, whereas high standard deviation indicates that the data are spread out
over a large range of values.
Standard deviation is a statistical measurement that sheds light on historical
volatility.
A measure of the dispersion of a set of data from its mean. The more spread
apart the data, the higher the deviation. Standard deviation is calculated as the
square root of variance. 
COEFFICIENT OF VARIATION = A statistical measure of the dispersion of
data points in a data series around the mean.
The coefficient of variation represents the ratio of the standard deviation to the
mean, and it is  a useful statistic for comparing the degree of variation from one
data series to another, even if the means are drastically different from each
other.

OTHER INDICATORS OF DISPERSION


SKEWNESS = is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the same to the left
and right of the center point.
KURTOSIS = is a measure of whether the data are peaked or flat relative to a
normal distribution. That is, data sets with high kurtosis tend to have a distinct
peak near the mean, decline rather rapidly, and have heavy tails. Data sets
with low kurtosis tend to have a flat top near the mean rather than a sharp
peak. A uniform distribution would be the extreme case.

Some of the bell-shaped normal distribution curves peak in a sharper curve


than do others. This is a direct consequence of the size of the standard
deviation. The smaller the standard deviation, the sharper the peak at the
mean; the greater the standard deviation, the flatter the curve near the mean.

ESTABLISHING A CONFIDENCE INTERVAL


a probability statement about an interval which is likely to contain the true
population mean.

(in statistics) an indication of how well the mean of a sample estimates the
mean of a population. It is measured by the standard deviation of the means of
randomly drawn samples of the same size as the sample in question.
STANDARD ERROR OF THE MEAN
= n an estimate of the amount that an obtained mean may be expected to
differ by chance from the true mean.

For example, the sample mean is the usual estimator of a population mean.
However, different samples drawn from that same population would in general
have different values of the sample mean. The standard error of the mean (i.e.,
of using the sample mean as a method of estimating the population mean) is
the standard deviation of those sample means over all possible samples (of a
given size) drawn from the population. Secondly, the standard error of the
mean can refer to an estimate of that standard deviation, computed from the
sample of data being analyzed at the time.

When you take a sample of observations from a population, the mean of the
sample is an estimate of the parametric mean, or mean of all of the
observations in the population. If your sample size is small, your estimate of the
mean won't be as good as an estimate based on a larger sample size.

You'd often like to give some indication of how close your sample mean is likely
to be to the parametric mean. One way to do this is with the standard error of
the mean. If you take many random samples from a population, the standard
error of the mean is the standard deviation of the different sample means.
About two-thirds (68.3%) of the sample means would be within one standard
error of the parametric mean, 95.4% would be within two standard errors, and
almost all (99.7%) would be within three standard errors.
t DISTRIBUTION

In probability and statistics, Student’s t-distribution (or simply the t-distribution)


is a continuous probability distribution that arises when estimating the mean of
a normally distributed population in situations where the sample size is small. It
plays a role in a number of widely-used statistical analyses, including the
Student’s t-test for assessing the statistical significance of the difference
between two sample means, the construction of confidence intervals for the
difference between two population means, and in linear regression analysis.
The t-distribution is symmetric and bell-shaped, like the normal distribution, but
has heavier tails, meaning that it is more prone to producing values that fall far
from its mean. This makes it useful for understanding the statistical behavior of
certain types of ratios of random quantities, in which variation in the
denominator is amplified and may produce outlying values when the
denominator of the ratio falls close to zero.
A type of probability distribution that is theoretical and resembles a normal
distribution. A T distribution differs from the normal distribution by its degrees of
freedom. The higher the degrees of freedom, the closer that distribution will
resemble a standard normal distribution with a mean of 0, and a standard
deviation of 1.
The use of a T distribution is precluded by the standard deviation of the
population parameter being unknown and allows the analyst to
approximate probabilities, based on the mean of the sample, the population,
the standard deviation of the sample and the sample's degrees of freedom. As
the sample's degrees of freedom approaches 50, the T distribution will virtually
be identical to the normal distribution.

CONFIDENCE LEVEL
A confidence interval gives an estimated range of values which is likely to
include an unknown population parameter, the estimated range being
calculated from a given set of sample data.
If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage
(confidence level) of the intervals will include the unknown population
parameter. Confidence intervals are usually calculated so that this percentage
is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence
intervals for the unknown parameter.
The width of the confidence interval gives us some idea about how uncertain
we are about the unknown parameter (see precision). A very wide interval may
indicate that more data should be collected before anything very definite can be
said about the parameter.
Confidence intervals are more informative than the simple results of hypothesis
tests (where we decide "reject H0" or "don't reject H0") since they provide a
range of plausible values for the unknown parameter
CONFIDENCE INTERVAL

In statistics, a confidence interval (CI) is a particular kind of interval estimate of


a population parameter and is used to indicate the reliability of an estimate. It is
an observed interval (i.e it is calculated from the observations), in principle
different from sample to sample, that frequently includes the parameter of
interest, if the experiment is repeated. How frequently the observed interval
contains the parameter is determined by the confidence level or confidence
coefficient.
A confidence interval with a particular confidence level is intended to give the
assurance that, if the statistical model is correct, then taken over all the data
that might have been obtained, the procedure for constructing the interval
would deliver a confidence interval that included the true value of the parameter
the proportion of the time set by the confidence level. More specifically, the
meaning of the term "confidence level" is that, if confidence intervals are
constructed across many separate data analyses of repeated (and possibly
different) experiments, the proportion of such intervals that contain the true
value of the parameter will approximately match the confidence level; this is
guaranteed by the reasoning underlying the construction of confidence
intervals.
OUTLIER
What are outliers in the data?
An outlier, in statistics, is an observation that is numerically distant, deviating
markedly, from the rest of the data.

There is no rigid mathematical definition of what constitutes an outlier; determining


whether or not an observation is an outlier is ultimately a subjective exercise.

Outliers are often easy to spot in histograms. For example, the point on the far left in the above figure is an outlier.
Outliers can also occur when comparing relationships between two sets of data. Outliers of this type can be easily id

Outliers should be investigated carefully. Often they contain valuable information


about the process under investigation or the data gathering and recording process.
Before considering the possible elimination of these points from the data, one should
try to understand why they appeared and whether it is likely similar values will
continue to appear.

DEALING WITH OUTLIERS

If your data set contains hundreds of observations, an outlier or two may not be cause
for alarm. But outliers can spell trouble for models fitted to small data sets: since the
sum of squares of the residuals is the basis for estimating parameters and calculating
error statistics and confidence intervals, one or two bad outliers in a small data set
can badly skew the results. When outliers are found, two questions should be asked:
(i)                 are they merely "flukes" of some kind (e.g., data entry errors, or the result of
exceptional conditions that are not expected to recur), or do they represent a real
effect that you might wish to include in your model; and

(ii)               how much have the coefficients, error statistics, and predictions, etc., been
affected?

An outlier may or may not have a dramatic effect on a model, depending on the
amount of "leverage" that it has. Its leverage depends on the values of the
independent variables at the point where it occurred: if the independent variables
were all relatively close to their mean values, then the outlier has little leverage and
will mainly affect the value of the estimated CONSTANT term and the SEE. However, if
one or more of the independent variable had relatively extreme values at that point,
the outlier may have a large influence on the estimates of the corresponding
coefficients: e.g., it may cause an otherwise insignificant variable to appear significant,
or vice versa.
The best way to determine how much leverage an outlier (or group of outliers) has, is
to exclude it from fitting the model, and compare the results with those originally
obtained.
µ =Population Mean
α = Significance Level
d.f. = Degree of Freedom [ n-1]

You might also like