Professional Documents
Culture Documents
Normal Distribution
Standard Normal Distribution
Other Continuous Distributions
Sampling Distribution
Sample:
A subset of a population
Random Variable:
A random variable is a function which maps simple events to real numbers
Sample Statistics:
A function of sample values is called statistic
Example -
For the set of observations - 12, 15, 11, 11, 7, 13
Mean Mode
Median
SAS COMMAND :-
data Rand;
do i = 1 to 500;
Gamma_RN = rangam(int(datetime()),4) / (0.5);
Normal_RN = rannor(int(datetime()));
output;
end;
run;
A rule that assigns probabilities to values of random variable is called the probability function or
probability mass function (pmf) . It is denoted by-
f ( x) = P ( X = x)
This is also called the probability distribution for the discrete random variable. Probability at a point
X=x is corresponding to an event A such that
A = {s ∈ S : X (s ) = x}
Example:
We toss three coins and observe the “number of heads” visible. The random variable X is the number of heads
observed and may take on integer values 0 to 3.
Sample space = {TTT, HHT, HTH, THH, HTT, THT, TTH, HHH}
P r o a b ility D is trib u tio n
X p r o b a b ility Properties are satisfied,
0 f (0 ) = P ( X = 0 ) = 1 8
1 f (1 ) = P ( X = 1 ) = 3 1 . 0 ≤ each P ( xi ) ≤ 1
8
2 f (2 ) = P ( X = 2 ) = 3 8 2. ∑ P (x ) =1
all X
i
3 f (3 ) = P ( X = 3 ) = 1
8
0.04
0.03
0.02
0.01
0.00
150 155
X=a X=b
160 165 170 175 180 185 190 195 200 205 210
xx
Fgure 3
The graph of the probability density function is called the density curve.
• When a pmf is constant on the space, we say that the distribution is uniform over that space.
• Example:
• If X has a discrete uniform distribution on S={1,2,3,4,5,6} and its pmf is
• We can generalize this by letting X have a discrete uniform distribution over the first m positive integers
so that its pmf is
f(x)=1/m x=1,2,3….,m
Probability Mass Function of Binomial Distribution with PMF AND CDF for Geometric Distribution with Probability of Head
Parameteres n = 10 and p = 0.5 as 0.7
0.3 1.00000
0.90000
0.25 0.80000
0.2 0.70000
Probaility
0.60000
P(X)
0.15
0.50000
0.1 0.40000
0.30000
0.05
0.20000
0 0.10000
0 1 2 3 4 5 6 7 8 9 10 -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X
Trial Number at which first Tail appears
P(X)
P(X = x) F(X = x)
0.9
0.8
0.7
0.6
P(X)
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X
0
166 168 170 172 174 176 178 180 182 184 186 188 190 192 194 196 198 200
x
Figure 1
Suppose we measured the variable more accurately, then for a larger sample from the population, we would
get a relative frequency distribution as shown below.
L a r g e r s a m p le a n d m o r e a c c u r a te m e a s u r e m e n t
y
50
40
30
20
10
0
160 165 170 175 180 185 190 195 200
x
F ig u re 2
Two important areas under the normal curve are emphasized below,
• Approximately 95% of the data will lie within two standard deviations i.e. in the interval ( µ − 2σ , µ + 2σ )
• Nearly all the values lie within three standard deviation.
0.3
0.2
0.1
-4 -3 -2 -1 0 1 2 3 4
x
Figure 5
Approximately 95% of the data will lie within two units of 0 i.e. the interval (-2,2)
Approximately all (99.7% ) of the values lie within three units of 0. i.e. the interval (-3,3)
The distribution of values of a sample statistic obtained from repeated samples, all of the same size and all
drawn from the same population is called the sampling distribution of the statistic. The sampling distribution
is nothing but the probability distribution of the statistic.
Example:
A box consists of 10 counters. The color of the counter can be either yellow, green, red, or blue. Suppose
there are 2 yellow, 2 green, 3 red and 3 blue counters in the box. Each colour is associated with a “score”
which is written on the counter. A person is asked to draw 2 counters from this box. Determine the sampling
distribution of the mean score of this person.
All the possible ways in which two counters can be drawn:
There can be an infinite number of functions of sample values called statistics that may be proposed as
estimators. The best estimate would be one which falls nearest to the true value of the parameter.
We can estimate in two ways:
Point estimation: It is a single value or a point.
Interval estimation: It gives a range of values or an interval in which the true value of the parameter may
be expected to lie with some definite probability or degree of confidence. This interval is called a
Confidence Interval.
Level of Confidence:
The probability that the sample yields an interval that includes the parameter being estimated is the level of
confidence for that interval. It is denoted by (1-α) where α is the probability that the parameter lies outside the
confidence interval. It is often expressed as a percentage.
e.g. For a = 0.05, the confidence coefficient = 100(1– 0.05)% = 95%.
The process by which The process by which an These test statistics Methods used –
an engineer decides on engineer decides on the basis follow some 1. Classical approach
the basis of sample of sample data whether the distribution. 2.P-value approach
data whether the true true avg. lifetime of a certain
avg. lifetime of a kind of tire is 22000 miles.
certain kind of tire is
22000 miles.
© ExlService Holdings, Inc. 2006-2011 Confidential 22
Type 1 and Type 2 error
If we observe that in the sample, average lifetime of the tire is 22000 then we do not have sufficient evidence
to reject the null hypothesis.
But since we are not sure of the population, we might be committing an error. More clearly, µ≠22000 for the
population, it may have been the case only for that sample.
Since it appears reasonable to accept the null hypothesis when our estimate of µ is close to 22000 and to
reject it when our estimate is much larger or much smaller than 22000, it would be logical to let the critical
region consist of both tails of the sampling distribution of our test statistic. Such a test is referred to as a two-
tailed test.
compare the shaded region of the figure drawn above with αinstead of comparing the observed value of
X with the boundary of the critical region or the value of zα
.In other words, we reject the null hypothesis if the shaded region of above figure is less than or equal to α.
This shaded region is referred to as the P-value or the observed level of significance corresponding to x , the
observed value of X In fact, it is the probabilityP ( X ≥ x ) when the null hypothesis is true.
-4 -3 -2 -1 0 1 2 3 4
x
Is ‘n’ large?
Samples of size n > 30 may be considered large enough for the central limit theorem to hold if the
sample data are unimodal, nearly symmetrical, short-tailed, and without outliers.
Samples that are not symmetrical require larger sample sizes, with 50 sufficing, except for extremely
skewed samples.