Professional Documents
Culture Documents
Syllabus
• UNIT-I: Probability Distributions
• The manager of a departmental store informs that the probability that a customer
who is just browsing will eventually buy some items is 0.4. During the pre-lunch
session on a day, 7 customers are seen to browse in the department. Find:
• We know: P(0)= 0.02801, P(1)= 0.13062, P(2)= 0.26133, P(3)= 0.29034, P(4)=
0.19355, P(5)= 0.07746, P(6)= 0.01727, P(7)= 0.0017
Expected value and standard deviation
Variance = n*p*(1-p)
Poisson distribution
• In many cases, the number of trials “n” and probability “p” is not given
• There are many discrete phenomena that are represented by a Poisson process.
• A Poisson distribution is said to exist when we can observe discrete events in some
area of interest (which may be a continuous interval of time, space, length, etc.)
Poisson distribution
• The only condition for Poisson distribution is that the expected number of successes (or events) must be
known which is represented by . [Example: Average customers visiting a site is 5 per minute]
• If we know , we can find the probabilities of exactly getting 0 success in future, 1 success in future, 2, 3,4 ,5
…………………to infinity [Example: Probability of exactly 0 customer per minute, 1 customer per minute,
prob. of exactly 2 customer per minute, 3 customer…………..]
• So, x is the random variable denoting number of successes or number of arrivals x={0,1,2,3,4,…..∞}
• (a) The minimum number of successes in a Poisson distribution is zero while there is no upper limit.
• (b) In calculating probabilities, the value of should be defined carefully. To illustrate, it is given
that, on an average, 12 accidents occur in a quarter of a year on a certain crossing. In this case, for
calculating probabilities,
(ii) A certain number of accidents to occur over a two-month period, we should take = 8.
(iii) A certain number of accidents to occur over a three-month period, we should take = 12.
(iv) A certain number of accidents to occur over a one-and-a-half month period, we should take = 6.
EXAMPLE:
• If, on an average, 2 customers arrive at a shopping mall per minute, what is the
probability that
• There are several phenomena which seem to follow this distribution very closely or
can be approximated by it
• When the data is very large, then in most of the cases, the random variable follows
the normal distribution
• where, e = 2.7183
• = expected value
• σ =standard deviation
[standard deviation=
• x = a particular value of the random variable,
• y(x) = density for x
• Samples are taken and analysed not just for their sake but to learn about the
populations from which they are drawn.
• Economic: Sampling is done mainly for the economic reasons as it may be too
expensive or too time-consuming to attempt either a complete or a nearly
complete coverage in a statistical study.
• Destructive nature of tests: Where the testing results in the destruction of the
elements in the process of examination
• Very large populations: When the population in question is very large in size or
is infinite, then sampling is the only choice
Types of Sampling
1. Simple Random Sampling
• Then, the ratio of the population size, N, to the sample size, n, is calculated and
represented by k. Thus, k = N/n.
• Note that only integer value of k is considered here, ignoring the fractional part, if any.
• After this, an element is chosen randomly from the first k elements. This is the first
element selected in the sample.
• It is followed by choosing every kth element from the element chosen, for inclusion in
the sample.
3. Stratified Sampling
• In stratified sampling, the N elements of the population are first sub-divided into distinct and
mutually exclusive sub-populations, also called strata, according to some common
characteristic.
• For example, the employees of a large company can be divided by their rank, gender,
department, and so forth.
• After a population is divided into appropriate strata, a simple random sample is taken within
each strata
• Stratified sampling is more efficient than simple random sampling or systematic sampling
because such sampling ensures representation of individuals or items across the total population.
4. Cluster Sampling
• In this type of sampling, the investigator or his people have the freedom to
choose whomsoever they find conveniently.
• For example, sample mean and standard deviation are represented by , and s
respectively, while the population parameters are represented by μ and .
• Since a sample is only a part of the population, we do not expect a statistic value
to match exactly the corresponding parameter, except only by chance.
• Such an error is likely to occur due to the fact that a sample is only a subset of
the population.
• However, this is not the only reason of having errors. There are other reasons
also that cause errors.
• The sampling errors arise only for the reason of sampling and result from
the chance selection of sampling units
• This type of error occurs simply because only a part of the population is
observed and is expected to disappear when a census study is undertaken.
• They may arise because of bias, vague definitions used in the data
collection, defective methods of data collection, incomplete coverage of the
population, wrong entries made in the questionnaire, etc.
(a) Take all possible samples of size n from a population of size N, having mean μ and standard
deviation
(c) Tabulate the mean values and calculate the relative frequency of each value of mean by
dividing the frequency with which it appears by the total frequency (equal to the number of
samples). The relative frequency of each value indicates its probability.
Example
• Central Limit Theorem (CLT): If random samples of size n are drawn from any
population with mean μ and standard deviation σ, and if n is sufficiently large, then
the distribution of possible mean values will be approximately normal with expected
value μ, and standard error, , regardless of the population distribution.
2. When population is not normally distributed but sample size n is large enough.
• We can use normal area table to calculate probabilities involving the sample
mean.
Example
(a) A random sample of 10 batteries will have a mean life of 412 hours or
greater.
(b) A sample of 100 batteries selected randomly will have a mean life of at least
412 hours.
Example
• Since the number of successes involved here is a discrete variable, we need to apply
continuity correction factor (CCF) of +0.5 or -0.5
Sampling Distribution of Number of Successes
• It, thus, involves using sample statistics to predict the values of the
population parameters.
Theory of Estimation
• Using a point estimate, we might say that average height of Pune citizens is 160 cm.
• An interval estimate would say that there is 95% probability that average height of
Pune citizens falls in the interval 150 cm-170cm
Point Estimates
• When a point estimate is found, the sample value or statistic used is called
an estimator and the specific number obtained is called an estimate.
• So, interval estimate is obtained by adding and subtracting some quantity to the point estimate.
• The interval coefficient or confidence coefficient, depends on the level of confidence and the shape of the sampling
distribution.
• A level of confidence equal to 95 percent means that the probability is 0.95 that the parameter value being estimated is
contained, and 0.05 that is not contained, within the interval we obtain.
• A level of confidence is designated as 1 -α. Hence, if 95 percent confidence level is required, then α = 0.05.
• α is the probability of error indicating that the parameter will not be included in the interval estimate.
CONFIDENCE INTERVAL FOR POPULATION MEAN OF LARGE SAMPLES:
When the Population Standard Deviation is Known
• If the sampling distribution is normally distributed, an interval estimate of population mean may
be constructed as follows
• When the level of confidence is 95 percent, we have = 0.05, and /2 = 0.025. Now an area equal to
0.025 lies under the normal curve beyond z =1.96. Thus, for 95 percent level of confidence, we
have z = 1.96.
CONFIDENCE INTERVAL FOR POPULATION MEAN OF LARGE
SAMPLES: When the Population Standard Deviation is Known
• Example: A marketing research firm has contracted with an advertising agency to provide
information on the average time spent on watching TV per. week by families in a city. The firm
selected a random sample of 100 families and found the mean time spent by them watching TV
to be equal to 32 hours. It is known that the standard deviation of the TV watching time is 12
hours, which has been constant over the past few years.
(b) a 99 percent confidence interval for mean time spent by families watching TV. Help the firm.
CONFIDENCE INTERVAL FOR POPULATION MEAN OF LARGE SAMPLES:
When the Population Standard Deviation is not Known
• In most of the business applications, neither population mean is known, nor is the population standard
deviation.
• To make interval estimates in such cases, we use sample standard deviation, s, calculated as follows, as an
estimator of the population standard deviation
• Thus, if the sample size is large, the confidence interval estimate of the population mean is approximated by the
following expression
CONFIDENCE INTERVAL FOR POPULATION MEAN OF LARGE SAMPLES:
When the Population Standard Deviation is not Known
• This can be interpreted as: there is (1-) probability that the value of sample mean will
provide a sample error of or less
• Given the extent of error acceptable (E), the level of confidence (1-), and the population
standard deviation (), we can determine the sample size required.
• Oil India Corporation has a bottle-filling machine which can be adjusted to fill oil
to any given average level, but it fills oil with a standard deviation of 0.010 litres.
The machine has recently been reset to a new filling level and the manager wants
an estimate of the mean amount of fill to be within ±0.001 litres with a 99 percent
level of confidence. How many bottles should the manager sample?
CONFIDENCE INTERVAL FOR POPULATION MEAN: SMALL SAMPLES
When the population is normally distributed and the population
standard deviation is known
If population is normally distribute and the population standard deviation is known, then the
sampling distribution of mean is also normally distributed irrespective of the size of the sample
involved.
This implies that even if the sample is small in size, we can use the z-table to obtain the interval
estimate in the same manner as in the case of large samples using the interval:
CONFIDENCE INTERVAL FOR POPULATION MEAN: SMALL SAMPLES
Example:
the solution to the problem of small-sample interval estimation lies with a statistic called `t' rather
than with z
CONFIDENCE INTERVAL FOR POPULATION MEAN: SMALL SAMPLES
• For a sample size n, the degrees of freedom is one less than the size, n, that is,
v=n-1
CONFIDENCE INTERVAL FOR POPULATION MEAN: SMALL SAMPLES
• As in the case of z, the value of t depends upon the level of confidence. In addition, the t-value is
dependent on the degrees of freedom.
CONFIDENCE INTERVAL FOR POPULATION MEAN: SMALL SAMPLES
• The chance of μ not being included in the confidence interval is represented by α and, therefore,
reference should be made to the column head represented by α if the table gives areas in both tails of the
distribution and α /2 if the table provides areas in one tail of it.
• Values under area (α) = 0.05 in a two-tail table would match with the values under area (α /2) = 0.025 in
the case of one-tail table.
• If the level of confidence is given to be 90 percent, we focus on the column headed 0.10 in the case of a
two-tail table and 0.05 in the case of a one-tail table.
• Another element relevant in consulting the t-table is the number of degrees of freedom, v, which is
equal to n-1
t-table
Example: