Distributions and Sampling - Wednesday

Probability Distributions
Syllabus
• UNIT-I: Probability Distributions
• UNIT-II: Sampling & Sampling Distribution, Estimation and Confidence Interval
• UNIT- III: (Hypothesis Testing for means & proportions)
• UNIT- IV (Chi-Square Test, Non Parametric Test & ANOVA)
• UNIT- V (Correlation and Regression Analysis)

Binomial Distribution
A binomial distribution has the following essential properties:
1.The experiment consists on “n” trials
2. All the trials are independent of each other
3. Each trial has only 2 outcomes. Any one of them can be denoted as “success” and the other as “failure”.
Probability of “success” is “p” and the failure is “(1-p)”
4. In the experiment conducted for “n” times, the random variable is the number of success obtained “x” times
So, x={0, 1, 2, 3, 4,…………, n}, if n=3 then x={0, 1, 2, 3}
x=0 means exactly 0 success in 3 trials x=1 means exactly 1 success in 3 trials,
x=2 means exactly 2 success in 3 trials, x=3 means exactly 3 success in 3 trials
5. Then, the probability of “x” successes denoted by p(x) is given as:
P(x)=
Where = in which n! = n*(n-1)*(n-2)……1. Also 0!=1
Example
A coin is tossed 3 times. Assume Head to be success. Calculate:
• a. Probability of getting exactly no success
• b. Probability of getting exactly 1 success
• c. Probability of getting exactly 2 success
• d. Probability of getting exactly 3 success

Example
• The manager of a departmental store informs that the probability that a

customer who is just browsing will eventually buy some items is 0.4.
During the pre-lunch session on a day, 7 customers are seen to browse in
the department. Using this information, answer the following questions:
(a): What is the probability of 0, 1, 2, ..., 7 customers buying some items?

Example
(b) What is the probability that at least 4 customers will be buying
something?
Example
(c) What is the probability that no browsing customers will be buying
anything?
Example
(d) What is the probability that not more than 4 customers will be
buying something?
Example
• The manager of a departmental store informs that the probability that a customer
who is just browsing will eventually buy some items is 0.4. During the pre-lunch
session on a day, 7 customers are seen to browse in the department. Find:
• A. Expected number of customers buying some items.
• B. Variance of the customers buying some items.
• We know: P(0)= 0.02801, P(1)= 0.13062, P(2)= 0.26133, P(3)= 0.29034, P(4)=
0.19355, P(5)= 0.07746, P(6)= 0.01727, P(7)= 0.0017
Expected value and standard deviation
• We have already discussed the calculation of expected value and

standard deviation of a probability distribution.
• It can be shown mathematically that in the case of binomial

distribution, the formulae for these measures simplify to the following:
Expected value or mean E(x)= n*p
Variance = n*p*(1-p)
Poisson distribution
• In many cases, the number of trials “n” and probability “p” is not given
• If we are interested in the:
 Number of events occurring in the future
And the probability of number of event occurring in the future,
Then, we can use Poisson distribution

Examples
• Number of customer arrival per minute at a retail
• Number of visitors in one hour on a e-commerce website
• The number of telephone calls received per minute
• The number of arrivals at a bank teller booth in every 5-minute period
• Number of accidents per day

• The Poisson probability distribution, named after the French mathematician S.D.
Poisson, is another discrete probability distribution with important practical
applications.
• There are many discrete phenomena that are represented by a Poisson process.
• A Poisson distribution is said to exist when we can observe discrete events in some
area of interest (which may be a continuous interval of time, space, length, etc.)
• The only condition for Poisson distribution is that the expected number of successes (or events) must be
known which is represented by . [Example: Average customers visiting a site is 5 per minute]
• If we know , we can find the probabilities of exactly getting 0 success in future, 1 success in future, 2, 3,4 ,5
…………………to infinity [Example: Probability of exactly 0 customer per minute, 1 customer per minute,
prob. of exactly 2 customer per minute, 3 customer…………..]
• So, x is the random variable denoting number of successes or number of arrivals x={0,1,2,3,4,…..∞}
• Then, the formula for probability of exactly x success is given by
• Here, e = 2.7183 (a mathematical constant)
• = the expected number of successes (the mean)
• x = number of successes and P(x) = the probability of x successes

The following points may be noted:
• (a) The minimum number of successes in a Poisson distribution is zero while there is no upper limit.
• (b) In calculating probabilities, the value of should be defined carefully. To illustrate, it is given
that, on an average, 12 accidents occur in a quarter of a year on a certain crossing. In this case, for
calculating probabilities,
(i) A certain number of accidents to occur over a month, we should take = 4.
(ii) A certain number of accidents to occur over a two-month period, we should take = 8.
(iii) A certain number of accidents to occur over a three-month period, we should take = 12.
(iv) A certain number of accidents to occur over a one-and-a-half month period, we should take = 6.
EXAMPLE:
• If, on an average, 2 customers arrive at a shopping mall per minute, what is the
probability that
(a) In a given minute, exactly 3 customers will arrive?
(b) In a given minute, exactly 4 customers will arrive?
(c) In a given minute, no customer will arrive?
(d) In a given minute, more than 2 customers will arrive?
(e) In a 5-minute period, exactly 10 customers will arrive?

Expected Value and Standard Deviation
• The expected value (mean) and the variance of a Poisson distribution
are both numerically equal to . Thus, standard deviation of a Poisson
distribution is equal to (since ).
Normal Distribution
• Normal Distribution is defined for a continuous random variable
• Normal Distribution holds a very significant place in statistics
• There are several phenomena which seem to follow this distribution very closely or
can be approximated by it
• When the data is very large, then in most of the cases, the random variable follows
the normal distribution
• Examples: Height, weight of a large population

Normal Distribution
• A variable x has normal distribution if its curve, as shown in
Figure, is given by the following equation:
• where, e = 2.7183
• = expected value
• σ =standard deviation
[standard deviation=
• x = a particular value of the random variable,
• y(x) = density for x
Note: Normal distribution is a symmetric distribution with mean

(expected value) located in the middle
Calculation of Probabilities
• We can not define probability for a point in the
continuous random variables since there are infinite
points. Instead, we define probability for an interval
• In this case, the probability that the variable will lie

in a given range, say x1 and x2, is equal to ratio that
the area under the normal curve between x1 and x2
to the total area under the curve.
• The total area under the curve to be equal to 1

Calculation of Probabilities
• z-transformation: Convert any normal random variable x into a

standardized normal variable z. This is called z-transformation. The
transformation from x to z is done by the following formula:
• Then, use standard z table to calculate the area under curve.

Example
A machine fills coffee powder in packets, with an average of 200 gm and a standard deviation
of 4 gm. Assuming that the coffee weight is normally distributed, find the probability that a
coffee packet selected at random will contain the following quantity of coffee.
(a) Between 200 and 206 gm,
(b) Between 206 and 210 gm,
(c) Between 190 and 195 gm,
(d) Atleast 200 gm, and

Sampling-Introduction
• Decision makers cannot always measure an entire population and therefore
they have to rely on the information gained by sampling from the population.
• Samples are taken and analysed not just for their sake but to learn about the
populations from which they are drawn.
• Sampling is an integral part of what is known as inferential statistics which

allows us to draw conclusions regarding large populations from a relatively
small number of observations drawn from such population.
Reasons For Sampling
• Economic: Sampling is done mainly for the economic reasons as it may be too
expensive or too time-consuming to attempt either a complete or a nearly
complete coverage in a statistical study.
• Destructive nature of tests: Where the testing results in the destruction of the
elements in the process of examination
• Very large populations: When the population in question is very large in size or
is infinite, then sampling is the only choice
Types of Sampling
1. Simple Random Sampling
• A simple random sample is one in which every element of the parent

population has an equal chance of being included in the sample.
• For a population comprising N elements, the chance of a particular

element being selected is1/N.
2. Systematic Sampling
• Firstly, the elements of population are sequenced randomly.
• Then, the ratio of the population size, N, to the sample size, n, is calculated and
represented by k. Thus, k = N/n.
• Note that only integer value of k is considered here, ignoring the fractional part, if any.
• After this, an element is chosen randomly from the first k elements. This is the first
element selected in the sample.
• It is followed by choosing every kth element from the element chosen, for inclusion in
the sample.
3. Stratified Sampling
• In stratified sampling, the N elements of the population are first sub-divided into distinct and
mutually exclusive sub-populations, also called strata, according to some common
characteristic.
• For example, the employees of a large company can be divided by their rank, gender,
department, and so forth.
• After a population is divided into appropriate strata, a simple random sample is taken within
each strata
• Stratified sampling is more efficient than simple random sampling or systematic sampling
because such sampling ensures representation of individuals or items across the total population.
4. Cluster Sampling
• In cluster sampling, the elements of a population are divided into

several clusters so that each cluster is nearly representative of the
entire population.
• A random sampling of clusters is then taken and all elements of each

selected cluster are then studied.
5. Convenience Sampling
• Convenience sampling is a non-probability sampling procedure, involving

no restrictions.
• In this type of sampling, the investigator or his people have the freedom to
choose whomsoever they find conveniently.
• Convenience sampling is obviously convenient and relatively cheaper to

undertake but it does not ensure precision since the sample can be biased
Statistics and Parameters
• In the context of sampling, it is important to understand the difference

between statistics and parameters.
• A statistic refers to a quantitative characteristic of a sample while a

parameter is a quantitative characteristic of a population.
• Example: Average of a population is a parameter whereas average of a

sample is statistic
Statistics and Parameters
• The sample statistics are distinguished from parameters by using different symbolic
representations.
• For example, sample mean and standard deviation are represented by , and s
respectively, while the population parameters are represented by μ and .
• Since a sample is only a part of the population, we do not expect a statistic value
to match exactly the corresponding parameter, except only by chance.
• Since different values of a statistic are possible, a statistic is a random variable

whereas a population parameter is a fixed value.
Sampling and Non Sampling Errors
• In statistics, an error refers to the difference between the value of a sample

statistic and the corresponding population parameter.
• Such an error is likely to occur due to the fact that a sample is only a subset of
the population.
• However, this is not the only reason of having errors. There are other reasons
also that cause errors.
• As such, a distinction is made between sampling errors and non-sampling

errors.
Sampling Errors
• The sampling errors arise only for the reason of sampling and result from
the chance selection of sampling units
• This type of error occurs simply because only a part of the population is
observed and is expected to disappear when a census study is undertaken.
• Example: Difference between a population mean and a sample mean

Non Sampling Errors
• Non-sampling errors arise because of reasons other than sampling.
• They may arise because of bias, vague definitions used in the data
collection, defective methods of data collection, incomplete coverage of the
population, wrong entries made in the questionnaire, etc.
• While the non-sampling errors can be controlled by careful planning and

execution, we cannot avoid sampling errors and have to deal with them.
Sampling Distribution
• Since, the “statistic” is a random variable, we can study how the

group of statistics behave and distribute themselves
• The distribution of a statistic is called a sampling distribution,
• Sampling distribution is used to understand how the data from

samples can be used in decision-making process.
Sampling Distribution
• If all possible samples of size n are selected from a population and

the same statistic is calculated for each sample, the distribution of
these values of the statistic is called the sampling distribution of
that statistic.
• Example: Distribution of Mean

Standard Deviation of a Sampling Distribution: Standard Error
• A sampling distribution, like any other distribution, is described by its

expected value and standard deviation.
• However, the standard deviation of the sampling distribution is called

standard error.
• Thus, the standard deviation of the sampling distribution of mean is

known as standard error of mean (SE,)
Sampling Distribution of Mean
• The sampling distribution of mean is the probability distribution of all sample mean, obtained
from all possible samples of a certain fixed size from a given population.
• The following steps are used to obtain sampling distribution of mean:
(a) Take all possible samples of size n from a population of size N, having mean μ and standard
deviation
(b) Calculate mean values for all the samples obtained.
(c) Tabulate the mean values and calculate the relative frequency of each value of mean by
dividing the frequency with which it appears by the total frequency (equal to the number of
samples). The relative frequency of each value indicates its probability.
Example
• A population consists of five elements, viz. 2, 3, 4, 6 and 10. Obtain

sampling distribution of mean taking sample size equal to. Also,
calculate the mean and variance of the sampling distribution
Sampling with replacement
• In sampling with replacement, repetition of samples are permitted.
• If sampling is done with replacement from a population of size N,

taking samples of size n each, then a total samples can be drawn
Sampling without replacement
• In sampling without replacement, repetition of samples are not

permitted.
• If sampling is done without replacement from a population of size N,

taking samples of size n each, then a total samples can be drawn
Example
• A population consists of five elements, viz. 2, 3, 4, 6 and 10. Obtain

sampling distribution of mean taking sample size equal to 2 assuming
that sampling is done without replacment. Also, calculate the mean
and variance of the sampling distribution

Distributions and Sampling - Wednesday

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributions and Sampling - Wednesday

Uploaded by

Copyright:

Available Formats

Probability Distributions

• UNIT-II: Sampling & Sampling Distribution, Estimation and Confidence Interval

• UNIT- III: (Hypothesis Testing for means & proportions)

• UNIT- IV (Chi-Square Test, Non Parametric Test & ANOVA)

• UNIT- V (Correlation and Regression Analysis)

A coin is tossed 3 times. Assume Head to be success. Calculate:

• a. Probability of getting exactly no success

• b. Probability of getting exactly 1 success

• c. Probability of getting exactly 2 success

• d. Probability of getting exactly 3 success

• The manager of a departmental store informs that the probability that a

(a): What is the probability of 0, 1, 2, ..., 7 customers buying some items?

• A. Expected number of customers buying some items.

• B. Variance of the customers buying some items.

• We have already discussed the calculation of expected value and

• It can be shown mathematically that in the case of binomial

Expected value or mean E(x)= n*p

• If we are interested in the:

 Number of events occurring in the future

And the probability of number of event occurring in the future,

Then, we can use Poisson distribution

• Number of visitors in one hour on a e-commerce website

• The number of telephone calls received per minute

• The number of arrivals at a bank teller booth in every 5-minute period

• Number of accidents per day

• Then, the formula for probability of exactly x success is given by

• Here, e = 2.7183 (a mathematical constant)

• = the expected number of successes (the mean)

• x = number of successes and P(x) = the probability of x successes

(i) A certain number of accidents to occur over a month, we should take = 4.

(a) In a given minute, exactly 3 customers will arrive?

(b) In a given minute, exactly 4 customers will arrive?

(c) In a given minute, no customer will arrive?

(d) In a given minute, more than 2 customers will arrive?

(e) In a 5-minute period, exactly 10 customers will arrive?

• Normal Distribution is defined for a continuous random variable

• Normal Distribution holds a very significant place in statistics

• Examples: Height, weight of a large population

Note: Normal distribution is a symmetric distribution with mean

• In this case, the probability that the variable will lie

• The total area under the curve to be equal to 1

• z-transformation: Convert any normal random variable x into a

• Then, use standard z table to calculate the area under curve.

(a) Between 200 and 206 gm,

(b) Between 206 and 210 gm,

(c) Between 190 and 195 gm,

(d) Atleast 200 gm, and

• Sampling is an integral part of what is known as inferential statistics which

• A simple random sample is one in which every element of the parent

• For a population comprising N elements, the chance of a particular

• In cluster sampling, the elements of a population are divided into

• A random sampling of clusters is then taken and all elements of each

• Convenience sampling is a non-probability sampling procedure, involving

• Convenience sampling is obviously convenient and relatively cheaper to

• In the context of sampling, it is important to understand the difference

• A statistic refers to a quantitative characteristic of a sample while a

• Example: Average of a population is a parameter whereas average of a

• Since different values of a statistic are possible, a statistic is a random variable

• In statistics, an error refers to the difference between the value of a sample

• As such, a distinction is made between sampling errors and non-sampling

• Example: Difference between a population mean and a sample mean

• Non-sampling errors arise because of reasons other than sampling.

• While the non-sampling errors can be controlled by careful planning and