You are on page 1of 182

4/24/2020

Good morning

Probability and Statistics


 Topics for the session:

 Probability
 Axioms of Probability
 Theorems on probability
 Conditional probability
 Bayes’ Theorem

1
4/24/2020

Probability
 Random Experiment
 An experiment with uncertain outcomes which means more than one
possible outcome for every trial.
 Sample space
 The set of all possible out-comes of an experiment is called the sample space.

 Event
 Every subset of a sample space is an event.

 Probability
 A measure of uncertainty defined from the sample space to [0,1].

Probability
 Random Experiment
 An experiment with uncertain outcomes which means more than one
possible outcome for every trial- Toss a coin .
 Sample space
 The set of all possible out-comes of an experiment is called the sample
space- S={H,T}.
 Event
 Every subset of a sample space is an event-A=event of heads={H}.

 Probability
 A measure of uncertainty defined from the sample space to [0,1]-

P(A)=1/2 but ?.

2
4/24/2020

Probability Approaches
 Classical approach
 If 'S' be the sample space, then the probability of occurrence of an event 'E' is defined as:
 P(E) = n(E)/n(S) = number of outcomes favoring 'E'
number of outcomes in sample space 'S'

 Frequency approach
 Subjective approach
 Axiomatic approach
 A function from S to R satisfying the following axioms
 P(A)>=0;
 P(S)=1;
 P(A+B)<=P(A)+P(B)

Types of events
 Simple
 Compound
 Equally Likely
 Mutually exclusive
 Independent/Dependent
 Exhaustive
 Complimentary

3
4/24/2020

Probability
 A coin is thrown 3 times .what is the probability that at least
one head is obtained?
 Find the probability of getting a numbered card when a card
is drawn from the pack of 52 cards.
 There are 5 green 7 red balls. Two balls are selected one by
one without replacement. Find the probability that first is
green and second is red.
 What is the probability of getting a sum of 7 when two dice
are thrown?

Counting-Probability
 Selection problems
 Probability P[A]= n/m
 Single event : Single element selected

 Probability P[A]= n Cr/m Cr


 Single event : Multiple element selected

4
4/24/2020

Probability-Selection problems
 A coin is thrown 3 times .what is the probability that at least one head is obtained?

 Find the probability of getting a numbered card when a card


is drawn from the pack of 52 cards.
 There are 5 green 7 red balls. Two balls are selected one by
one without replacement. Find the probability that first is
green and second is red.
 What is the probability of getting a sum of 7 when two dice are thrown?

Probability-Selection problems
 There are 5 green 7 red balls. Two balls are randomly
selected . Find the probability that both are red.
 Solution:
 Probability P[A]= n Cr/m Cr
 Single event : Multiple element selected
 n(red)=7, m(total)=12, r(selected)=2

7C2 (7 * 6) /(1* 2) 7
P( A)   
12C2 (12 *11) /(1* 2) 22

5
4/24/2020

Good morning

Addition Theorem
 If ‘A’ and ‘B’ by any two events, then the probability of
occurrence of at least one of the events ‘A’ and ‘B’ is given
by:
 P(A or B) = P(A) + P(B) – P (A and B)
 P(A B) = P(A) + P(B) – P (A B)

6
4/24/2020

Addition Theorem
 Ex.: The probability that a contractor will get a contract is ‘2/3’
and the probability that he will get on other contract is 5/9 . If the
probability of getting at least one contract is 4/5, what is the probability
that he will get both the contracts ?

 Solution:
 A=Event of getting contract; A=Event of getting another contract.
 Here P(A) = 2/3, P(B) = 5/9
 P(A or B) = 4/5, (P(A and B) = ?
 By addition theorem of Probability:
 P(A or B) = P(A) + P(B) - P(A and B)
 4/5 = 2/3 + 5/9 - P(A and B)
 or 4/5 = 11/9 – P(A B)
 or P(A B) = 11/9 – 4/5 = (55-36) / 45
 P(A B) = 19/45

Multiplication Theorem
 Let A and B be two independent events. Then
multiplication theorem states that,
 P[AB]= P[A]. P[B].
 Note: P[AB] can also be represented by P[A and B] or
P[A∩B].

7
4/24/2020

Multiplication Theorem
 Example:
 Let a problem in statistics be given to two students whose probability of
solving it are 1/5 and 5/7.
 What is the probability that both solve the problem.
 Solution:
 Let A= event that the first person solves the problem.
 B= event that the second person solves the problem.
 It is given that P[A]=1/ 5; P[B]=5/7.
 Since A and B are independent, using multiplication theorem
 P[AB]= P[A]. P[B].
 = 1/5*5/7= 1/7

Good morning

8
4/24/2020

Conditional Probability
 Probability of dependent events is termed conditional
probability.
 Let A and B be 2 events, A depending on B. Then,
P[A/B] = P[AB]/P[B]

Conditional Probability
 Example:
 Let a file contain 10 papers numbered 1 to 10. A paper is selected at random.
What is the probability that it is 10 given that it is at least 5.
 Solution:
 From the problem we can see that,
 Sample space S ={1,2,3,4,5,6,7,8,9,10}
 Event that number is 10 : A ={10}.
 Event that number is at least 5: B ={5,6,7,8,9,10}.
 A and B={10}.
 P[A]= 1/10; P[B] =6/10; P[ AB] =1/10.
 Therefore,
 P[A/B] =

9
4/24/2020

Complement
 Complement probability
 P(AC)=1-P(A)

Bayes Theorem
 Statement:
 Let E1,E2,…,En be n complimentary events and B be any event.
Then
P( B / Ei ) P( Ei )
P( Ei / B) 
n
 P( B / Ei ) P( Ei )
i 1

10
4/24/2020

Good morning

11
5/15/2020

Good morning

Good Morning
Recall From
previous session:

Bayes’ Probability
Theorem

Conditional Theorems-
Addition/
Probability Multiplication

Good Morning
Today’s Sessions

Standard Discrete
distributions Four Things to
• Binomial/Poisson/ Discuss/Learn
Geometric

Probability Random
Distributions- variables-
Measures Types

1
5/15/2020

Probability
• Single event
Probability P[A]= n/m
• single element
• Single event
Probability P[A]= n Cr/m Cr
• r elements
Addition theorem • Two events
P[A or B]=P[A]+P[B]-P[A and B] • Either one is required

Multiplication theorem • Two events independent


P[A and B]=P[A] P[B] • Both are required
Conditional Probability • Two events dependent
P[B/ A]=P[A and B] / P[A] • B depends on A
Baye’s Theorem • Events [An] are Complimentary
P[An/ B]=P[An and B] / P[B] • B depends on An

Probability
 A speaks truth in 75% of cases and B in 80% of cases.
In what percentage of cases are they likely to
contradict each other, narrating the same incident?
 Solution:

Probability
 A speaks truth in 75% of cases and B in 80% of cases.
In what percentage of cases are they likely to
contradict each other, narrating the same incident?
 Solution:
 A and B contradict each other =
[A lies and B true] or [B true and B lies]
= P(A).P(B-lie) + P(A-lie).P(B)
[Please note that we are adding + at the place of OR]
= (3/5*1/5) + (1/4*4/5) = 7/20
= (7/20 * 100) % = 35%

2
5/15/2020

Probability
 A speaks truth in 75% of cases and B in 80% of cases.
In what percentage of cases are they likely to
contradict each other, narrating the same incident?
 Solution:
 A and B contradict each other =
[A lies and B true] or [B true and B lies]
= P(A).P(B-lie) + P(A-lie).P(B)
[Please note that we are adding + at the place of OR]
= (3/5*1/5) + (1/4*4/5) = 7/20
= (7/20 * 100) % = 35%

Probability
 What is the probability of getting 53 Mondays in a leap year?
 1/7 or 2/7 or 3/7 or 1
 Solution:

Probability
 What is the probability of getting 53 Mondays in a leap year?
 1/7 or 2/7 or 3/7 or 1
 Solution:
 1 year = 365 days . A leap year has 366 days
 A year has 52 weeks. Hence there will be 52 Sundays for
sure.
 52 weeks = 52 x 7 = 364days which makes it
 366 – 364 = 2 days to discuss about.

3
5/15/2020

Probability
 What is the probability of getting 53 Mondays in a leap year?
 1/7 or 2/7 or 3/7 or 1
 Solution:
 1 year = 365 days . A leap year has 366 days
 A year has 52 weeks. Hence there will be 52 Sundays for sure.
 52 weeks = 52 x 7 = 364days which makes it
 366 – 364 = 2 days to discuss about.
 In a leap year there will be 52 Sundays and 2 days will be left.
 These 2 days can be:
 1. Sunday, Monday; 2. Monday, Tuesday;
 3. Tuesday, Wednesday; 4. Wednesday, Thursday
 5. Thursday, Friday; 6. Friday, Saturday; 7. Saturday, Sunday
 Of these total 7 outcomes, the favourable outcomes are 2.
 Hence the probability of getting 53 days = 2/7

Probability -Example
 A person sells cone ice creams in a moving vehicle. Any
customer can buy 1 or more cups as per the following
distribution based on past data.
Number of 1 2 3 4 5
ice creams
Customers 175 115 63 32 15

 How many customers of his next 100 will buy 2 ice creams?

Probability -Example
 A person sells cone ice creams in a moving vehicle. Any
customer can buy 1 or more cups as per the following
distribution based on past data.
Number of 1 2 3 4 5
ice creams
Customers 175 115 63 32 15
Prob 0.4375 0.2875 0.1575 0.08 0.0375
 How many customers of his next 100 will buy 2 ice creams?
 P(X=2)=0.2875
 So, 100*0.2875≈29

4
5/15/2020

Good morning

Random Variables
 We can answer questions regarding a specified event through
probability, like how frequently that event is expected to
occur…
 And we can handle any event discussion using probability.

Random Variables
 We can answer questions regarding a specified event through
probability, like how frequently that event is expected to
occur…
 And we can handle any event discussion using probability.
 But do we have any idea about the random experiment as a
whole?

5
5/15/2020

Random Variables
 Understanding a Random experiment is through random
variables.
 A random variable is a function that maps the set of events to
Rn.

 For example, in the example of tossing two dice together,


 define the random variable X to be the sum of the two dice.
For every element in the sample space, we can specify the
value of X.

RANDOM VARIABLE
 So, a r.v is a variable, because it takes many values and is random
because the values taken depend on which of the outcomes the
experiment results in.

Random Variable
 For example if we're tossing two six sided dices,
 S: { (1; 1) (1; 2) (1; 3) (1; 4) (1; 5) (1; 6)
 (2; 1) (2; 2) (2; 3) (2; 4) (2; 5) (2; 6)
 (3; 1) (3; 2) (3; 3) (3; 4) (3; 5) (3; 6)
 (4; 1) (4; 2) (4; 3) (4; 4) (4; 5) (4; 6)
 (5; 1) (5; 2) (5; 3) (5; 4) (5; 5) (5; 6)
 (6; 1) (6; 2) (6; 3) (6; 4) (6; 5) (6; 6) }

6
5/15/2020

Random Variable
 If we know the probabilities of a set of events, we can
calculate the probabilities that a random variable defined on
those set of events takes on certain values. For example
 P(X = 2) = P((1; 1)) = 1/36
 P(X = 5) = P((1; 4); (2; 3); (3; 2); (4; 1g) = 1/9.

Random Variables-Types

• Number of students Joined today


• Number of students sleeping now

• Number of minutes to conclude the session


• Amount of water you’ve taken Since Morning
• Number of minutes you were attentive so far

Good morning

7
5/15/2020

Random Variable
 Probability Distributions :
 The probability that x can take a specific value is p(x).
That is
P( X  x)  p( x)  p x

 p(x) is non-negative for all real x,


p ( x)  0, x

 The sum of p(x) over all possible values of x is 1,

 p( x)  1
x

Random Variable-
Probability Distributions
The probability that x can take a specific
value is P( X  x)  p( x)  p
x

p(x) is non-negative for all real x,


p( x)  0, x

The sum of p(x) over all possible


values of x is 1,  p( x)  1
x

Random Variable
 Probability Distributions :
 The probability that x can take a specific value is p(x).
That is P( X  x)  p( x)  p x
 p(x) is non-negative for all real x, p ( x)  0, x
 The sum of p(x) over all possible values of x is 1,  p( x)  1
x

 In case of continuous distributions, the summation is


replaced by integration.
 So, a PD is a pair (X, p(x)) or (X,f(x))
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2

8
5/15/2020

Random Variable-Measures
 Expectation: E[ X ]   xp( x)  x f ( x) dx
x x

 Variance: V [ X ]    x   2 p ( x)  x   
2
f ( x) dx
x x

For Computations, V(X) = E[X2]-{E[X]}2


where E[ X n ]   x n p( x)
x

Random Variable-Example 1
 Discrete r.v: probability mass function : p(x)=P(X=x)
 Ex: Toss a coin twice S={HH,HT,TH,TT}
 X=Number of heads={0,1,2}
 p(0)=P{TT}=1/4
 P(1)={TH,HT}=2/4
 P(2)={HH}=1/4
 E[X]=
 V[X]=

Random Variable -Example 2


 Suppose a variable X can take the values 1, 2, 3, or 4.
The probabilities associated with each outcome are described
by the following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2

 P(X = 2 or X = 3)

 The probability that X is greater than 1 is equal to

9
5/15/2020

Random Variable-Example 3
 Suppose a variable X can take the values 1, 2, 3, or 4.
The probabilities associated with each outcome are described
by the following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2

 The cumulative probability distribution is


p (3)   p (3)  0.1  0.3  0.4  0.8
x

Random Variables-Example 4
 For the following data, find the measures , form the
histogram of pmf and cdf

Good morning

10
5/15/2020

Discrete Random Variables

Bernoulli

Geometric Binomial

Poisson

Discrete Random Variables

Bernoulli

Geometric Binomial

Poisson

Discrete Random Variables


 B

Poisson
e  x  x
p( x) 
x!
Geometric
Binomial p( x)  pq x r
p( x)  nCr p r q n  r

Bernoulli
p x 1
p ( x)  
q x0

11
5/15/2020

Discrete Distributions
 A bag contains 50 balls, 20 blue and 30red. Four balls are
taken one after another with replacement. Probability of
getting 2 from each category?.

 A bag contains 500 balls, 200 blue and 300red. Four balls are
taken one after another with replacement. Probability of
getting 2 from each category?

 A bag contains 50 balls, 20 blue and 30red. Balls are taken


one after another with replacement until we get a blue one.
Probability of getting blue in fourth selection?

Discrete Distributions
 A bag contains 50 balls, 20 blue and 30red. Four balls are taken
one after another with replacement. Probability of getting 2 from
each category?.
 Binomial
 A bag contains 500 balls, 200 blue and 300red. Four balls are
taken one after another with replacement. Probability of getting 2
from each category?
 Poisson
 A bag contains 50 balls, 20 blue and 30red. Balls are taken one
after another with replacement until we get a blue one.
Probability of getting blue in fourth selection?
 Geometric

Binomial Distribution
 Binomial.

 Suppose the probability of purchasing a defective computer is


0.02. What is the probability of purchasing 2 defective
computers and what is the expected number of defective
computers in the 10.

12
5/15/2020

Poisson Distribution
 Mean and Variance of Poisson rv are equal and is   np

 Births in a hospital occur randomly at an average rate of 1.8


births per hour. What is the probability of observing 4 births
in a given hour at the hospital?
 Solution:
 Let X = No. of births in a given hour. Lambda=1.8

Geometric Distribution
 Possible values are Y=1,2,3,…

 Mean=1/p
 Variance=(1-p)/p2.

Geometric Distribution
 A coin has been weighted so that it has a 0.9 chance of
landing on heads when flipped. What is the probability that
the first time the coin lands on heads is the 3rd flip?
 Solution:
 X=Number of BT’s to get the first head.
 p=0.9
 P(X=3)=T*T*H= (0.1)2.*(0.9)=0.009

13
5/15/2020

Geometric Distribution
 A coin has been weighted so that it has a 0.9 chance of
landing on heads when flipped. What is the probability that
the first time the coin lands on heads is after the 3rd flip?.
 Solution:
 X=Number of BT’s to get the first head.
 p=0.9
 P(X>3)=1-P(X<=3)=1-[P(X=1)+ P(X=2)+ P(X=3)]
 =1-[(0.1)0.*(0.9)+(0.1)1.*(0.9)+(0.1)2.*(0.9)]
 =1-[0.9+0.09+0.009]
 =0.001

Good morning

14
5/15/2020

Good morning

Previous session
Random
Variables-
Types

Probability
Geometric distributions-
Measures

Poisson Binomial

This session
Binomial,
Poisson,
Geometric

Hyper
Simulation
geometric

Chebychev’s
Poisson Process
theorem

1
5/15/2020

Standard distributions
 Binomial:
 Fixed Number(n) of Independent Bernoulli trials.
 Probability of success(p).
 Probability of Failure(q).
 Required number of successes(x).

p( x)  nCr p r q n  r

Standard distributions
 Poisson:
 Fixed Number(n) of Independent Bernoulli trials.
 Probability of success(p).
 Required number of successes(x).
 n large, p small and np=λ, constant.

e  x
p( x) 
x!

Standard distributions
 Geometric:
 Repeated Independent Bernoulli trials before first success.
 Probability of success(p).
 Probability of Failure(q).
 Required number of BT(x).

p ( x )  pq x 1

2
5/15/2020

Standard distributions
 Hyper Geometric:
 Fixed number (n) of Dependent trials/Number of draws .
 The number of successes(A) in the population.
 The number of observed successes(a).
 The number of elements(N).

 A  N  A 
 a  n  a 
P( X  a)    
N
 
n

Geometric Distribution
 Possible values are Y=1,2,3,…

 Mean=1/p
 Variance=(1-p)/p2.

Geometric Distribution
 A coin has been weighted so that it has a 0.9 chance of
landing on heads when flipped. What is the probability that
the first time the coin lands on heads is the 3rd flip?
 Solution:
 X=Number of BT’s to get the first head.
 p=0.9
 P(X=3)=T*T*H= (0.1)2.*(0.9)=0.009

3
5/15/2020

Geometric Distribution
 A coin has been weighted so that it has a 0.9 chance of
landing on heads when flipped. What is the probability that
the first time the coin lands on heads is after the 3rd flip?.
 Solution:
 X=Number of BT’s to get the first head.
 p=0.9
 P(X>3)=1-P(X<=3)=1-[P(X=1)+ P(X=2)+ P(X=3)]
 =1-[(0.1)0.*(0.9)+(0.1)1.*(0.9)+(0.1)2.*(0.9)]
 =1-[0.9+0.09+0.009]
 =0.001

Hyper-Geometric Distribution
 This is similar to Binomial distribution except that here the
trials are without replacements.
 Recall that Binomial is like probability of taking 5 blue balls
from a bag with 10 blue and 15 white balls. But it was
mentioned that the trials were independent meaning the
drawing was with replacement. If we don’t replace the balls
selected, then the trials are dependent and now the probability
of choosing 5 blue balls is from Hyper-geometric.

Without
With
replacements
replacements
Given number of Binomial Hyper-geometric
draws distribution distribution

Hyper-Geometric Distribution
 Its pmf  A  N  A 
  
a n  a 
P( X  a )   
N
 
 Where: n
A is the number of successes in the population
a is the number of observed successes
N is the population size
n is the number of draws

not drawn total


drawn
green
a A−a A
marbles
red marbles n − a (N − n) −(A− a) N − A
total n N−n N

4
5/15/2020

Hyper-Geometric Distribution-Example
 A deck of cards contains 20 cards: 6 red cards and 14 black
cards. 5 cards are drawn randomly without replacement.
What is the probability that exactly 4 red cards are drawn?

 6 14 
 4  1 
P( X  4)    
 20 
 5 
 

Hyper-Geometric Distribution-Example
 A small voting district has 101 female voters and 95 male
voters. A random sample of 10 voters is drawn. What is
the probability exactly 7 of the voters will be female?

101 95 
  
7  3 
P ( X  7)  
196 
 
 10 

Hyper geometric
 If a hyper geometric distribution is represented by
X~H(P,Q,n), where N=P+Q,
 Its mean is nP

PQ

 Its standard deviation is


nPQ( P  Q  n)

P  Q 2 P  Q  1

5
5/15/2020

Good morning

Poisson Process
Is a counting process, independent increments, stationary increments
A stochastic process {N(t), t  0} is a counting process if N(t)
represents the total number of events that have occurred in [0,
t]
Then {N(t), t  0} must satisfy:
N(t)  0; N(t) is an integer for all t
If s < t, then N(s)  N(t)
For s < t, N(t) - N(s) is the number of events that occur in the
interval (s, t].

Poisson Process
Is a counting process with independent and stationary increments
 A counting process has independent increments if, for any
0st uv N (t )  N ( s )
is independent of N (v)  N (u )

That is, the number of events that occur in non overlapping


intervals are independent random variables.

 A counting process has stationary increments if , for any s < t, the


distribution of N(t) – N(s) depends only on the length of the
time interval, t – s.

6
5/15/2020

Poisson Process
A counting process {N(t), t  0} is a Poisson process with rate ,
 > 0, if
N(0) = 0
The process has independent increments
The number of events in any interval of length t follows a
Poisson distribution with mean t (therefore, it has stationary
increments), i.e.,

e  t  t 
n

P  N  t  s   N  s   n  , n  0,1,...
n!

Poisson Process
A counting process {N(t), t  0} is a Poisson process

Good morning

7
5/15/2020

Chebychev’s theorem
 Chebyshev’s theorem will show you how to use the mean and the
standard deviation to find the percentage of the total observations that
fall within a given interval about the mean.
 The fraction of any set of numbers lying within k standard deviations
of the mean of those numbers is at least 1-[1/k2], where
the within number
k
the standard deviation
 and k must be greater than 1

Chebychev’s theorem-Example
 Use Chebyshev's theorem to find what percent of the values will fall between 123 and 179 for
a data set with mean of 151 and standard deviation of 14.
 Solution:
 We subtract 151-123 and get 28, which tells us that 123 is 28 units below the mean.
 We subtract 179-151 and also get 28, which tells us that 151 is 28 units above the mean.
 Those two together tell us that the values between 123 and 179 are all within 28 units of the
mean. Therefore the "within number" is 28.
 So we find the number of standard deviations, k, which the "within
number", 28, amounts to by dividing it by the standard deviation:
 k=the within number/the standard deviation=28/14=2

So now we know that the values between 123 and 179 are all within 28 units of the mean,
which is the same as within k=2 standard deviations of the mean. Now, since k > 1 we can
use Chebyshev's formula to find the fraction of the data that are within k=2 standard
deviations of the mean. Substituting k=2 we have:
 1−[1/k2]=1−[1/22]=1−1/4=3/4
 So 3/4 of the data(75%) lie between 123 and 179.

Chebychev’s theorem- Method 2


 Suppose µ = 39 and σ = 5, find the percentage of values that
lie within 29 and 49 of the mean.
29 ——————– 39 ——————– 49

 We just need to find k and there are 2 ways to do so.
 First , we notice distance between mean(39) and 49 is 10 and
it is kσ. Therefore kσ=10 implies k=2.
 For k=2, we can find that is contains 75% of values.

8
5/15/2020

Good morning

Discrete Simulation
Simulation is basically about mimicking a system to understand it.
We sort of observe the system through a set of variables called
state Variables and when the state variables change their values
only at countable number of points in time, its called Discrete
Simulation.
Most business processes can be described as a sequence of
separate, discrete, events. For example, a truck arrives at a
warehouse, goes to an unloading gate, unloads, and then
departs. To simulate this, discrete event modeling is often
chosen.

Discrete Simulation
 Steps in a simulation study:

9
5/15/2020

Discrete Simulation
 In discrete systems, the changes in the system state are discontinuous and
each change in the state of the system is called an event.The model used in
a discrete system simulation has a set of numbers to represent the state of
the system, called as a state descriptor. In this chapter, we will also learn
about queuing simulation, which is a very important aspect in discrete event
simulation along with simulation of time-sharing system.
 Following is the graphical representation of the behavior of a discrete system
simulation.

Discrete Simulation
 Key Aspects:
 Entities − These are the representation of real elements like the parts of
machines.
 Relationships − It means to link entities together.
 Simulation Executive − It is responsible for controlling the advance time
and executing discrete events.
 Random Number Generator − It helps to simulate different data
coming into the simulation model.
 Results & Statistics − It validates the model and provides its performance
measures.

Discrete Simulation-Queuing System


 A queue is the combination of all entities in the system being served and
those waiting for their turn.

10
5/15/2020

Discrete Simulation-Queuing System


 A queue is the combination of all entities in the system being served and
those waiting for their turn.

Arrival Cust. Serv. Serv. Serv. Waiting Idle


Int Arr Begin Dur Comp Time Time
5 5 5 2 7 0 5
1 6 7 4 11 1 0
3 9 11 3 14 2 0
3 12 14 1 15 2 0
 Average waiting time for a customer: 1.25 mts
 Total Idle Time of server: 5 mts
 Prob(Customer has to wait)= 0.75

Good morning

11
5/29/2020

Good morning

Previous session
Binomial,
Poisson,
Geometric

Hyper
Simulation
geometric

Chebychev’s
Poisson Process
theorem

Today’s topics

Continuous
r.v’s
• Measures Standardization

Normal Problems
Distribution

1
5/29/2020

Continuous random variables


 Random variables which take real values, all possible values
between two real numbers and not necessarily as integers.
 (i) Let X be the length of a randomly selected telephone call.
(ii) Let X be the volume of water in a can marketed as 1 litre.


(i )  f ( x) dx  1

(ii ) f ( x)  0

Continuous random variables


 When we are handling cts r.v’s,
b
P(a  X  b)   f ( x) dx
a

b
E ( X )   xf ( x) dx
a
b
E ( X )   x n f ( x) dx
n

Continuous Random variables


 Let X be a random variable with PDF given by

cx 2 0 | x | 1
f ( x)  

0 else
 (a)Find the constant c.
 (b)Find EX and Var(X).
 (c)Find P(X≥1/2).

2
5/29/2020

Continuous Random variables


cx 2 0 | x | 1
 Let X be a random variable with PDF given by f ( x)  
0 else

 Find the constant c.



 (a)For f to be a pdf,  f ( x) dx  1 which for the given function is
 1


cx 2 dx  1
1
2
c 1
3

 giving c=3/2.

Continuous Random variables


cx 2 0 | x | 1
 Let X be a random variable with PDF given by f ( x)  
0 else

1
3  x4 
1 1
 (b) E(X) 3 2
dx     0
 xf ( x) dx   x 2 x 2  4 
1 1 1

1
3  x5 
1 1
2 3 2 3
x
2
f ( x) dx  x x dx    
2 2  5  5
1 1 1

 Var(X)=E(X^2)-[E(X)]^2=3/5-0=3/5

Continuous Random variables


 Let X be a random variable with PDF given by
cx 2 0 | x | 1
f ( x)  
0 else

 (c)
1 1
3 7
P( X  1 / 2)   f ( x) dx   x 2 dx 
1 1
2 16
2 2

3
5/29/2020

Continuous Random variables-2


 Let X be a random variable with PDF given by
(1  x 2 )(3 / 4)  1  x  1
f ( x)  
0 else

 Find its expected value.

Good morning

Normal Distribution
 A normal variable X with mean
µ( -∞ < µ < ∞ ) and variance σ2 > 0 has a normal distribution
if its pdf is,

1   ( x   )2 
f ( x)  exp  for   x  
 2  2 2 
 

4
5/29/2020

Normal Distribution
 A normal variable X with equal mean and differing
variance,

Normal Distribution
 Differing parameters

Standard Normal Distribution


 A normal r.v X with mean µ variance σ2 > 0, can be
standardized using x
z

5
5/29/2020

Standard Normal Distribution


 When we standardize, x
z

Standard Normal Distribution


 Problems,

Normal
Table
 Value
 for
 1.45

6
5/29/2020

Standard Normal Problem-1


 For a normal(300,50), find the probability that it
assumes a value greater than 362.
362  300
 Standardize: z  50
 1.24

 Prob=1-area from Table


 =1-0.8925=0.1075

Standard Normal Problem-2


 For a normal(50,10), find the probability that it assumes
a value between 45 and 62.

45  50
 Standardize: z1   0.5
10
62  50
z2   1.2
10
 Prob=area from Table(1.2)
 -area from Table(-0.5)
 =0.8849-(1-0.6915)
 =0.8849-0.3085=0.5764

Standard Normal Problem-3


 Find the area under normal curve that lies [i] to the
right of z=1.84 and [ii] between z=-1.97 and
z=0.86.

 (a)Area=1-area from Table


 =1-0.9671=0.0329
 (b)Area=area from Table(0.86)
 -area from Table(-1.97)
 =0.8051-0.0244
 =0.7807

7
5/29/2020

Standard Normal Problem-4


 Let us assume that heights of students in II Year is normally
distributed with an average of 165 cm and a standard
deviation of 10 cms. What is the probability that a student’s
height is less than 175 cms.
 Solution:

Standard Normal Problem-4


 Let us assume that heights of students in II Year is normally
distributed with an average of 165 cm and a standard
deviation of 10 cms. What is the probability that a student’s
height is less than 175 cms.
 Solution:
 Let, X= Height of students in II Year.
 It is normal with, mean µ= 165; standard deviation σ
=10.
 P[ a student’s height is less than 175 cms]
 =P[-∞<X<175]

Standard Normal Problem-4


 First, we should convert X into Z by
Z= x- µ/ σ.
 We have x=175, µ= 165; σ =10.
Z= 175- 165/ 10 =1.
So when X=175; Z=1 and so
 P[-∞<X<175] = P[-∞<Z<1]=
P[-∞<Z<0]+ P[0<Z<1].
=0.5+0.34 = 0.84.

8
5/29/2020

Standard Normal Problem-5


 Suppose the diameter of a certain car component follows the
normal distribution with X N(10; 3). Find the proportion of
these components that have diameter larger than 13.4 mm.
Or, if we randomly select one of these components and the
probability that its diameter will be larger than 13.4 mm.

 Suppose the diameter of a certain car component follows the


normal distribution with X N(10; 3).
 P(X>13.4)=P(X-10>13.4-10)=

Standard Normal Problem-5


 Suppose the diameter of a certain car component follows the
normal distribution with X N(10; 3). Find the proportion of
these components that have diameter larger than 13.4 mm. Or, if
we randomly select one of these components and the probability
that its diameter will be larger than 13.4 mm.
 Solution: Suppose the diameter of a certain car component
follows the normal distribution with X N(10; 3).
 P(X>13.4)=P(X-10>13.4-10)=
 X  10   13.4  10 
P   P 
 3   3 
 P( Z  1.13)  1  0.8708  0.1292

Normal Distribution property


 68-95-99.7 rule/ 3SD rule

9
5/29/2020

Good morning

10
6/5/2020

Good morning

Previous session

Continuous
r.v’s
• Measures Standardization

Normal Problems
Distribution

Today’s Session
Normal
approximation to
Binomial-
Continuity

Beta Uniform
distribution Distribution

Gamma Exponential
Distribution Distribution

1
6/5/2020

Normal Approximation to Binomial


 What is it?
 When is it possible.
 How can we do that.

 As Normal is continuous and Binomial is discrete, we need to


use continuous correction factor to balance.

Normal Approximation to Binomial


 The parameters of Binomial.
 N-Sample size
 p-Prob of success
 q-Prob of Failure
 When n * p and n * q are greater than 5, we can use the
normal approximation to the binomial to solve a problem
 This is called non-Skewness condition.

Normal Approximation to Binomial


 There are Two parts:
 Part1: Getting it ready with measures(steps1-5).
 Part2: Using continuity correction factor(steps6-9).
 Step 1: Find p,q, and n:
 The probability p is given in the question as 62%, or 0.62
 To find q, subtract p from 1: 1 – 0.62 = 0.38
 The sample size n is given in the question as 500.

2
6/5/2020

Normal Approximation to Binomial


 Step 2: Deciding if you can use the normal approximation to
the binomial.
 If n * p and n * q are greater than 5, then you can use the
approximation:
n * p = 310 and n * q = 190.
These are both larger than 5, so you can use the normal
approximation to the binomial for this question.

Normal Approximation to Binomial


 Step 3: Find the mean, μ by multiplying n and p:
n * p = 310.
 Step 4: Multiply step 3 by q :
310 * 0.38 = 117.8.
 Step 5: Take the square root of step 4 to get the standard
deviation, σ:
√(117.8)=10.85

Normal Approximation to Binomial


 Step 6: Write the problem using correct notation. The
question stated that we need to “find the probability that at
least 290 are actually enrolled in school”. So:
P(X ≥ 290)

3
6/5/2020

Normal Approximation to Binomial


 Step 7: Rewrite the problem using the continuity correction
factor:
P (X ≥ 290-0.5) = P (X ≥ 289.5)

Normal Approximation to Binomial


 Step 8: Draw a diagram with the mean in the center. Shade
the area that corresponds to the probability you are looking
for. We’re looking for X ≥ 289.5, so:

Normal Approximation to Binomial


 Step 9: Find z value, use table to find the probability.
 Z=(289.5-310)/10.85=-1.89
 Area from z table for -1.89 is 0.4706.
 Adding 0.5, we get 0.5+0.4706=0.9706.

4
6/5/2020

Normal Approximation to Binomial


Example2
 All of us receive calls trying to sell some unwanted products,
called spams. A company Orion, a call protection agency, issued
a report that suggested 45% of all cell phone calls in 2019 will be
spam. Suppose 500 cell phone calls are selected at random. Use
the normal approximation to the binomial distribution to answer
the following questions.
 (a) Find the probability that at least 240 of the cell phone calls
will be spam.
 (b) Find the probability that at exactly 245 of the cell phone calls
will be spam.

Normal Approximation to Binomial


Example2
 Soln:
 (a)Let X be the number of cell phone calls that are spam. X is a
binomial random variable with n = 500 and p = 0.45 , X ∼
B(500, 0.45) . Find the mean and variance of the binomial
random variable X .
 µ= = np (500)(0.45)= 225
 σ =npq=( 500)(0.45)(0.55)= 123.75

Normal Approximation to Binomial


Example2
 Soln: (a)
 Check non skewness conditions
 It is ok.
 P(X>=240)=P(X>=239.5)
 Binom Normal
 Going through Normal procedure,
 P(X>=239.5)=0.0968

5
6/5/2020

Normal Approximation to Binomial


Example2
 Soln:
 (b)=0.0072

Good morning

Continuous distribution
 1. Suppose in a quiz there are 30 participants. A question is
given to all 30 participants and the time allowed to answer it
is 25 seconds. Find the probability of participants responds
within 6 seconds?
 2. Suppose a flight is about to land and the
announcement says that the expected time to land is 30
minutes. Find the probability of getting flight land between
25 to 30 minutes?
 3. Suppose a train is delayed by approximately 60 minutes.
What is the probability that train will reach by 57 minutes to
60 minutes?

6
6/5/2020

Continuous distribution
 4. The data that follows are 55 smiling times, in seconds, of
an eight-week old baby.

 What is the probability that a randomly chosen eight-week


old baby smiles between 2 and 18 seconds?

Uniform Distribution
 Let us consider the baby smiling example. From the data we
can calculate the mean smiling time to be 11.49 and 6.23
seconds.
 Since this is an entirely spontaneous activity which could be
termed completely random, we can use Uniform distribution
to approximate this.

Uniform Distribution
 A random variable is uniformly distributed in the
interval(a,b) if its pdf is defined by

 1
 a xb
f ( x)   b  a
0 else

 Its mean is E[X]=(a+b)/2.


 Variance V(X)=(b-a)2/12

7
6/5/2020

Uniform Distribution
 So, we need to form the uniform distribution.
 From the table, the smallest value is 0.7 and the largest is
22.8. So, if we assume that smiling times, in seconds, is
uniformly distributed between(0,23), then by the definition
of the uniform distribution

 1
 a xb
f ( x)   b  a
0 else
1
f ( x) 
23

Uniform Distribution
 So, P(2<X<18)=

18 18
1 18  2 16
P(2  X  18)   f ( x)dx   23 dx  23

23
2 2

Uniform Distribution
 Can we solve this in any other logical way

8
6/5/2020

Uniform Distribution
 Can we solve this in any other logical way?
 Yes. Consider the following rectangle.

Then the required probability is

18
1 16
P(2  X  18)   f ( x)dx  (base) * (height )  (18  2) * 23  23
2

Uniform
 The amount of time, in minutes, that a person must wait for
a bus is uniformly distributed between 0 and 15 minutes,
inclusive.
X  U (0,15)

 What is the probability that a person waits fewer than 12.5


minutes?
 Draw the rectangle. Then
 P (X < 12.5) = (base) (height)
 = (12.5 -0) * 1/15= 0.8333

Good morning

9
6/5/2020

Exponential
 Suppose the number of hits to your website follow a Poisson
distribution at a rate of 2 per day. Let T be the time (in days)
between hits.
 Suppose that messages arrive to a computer server following
a Poisson distribution at the rate of 6 per hour. Let T be the
time in hours that elapses between messages.

 The above examples follow- exponential

Exponential
 A random variable X follows exponential with rate
parameter λ if its pdf is defined by
e  x x  0
f ( x)  
0 else

E[ X ]  1 /  ; V [ X ]  1 / 2

Exponential
 If jobs arrive every 15 seconds on average λ=4
per minute, what is the probability of waiting less than or
equal to 30 seconds, i.e 0.5 min?

10
6/5/2020

Exponential

 If X=Waiting time between arrivals


 Then P(waiting less than or equal to 0.5 min)=

0.5
4x
P( X  0.5)   4e dx  1 e  2  0.86
0

Example
 Accidents occur with a Poisson distribution at an average of 4
per week. i.e.λ=4
 1. Calculate the probability of more than 5 accidents in any
one week
 2. What is the probability that at least two weeks will elapse
between accident?

Solution
 (i) X=number of accidents
 Poisson with mean 4
 P(X>5)=1-P(X<=5)
 =1-{P(X=0)+ P(X=1)… P(X=5)}
e 4 40
P( X  0)   e4
0!
 (ii) T=Time between occurrences
 Exponential with mean 4


P(T  2)   4e  4t dt e 8  0.00034
2

11
6/5/2020

Exponential
Memoryless property.
 This distribution has a memoryless property, which means it
“forgets” what has come before it.
 In other words, if you continue to wait, the length of time
you wait neither increases nor decreases the probability of an
event happening.
 Let’s say a hurricane hits your island. The probability of
another hurricane hitting in one week, one month, or ten
years from that point are all equal. The exponential is
the only distribution with the memoryless property.

P  X  s  t X  t  P  X  s for all s, t  0

Exponential
Applications
 The exponential often models waiting times , Time
between events, Lifetime of objects…
 “How much time will go by before a major hurricane hits the
Atlantic Seaboard?” or
 “How long will the transmission in my car last before it
breaks?”.

Exponential
Applications
 Young and Young (198) give an example of how the
exponential distribution can also model space between
events (instead of time between events). Say that grass seeds
are randomly dispersed over a field; Each point in the field
has an equal chance of a seed landing there. Now select a
point P, then measure the distance to the nearest seed. This
distance becomes the radius for a circle with point P at the
center. The circle’s area has an exponential distribution.

12
6/5/2020

Exponential Family
 The exponential distribution is one member of a very large class
of probability distributions called the exponential families, or
exponential classes. Some of the more well-known members of this
family include:
 The Bernoulli distribution,;The beta distribution,;
 The binomial distribution (for a fixed number of trials),;
 The Categorical distribution,
 The Chi-squared distribution,;The Dirichlet distribution,;The gamma
distribution,
 The Geometric distribution,;The inverse Gaussian distribution,
 The lognormal distribution.;
 The negative binomial distribution (for a fixed number of failures),
 The normal distribution,; Poisson distribution
 The von-Mises distribution,;The von-Mises Fisher distribution.

Good morning

Exponential, Gamma and Beta Dist

E ( ) , random
 If X1, . . ., Xk are i. i. d. Exponential,
variables, then X1 + · · · + Xk is a Gamma G(k ,  )
random variable.

 Ratio of Gamma random variables is a Beta random variable. If


X and Y are Gamma’s , then X/(X+Y) is Beta.
 Beta(1, 1) = Uniform(0, 1).

13
6/5/2020

Gamma Distribution
 A random variable is Gamma distributed With a shape
parameter k and a scale parameter θ,if its pdf is given by
 Gamma dist r e x x r 1
f ( x)  x0
gamma(r)

 Gamma Function, Г(x) gamma( x)   x n 1e  x dx
0

 Note: Г(x)=(x-1)!

Gamma Distribution
 Gamma(shape,scale)

Beta Distribution
 A random variable is Beta distributed with parameters α,β
(both shape parameters) if its pdf is given by

14
6/5/2020

Gamma-Beta
Examples
 Suppose that on an average 1 customer per minute arrive at a
shop. What is the probability that the shopkeeper will wait
more than 5 minutes before
 (i) both of the first two customers arrive, and
 (ii) the first customer arrive?
 Soln: X is Gamma
 Let X denotes the waiting time in minutes until the second
customer arrives, then X has gamma distribution with r = 2
(as the waiting time is to be considered up to 2nd customer)
λ=1
 P(X>5)=6exp(-5)=0.042.

Beta Distribution
Properties
 The difference between the binomial and the beta is that the
former models the number of successes (x), while
the latter models the probability (p) of success.

 Interpretation of α, β
 You can think of α-1 as the number of successes and β-1
as the number of failures, just like n & n-x terms in
binomial.

 If both α=1, β=1,


 Then the Beta reduces to a uniform distribution.

Exponential, Gamma and Beta


Example
 Waiting for a bus. Buses arrive as a Poisson process of rate λ =
4 per hour after 5pm. I start waiting for a bus at 5pm, and,
knowing about the exponential distribution, expect to wait for
about 1/λ hours = 15 minutes for a bus. By 5.30pm, no bus has
turned up. How much longer do I expect to wait?

15
6/5/2020

Exponential, Gamma and Beta Example


 Solution: We are working with X1, the time until the first
arrival. Now X1 is exponential with parameter λ = 4, as we
have seen. But we are told that X1 > 0.5 (already waited half an
hour). So we need a conditional distribution. We can calculate
the probability of waiting a further time t > 0, given that we
have already waited 0.5, as follows
 P(X1 > 0.5 + t | X1 > 0.5) = P(X1 > 0.5 + t) P(X1 > 0.5)
 = exp{ −4(0.5+t)}/ exp{−4(0.5)} =
exp{−4t} , which is another exponential random time with
parameter 4.
 So I am no better off now than when I started! The
(conditional) waiting time has the same distribution as the
original, and the expected waiting time is still 15 (more)
minutes.

Problems
Uniform, Exponential, Gamma
 The average amount of weight gained in winter is uniformly
distributed U(0,30). What is P(10,15).
 In a quiz , there are 30 participants. Time allowed to answer
is 25 seconds. How many will answer within 6 secs.
 Suppose sending a money order is a random event at a
particular post office and on an average a money order sent
every 15 minutes. What is the probability that a total of 10
money orders are sent in <3 hours?

Good morning

16
6/5/2020

Summary-five sessions
 Probability.
 Approaches.
 Theorems –Addition, Multiplication.
 Conditional Probability-Bayes Theorem.
 Random Variables-Distributions-Measures.
 Discrete RV-Binomial, Poisson,Geometric, Hyper-Geometric.
 Continuous RV-Normal,Exponential,Uniform, Gamma,Beta.
 Chebychev’s theorem.
 Poisson process.
 Simulation.

Summary-five sessions
Probability-
Approaches.

Chebychev’s Prob.Theorems
theorem.Poisson –Addition,
Process-Simulation Multiplication.

Continuous RV- Conditional


Normal, Probability-
Exponential,
Uniform, Gamma, Bayes
Beta. Theorem.

Discrete RV- Random


Binomial, Poisson, Variables-
Geometric, Hyper- Distributions-
Geometric. Measures.

Good morning

17
6/19/2020

Good morning

Five sessions –so far


 Probability
 Bayes’ Theorem
 Discrete rv-Standard Types
 Continuous rv-Standard types
 Others
 Poisson Process
 Chebychev’s theorem
 Simulation

Five sessions –so far


Bayes’
theoem

Discrete
rv
Probability Cts rv

Others
PP,CT,S

1
6/19/2020

This session
Joint
Distributions

Properties of
Joint discrete
Mean and
and Joint
variance of
continuous .
sample mean

Independence Marginal and


of random conditional
variables. densities.

Random Vector
 Let us play a game Toss a Die
 You toss a coin 1 2 3 4 5 6
Toss H
 I’ll Toss a dice A
1/12 1/12 1/12 1/12 1/12 1/12

 Coin T 1/12 1/12 1/12 1/12 1/12 1/12

 You win if H and odd number are outcomes


 I win if T and Even number are outcomes.
 Make them as a pair(Vector ) of rvs and get their joint
distribution, to get our chances to win.

Another Example of Joint p.f.

 Throw a die once, let X be the outcome. Throw a fair coin X


times and let Y be the number of heads. Find P(x,y)?
Solution:
 The possible X values range from 1 to 6.
 The possible Y values range from 0 to 6.
 When X = 2 (e.g.) Y = 0,1 or 2.
 P(2,0) = P(X=2,Y=0) = P(X=2)P(Y=0|X=2)
=(1/6)(1/2)2=1/24.

2
6/19/2020

Jointly Distributed Random Variables

 Joint Probability Distributions


 Discrete

P( X  xi , Y  y j )  pij  0
satisfying  p
i j
ij 1
 Continuous

f ( x, y)  0 satisfying  state space


f ( x, y)dxdx  1

Jointly Distributed Random Variables


 Joint Cumulative Distribution Function
F ( x, y)  P( X  xi , Y  y j )

 Discrete F ( x, y )   
i:xi  x j: y j  y
pij

x y
 Continuous F ( x, y)    f ( w, z )dzdw
w z 

Example-Joint Distribution
 Air Conditioner Maintenance
 A company that services air conditioner units in residences and
office blocks is interested in how to schedule its technicians in
the most efficient manner
 The random variable X, taking the values 1,2,3 and 4, is the
service time in hours
 The random variable Y, taking the values 1,2 and 3, is the
number of air conditioner units

3
6/19/2020

Example-Joint Distribution
X=service time
 Joint p.m.f
Y=
number  pij 0.12  0.18
i j
of units 1 2 3 4
  0.07  1.00

1 0.12 0.08 0.07 0.05 F (2, 2)  p11  p12  p21  p22


 0.12  0.18  0.08  0.15
2 0.08 0.15 0.21 0.13  0.43
 Joint cumulative
3 0.01 0.01 0.02 0.07
distribution function

Bivariate
 BIVARIATE DISCRETE p( x, y) pX (x) p X / Y ( x / y)
pY (y)

Good morning

4
6/19/2020

BIVARIATE CONTINUOUS RV’S


 Definition: Two random variable are said to have
joint probability density function f(x,y) if

Definition: Let X and Y denote two random variables


with joint probability density function f(x,y) then

the marginal density of X is



fX  x   f  x, y  dy


the marginal density of Y is



fY  y    f  x, y  dx


Conditional density
 Let X and Y denote two random variables with
joint probability density function f(x,y) and
marginal densities fX(x), fY(y) then
 the conditional density of Y given X = x

f  x, y 
fY X  y x  
fX  x
 conditional density of X given Y = y

f  x, y 
fX Y  x y 
fY  y 

5
6/19/2020

Bivariate Normal
1
 B-Nrl 1  Q x1 , x2 
f  x1 , x2   e 2

 2  1 2 1  2

 x   2  x1  1  x2  2   x2  2  
2

   2     
1 1

  1    1   2    2  
Q  x1 , x2  
1  2

This distribution is called the bivariate


Normal distribution.
The parameters are 1, 2 , 1, 2 and .

Bivariate-Continuous-Example
 If X and Y are jointly distributed as given by

 Find c and their marginal pdf’s

Bivariate-Continuous-Example

6
6/19/2020

Bivariate-Continuous-Example
 Recall that

Bivariate-Continuous-Example
 Similarly, the pdf of Y is

Good morning

7
6/19/2020

Covariance
 The covariance of two random variables is a statistic that
tells you how "correlated" two random variables are. If two
random variables are independent, then their covariance is
zero. If their covariance is nonzero, then the value gives you
an indication of "how dependent they are".
 For example, how height and weight of a person co-vary?
 Will an increase in height result in increase of weight? If so,
by how much?

Covariance
 Covariance is a measure of how much two random variables
vary together. It’s similar to variance, but where variance
tells you how a single variable varies, co variance tells you
how two variables vary together.

Covariance
 The covariance of two random variables X and Y is

 XY  E X   X Y  Y 
 Which for application purposes can be simplified into

 XY  E XY    X Y

8
6/19/2020

Covariance
 Two ball pens are selected at random from a bag containing 3
blue, 2 red and 3 green pens. If X is the number of blue pens
selected and Y is the number of red pens selected, find
 (i)the joint distribution of X and Y
 (ii)P[(X,Y)εA)] where A is the region {(x,y), x+y<=1}
 (iii) Covariance of X and Y

Covariance
 The possible pair values of X and Y are
 (0,0),(0,1),(0,2),(1,0),(1,1),(2,0).
 Their joint pmf is given by 3C x * 2C y * 3C2  x  y
 Giving the table f ( x, y ) 
8C2

X Row
f(x,y) total
0 1 2 P(y)
Y 0 3/28 9/28 3/28 15/28
1 6/28 6/28 0 12/28
2 1/28 0 0 1/28
Column P(x) 10/28 15/28 3/28 1
total

Covariance E  XY    xyf ( x, y )  (0 * 0)
x y
3
28
9
 (0 *1)  ... 
28
3
14

 Covariance
10 15 3 3
 X   xf X ( x)  0 *  1*  2 * 
x
28 28 28 4
15 3 1 1
Y   yfY ( y)  0 *  1*  2 * 
y
28 28 28 2

 XY  E  XY    X Y   *  
3 3 1 9
14 4 2 56
X Row
f(x,y) total
0 1 2 P(y)
Y 0 3/28 9/28 3/28 15/28
1 6/28 6/28 0 12/28
2 1/28 0 0 1/28
Column P(x) 10/28 15/28 3/28 1
total

9
6/19/2020

Covariance
Example
 If two random variables X and Y have the joint pdf f(x,y)=3x
0<=y<=x<=1,
 Find the covariance between X and Y.

Covariance
Example
 If two random variables X and Y have the joint pdf f(x,y)=3x
0<=y<=x<=1,
 Find the covariance between X and Y.

Covariance
Example
 If two random variables X and Y have the joint pdf f(x,y)=3x
0<=y<=x<=1,
 Find the covariance between X and Y.

10
6/19/2020

Covariance
Example
 If two random variables X and Y have the joint pdf f(x,y)=3x
0<=y<=x<=1,
 Find the covariance between X and Y.

Good morning

Independent Random variables


Example
 If we toss a dice once. Find P(X | observed number is <5)
 Solution
 Let A={X<5}. P(A)=4/6
 PX|A(X=1)=P(X=1 |A)=P(X=1 and A)/P(A)
 = P(X=1)/P(A)
 =1/4.

 The extension of the above is the conditional pmf of X given


Y, denoted by

11
6/19/2020

Independent Random variables


 Two random variables are said to be independent when
information regarding one of them has no influence on that
of the other.
 P(X=x and Y=y)=P(X=x) * P(Y=y).
 In another way,
 P(X=x /Y=y)=P(X=x).

Independent Random variables


Example
 If we toss a coin a coin four times. Let X be the random
variables denoting the number of heads in the first two tosses
and Y , the same from the next two. Are they independent.
Also find out P(X<2 and Y>1)
 Solution
 P(X=x and Y=y)=P(X=x) * P(Y=y).
 P(X<2 and Y>1)=3/16.

Good morning

12
6/19/2020

Properties of Mean and SD


 If a random variable X is adjusted by multiplying by the
value b and adding the value a, then the mean is affected as
follows:

 The mean of the sum/difference of two random variables X


and Y is the sum/difference of their means.
 μX+Y=μX+(-)μY.

Properties of Mean and SD


 If a random variable X is adjusted by multiplying by the
value b and adding the value a, then the variance is affected as
follows:

 Similarly

 X2 Y   X2   Y2
 X2 Y   X2   Y2

Properties of Mean and SD


 If a random variable X is adjusted by multiplying by the
value b and adding the value a, then the mean is affected as
follows:

 The mean of the sum of two random variables X and Y is the


sum of their means.
 μX+Y=μX+μY.

13
6/19/2020

Properties of Mean and SD


Example
 Suppose an individual plays a gambling game where it is
possible to lose Rs1.00, break even, win Rs3.00, or win
Rs10.00 each time she plays. The probability distribution for
each outcome is provided by the following table:
 Outcome -Rs1.00 Rs0.00 Rs3.00 Rs5.00
 Probability 0.30 0.40 0.20 0.10
 The mean outcome for this game is calculated as follows:
= (-1*.3) + (0*.4) + (3*.2) + (10*0.1)
 = -0.3 + 0.6 + 0.5 = 0.8.
In the long run, then, the player can expect to win about 80
paise playing this game -- the odds are in her favor.

Properties of Mean and SD


Example
 Suppose the casino realizes that it is losing money in the long
term and decides to adjust the payout levels by subtracting
Rs.1.00 from each prize. The new probability distribution for
each outcome is provided by the following table:
 Outcome -Rs1.00 Rs0.00 Rs3.00 Rs5.00
 Probability 0.30 0.40 0.20 0.10
 The new mean is (-2*0.3) + (-1*0.4) + (2*0.2) + (4*0.1)
= -0.6 + -0.4 + 0.4 + 0.4 = -0.2.
 This is equivalent to subtracting $1.00 from the original
value of the mean, 0.8 -1.00 = -0.2. With the new payouts,
the casino can expect to win 20 paise in the long run.

Good morning

14
7/3/2020

Good morning

Previous session
Joint
Distributions

Properties of
Joint discrete
Mean and
and Joint
variance of
continuous .
sample mean

Independence Marginal and


of random conditional
variables. densities.

This session
 Population and sample, random sample, parameters and
statistics
 Null and Alternate Hypothesis, level of significance,
 One sided and two sided tests of hypothesis on mean
 t-distribution,

1
7/3/2020

Sampling
 Population
 Sampling

 Sampling Techniques

Sampling
 Types of sampling

Sampling
 Types of sampling-Multistage

2
7/3/2020

Statistical Inference
 Statistical Inferences refers to the process of selecting and using a
sample statistic to draw inference about a population parameter. It
is concerned with using probability concept to deal with
uncertainly in decision making.
 Statistical Inference treats two different classes of problems
namely hypothesis testing and estimation.
 Hypothesis Testing:-
 Hypothesis Testing is to test some hypothesis about the
parent population from which the sample is drawn. It must be
noted test of hypothesis also includes test of significance.
 Estimation:-
 The estimation theory deals with defining estimators for
unknown population parameters on the basis of sample study.

Test of Hypothesis
 Parameter and Statistics:-
 The statistical constants of the population, namely mean
µ, variance σ2 which are usually referred to as parameters.
 Statistical measures computed from sample
observations alone eg. mean (X), variance (S2) etc. are usually
referred to as statistic.
 Hypothesis: Null and Alternate
 Null hypothesis H0: Veg and non-veg people are equally
populated in the village.
 Alternative hypothesis H1 : Veg and non-veg peoples are
not equally populated in the village

Test of Hypothesis
 Errors in sampling:-
 The main objective in sampling theory is to draw valid inferences about the
population parameters on the basis of the samples results. In practice we decide to
accept (or) to reject the lot after examining a sample from it. As such we have two types
of errors.
 (i) type I error and (ii) type II error
 Type I Error:-
 A type I error is committed by rejecting the null hypothesis when it is true. The
probability of committing a type I error is denoted by α,where
 α = prob (type I error)
 = prob. (rejecting H0/when H0 is true)
 Type II Error:-
 A type II error is committed by accepting the null hypothesis is when it is false.
The probability of committing a type II error is denoted by β.
 Where β = prob. (type II error)
 = prob. (accepting H0/when H0 is false

3
7/3/2020

Good morning

t distribution
 The t distribution has the following properties:
 The mean of the distribution is equal to 0 .
 The variance is equal to v / ( v - 2 ), where v is the degrees of freedom
(see last section) and v >2.
 The variance is always greater than 1, although it is close to 1 when
there are many degrees of freedom. With infinite degrees of freedom,
the t distribution is the same as the standard normal distribution.
 The following are the important Applications of the t-
distribution:
 Test of the Hypothesis of the population mean.
 Test of Hypothesis of the difference between the two means.
 Test of Hypothesis of the difference between two means with dependent
samples.
 Test of Hypothesis about the coefficient of correlation.

t distribution
 Student's t-distribution (or simply the t-distribution) is any
member of a family of continuous probability distributions that
arises when estimating the mean of a normally
distributed population in situations where the sample size is small
and population standard deviation is unknown.
X 
t
 n
 Definition. If Z ~ N(0, 1) and U ~ χ2(r) are independent, then
the random variable
Z
T
U /r
follows a t-distribution with r degrees of freedom. We
write T ~ t(r).

4
7/3/2020

Sampling
 Test of Significance
 Let us now discuss the various situations where we have
to apply different tests of significance. For the sake of
convenience and clarity these situations may be summed up
under the following 3 heads:
 test of significance for attributes
 test of significance for variables (large samples)
 test of significance for variables (small samples)

Sampling
 SMALL SAMPLES

 Defn:
 When the size of the sample (n) is less than 30, then the
sample is called a small sample.
 The following are some important tests for small
samples.
 student’s t-test
 f – test
 x – test

Sampling
 Degrees of Freedom:-
 Degrees of freedom is the no. of independent
observations in a set.
 By degrees of freedom we mean the no. of classes in
which the values can be assigned arbitrarily (or) at will
without violating the restrictions (or) limitations placed.
 Degrees of freedom = no. of groups – no. of constraints

5
7/3/2020

t TEST
 The t test is based on the assumption that we are comparing
means .
 The test statistic is defined by

| X  |
t where
S / n

 X  X 
2
S
n 1

t TEST-Example
 An outbreak of Salmonella-related illness was attributed to
ice cream produced at a certain factory. Scientists measured
the level of Salmonella in 9 randomly sampled batches of ice
cream. The levels (in MPN/g) were:
 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
 Is there evidence that the mean level of Salmonella in the ice
cream is greater than 0.3 MPN/g?

t TEST-Example

 Let be the mean level of Salmonella in all batches of ice


cream. Here the hypothesis of interest can be expressed as:
 H0:   0.3

 Ha:   0.3

6
7/3/2020

t Table
 t

t TEST-Example
 t = 2.2051,
CALCULATED t VALUE

t0.05,8  1.18 TABULATED t VALUE

 alternative hypothesis: true mean is greater than 0.3

t test
 Example:
 A manufacturer of a kind of bulb claims that his bulbs have a
mean life of 25 months with a standard deviation of 5
months. A random sample of 6 bulbs gave the following
lifetimes. Is the claim valid.
 24,26,30,20,20,18

7
7/3/2020

t test
 Step 1: Ho: There is no significant difference between the
sample mean and the population mean.
 Step 2: Dof = n-1 = 6-1 = 5; LOS = 5% =0.05.

t test

X 
t 
S/ n

S 
 [ x  x] 2

n 1

t test

X X- =x . x2
24 1 1
26 3 9
30 7 49
20 -3 9
20 -3 9
18 -5 25
138 102

8
7/3/2020

t test

X 
X 
138
 23
n 6

S 
x 2


102
 4.517
n 1 5

23  25
t 6  1.084
4.517

t test
 Calculated value t = 1.084
 Tabulated value t5,0.05 = 2.015
 Since CV < TV, accept Ho.
 Therefore, There is no significant difference
between the sample mean and the
population mean

9
7/3/2020

t TEST-Example
 An outbreak of Salmonella-related illness was attributed to
ice cream produced at a certain factory. Scientists measured
the level of Salmonella in 9 randomly sampled batches of ice
cream. The levels (in MPN/g) were:
 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
 Is there evidence that the mean level of Salmonella in the ice
cream is greater than 0.3 MPN/g?

t TEST-Example

 Let be the mean level of Salmonella in all batches of ice


cream. Here the hypothesis of interest can be expressed as:
 H0:   0.3
 Ha:   0.3

t TEST-Example
 t = 2.2051,
CALCULATED t VALUE

t0.05,8  1.18 TABULATED t VALUE

 alternative hypothesis: true mean is greater than 0.3

10
7/3/2020

Independent t Test
 Compares the difference between two means of two
independent groups.
 The comparison distribution is a difference between means to
a distribution of differences between means.
 Population of measures for Group 1 and Group 2
 Sample means from Group 1 and Group 2
 Population of differences between sample means of Group 1 and
Group 2

Independent t Test
Paired-Sample Independent t Test
 Two observations from each  Single observation from each
participant participant from two independent
groups
 The second observation is
 The observation from the second
dependent upon the first group is independent from the first
since they come from the since they come from different
same person. subjects.
 Comparing a mean  Comparing a the difference
between two means to a
difference to a distribution distribution of differences
of mean difference scores between mean scores .

Student’s t-test

The Student’s t-test compares the averages and standard deviations of two samples to see if there is a
significant difference between them.

We start by calculating a number, t

t can be calculated using the equation:

( x1 – x2 ) Where:
t= x1 is the mean of sample 1
(s1)2 (s2)2 s1 is the standard deviation of sample 1
+ n1 is the number of individuals in sample 1
n1 n2
x2 is the mean of sample 2
s2 is the standard deviation of sample 2
n2 is the number of individuals in sample 2

11
7/3/2020

Paired t test
 The t statistic for the paired t test is

d
t
Sd / n
where
d  X1 X 2
d isthe average of deviation
S d is the s tan dard deviation of the deviation

Two sample t test-example


 6 subjects were given a drug (treatment group) and an
additional 6 subjects a placebo (control group). Their
reaction time to a stimulus was measured (in ms).We want to
perform a two-sample t-test for comparing the means of the
treatment and control groups.
 Control = (91, 87, 99, 77, 88, 91)
 Treat = (101, 110, 103, 93, 99, 104)

Two sample t test-example


 A study was performed to test whether cars get better
mileage on premium gas than on regular gas. Each of 10 cars
was first filled with either regular or premium gas, decided
by a coin toss, and the mileage for that tank was recorded.
The mileage was recorded again for the same cars using the
other kind of gasoline. We use a paired t-test to determine
whether cars get significantly better mileage with premium
gas.
 reg = (16, 20, 21, 22, 23, 22, 27, 25, 27, 28)
 prem = (19, 22, 24, 24, 25, 25, 26, 26, 28, 32)

12
7/3/2020

Good morning

13
7/10/2020

Good morning

Previous session- Statistics


Population and
sample

Sampling
t-distribution, Techniques,
Mean parameters and
statistics

One sided and Null and


two sided tests Alternate
of hypothesis on Hypothesis, level
mean of significance,

This session
Statistics-
Continues

Central limit type I and II


theorem, errors,

Sampling
distribution of
x2-distribution
mean and
variance

1
7/10/2020

Recall
 Sampling
Population

Enumeration Sampling

Test of
Estimation
Hypotheses

Point Interval
Estimation Estimation

Statistical Inference
 Statistical Inferences refers to the process of selecting and using a
sample statistic to draw inference about a population parameter. It
is concerned with using probability concept to deal with
uncertainly in decision making.
 Statistical Inference treats two different classes of problems
namely hypothesis testing and estimation.
 Hypothesis Testing:-
 Hypothesis Testing is to test some hypothesis about the
parent population from which the sample is drawn. It must be
noted test of hypothesis also includes test of significance.
 Estimation:-
 The estimation theory deals with defining estimators for
unknown population parameters on the basis of sample study.

Test of Hypothesis
 Parameter and Statistics:-
 The statistical constants of the population, namely mean
µ, variance σ2 which are usually referred to as parameters.
 Statistical measures computed from sample
observations alone eg. mean (X), variance (S2) etc. are usually
referred to as statistic.
 Hypothesis: Null and Alternate
 Null hypothesis H0: Veg and non-veg people are equally
populated in the village.
 Alternative hypothesis H1 : Veg and non-veg peoples are
not equally populated in the village

2
7/10/2020

Test of Hypothesis
 Errors in sampling:-
 The main objective in sampling theory is to draw valid inferences about the
population parameters on the basis of the samples results. In practice we decide to
accept (or) to reject the lot after examining a sample from it. As such we have two types
of errors.
 (i) type I error and (ii) type II error
 Type I Error:-
 A type I error is committed by rejecting the null hypothesis when it is true. The
probability of committing a type I error is denoted by α,where
 α = prob (type I error)
 = prob. (rejecting H0/when H0 is true)
 Type II Error:-
 A type II error is committed by accepting the null hypothesis is when it is false.
The probability of committing a type II error is denoted by β.
 Where β = prob. (type II error)
 = prob. (accepting H0/when H0 is false

Sampling
 Errors in sampling:-

Sampling
 Critical Region:-
 A region corresponding to a statistic in the sample space S
which lead to the rejection of H0 is called Critical Region (Or)
Rejection Region.
 Those region which to the acceptance of H0 give us a region
called acceptance region.
 Level of Signifance:-
 The probability α that a random value of the statistic ‘t'
belongs to the critical region is known as the level of significance.
In other words, level of significance is the size of the type I error.
The level of significance usually employed in testing of hypothesis
are 5% and 1%.

3
7/10/2020

Sampling
 One tailed test:-
 A test of any statistical hypothesis where the alternative
hypothesis is one tailed (right tailed (or) left tailed) is called a
one tailed test.
 Thus in a one tailed test, the rejection region will be
located in only one tail which may be depending upon the
alternative hypothesis formulated.
 We assume that the null hypothesis
 H0 : = 0 against the alternative hypothesis
 H1 : > 0 (right tailed)
 H1 : < 0 (left tailed) is called one tailed test

Sampling
 Two tailed test:-
 In a two tailed test the rejection region is located in
both the tails.
 In a test of statistical hypothesis where the alternative
hypothesis is two tailed, we assume that the null hypothesis.

 H0 : = 0
 H1 : ≠ o [ > 0 (or) < 0 ]

Sampling
 Procedure for testing of hypothesis :-
 Set up the null hypothesis.
 Choose the appropriate level of significance (either 5% or
1% level) and find the Degree of freedom.
 Compute the test statistic and find the Table value
 We compare the calculated value and tabulated value.
 If C.V. < T.V, H0 is accepted at 5% or 1%
 C.V. > T.V, H0 is rejected at 5% or 1%

4
7/10/2020

Sampling
 Test of Significance
 Let us now discuss the various situations where we have
to apply different tests of significance. For the sake of
convenience and clarity these situations may be summed up
under the following 3 heads:
 test of significance for attributes
 test of significance for variables (large samples)
 test of significance for variables (small samples)

Sampling
 SMALL SAMPLES

 Defn:
 When the size of the sample (n) is less than 30, then the
sample is called a small sample.
 The following are some important tests for small
samples.
 student’s t-test
 f – test
 x – test

Sampling
 Degrees of Freedom:-
 Degrees of freedom is the no. of independent
observations in a set.
 By degrees of freedom we mean the no. of classes in
which the values can be assigned arbitrarily (or) at will
without violating the restrictions (or) limitations placed.
 Degrees of freedom = no. of groups – no. of constraints

5
7/10/2020

Good morning

Sampling distribution
 The distribution of a statistic(measure) obtained from
repeated random sampling from a population is a sampling
distribution.
 If we draw 10 random samples from a data(population),
calculate mean for each sample, then the sequence of the 10
means is a sampling distribution of means. Likewise for other
measures.

Sampling distribution
 Consider , tossing a dice .
The mean of a single throw
is (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5.
When we increase the sample size
to 2,3…, there is movement
towards normality

6
7/10/2020

Sampling distribution
 The Sampling distribution of mean

Pumpkin A B C D E F

Weight
(in 19 14 15 9 10 17
pounds)

Good morning

Chi-square χ2 Distribution
 A standard normal deviate is a random sample from the standard
normal distribution. The Chi Square distribution is the distribution
of the sum of squared standard normal deviates. The degrees of
freedom of the distribution is equal to the number of standard
normal deviates being summed. Therefore, Chi Square with one
degree of freedom, written as χ2(1), is simply the distribution of a
single normal deviate squared.
 Consider the following problem: you sample two scores from a
standard normal distribution, square each score, and sum the
squares. What is the probability that the sum of these two squares
will be six or higher? Since two scores are sampled, the answer can
be found using the Chi Square distribution with two degrees of
freedom.

7
7/10/2020

Chi-square χ2 Distribution
 The mean of a Chi Square
distribution is its degrees
of freedom. As the degrees
of freedom increases,
the Chi Square distribution
approaches a
normal distribution.

Chi square test


 The χ2-test is one of the simplest and most widely used
non-parametric tests in statistical work. The quantityχ2
describes the magnitude of the discrepancy between
theory and observation. It is defined as
[O  E ]2
2  
E
O  observedfrequency
E  exp ectedfrequ ency
 It is generally applied to check Uniformity and
independence

Chi square test


Goodness of fit
 The number of mistakes in a page for a sample of 6 random
pages are 5,8,6,7,9,7. Are they uniformly distributed.
 Solution:
 Step 1: Ho: The mistakes are uniformly distributed.
 Ho: There is no significant difference between the observed
frequency and the expected frequency.
 Step 2:

 Dof = n-1 = 6-1 = 5; LOS = 5% =0.05.

8
7/10/2020

Chi square test


Goodness of fit
 Step3: The calculated value.
 The expected frequency is E=[5+8+6+7+9+7] / 6 =7
O E O-E [O-E ]2 {[O-E]2}/E
5 7 -2 4 4/7
8 7 1 1 1/7
6 7 -1 1 1/7
7 7 0 0 0/7
9 7 2 4 4/7
7 7 0 0 0/7
c.v χ2 =10/7

Chi square test


Goodness of fit
 Step4: The tabulated value χ2=11.07(dof=5,los=0.05)

Chi square test


Goodness of fit
 Step5: Since the calculated value of χ2= 1.45 is less than the
tabulated value χ2=11.07, accept H0.
 Therefore, The mistakes are uniformly distributed.

9
7/10/2020

Chi square
Independence of Attributes
 Chi-square test for independence of attributes:-
 Defn: Literally, an attribute means a quality (or)
characteristic.
 Ex: Sincere, honesty etc.
 An attribute may be marked by its presence (or)
absence in a number of a given population

Chi square
Independence of Attributes
 Two characters A and B are considered means, we ‘ll
have a 2 x 2 contingency table of observed frequencies

A a1 a2
B
b1 a b Row total
b2 C d Row Total
Column Total Column Total Grand Total

 Expected frequency= [row total x column total]/ N

Chi square
Independence of Attributes
 Null Hypothesis. H0: Attributes are independent
 Alternative Hypothesis.
 H1: Attributes are not independent
 D.o.f. r = (c-1) * (r-1)
 Where
 c= no. of columns
 r= no. of rows

10
7/10/2020

Chi square
Independence of Attributes-Example
 In an anti-malarial campaign in an area, quinine was
administered to 812 persons out of a total population of
3248. The number of fever cases is below. Discuss the
usefulness of quinine in the campaign.

Side Effect Fever No Fever


Treatment
Quinine 20 792 812
No Quinine 220 2216 2436
240 3008 3248

Chi square
Independence of Attributes-Example
 Step 1: Ho: Quinine is not effective.
 Step 2: Dof = [r-1][c-1]
 =[2-1][2-1] = 1;
 LOS = 5% =0.05.
 Step3: Calculated Value.
 The expected Frequencies are
Side Effect Fever No Fever
Treatment
Quinine [ 812 *240]/ [812*3008]/32 812
3248 =60 48=752
No Quinine [2436/240]/32 [2436*3008]/3 2436
48=180 248=2256
240 3008 3248

Chi square
Independence of Attributes-Example
 The chi-square table

O E O-E [O-E ]2 {[O-E]2}/E


20 60 -40 1600 1600/60
220 180 40 1600 1600/180
792 752 40 1600 1600/752
2216 2256 -40 1600 1600/2256
c.v χ2 =38.39

11
7/10/2020

Chi square
Independence of Attributes-Example
 Calculated value 2 =38.39

 Tabulated value 21, 0.05 = 3.84

 Since CV> TV, Reject Ho.

 Therefore,
 Quinine is effective.?

Good morning

Central Limit Theorem


 Given a sequence of random variables with a mean and a
variance, the CLT says that the sample average has a
distribution which is approximately Normal, and gives the
new mean and variance.
X 1 , , X n ;  ; 2
 X n ~ N (  ,  2 / n)

 Notice that nothing at all need be assumed about the P, CDF,


or PDF associated with X, which could have any distribution
from which a mean and variance can be derived.

12
7/10/2020

Central Limit Theorem


 The lifetime of a certain type of bulb for each plant of a company
is a random variable with mean 1200 hrs and standard deviation
250 hrs. Using central limit theorem, find the probability that the
average lifetime of 60 bulbs of the company exceeds 1250 hrs.
 Let Xi= Lifetime of bulbs from plant I
 Then is the mean of the lifetime of bulbs of the company. By
CLT,
X
250
X follows N (1200, )
60

 
 X  1200 1250  1200 
P[ X  1250]  P   
  250   250  
  60    
    60  
60
 P[ z  ]  P[ z  1.55]  0.06
5

Good morning

13
Probability and Statistics

Dr.J.Vijayarangam
BITS Pilani jvijayarangam@wilp.bits-pilani.ac.in
Pilani|Dubai|Goa|Hyderabad
Previous Session

 Types of errors
 Sampling Distributions-Mean and Variance
 Chi-Square Distribution
 Central Limit Theorem
This Session
Point estimates
of mean and
variance,
maximum
likelihood

Test for Interval


Proportions. estimation,

Confidence
intervals
Estimation
In studying a population, estimation is a route and it is about
estimating the parameter values of the population using the
statistic value of a sample.

There are two types of Estimations


Point Estimation and Interval Estimation.

Point Estimates try to get a single value for the parameter


whereas the Interval estimation gets a Confidence Interval.

Methods: Least squares and Maximum likelihood.


Estimation
Properties of Estimators:

Unbiasedness: Expected value of estimator equals parameter.

Consistency: Estimator approaches parameter as n gets larger.

Efficiency: Between two estimators. One with smaller variance is


efficient.

Sufficiency: Conveys as much information as is possible about


parameter from the sample
Estimation
Point Estimation-Methods
Method of Moments
Method of Maximum Likelihood Estimation
Method of Minimum Variance
Method of Least Squares
Point Estimator
A point estimator is any function T (Y1,Y2, ..,YN ) of a sample.
Any statistic is a point estimator.

Assume that Y1,Y2, ..,YN are i.i.d. N( m, σ2) random variables.


The sample mean (or average)
N
1
Yn 
N
Y
i 1
i

is a point estimator (or an estimator) of m.


Point Estimation
Sample mean X , is the point estimator for the mean.
Point estimator for the variance is the mean of the random
variable Y  ( X   ) 2 , given by
1 n
ˆ   ( X k   ) 2
2

n k 1
Point Estimation
One measure for the quality of an estimator X^ is its bias or how far
off its estimate is on average from the true value XX:
bias(X^)=E[X^]−X
where the expected value is over the randomness involved in X^.

However, even if we have an unbiased estimator, its individual


estimates can still be far off from the true value. To quantify how
consistently an estimator is close to the true value, another statistic
is required. Commonly, the variance of the estimator is considered
here:
Var[X^]=E[(X^−X)2]
It is defined as the mean squared distance between the estimate and
the value to be estimated.
Point Estimation
Bias-Variance Tradeoff
Statistical models can be seen as estimators. They use
observations, or data, to make predictions.

If the model has a high bias, its predictions are off, which
corresponds to underfitting.

If overfitting occurred, i.e. the data is matched too well, the


estimates have a high variance.
Point Estimation
Bias-Variance Tradeoff
In other words, we accept a certain bias of the model to keep its variance
low. A good tradeoff between the two needs to be achieved.
Point Estimation
Bias-Variance Tradeoff
In other words, we accept a certain bias of the model to keep its variance
low. A good tradeoff between the two needs to be achieved.
Point Estimation
Example
Let T be the time that is needed for a specific task in a factory to be
completed. In order to estimate the mean and variance of T, we observe a
random sample T1,T2,⋯⋯,T6. Thus, Ti's are i.i.d. and have the same
distribution as T. We obtain the following values (in minutes):
18,21,17,16,24,20.18,21,17,16,24,20.
Find the values of the sample mean, the sample variance, and the sample
standard deviation for the observed sample.

18  21  17  16  24  20
T   19.33
6

n
1
S2  
6  1 k 1
( X k  19.33) 2
 8.67
Maximum Likelihood
The likelihood of a set of data is the probability of obtaining
that particular set of data, given the chosen probability
distribution model.
This expression contains the unknown model parameters. The
values of these parameters that maximize the sample
likelihood are known as the Maximum Likelihood
Estimates or MLEs.
Likelihood(θ )= probability of observing the given data as a
function of ‘θ ’.
Maximum Likelihood
The maximum likelihood estimate (mle) of θ is that value of θ
that maximises likelihood(θ).
It is defined as n
L( )   f ( xi /  )
i 1
n
log L( )   log f ( xi /  )
i 1
Maximum Likelihood
Example 1
MLE P( x |  ,  )  L(  ,  | x) 
Suppose we have x=32. If we assume mean=28 and SD=2, then , the above
equation gives L=0.03
Maximum Likelihood
Example2
Consider a sample 0,1,0,0,1,0 from a binomial distribution, with the form
P[X=0]=(1-p), P[X=1]=p. Find the maximum likelihood estimate of p.

Soln :
L(p)=P[X=0] P[X=1] P[X=0] P[X=0] P[X=1] P[X=0]
=(1-p) p (1-p) (1-p) p (1-p)
=(1-p)3p2.
Log L(p)=log[(1-p)3p2.]=3log(1-p)+2logp
LogL( p) 3 2 3p  2  2 p
0 means,   0   0  p  2/5
p 1 p p p(1  p)

That is , there is 2/5 chance to observe this sample if we believe the


population to be Binomial distributed .
Interval Estimation
In brief , we can say
Interval estimate = Point estimate ± Margin of Error
Margin of Error = Critical Value * Standard Error of the statistic

Let xi, i = 1, 2, … n be a random sample of size n from f(x,θ). If T1(x) and


T2(x) be any two statistics such that T 1(x) ≤ T2(x) then,
P(T1(x) < θ < T2(x)) = 1 – α
where α is level of significance, then the random interval (T 1(x), T2(x)) is
called 100(1-α)% confidence interval for θ.
Here, T1 is called lower confidence limit and T2 is called upper confidence
limit. (1-α) is called the confidence coefficient.
Confidence Interval
a confidence interval is a type of estimate computed from the
statistics of the observed data.
Confidence Interval
Example
Consider a sample 3.7,5.3,4.7,3.3,5.3,5.1,5.1,4.9,4.2,5.7 .
The average is 4.73 and SD=0.766.
If we go for 90%confidence interval, then α=10. The confidence
interval for the average is given by
S S
X  t / 2,dof    X  t / 2,dof
n n
So,
0.766 0.766
4.73  1.83 *    4.73  1.83 *
10 10
4.73  0.4433    4.73  0.4433
4.2867    5.1733
BITS Pilani
Pilani Campus

Tests for Attributes


Test For Attributes
Based on attributes(qualities) we may have three types of tests
– Test for number of successes
– Test for proportion of successes
– Test for difference between proportions
Test For Attributes
Number of successes
Procedure:
Step1: Form the Hypothesis
Step2: Find the standard error
Step3: Find Difference/S.E
Step4:
If step3 value <1.96, Accept H0, else reject H0 at 5% LOS.
If step3 value <2.58, Accept H0, else reject H0 at 1% LOS.
Test For Attributes
Number of successes-Example
A coin was tossed 400 times and head appeared 216 times.
Test the hypothesis that the coin is unbiased.
Solution:
Step1: Coin is unbiased.
Step2: SE=sqrt(npq)[ as we are in Binomial distribution domain]
= sqrt(400*1/2*1/2)=10
Step3: Diff/S.E=(216-200)/10=1.6
Step 4: As Diff/S.E is <1.96, accept H0 at 5% LOS.
Tests for proportions
Just think about the questions,
Is the proportion of babies born male different from .50?
Are more than 80% of Indians right handed?
Is the percentage of customers who prefer chocolate ice cream
over vanilla less than 80%?
Tests for proportions
z test
Recalling Hypotheses,
Tests for proportions
z test
Test Statistic, pˆ  p0
z
p0 (1  p0 )
n
Where
pcap = sample proportion
p0=population proportion
n= sample size
Tests for proportions
z test-Example 1
Is the proportion of babies born male different from .50?In a
sample of 200 babies, 96 were male.
Is the proportion of babies born male different from .50?
H0:p=.50
Ha:p≠.50

pˆ=96/200=.48, p0=.50, n=200


Tests for proportions
z test-Example 1
Using the standard normal distribution, we want to find the
probability of obtaining a z score of -0.566 or more extreme
(i.e., less than -0.566).
Tests for proportions
z test-Example 1
P(z<−0.566)=.2843
Because this is a two-tailed test we must take into account both
the left and right tails. To do so, we multiply the value above
by two. (p=.2843+.2843=.5686)
Our p-value is .5686
Since Our p-value [.5686]
is > o.o5, accept
the null hypothesis.
Tests for proportions
z test-Example 2
Are more than 80% of Indian right handed?
In a sample of 100 Indians, 87 were right handed.
H0: p=.80
Ha: p>.80

p0=.80, n=100; pˆ=87/100=.87,


Tests for proportions
z test-Example 2
According to the table,
P(z<1.75)=.9599.
Therefore,
P(z≥1.75)=1−.9599
=.0401
Tests for proportions
z test-Example 2
Since, p(0.04)≤0.05, reject the null hypothesis
BITS Pilani
Pilani Campus

Tests for Two proportions


Tests for two proportions
For a right or left-tailed test, a minimum of 10 successes and 10 failures in
each group are necessary (i.e., np≥10 and n(1−p)≥10).
Two-tailed tests are more robust and require only a minimum of 5 successes
and 5 failures in each group.
The two groups that are being compared must be unpaired and unrelated
(i.e., independent).
Tests for two proportions
Pooled sample proportion.
Since the null hypothesis states that P1=P2, we use a pooled
sample proportion (p) to compute the standard error of the
sampling distribution.
p = (p1 * n1 + p2 * n2) / (n1 + n2)
where p1 is the sample proportion from population 1,
p2 is the sample proportion from population 2,
n1 is the size of sample 1, and
n2 is the size of sample 2.
Tests for two proportions
Standard error. Compute the standard error (SE) of the
sampling distribution difference between two proportions.

SE = sqrt{ p * ( 1 - p ) * [ (1/n1)+ (1/n2) ] }


where
p is the pooled sample proportion,
n1 is the size of sample 1, and
n2 is the size of sample 2.
Tests for two proportions
Test statistic. The test statistic is a z-score (z) defined by the
following equation.
Z=(p1-p2)/SE
where
p1 is the proportion from sample 1,
p2 is the proportion from sample 2, and
SE is the standard error of the sampling distribution.

P-value. The P-value is the probability of observing a sample


statistic as extreme as the test statistic. Since the test statistic
is a z-score, use the Normal Distribution table to assess the
probability associated with the z-score.
Tests for two proportions
Example 1
Suppose the Acme Drug Company develops a new drug,
designed to prevent colds. The company states that the drug
is equally effective for men and women. To test this claim,
they choose a simple random sample of 100 women and 200
men from a population of 100,000 volunteers.
At the end of the study, 38% of the women caught a cold; and
51% of the men caught a cold. Based on these findings, can
we reject the company's claim that the drug is equally
effective for men and women? Use a 0.05 level of
significance.
Tests for two proportions
Example 1
State the hypotheses. The first step is to state the null
hypothesis and an alternative hypothesis.
Null hypothesis: P1 = P2
Alternative hypothesis: P1 ≠ P2
Note that these hypotheses constitute a two-tailed test. The null
hypothesis will be rejected if the proportion from population 1
is too big or if it is too small.
Tests for two proportions
Example 1
Using sample data, we calculate the pooled sample proportion (p) and
the standard error (SE). Using those measures, we compute the z-
score test statistic (z).
p = (p1 * n1 + p2 * n2) / (n1 + n2)
= [(0.38 * 100) + (0.51 * 200)] / (100 + 200) = 140/300 = 0.467
SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ]
= sqrt [0.003733] = 0.061
z = (p1 - p2) / SE = (0.38 - 0.51)/0.061 = -2.13
where p1 is the sample proportion in sample 1, where p2 is the sample
proportion in sample 2, n1 is the size of sample 1, and n2 is the size
of sample 2.
.
Tests for two proportions
Example 1
Since we have a two-tailed test, the P-value is the probability
that the z-score is less than -2.13 or greater than 2.13.
We use the Normal Distribution table to find
P(z < -2.13) = 0.017, and P(z > 2.13) = 0.017.
Thus, the P-value = 0.017 + 0.017 = 0.034.
Since the P-value (0.034) is less than the significance level
(0.05), reject the null hypothesis.

.
Tests for two proportions
Example 2
Suppose the previous example is stated a little bit differently.
Suppose the Acme Drug Company develops a new drug,
designed to prevent colds. The company states that the drug
is more effective for women than for men. To test this claim,
they choose a a simple random sample of 100 women and
200 men from a population of 100,000 volunteers.
At the end of the study, 38% of the women caught a cold; and
51% of the men caught a cold. Based on these findings, can
we conclude that the drug is more effective for women than
for men? Use a 0.01 level of significance.
Tests for two proportions
Example 2
The first step is to state the null hypothesis and an alternative
hypothesis.
Null hypothesis: P1 >= P2
Alternative hypothesis: P1 < P2
Note that these hypotheses constitute a one-tailed test. The null
hypothesis will be rejected if the proportion of women
catching cold (p1) is sufficiently smaller than the proportion of
men catching cold (p2).
Tests for two proportions
Example 2
Using sample data, we calculate the pooled sample proportion
(p) and the standard error (SE). Using those measures, we
compute the z-score test statistic (z).
p = (p1 * n1 + p2 * n2) / (n1 + n2) = [(0.38 * 100) + (0.51 * 200)] /
(100 + 200) = 140/300 = 0.467

SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }


SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ] = sqrt
[0.003733] = 0.061

z = (p1 - p2) / SE = (0.38 - 0.51)/0.061 = -2.13


where p1 is the sample proportion in sample 1, where p 2 is the
sample proportion in sample 2, n1 is the size of sample 1, and
n2 is the size of sample 2.
Tests for two proportions
Example 2
Since we have a one-tailed test, the P-value is the probability
that the z-score is less than -2.13. We use the Normal
Distribution table to find P(z < -2.13) = 0.017.
Thus, the P-value = 0.017.

Since the P-value (0.017) is greater than the significance level


(0.01), accept the null hypothesis.
Exercise
A Manufacturer claims only 4% of apples of his are defective.
From a random sample of 600, 36 are defective. Is his claim
true?

What test shall we apply?


Exercise
A Manufacturer claims only 4% of apples of his are defective. From a
random sample of 600, 36 are defective. Is his claim true?

Solution:
Step1: 4% apples are defective.
Step2: Standard error pq 0.96 * 0.04
S.E    0.008
n 600
Step3: 95% CI is
X  1.96 S .E  0.96  1.96 * 0.008
 0.9443 to 0.9757
Step 4: With n=600, the boundary [for good ones] is
[0.9443*600, 0.9757*600]=[567,585]
Since the number of defectives is [15,33] and 36 is outside, reject H0
THANKS
Probability and Statistics

Dr.J.Vijayarangam
BITS Pilani jvijayarangam@wilp.bits-pilani.ac.in
Pilani|Dubai|Goa|Hyderabad
Previous Session
Point estimates
of mean and
variance,
maximum
likelihood

Test for Interval


Attributes. estimation,

Confidence
intervals
This Session
Correlation
Regression
Tests of Hypothesis for correlation
BITS Pilani
Pilani Campus

Correlation
Correlation
Measures the degree of association between two interval
scaled variables analysis of the relationship between two
quantitative outcomes , e.g., height and weight,
Correlation
Graphically, we plot them ,called Scatter plot to find correlation.
Correlation
Graphically, we plot them ,called Scatter plot to find correlation.
Correlation
Karl Pearson’s correlation coefficient
Karl Peason’s, N  XY   X  Y
r
N  X 2   X  N  Y 2   Y 
2 2

r lies between -1 and 1.


Values near 0 means no (linear)
correlation and values near ± 1 means very strong
correlation.
Karl Pearson’s correlation coefficient
Example
N  XY   X  Y
Find correlation between X and Y, r
N  X 2   X  N  Y 2   Y 
2 2

X 1 2 3 4 5
Y 2 5 3 8 7

X Y XY x2 y2
1 2 2 1 4
2 5 10 4 25
3 3 9 9 9
4 8 32 16 64
5 7 35 25 49
15 25 88 55 151
5 * 88  15 * 25
r  0.8062
5 * 55  (15) 2 * 5 *151  (25) 2
As r value is positive, they are positively correlated.
Correlation
Spearman’s correlation coefficient(ranks)
Spearman's rank correlation coefficient or Spearman's rho, is
a measure of statistical dependence between two variables
based on ranks or relative values
Spearman’s correlation coefficient(ranks)
Example
Find Spearman's rank correlation between IQ(X) and hours
spent in TV(Y)

=-0.1757
Correlation
Assumptions
Assumption 1: The correlation coefficient r assumes that the two
variables measured form a bivariate normal distribution
population.
Assumption 2: The correlation coefficient r measures only linear
associations: how nearly the data falls on a straight line. It is
not a good summary of the association if the scatterplot has a
nonlinear (curved) pattern.
Assumption 3: The correlation coefficient r is not a good
summary of association if the data are
heteroscedastic.(when random variables have the same finite
variance. It is also known as homogenity of variance)
Assumption 4: The correlation coefficient r is not a good
summary of association if the data have outliers.
BITS Pilani
Pilani Campus

Regression
Regression
Regression follows correlation in identifying the causal
relationship between the two correlated variables.
The dependence of dependent variable Y on the independent
variable X.
Relationship is summarized by a regression equation.
y = a + bx
a=intercept at y axis
b=regression coefficient
Regression
The line of regression is the line which gives the best estimate
to the value of one variable for any specific value of the other
variable. Thus the line of regression is the line of “best fit” and
is Obtained by the principle of least squares.
This principle consists in minimizing the sum of the squares of
the deviations of the actual values of y from their estimate
values given by the line of best fit
Regression
The principle of least squares.
This principle consists in minimizing the sum of the squares of
the deviations of the actual values of y from their estimate
values given by the line of best fit
Regression
The principle of least squares.
This principle consists in minimizing the sum of the squares of
the deviations of the actual values of y from their estimate
values given by the line of best fit
Regression
Procedure
Step1: Write the normal equations for the regression line
y=mx+c as
 
y  m x  nc
 xy  m x 2
 c x

Step2: Form the regression table to get the values.

Step3: Substitute the values in normal equations, solve them to


find ‘m’ and ‘c’ to fit the line
Regression
Example
X 1 2 3 4 5
Find the regression line
Y 2 5 3 8 7

Step1: Write the normal equations for the regression line.


Step2: Form the computation table.
X Y XY x2
1 2 2 1
2 5 10 4
3 3 9 9
4 8 32 16
5 7 35 25
15 25 88 55
Regression
Example
Step3: Substitute the values in normal equations, solve them to
find ‘m’ and ‘c’ to fit the line

 y  m x  nc  25  15m  5c
 xy  m x  c x  88  55m  15c
2

Solving them simultaneously, we get the regression line of “y on


x” as
y=1.1+1.3x
BITS Pilani
Pilani Campus

Test for Correlation


Test for correlation
The test for correlation is t with r
t n2
1 r 2

Where r is the correlation and DOF= n-2.


Test for correlation
Example
A study of the heights of 18 pairs husbands and wives in a
company has a correlation of 0.52. Apply t test to check
whether it is significant.
Solution:
H0: There is no significant difference in the correlation
coefficient .
r 0.52
t n2  18  2  2.44
1 r 2
1  0.52 2

For DOF= n-2=18-2=16 and LOS=5%, TV, t16,0.05=2.12.


Since CV>TV, reject H0
BITS Pilani
Pilani Campus

Correlation- Other methods


Correlation-Other Methods
Covariance/Concurrent deviation
Covariance based

Concurrent Deviation Method


C is no of positive signs
in dx dy column
m is no of pairs of observations.
Correlation
Covariance Method
x <- c(25, 27, 29) y <- c(5, 15, 9)
xdev <- x - mean(x) ydev <- y - mean(y)
xdev_ydev <- xdev * ydev
sum_xdev_ydev <- sum(xdev_ydev)
cov_xy <- sum_xdev_ydev / (3 - 1)
cov_xy 4
stnd.dev <- sd(x)*sd(y)
Corr<-cov_xy/stnd.dev
Corr=0.39
Correlation
Concurrent Deviation Method
Example

2*2  7
rc     0.65
7
BITS Pilani
Pilani Campus

The Syllabus
Probability and Statistics
Probability
Random Variable
Probability distributions
Joint Distributions(Bi-variate)
Sampling
Tests of Hypotheses
Estimation
Correlation
Regression
Others:
Chebyshev’s Theorem, Poisson Processes, Simulation,
Central limit theorem
THANKS

You might also like