# Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.

com

Probability theory and applications For randomlyoccurring events, we would like to know how many times we get a desired result out of total trials. This means we would like to know the fraction of favourable events or trails. Suppose, we flip a coin a few number of times. We may count how many times there is a ³Head´ or a ³Tail´ out of all the flips. Let, = No. of favourable events and = Total no. of events.

= fraction of favourable events. We can also say this is relative frequency in the usual language of Statistics. Now, if we do the trials a large number of times, this ratio event. This is the concept of probability. tends to a fixed value specific to the

Note: Total no. of trials is also called µsample space¶ when we are drawing samples out of total µpopulation¶. As the no. of trials is increased, the sample space becomes bigger. Definition of Probability: Probability is the ratio of number of favourable events to the total number of events, provided the total number of events is very large (actually infinity). , when So by definition, is a fraction between 0 and 1 : (infinity). .

No favourable outcome. All the outcomes are in favour. We can also think in the following way: probability of occurring an event, probability of not occurring the event. Since, either the event will occur or not occur, we must write:

Therefore, we have Example #1:

.

In a coin toss, we know from our experience, . So,
1

= .

and

=

=

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Example #2: In a throw of a dice, we know that the probability of the dice facing ³1´ up, ³2´ up, ³3´ up etc. will be , , and so on. Here, Probability of not occurring ³1´ = .

Note: The condition that the total probability of all the events has to be 1 is called normalization of probabilities. Rules of Probability: When more than one event takes place, we need to calculate the joint probability for the all the events. Mutually Exclusive Events Two events are mutually exclusive (or disjoint) when they can not occur at the same time. Suppose, two events are A and B and the individual probabilities for them are designated as and . Mutually exclusive means,   . Addition Rule:  

Example#1: The probability of occurring either Head or Tail in a coin toss,     .

Example#2: The probability of occurring either ³1¶ or ³6´ in a dice throw,

Independent Events When the occurrence of one event does not influence the other but they can occur at the same time, they are called independent. For example, the rain fall today and the Manchester United winning a match. Multiplication Rule: 


2

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Example#1: What is the probability that two Heads will occur when we toss two coins together? for the first coin and for the second coin.   . Note that if would flip a single coin two times and ask the probability of getting Heads twice, we would get the same answer. Example#2: Now we ask the question, what is the probability of getting one Head and one Tail in the flipping of two coins together? Consider, the probability of obtaining Head in the first coin and Tail in the second coin:       . . . And the probability of obtaining Tail in the first and the Head in the second: Now the total probability of above two events (either of them occurs mutually exclusively): Note that in the flipping of two coins together, there are 4 types of events, HH, HT, TH, TT. Out of which the relative occurrence of one Head and one Tail is 2/4 = /12. Events that are not Mutually Exclusive: If the events are not mutually exclusive, there are some overlap. Suppose, we designate an area A corresponding to the probability of some event A and the area B to the probability of another event B. The overlap between the two areas then represents the joint probability   . Note that for two independent events the overlap would be zero.

Fig.

Addition Rule in this case:    

Events that are not Independent: Multiplication rule: 


)

3

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

) ³The probability of B given A´. This is a conditional probability, i.e., the probability of occurring B provided A occurs first. Similarly, ) ³The probability of A given B´. Note here that )= , when B does not depend on A that means A and B are independent. )= , when A does not depend on B that means A and B are independent. So, we can write the formula for conditional probability:  

Now as examples, follow the following table: In a survey over 100 people, the question was asked whether they are graduate or not. Graduate Male Female Total 40 10 50 Nongraduate 20 30 50 Total 60 40 100

Q,1 What us the probability that a randomly selected person is a male?

¡ ¢ £

ns. ns. ns. ns.

Q.2 What is the probability that a randomly selected person is a female? Q.3 What is the probability that a randomly selected person is a male who is graduate? Q.4 What is the probability that a randomly selected person is a female who is non-graduate? Q.5 What is the probability that the randomly selected person is either a male graduate or a female non-graduate? Ans. This two events are mutually exclusive and by the law of addition,
4

.

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Q.6 If we now select two persons, what is the probability that one of them is a male graduate and another is a female non-graduate? ns. Two independent events are occurring together. So by the law of multiplication of . Q.7 What is the probability that a randomly selected no-graduate is a female?

¤ ¥ ¦

probabilities, ns.

Q.8 What is the probability that a randomly selected graduate is a male? ns. This is no. of male out of total graduates, .

Note: In Q.7&8, each probability is a conditional probability. However, we gave the answers by looking at the table directly. Now we answer them in terms of the law of conditional probability. Ans. to Q.8:Suppose, A = graduate, B = male, = probability of male given that they are graduates. We use the formula: 


Here,   

= Prob. of male graduates = 

,

.

Exercise:Q.7 can also be answered in terms of conditional probability formula. Do this and check yourself.

Q.9 What is the probability that the selected person is either male or graduate? Ans. Here the two events do not happen together but they are not mutually exclusive. So we use the formula:    = .

Probability Distributions Let us think of the probabilities for a number of events marked 1, 2, 3«..and so on. For each event we can have and also for all the events, . So we have a set of probabilities corresponding to a set of events. This collection is a probability distribution for all that discrete events. Fig. Suppose, instead of discrete events we think that is variable which can continuous values in and there is the probability for each value of . Now if we plot against , we get a
5

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

continuous curve which is the continuous probability distribution curve (commonly referred as the probability distribution curve). Fig.

Here the area under the curve is the following definite integral: = is the total probability for all the values between the two limits. That is why, is often referred to as the probability density. So, is the probability in between and , where is the infinitesimally small (smaller than you can think) range! Note that for discrete case, the above is the sum of all the mutually exclusive events. [ The sum,    Also, = (Normalization)

The above means that the total area under the curve (extended from negative infinity to positive infinity that means over the entire stretch of the curve.) is unity. This is true as in discrete casewe know that the sum of all the probabilities for all the events should be 1. For discrete events, we calculated the relative frequency and then the Bar diagram from them. Here for the continuous case, the bars merge together to form a continuous spectrum and that is the probability distribution. The relative frequencies tend to the probabilities for corresponding values of the variable for large number of events. Now given the probability distribution curve, we would like to know about the shape and size of the curve, some specific quantities that are representative of the character of the event. For any discrete set of data collection, we measure the central tendency of the data set. We calculate mean, mean of square and variance. Mean:

=
where

=

,
and we have total frequency,
, , and

is the frequency of occurrence for event
If the probabilities so on, we write Expectation: , ,

.

Mathematical Expectation and Mean

etc. are known for the values 6

, where

However, when instead of probabilities, we are given the frequencies , for the quantities that appear in a data set, we calculate

,

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Mean of Square: = Variance: Var( )

=

=
= =
is square root of the variance.

.

Standard deviation

Now for a large number of events each of the ratio corresponding probability :

in each of the above formulasbecomes the  

as tends to very large.

Therefore, we write the above quantities in terms of probabilities: Mean, Mean of Square, Variance, = 

=
=

Standard deviation

7

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Now we calculate the above quantities from the following dice throwing experiments. Example #1Throwing of a single dice: 1 1/6 1/6 1/6 2 1/6 2/6 4/6 3 1/6 3/6 9/6 4 1/6 4/6 16/6 5 1/6 5/6 25/6 6 1/6 6/6 36/6 Total 1 21/6 91/6

From the table, we can calculate mean, variance,

and

If we plot against , we obtain the probability distribution for this case. This distribution is uninteresting as we can check that the probabilities for all values of are same! The curve obtained by joining the points will be a horizontal straight line.

Fig.

Now we do this similar experiment with taking two dice. Example #2 (Two Dice) We look for the value of which is the sum of two numbers on thetop faces of the two dice. Here we shall have possible combinations of events and can have a minimum value, and maximum value, .
2 1/36 2/36 4/36 3 2/36 6/36 4 3/36 5 4/36 6 5/36 30/36 7 6/36 42/36 8 5/36 40/36 9 4/36 36/36 10 3/36 30/36 11 2/36 22/36 12 1/36 12/36 Total 1 252/36

12/36 20/36

18/36 48/36 100/36 180/36 294/36 320/36 324/36 300/36 242/36 144/36 1974/36

8

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Mean,

, Variance,

Now if we plot against taking from above table, we get an interesting symmetric distribution around a peak! The peak is at (mean value).

Fig.

We can go on doing such experiment 3 or more dice together and ask for the sum of values occurring on all the dice together and calculate the corresponding probabilities as above. We can realize that the distribution would be smoother retaining the symmetry with the peak value at the mean. In fact, the envelope of the probability values at different (joining the top of the height bars) of the discrete distribution will slowly assume a continuous symmetric curve! In the limit of large number of events obtained from the large number of dice throwing together, we tend to get a continuous bell shaped symmetric distribution. This is normal distribution. Fig.

For a large number of independent random events, the probability distribution is normal distribution. This is called Central Limit Theorem.

For any naturally occurring event, for any random measurement of any value in any experiment, the distribution that occurs is Normal distribution. The bell shaped symmetric curve is called Normal curve. If we calculate the height distribution or age distribution among a population, the probability distribution turns out to be Normal. The name µnormal¶ is given as it occurs normally.

9

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com Properties of Normal Distribution:

y y

Symmetric about mean (mean at the centre or at peak position) Approximate are under the curve: A = 68%, [within one standard deviation ( A = 95%, A = 99.7%,

from the mean (

on both sides]

Fig.
Mathematical Expression for Normal distribution:

,(1)
where mean, = standard deviation. The above expression is symmetric around the mean, ] . [The value of the exponential, y

Normal distributions are often referred by the symbol:

The total area under the curve, = 1. If we put

, we get

Thus we can write, the rescaled probability, (2) Now the above is a symmetric distribution around .

So the Normal distribution (1) has become a µZ-distribution´ in (2). This is nothing but a normal distribution with mean = 0 and standard deviation = 1.

10

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

We have to remember that the area under the curve between values of gives us the total probability: Area =

.

Now instead of actually doing the integration over , we are supplied with the -score and we find the area under the curve (hence the total probability) between two limits from the table. (See the z-score table.)
Consider the following typical situations where we have to calculate the areas from zdistribution:

y

Fig. (Total area under the curve = 1)

y

Fig. (Area between symmetry)

and

is 0.5 or area between

and

is 0.5 because of

y

Fig. (Area between

and any other value

)

y

Fig. (Area between two positive values of or between two negative values)

11

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

y

Fig. (Area between a negative value and a positive value)

y

Fig. (Area less than a negative or greater than a positive value)

Important:

In the z-score table we always look for the area between zero and any other value (as the integral is actually done that way). So, zero isalways the reference point. Finally, the area between any two values of is obtained by adding or subtracting the scores involving zero. This will be clear from the following examples. Examples: (Follow the z-score table) #1.In the Geography examination, the marks distribution is known to be Normal where the mean is52 and the standard deviation is 15. Determine the z-scores of students receiving marks: (i) 40, (ii) 95, (iii) 52. Solution: Here, , (i) (ii) (iii)   

So, we see the z-scores can be negative, positive or zero. #2. Find the area under the normal curve in each of the following cases: (i) and Area = 0.0349 from table.

Fig.

(ii)

and Area = 0.2518 Fig. (Note: The area is equal to the area between

and

as the curve is symmetric.)

12

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

(iii)

Area between

and 2.21

Fig. Area = (area between and 2.21) + (area between = 0.4856 + 0.1772 = 0.6637 (Note: The areas are added as they are on both sides of (iv) Area between and Fig. Required area = (area between and 1.94) (area between = 0.4738 0.2910 = 0.1828 (Note: There is the subtraction as the two areas are on the same side of (v) To the left of and 0.81) and -0.46)

.)

.)

Fig. Required area = 0.5 ± (area between = 0.5 ± 0.2257 = 0.2743 (vi) To the right of and )

Fig.

Required area = (area between and = (area between and = 0.3997 + 0.5 = 0.8997

) + 0.5 ) + 0.5

13

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

#3. Among 1000 students, the mean score in the final examination is 25 and the standard deviation is 4.0. Assume the distribution is Normal. Find the following. (a) How many students score between 22 and 27? =25, = 4.0 , Fig.

So the probability is the area under the curve between -0.75 and 0.5 = (area between 0 and -0.75) + (area between 0 and 0.5) = 0.2734 + 0.1915 = 0.4649 The number of students in this marks range = (b) How many students score above 30?  Fig. Probability = area right to = (area between 0 and 1.25) = 0.5 ± 0.3944 = 0.1056 The number of students = (c) How many students score below 15?

Fig.

Area = 0.5 ± (area between and -2.5) = 0.5 ± 0.4938 = 0.0062 The number of students =  (d) How many score 24? Here we have to calculate area between 23.5 and 24.5. , Area between and = (area between 0 and ) + (area between 0 and = 0.1480 ± 0.0517 = 0.0963
14

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

The number of students = 

.

Fig.

Symmetry of Distribution, Skewness

We have seen that a Normal distribution is symmetric around its peak (most probable value or the value for which the probability is the highest). In a symmetric distribution the mean, median and mode are at the same position. The skewness is any deviation from symmetry or we can say, lack of symmetry. Coefficient of skewness = 

The above coefficient can be positive or negative. Below are the two figures demonstrating the negative and positive skewness: the distributions are correspondingly called negative skewed and positively skewed distributions.

Figs.

(Negative Skewness: Mean < Mode) For a symmetric distribution, skewness is zero.

(Positive Skewness: Mean > Mode)

Note:The distribution we are discussing is a unimodal distribution that means a distribution which has a single mode or one peak. But in many practical cases, we can have a distribution with many peaks or many modes. For example, a distribution with two peaks (in fig.) is called a bimodal distribution.

Figs. (Bimodal distribution)
15

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Combination Rules:
When we scale a variable that is we multiply a variable by a number or add with this, we need to know how this scaled variable behaves. Do they have same statistical measures? Do they follow the same kind of distribution? Also, we ask the same question for two or more variables when scaled and added together to form a combined variable.

When y y When y y y When y y y Mean: Variance: If and are separately Normal distributions, distribution. Mean: Variance: If has a Normal distribution, Mean: Variance:

is also a Normal distribution.

is then also a Normal

Following combination rules in the above box, we can solve the following problem. Example: The weight of individual people follows Normal distribution, probability distribution of weight of 10 people taking together? , . Ans. Here, mean Mean weight of 10 people, + = = 40 Variance, «+ =

. What will be the

= 500

16

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

The probability distribution of weight of 10 people taking together,

.

Hypothesis Testing
What is Hypothesis?

On the basis of sample information, we make certain decisions about the population. In taking such decisions we make certain assumptions. These assumptions are known as statistical hypothesis. [ Note: A collected set of data points which is a part of the population (a few number of data) is called a sample. The process of selection is called sampling. When all the data are considered for a study, this is called population.]
How to test Hypothesis?

Assuming the hypothesis correct, we calculate the probability of getting the observed sample. If this probability is less than a certain assigned value, the hypothesis is rejected. If there is no significant difference between the observed value and the expected value, the hypothesis is called Null Hypothesis.
Test of significance:

The testswhich enable us to decide whether to accept or to reject the null hypothesis are called the tests of significance. If the differences between the sample values and the population values are significantly large it is to be rejected.

Fig.

17

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Student t-test:

Let be the elements of a set of data from a random sampling. The sampling is drawn from a population that is assumed to obey Normal distribution. Let = the actual mean of the distribution, = the sample mean. A parameter is calculated as following:

,
where Example: Q. The average life span of a citizen of India is 70. The average value obtained from a sample of 100 people is 75. The standard deviation is 40. Find if the claim is accepted using the level of significance of 0.05. Ans. Here , , , and sample size.

=

.

Now at the level of significance 0.05, we know (from the standard value) The calculated value of < the tabular value of t [ ] Thus the claim is accepted within 5% level of significance.
The  -test (Chi-square test):

Here we evaluate the following quantity:

,
Observed frequency Expected frequency
18

Lecture Notes by Dr. Abhijit Kar Gupta, kg.abhi@gmail.com

Now let us define another parameter called, µdegree of freedom¶. Degree of freedom = No. of independent observations = No. of observations ± No. of independent constraints. In practical calculations, we often estimate the degree of freedom from the number of columns ( and number of rows ( in a data table. Degree of freedom =

Example: Q. Given are the amounts of rainfall(in mm.) on different days in a week. Check if the rainfall is uniformly distributed over the week. Given that the is significant at 5, 6, 7 degrees of freedom are respectively 11.07, 12.59, 14.07 at the 5% level of significance. Day Rain fall (in mm.) 1 14 2 16 3 8 4 12 5 11 6 9 7 14

If the distribution is to be uniform the expected frequency has to be

. = 4.17

Here the degrees of freedom = and the tabulated value degrees of freedom is 12.59. As the calculated value 4.17 < 12.59, we can accept the claim. We then say Null Hypothesis.

for 6

19