You are on page 1of 32

A review of basic probability and statistics, random variables and their properties, Estimation of means variances and correlation.





Probability is the chance that something will happen how likely it is that some event will happen.

Sometimes you can measure a probability with a number: "10% chance of rain", or you can use words such as impossible, unlikely, possible, even chance, likely and certain.
Example: "It is unlikely to rain tomorrow".

Outcome: The end result of an experiment. For example, if the experiment consists of throwing a die, the outcome would be anyone of the six faces, F1,........,F6
Random experiment: An experiment whose outcomes are not known in advance. (e.g. tossing a coin, throwing a die, measuring the noise voltage at the terminals of a resistor etc.) Random event: A random event is an outcome or set of outcomes of a random experiment that share a common attribute. For example, considering the experiment of throwing a die, an event could be the 'face F1 ' or 'even indexed faces' (F2 , F4 , F6 ). We denote the events by upper case letters such as A, B or A1, A2 Sample space: The sample space of a random experiment is a mathematical abstraction used to represent all possible outcomes of the experiment. We denote the sample space by S .

Mutually exclusive (disjoint) events: Two events A and B are said to be mutually exclusive if they have no common elements (or outcomes).Hence if A and B are mutually exclusive, they cannot occur together. Union of events: The union of two events A and B , denoted A B , {also written as (A + B) or ( A or B )} is the set of all outcomes which belong to A or B or both. This concept can be generalized to the union of more than two events. Intersection of events: The intersection of two events, A and B , is the set of all outcomes which belong to A as well as B . The intersection of A and B is denoted by (A B) or simply (AB) . The intersection of A and B is also referred to as a joint event A and B . This concept can be generalized to the case of intersection of three or more events. Occurrence of an event: An event A of a random experiment is said to have occurred if the experiment terminates in an outcome that belongs to A.

Complement of an event: The complement of an event A, denoted by A is the event containing all points in S but not in A. Null event: The null event, denoted , is an event with no sample points. Thus = S (note that if A and B are disjoint events, then AB = and vice versa).

There are two approaches to probability of an event E

1. Classical Approach 2. Frequency Approach

Classical Approach:

i.e. The ratio of no. of ways an event can happen to the no. of ways sample space can happen, is the probability of the event. Classical Approach assumes that all outcomes are equally likely.

Example: If out all possible jumbles of the word BIRD, a random word is
picked, what is the probability, that this word will start with B.
n(E) = All possible jumbles of BIRD = 4! n(S) = Those jumbles starting with B = 3! =

in this problem


= ! =

Frequency Approach: Since sometimes all outcomes may not

be equally likely, a more general approaches is the Frequency Approach, where probability is defined as the relative frequency of occurrence of E.
P(E)=Lim n n(E)/( N)

Where N is the no. of times exp is performed and n(E) is the no. of times the event E occurs.
Example: From the following table find the probability of obtaining A grade in exam
Grade A NO. of students 10 Solution: B 20 C 30 D 40

N = Total no of students = 100 By frequency Approach, p(A Grade)=n(A Grade)/N=10/100=0.1

It is a rule that assigns a number to each of the outcomes of a random process. ex: roll a die assign the numbers 1, 2, 3, 4, 5, 6 as the random variable D ex: students attending class on a given day assign the number of people who actually attend as the random variable C ex: a person walking in the door assign 0 if the person is male, 1 if the person is female as the random variable G Once we have defined a random variable, we can examine the random process through its properties. The pattern of probabilities that are assigned to the values of the random variable is called the probability distribution of the random variable. ex: roll a die P(D = 1) = 1/6; P(D = 2) = 1/6; etc. ex: students attending class on a given day P(C = 0) = . . . ex: a person walking in the door P(G = 0) = .7; P(G = 1)

Types Of Random Variables:

Random variable may be discrete or continuous. Discrete Random Variable: A variable that can take one value from a discrete set of values. Example: Let x denotes sum of two dice, now x is a discrete random variable as it can take one value from the set (2,3,4,5,6,7,8,9,10,11,12) Since the sum of two dice can only be one of these values Continuous Random Variable: A variable that can take one volume form a continuous range of values. Example: x denotes the volume of Pepsi in a 500 ml cup. Now x may be a no. from 0 to 500, any of which value, x may take.

Statistics is a branch of mathematics which gives us the tools to deal with large quantities of data and drive meaningful conclusion about the data, to do this, Statistics uses some no. or measures which describes the general features contained in the data. In other words, using statistics, we can summaries large quantities of data, by a few descriptive measures. Two descriptive measures are often used to summarise data sets. These are 1. Measure Of Central Tendency 2. Measures Of Dispersion The central tendency measure indicates the average value of data, where Average is a generic term used to indicate a representative value that describes the general centre of the data.

The description measure characteristic the extent to which data items differ from the central tendency value. In other words dispersion measures and quantifies the variation in the data. The larger this no., the more the variation amongst the data items. Mean, Median and Mode are some examples of central tendency measures. Standard deviation, variance and coefficient of variance are examples of dispersion measures.

In statistics, mean has three related meanings 1. The arithmetic mean of a sample (distinguished from the geometric mean or harmonic mean). 2. The expected value of a random variable. 3. The mean of a probability distribution.

Examples of means:
1. Arithmetic mean (AM) 2. Geometric mean (GM) 3. Harmonic mean (HM)

Arithmetic mean (AM):

The arithmetic mean is the "standard" average, often simply called the "mean".

For example, the arithmetic mean of five values: 4, 36, 45, 50, 75 is

Geometric mean (GM):

The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth.

For example, the geometric mean of five values: 4, 36, 45, 50, 75 is:

Harmonic mean (HM) :

The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time).

For example, the harmonic mean of the five values: 4, 36, 45, 50, 75 is

Relationship between AM, GM, and HM:

AM, GM, and HM satisfy these inequalities:

Equality holds only when all the elements of the given sample are equal.

In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions. While other such approaches have been developed, those based on moments are advantageous in terms of mathematical and computational simplicity. The variance is a parameter describing in part either the actual probability distribution of an observed population of numbers, or the theoretical probability distribution of a sample (a not-fully-observed population) of numbers. In the latter case a sample of data from such a distribution can be used to construct an estimate of its variance: in the simplest cases this estimate can be the sample variance.

The variance of a random variable or distribution is the expectation, or mean, of the squared deviation of that variable from its expected value or mean. Thus the variance is a measure of the amount of variation of the values of that variable, taking account of all possible values and their probabilities or weightings (not just the extremes which give the range). For example, a perfect six-sided die, when thrown, has expected value of 1/6(1+2+3+4+5+6) = 3.5 Its expected absolute deviationthe mean of the equally likely absolute deviations from the meanis

1/6(|1-3.5|+|2-3.5|+|3-3.5|+|4-3.5|+|5-3.5|+|6-3.5|) = 1/6(2.5 + 1.5 + 0.5 + 0.5 + 1.5 + 2.5) = 1.5 But its expected squared deviationits variance (the mean of the equally likely squared deviations)is

As another example, if a coin is tossed twice, the number of heads is: 0 with probability 0.25, 1 with probability 0.5 and 2 with probability 0.25. Thus the expected value of the number of heads is:

and the variance is:

Units Of Measurement:
Unlike expected absolute deviation, the variance of a variable has units that are the square of the units of the variable itself. For example, a variable measured in inches will have a variance measured in square inches. For this reason, describing data sets via their standard deviation or root mean square deviation is often preferred over using the variance. In the dice example the standard deviation is 2.9 1.7, slightly larger than the expected absolute deviation of 1.5. The standard deviation and the expected absolute deviation can both be used as an indicator of the "spread" of a distribution. The standard deviation is more amenable to algebraic manipulation than the expected absolute deviation, and, together with variance and its generalization covariance, is used frequently in theoretical statistics; however the expected absolute deviation tends to be more robust as it is less sensitive to

Definition: If a random variable X has the expected value

(mean) = E[X], then the variance of X is the covariance of X with itself, given by:

That is, the variance is the expected value of the squared difference between the variable's realization and the variable's mean. This definition encompasses random variables that are discrete, continuous, neither, or mixed. From the corresponding expression for Covariance, it can be expanded:

A mnemonic for the above expression is "mean of square minus square of mean". The variance of random variable X is typically designated as Var(X), , or simply 2 (pronounced "sigma squared").

In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.
1. Correlation is Positive when the values increase together, and 2. Correlation is Negative when one value decreases as the other increases

Correlation can have a value:

1 is a perfect positive correlation 0 is no correlation (the values don't seem linked at all) -1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.

Example: Ice Cream Sales

The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days:

And here is the same data as a Scatter Plot:

You can easily see that warmer weather leads to more sales, the relationship is good but not perfect. In fact the correlation is 0.9575 ... see how I calculated it.

How To Calculate:
By using "Pearson's Correlation". There is software that can calculate it , such as the CORREL() function in Excel or Open Office Calc ...

how to calculate it without software:

Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales is y): Step 1: Find the mean of x, and the mean of y Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them "b") Step 3: Calculate: a b, a2 and b2 for every value Step 4: Sum up a b, sum up a2 and sum up b2 Step 5: Divide the sum of a b by the square root of [(sum of a2) (sum of b2)] Elementary Probability by David A. Santos An Introduction to Basic Statistics and Probability by Shenek Heyward Introduction to Basic Statistics by Pat Hammett