You are on page 1of 4 Probability of events:
Law of Total Probability:
̅
̅
̅
̅
S
If
are mutually exclusive and collectively exhaustive then:
Cumulative Distribution Function:
Multiplication rule:
Quotient rule:
Bayes rule:
̅
̅
If
we say A & B are mutually exclusive
Conditional probability:
Independence:
A and B are independent
X,Y are independent if
for every X and Y
Probability Tree:
Probability Table:
̅
̅
̅
̅
̅
̅
̅
̅
̅
1
Expected Value:
Variance:
(Positive)
(
)
((
)
)
Joint Distribution:
Marginal Distribution:
Covariance:
Correlation Coefficient:
(
)
 negative linear relationship, -1 is perfect negative
 no linear relationship
If
then
 positive linear relationship, 1 perfect positive
Corr is a measure of linear association, it does not imply causality!
If
If X,Y independent
X,Y are not necessarilyindependent
Continuous Random variables:
Uniform Distribution:
Density function:
The area under
Total area under
between a and b is
is 1
X~U[a,b]
X can get any value between a and b
for every x
a
b Normal Distribution:
Binomial Distribution:
X
If
the
is the number of successes in n independent trials; probability for
success is p (for failure 1-p)
in the table
given in a table
If
and
then
Steps for solution:
1. Look for
2. Transform to
(
)
3. look for the answer in the table
 If X,Y are from normal distribution that
is also Normal
A Simple Random Sample
A sample of size n from a population of N objects, where
Sample Proportion Distribution:
- every object has an equal probability of being selected
- objects are selected independently
- The proportion of the population that has the characteristic of
interest
̂ - The proportion of the Sample that has the characteristic of interest
- follow the same probability distribution
̂
are independent
Two ways to think about it:
-
̂
̂
- Sample with replacement
- Sample only a small fraction of the whole population
For large n (
, ̂
(
(
))
Sample Mean Distribution:
Confidence interval for sample mean when
is known:
Population Size N,
Sample Size
Sample Mean ̅
,
Need to make sure that
z-values:
are Normal or
(look for
in the normal table)
The margin of error:
)
( √
̅
̅
confidence interval:
⏟ ̅
)
( √
for large n (n>30) ̅
(
) ̅
⁄√
confident that:
̅
(
)
̅
)
( √
For a given ME,
, and
we need a sample of size:
(
)
Confidence interval for the population proportion:
Confidence interval for sample mean when
is unknown:
̂
Need to make sure that
are Normal or
̅
̅
̂
The
confidence interval: ̂
√ ̂
̅
If
the population is normally distributed, then
has a t distribution
⁄√
̂
The margin of error:
√ ̂
with n-1 degrees of freedom (values in table)
̅
confidence interval:
)
For a given ME,
, and
we need a sample of size:
( √
(
)
(
)
confident that:
̅
)
̅
)
( √
( √
Hypothesis testing (mean):
1. Formulate the hypothesis
2. Calculate ̅ (and
̅), decide on
3.
Assume
is true
̅
̅
̅
̅
4.Decide on hypothesis by
(a)Rejection area:
or
̅ √
̅
reject
, if
or
̅
̅
(b) p-value:
p-value
(
)
p-value
(
)
p-value
(
)
̅ ⁄√
̅ ⁄√
̅ ⁄√ reject
if p-value
̅̅̅ √
p-value
(
)
p-value
(
)
p-value
(
)
̅ √
̅ √
Hypothesis testing (proportion)
1. Formulate the hypothesis
2. Calculate
̂, decide on
3.
Assume
is true
̂
̂
̂
̂
4.Decide on hypothesis by
(a)Rejection area:
or
reject
, if
(b) p-value:
̂
̂
̂
reject
if p-value
p-value
(
)
p-value
(
)
p-value
(
)
Linear regression:
Least squares method:
Assumptions:
The relation is represented as a line:
1. Y is linearly related to X
or
2. The error term
The variance of the errorterm,
depend on the x-values.
̂
, does not
Choose
such that:
̂
is minimized
3. The error terms are alluncorrelated.
Estimation of model: Th r
i
a “r
al” lin
ar r lation:
The coefficient of determination:
The regression provides estimators for the coefficients and to the
̂
̅
standard error.
The precentage of explanid variation:
R is the correlation coefficient between y and x
Confidence interval for coefficients (k independet variables):
For multiple regression we adjust the R-squares by the number of
independent variables:
Prediction of y given
:
̂
Test statistic for
(k independent variables)
Standard error of single Y-values given X:
(
)
Prediction interval for single Y-values given X:
̂
(k independent variables)
Testing the linear realationship in multiple regression:
Standard error of the mean Y given X:
at least for one
Check the p-value given in the regression output and compare to
the desired
reject
if
p-value
Confidence interval for the mean Y given X:
̂
(k independent variables)
 Testing the linear realationship in simple regression: Categorial variables: If a qualitative variable can have only one of two values, introduce an independent variable that is 0 for the one value and 1 for the other. for k categories we need k-1 variables (base category is when all variables are 0) Check the p-value for and compare to the desired reject if p-value Non linear relationships: Trends –if the dependent variable increases or decreases in time, use a time counter or de-trend the data first. Seasonal behavior – if there are seasonal patterns, use dummies or seasonally adjusted data. Time Lags – if a independent variable influences the dependent variable over a number of time periods, include past values in the model. Multicollinearity – a high correlation between two or more independent variables.Effects: 1. Logarithmic transformation of the dependent variable: ̂ Use when the y grows or depreciate at a certain rate in all ̂ 2. Transformation of the independent variables Replace by , , or any other transforamtion that makes sense. Try when the relation is not linear and 1 does not make sense  The standard errors of the s are inflated  The magnitude (or even the signs) of the s may be different from what we expect  Adding or removing variables produces large changes in the coefficient estimates or their signs  The p-value of the F-test is small, but none of the t-values is significant Decision Theory: How to build a decision tree: 1. List all decisions and uncertainties 2. Arrange them in a tree using decision and chance nodes Time 3. Label the tree with probabilities (for chance nodes) and payoffs(at least at the end of each branch). 4. Solve the tree/ fold back by taking the maximum of payoffs/ minimum of costs for decision nodes and the expected value for chance nodes toget the EMV (expected monetary value) Solve

Regressions examples:

*Obtain 90% CI for expected change in TASTE if the concentration of LACTIC ACID increased by .01:

*What is the expected increase in peak load for 3 degree increase in high temp:

Prediction: Use output to estimate a 90% lower bound for predicted BAC of individual who had 4 beers 30 minutes ago: First calculate BAC hat. To obtain a lower bound on the blood alcohol level, we need to estimate S Forecast. See formula. S Forecast = 0.01844 So a lower bound =

Find 90% confidence interval for the expected weight of a car full of ore: