You are on page 1of 2

=

x
N

Population
i

(basic)

x= N xi2 ( xi ) N2
2

x
n

Sample
i

(basic)

( x )
i

s=

( x x )
i

n 1

n xi2 ( xi ) n(n 1)

= f i xi =
i

(frequency)

x = f i xi
i 2 i 2 i i

(frequency)

f ( f x )
i i

f x ( f x )

s=

n f i ( x f i xi ) 2 = n 1

n n 1

( f x
i

2 i

( f i xi )

Binomial (specific amount of samples)

Poisson (specific duration of test period) Use formula ( is expected rate/period; x = amount tested)

n! P( x) = p x q n x x!(n x)!

Check table of formula (n samples with x success)

= np
z=

= npq

x e x! = = np =
P ( x) =

Becomes Normal when np>5 and nq>5

x 0.5 (test statistic 0.5 corrected for discrete!)

(go to tables if you approximate Binomial with Poisson!) Becomes Normal when >5

z=

x 0.5 (test statistic 0.5 corrected for discrete!)

Estimating the Standard Deviation Chebyshevs theorem (always holds): amount of data within k of the mean is: Empirical rule of thumb (random i.e. normal sample): amount of data within k of the mean is: 68% for 1 95% for 2 99% for 3 Estimating Risk To measure risk, one often takes the coefficient of variation (CV) to compare for example stocks: / (higher=riskier) An alternative way to measure risk is to plot (risk) versus r (return) to find the efficient frontier (this is the upper left tangent of all data points in the graph, representing the highest return with lowest risk optimum) Functions of Random Variables What is the behaviour of function W when it is a function of random variables X and Y

1 1 outside: 2 k k2

W = a + bX + cY
Then E(W) is: And VAR(W) is:
2 2 2 w = b 2 x + c 2 y + 2bc xy

w = a + b x + c y
Where:

xy = xy x y (a higher cov. does not mean a closer relationship since influenced by units; Cov(x,,y) < 0 means that x & y move in opposite direction; >0 means same direction; =0 y & x independent)
xy = CORR[ X , Y ] =

xy COV [ X , Y ] = (if xy +/- 0.7 -> strong) STDDEV [ X ] * STDDEV [Y ] x y

= 0.05, z = 1.645 2 = 0.05, z = 1.960 = 0.01, z = 2.326 2 = 0.01, z = 2.576 = 0.02, z = 2.054 2 = 0.02, z = 2.326 = 0.1, z = 1.28

These formulas are true for all distributions. Only if probability is calculated ~N has to be assumed for combined Variables.

Central Limit Theorem: If the sample size is large (n>30) the sampling distribution of the sample mean x is approximately a normal distribution. The standard deviation of the sampling distribution of the sample mean is / n and is called the standard error of the mean (becomes smaller with bigger sample size).

Hypothesis testing Null hypothesis Alternative hypothesis

Confidence Intervals

H 0 : = 0

H A : < 0 H A : > 0 H A : 0

P x z / 2 x + z / 2 = 1 n n
Interval n>30 and known: n>30 and unknown: n<30 and known: n<30 and unknown: For proportions: Regression coefficients: Regression forecast: test-statistic

Reject H0 if:

x z / 2 x z / 2 x z / 2

z z z z z z / 2 or z z / 2
H0 is true Accept H0 OK H0 is false 2: p = 0 OK

n s n n
s n

x t / 2,n 1

Reject H0 For Alpha 5%; Z= +/-1.645 For Alpha 1%; Z= +/-1.645

1 : p value = F ( Z )
s n

p z/2 p/ q n b t / 2,n k 1 SE b Y f t / 2,n k 1 s f

S tan dardError =

or other term to the right

2 z z2 p1 ) ( p /2 n= = /2 2 E E

Error bound: sample size for e E or x% E (if p)

na m x

(p .5 = =0 ) 2 4E

z2/2

a xx% confidence interval for xx is [0.5; 0.7]; 0.50.7

/ n x z= s/ n x z= / n x t= s/ n x /np z= p q/n b b0 t= SE b Y f Y0 t= sf

z=

Building a Regression Model 1) Select variables for regression analysis and decide if there should be a constant or line should go through origin 2) Study scatter plots and remove extreme points 3) Study correlation matrix. For multi collinearity check the variables in the regression analysis (neither significant => remove variable with highest p-value; one is significant => take out the other; both are significant => leave both in, unless > 0.99 or signs of coefficients are distorted) 4) Run regression (again) 5) Take out variables one by one using t-test or p-value significance and run regression again 6) Check signs of coefficients with theoretical outcome Regression Analysis Postulated: Regression Residuals 1) Check autocorrelation (do Durbin-Watson test) (DW: 0=pos autocorr < ?? < OK < ?? < neg autocor=4) =>transformation/missing variable/measurement error? 2) Check homoskedasticity (positive residual correlation) and heteroskedasticity (increasing residual error) =>transformation/isolate the source/a significant variable may look insignificant? 3) Check if residuals are normally distributed (histogram) =>not so important (get more observations)

Yi = a + bX i n X2

Real:

Yi = a + bX i + u i

b=

XY XY
X2

XY XY
n
2 x

n a = Y bX

Regression error measures

r= n x y 2 Y Y =SR =1SE i S S 2 R= 2 S ST S (i Y Y) ST

XY XY

F=

R2 / k (1 R 2 ) / ( n k 1)

( )

n = number of samples k = number of variables (2 for simple reg.) t-value = coefficient/standard error p-value = 2*Fz(t-value) SS_ = Sum of squares Standard error of regression (For n>100 se approximates sf)

2 Radj = 1

n 1 SSE n k 1 SST

se =

Y Y i i

nk
2

S= + S S S T S S RE

X ) 1 ( f X sf =s 1+ + e n X X 2 (i )

Standard error of forecast