You are on page 1of 17

SIMULATION AND MODELING

STANDARD NORMAL DISTRIBUTION


NORMAL DISTRIBUTION VS THE
STANDARD NORMAL DISTRIBUTION
ALL NORMAL DISTRIBUTIONS, LIKE THE STANDARD NORMAL DISTRIBUTION, ARE
UNIMODAL AND SYMMETRICALLY DISTRIBUTED WITH A BELL-SHAPED CURVE.
HOWEVER, A NORMAL DISTRIBUTION CAN TAKE ON ANY VALUE AS ITS MEAN AND
STANDARD DEVIATION. IN THE STANDARD NORMAL DISTRIBUTION, THE MEAN AND
STANDARD DEVIATION ARE ALWAYS FIXED.
EVERY NORMAL DISTRIBUTION IS A VERSION OF THE STANDARD NORMAL
DISTRIBUTION THAT’S BEEN STRETCHED OR SQUEEZED AND MOVED HORIZONTALLY
RIGHT OR LEFT.
THE MEAN DETERMINES WHERE THE CURVE IS CENTERED. INCREASING THE MEAN
MOVES THE CURVE RIGHT, WHILE DECREASING IT MOVES THE CURVE LEFT.
THE STANDARD DEVIATION STRETCHES OR SQUEEZES THE CURVE. A SMALL
STANDARD DEVIATION RESULTS IN A NARROW CURVE, WHILE A LARGE STANDARD
DEVIATION LEADS TO A WIDE CURVE.
SAMPLE MEAN
SAMPLE MEAN REPRESENTS THE MEASURE OF THE CENTER
OF THE DATA. ANY POPULATION'S MEAN IS ESTIMATED USING
THE SAMPLE MEAN. IN MANY OF THE SITUATIONS AND
CASES, WE ARE REQUIRED TO ESTIMATE WHAT THE WHOLE
POPULATION IS DOING, OR WHAT ALL ARE THE FACTORS
GOING THROUGHOUT THE POPULATION, WITHOUT
SURVEYING EVERYONE IN THE POPULATION. IN SUCH CASES
SAMPLE MEAN IS USEFUL. AN AVERAGE VALUE FOUND IN A
SAMPLE IS TERMED THE SAMPLE MEAN. THE SAMPLE MEAN
SO CALCULATED IS USED TO FIND THE VARIANCE AND
THEREBY THE STANDARD DEVIATION.
CORRELATION AND REGRESSION
 Correlation: is there a relationship between 2
variables?
 Regression: how well a certain independent
variable predict dependent variable?
 CORRELATION  CAUSATION
 In
order to infer causality: manipulate independent
variable and observe effect on dependent variable
SCATTERGRAMS
Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation


VARIANCE VS COVARIANCE
 First, a note on your sample:
 If you’re wishing to assume that your sample is
representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1)
in your calculations of variance or covariance.
 But if you’re simply wanting to assess your current
sample (FIXED EFFECTS MODEL), substitute n for
the degrees of freedom.
VARIANCE VS COVARIANCE

 Do two variables change together?


n

 (x
Variance:
• Gives information on variability of a
single variable. i  x) 2

S x2  i 1

Covariance: n 1
• Gives information on the degree to
which two variables vary together. n
• Note how similar the covariance is to
variance: the equation simply  (x i  x)( yi  y )
multiplies x’s error scores by y’s error
scores as opposed to squaring x’s error cov(x, y )  i 1
scores. n 1
COVARIANCE
n

 (x i  x)( yi  y )
cov(x, y )  i 1

n 1
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no constant relationship: cov (x,y) = 0
EXAMPLE COVARIANCE
7

6 x y xi  x yi  y ( xi  x )( yi  y )
5
0 3 -3 0 0
4

3
2 2 -1 -1 1
2
3 4 0 1 0
1
4 0 1 -3 -3
0 6 6 3 3 9
0 1 2 3 4 5 6 7
x3 y3  7

 ( x  x)( y
i i  y ))
7 What does this
cov(x, y )  i 1
  1.75 number tell us?
n 1 4
PROBLEM WITH COVARIANCE

 The value obtained by covariance is dependent on the size of


the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
REGRESSION

 Correlation tells you if there is an association


between x and y but it doesn’t describe the
relationship or allow you to predict one
variable from the other.

 To do this we need REGRESSION!


BEST-FIT LINE
 Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that
gives best prediction of y for any value of x

 This will be the line that ŷ = ax + b


minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε

= ŷ, predicted value
= y i , true value
ε = residual error
LEAST SQUARES REGRESSION

 To find the best line we must minimise the sum of


the squares of the residuals (the vertical distances
from the data points to our line)
Model line: ŷ = ax + b a = slope, b = intercept

Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2

 we must find values of a and b that minimise


Σ (y – ŷ)2
FINDING B
 First we find the value of b that gives the min
sum of squares

b
ε b ε
b

 Trying different values of b is equivalent to


shifting the line up and down the scatter plot
FINDING A
 Now we find the value of a that gives the min
sum of squares

b b b

 Trying out different values of a is equivalent to


changing the slope of the line, while b stays
constant
MINIMISING SUMS OF SQUARES
 Need to minimise Σ(y–ŷ)2
 ŷ = ax + b
 so need to minimise:

sums of squares (S)


Σ(y - ax - b)2
 If we plot the sums of squares
for all different values of a and b
we get a parabola, because it is a
squared term
Gradient = 0
min S
 So the min sum of squares is at Values of a and b
the bottom of the curve, where
the gradient is zero.

You might also like