Professional Documents
Culture Documents
ECON 231 Contents - Saiprakash Ragi - N448U365
ECON 231 Contents - Saiprakash Ragi - N448U365
There are four key properties associated with the normal distribution:
1. It is bell shaped (and thus symmetrical) in appearance.
2. It is measure of central tendency (mean, median and mode) are all identical
3. Its middle spread is equal to 1.33 standard deviations. This means the interquartile range is
contained within an interval of two-thirds of a standard deviation below the mean to two-
thirds of a standard deviation above the mean.
4. Its associated random variable has an infinite range (-∞ < X < ∞)
Z2 σ2
Where n, sample size is n = 2
e
Hypothesis-Testing methodology
The Null and alternative hypotheses
Hypothesis testing typically begins with some theory, claim, or assertion about a particular
parameter of a population. For example, for purposes of statistical analysis, your initial
hypothesis about the cereal example is that the process is working properly, meaning that the
mean fill is 368 grams, and no corrective action is needed.
The hypothesis that the population parameter is equal to the company specification is referred
to as null hypothesis. A null hypothesis is always one of the status quo, and is identified by
the symbol H0
Here the null hypothesis is that the filling process is working properly, that the mean fill per
box is the 368-gram specification. This can be stated as
H0 : μ = 368
Note that even though information is available only from the sample, the null hypothesis is
written in terms of the population parameter. This is because the parameter of interest is
focussed on the entire filling process (the population) of all the cereal boxes being filled. The
sample statistic will be used to make inference about the entire filling process. One inference
may be that the results observed from the sample data indicate that the null hypothesis is
false. IF the null hypothesis is considered false, something else must be true.
Whenever a null hypothesis is specified, an alternative hypothesis must also be specified, one
that must be true if the null hypothesis is found false. The alternative hypothesis H 0 is the
opposite of null hypothesis H0. This is stated in the cereal example as
H1 : μ ≠ 368
On the other hand, the Type II error occurs if the conclusion (based on sample information) is
that the average population fill amount is 368 when in fact it is not 368.
Type I Error
A Type I error occurs if the null hypothesis H, is rejected when in fact it is true and should
not be rejected, the probability of a Type I error occurring is
Type II Error
A Type Il error occurs if the null hypothesis H, is not rejected when in fact it is false and
should be rejected. The probability of a Type II error occurring is β.
In recent years, with the advent of widely available statistical and spreadsheet software, the
concept of the p-value as an approach to hypothesis testing has increasingly gained
acceptance. The p-value is often referred to as the observed level of significance, which is the
smallest level at which H0 can be rejected for a given set of data. The decision rules for
rejecting H0, in the p-value approach follows.
If the p-value is greater than or equal to, the null hypothesis is not rejected
To understand the p-value approach, consider the cereal-filling-process example. You tested
whether or not the mean fill amount was equal to 368 grams. A Z value of +1.50 was
obtained and the null hypothesis was not rejected because +1.50 was less than the upper
critical value of +1.96 and more than the lower critical value of -1.96.
SIMPLE LINEAR REGRESSION MODEL
Yi = β0 + β1Xi + €i
Where
In this model, the slope of the line, β 1 represents the expected change in Y per unit change in
X. It represents the average amount that Y changes (either positively or negatively) for a
particular unit change in X. The intercept β0 represents the average value of Y when X Equals
0. The last component of the model, represents the random error in Y for each observation i
that occurs. In other word, €i is the vertical distance Yi is above or below the line of
regression.
When using a regression model for prediction purposes, it is important that you consider only
the relevant range of the independent variable in making predictions. This relevant range
includes all values from the smallest to the largest X used in developing the regression model.
Hence, when predicting Y for a given value of X, you can interpolate within this relevant
range of the X values, but you should not extrapolate beyond the range of X values. When
you use the square footage to predict al sales note from Table 1 that the square footage (in
thousands of square feet) varies from 1.1 to 5.8. Therefore, predictions of annual sales should
be made only for stores whose size is between 1.1 and 5.8 thousands of square feet. Any
prediction of annual sales for stores whose size is outside this range presumes that the fitted
relationship holds outside the range of 1.1 to 5.8 thousands of Square feet. For example, you
cannot extrapolate the linear relationship beyond 5,800 square feet in Example 1. It would be
improper to use the regression equation to forecast the sales for a new store containing 8,000
square feet. It is quite possible that store size has a point of diminishing returns. If that were
true as square footage increases beyond 5,800 square feet, the effect on sales might become
smaller and smaller.
RESIDUAL ANALYSIS
In the preceding discussion of the site selection data, a simple linear regression model has
been used.
In this section, a graphical approach called residual analysis is developed to evaluate whether
the regression model that has been fitted to the data is an appropriate model. In addition,
residual analysis also enables potential violations of the assumptions of the regression model
to be evaluated.
Evaluating the Aptness of the Fitted Model
The residual or estimated error value ei, is defined as the difference between the observed(Y i)
and predicted (Ŷi) values of the dependent variable for a given value of Xi, Graphically a
residual depicted on a scatter diagram as the vertical distance between an observed value of Y
and the line defined by the simple regression equation. Numerically the residual is defined in
Equation (9).
The residual is equal to the difference between the observed value of Y and the predicted
value of Y.
ei = Yi - Ŷi --- (9)
MEASURING AUTOCORRELATION:
One of the basic assumptions of the regression model is the independence of the errors. This
assumption is often violated when data are collected over sequential periods of time because
a residual at any one point in time may tend to be similar to residuals at adjacent points in
time. Such a pattern in the residuals is called autocorrelation. When substantial
autocorrelation is present in a set of data, the validity of a regression model can be in serious
doubt.