You are on page 1of 8

RESEARCH METHODOLOGY-MBA-333B

CIA-3

Discriminant Analysis & Multiple Linear Regression

To: By:
DR. Ramanatha H R Saqib Ibrahim Khan
(1827121)- M2
DISCRIMINANT ANALYSIS
Data Set: Resort Visit Data
Variables: The categorical variable is Resort Visit with two levels;
1) 1 time 2) 2 times
Data Set Description: The dataset has 244 observations on four variables. The psychological
variables are Family Income, Attitude towards Travel, Household Size Age of the Head of
Household and Importance attached to Family Holiday
Linear discriminant function analysis: It performs a multivariate test of differences between
groups. In addition, discriminant analysis is used to determine the minimum number of
dimensions needed to describe these differences. A distinction is sometimes made between
descriptive discriminant analysis and predictive discriminant analysis.

Analysis:

• The Standardized Canonical Discriminant Function Coefficients table and the Structure
Matrix table are listed in different orders.
• The number of discriminant dimensions is the number of groups minus 1. However,
some discriminant dimensions may not be statistically significant.
• There are two discriminant dimensions, both of which are statistically significant.
• The canonical correlation for the dimension is .801.
• The standardized discriminant coefficients function in a manner analogous to
standardized regression coefficients in OLS regression. For example, a one standard
deviation increase on the outdoor variable will result in a 1.785 standard deviation
increase in the predicted values on discriminant function 1.
• The canonical structure, also known as canonical loading or discriminant loadings,
represent correlations between observed variables and the unobserved discriminant
functions (dimensions).
• The discriminant functions are a kind of latent variable and the correlations are loadings
analogous to factor loadings.

Group centroids are the class (i.e., group) means of canonical variables
The discriminant functions are:

Discriminant score = 0742 * Family Income + .103 * Attitude towards Travel + 0.237 *
Importance attached to Family Holiday + 0.473 * Household Size + 0.209 * Age of the Head
of Household.

MULTIPLE LINEAR REGRESSION


Data Set: House Prices: Advanced Regression Techniques
Competition Description:
• Ask a home buyer to describe their dream house, and they probably won't begin with
the height of the basement ceiling or the proximity to an east-west railroad. But this
playground competition's dataset proves that much more influences price negotiations
than the number of bedrooms or a white-picket fence.
• Housing markets are unlike any other market. The heterogeneity of the housing market,
makes it difficult to categories: each house feature is different, and housing markets
contain many varying factors, where no two houses are identical (taking into account
the interior and exterior of the property).
• The difficulty in measuring a housing market, due to the heterogeneity of attributes that
define particular houses. Housing markets are imperfect, there are many reasons for
this, including recessions causing delayed movements in house prices and stagnation.
Objectives of the analysis:

• To predict the selling price of the House given the area of house and number of rooms
available

• To construct a multiple linear regression model for the above object and check the
accuracy of the data.
• To test the null hypothesis of the case.
Data Set Dictionary:
S. No Variable Type

1 SALES PRICE Continuous, Dependent

2 ID Continuous, Independent

3 Base Area Per Square Metre Continuous, Independent

4 Ground Area per Square metre Continuous, Independent

5 Number of rooms on Ground floor Continuous, Independent

6 First Floor Area per sq. metre Continuous, Independent

7 Number of rooms on First Floor Continuous, Independent

Model Summary:
R R2 Adjusted R2 Std. Error of Durbin-
the Estimate Watson

0.840 0.705 0.698 0.650 1.959

• The R value represents the simple correlation of 0.840, this indicates a high degree of
correlation.
• The R2 value indicates how much of the total variation in the dependent variable
(House Price), can be explained by the independent variables (Basement Area per
square metre, Ground Area per square metre, Number of rooms on Ground floor, First
Floor Area per sq. metre and Number of rooms on First Floor).
• Here 70.5% of information regarding House Price can be explained by the predictors
i.e. explained Variance. The remaining 29.5% is the unexplained variance.

• Adjusted R2 penalises R2 for each increase in the independent variables. The more
Adjusted R2 indicates that if the new independent variable added improves the model

• Durbin Watson Statistic: The Durbin Watson statistic indicates the autocorrelation
among the residuals. Durbin Watson =2(1- ρ), where ρ is Auto correlation Coefficient.
Here the Durbin Watson statistic is 1.959 which is approximately 2.This shows that the auto
correlation coefficient is 0. That means errors are identical and independently distributed.
Model Hypothesis:
Ho (Null Hypothesis): No relation between house selling price and the dimension of house and
number of rooms available.
Ha (Alternate Hypothesis): There is relation between house selling price and the dimension of
house and number of rooms available.
All the coefficients were significant and hence no coefficient was eliminated.
The regression equation formed:
Sale Price= Constant + (B1 *61.2) + (B2*40.8) + (B3*-21222.4) + (B4*101.1) + (B5*4323.7)

Beta Coefficient

Constant 10904

Basement Area per sqm B1 61.2

Ground Area per sqm B2 40.8

Room- ground floor B3 -21222.4

First floor per sqm B4 101.1

Rooms- first floor B5 4323.7

Analysis of Variance (ANOVA):

• Sum of Squares – These are associated with the three sources of variance, Total,
Regression and Residual (errors). The Total variance is partitioned into the explained
variance by the independent variables (Regression) and the un explained variance by
the independent variables (Residual) i.e. the R2 value
• The F statistic value is the Mean Square (Regression)/Mean Square (Residual) is
91.414.
• The Significance Value is 0.000 which is less than alpha value, hence we reject null
hypothesis. Also, it indicates that the regression model statistically significantly
predicts the dependent variable and is a good fit for the data

Histogram:

Histogram is nearly bell-Shaped Curve with mean=0.The curve is symmetrical and bell
shaped, showing that predictors will usually give a result near the average, but will occasionally
deviate by large amounts.
Actual Price v/s Predicted price:

Actual Price vs Price Predicted


450000
400000
350000
300000
250000
200000
150000
100000
50000
0
1 3 5 7 9 111315171921232527293133353739414345474951535557596163

Sale Price (thousand) Sale Price (Predicted)

The purpose is to detect a value, or group of values, that are not easily predicted by the
model. Here, in the graph below, the actual and precited sale price are more or less same,
which means Sales Price of the house is accurately predicted by the model

You might also like