Professional Documents
Culture Documents
RM Sik PDF
RM Sik PDF
CIA-3
To: By:
DR. Ramanatha H R Saqib Ibrahim Khan
(1827121)- M2
DISCRIMINANT ANALYSIS
Data Set: Resort Visit Data
Variables: The categorical variable is Resort Visit with two levels;
1) 1 time 2) 2 times
Data Set Description: The dataset has 244 observations on four variables. The psychological
variables are Family Income, Attitude towards Travel, Household Size Age of the Head of
Household and Importance attached to Family Holiday
Linear discriminant function analysis: It performs a multivariate test of differences between
groups. In addition, discriminant analysis is used to determine the minimum number of
dimensions needed to describe these differences. A distinction is sometimes made between
descriptive discriminant analysis and predictive discriminant analysis.
Analysis:
• The Standardized Canonical Discriminant Function Coefficients table and the Structure
Matrix table are listed in different orders.
• The number of discriminant dimensions is the number of groups minus 1. However,
some discriminant dimensions may not be statistically significant.
• There are two discriminant dimensions, both of which are statistically significant.
• The canonical correlation for the dimension is .801.
• The standardized discriminant coefficients function in a manner analogous to
standardized regression coefficients in OLS regression. For example, a one standard
deviation increase on the outdoor variable will result in a 1.785 standard deviation
increase in the predicted values on discriminant function 1.
• The canonical structure, also known as canonical loading or discriminant loadings,
represent correlations between observed variables and the unobserved discriminant
functions (dimensions).
• The discriminant functions are a kind of latent variable and the correlations are loadings
analogous to factor loadings.
Group centroids are the class (i.e., group) means of canonical variables
The discriminant functions are:
Discriminant score = 0742 * Family Income + .103 * Attitude towards Travel + 0.237 *
Importance attached to Family Holiday + 0.473 * Household Size + 0.209 * Age of the Head
of Household.
• To predict the selling price of the House given the area of house and number of rooms
available
• To construct a multiple linear regression model for the above object and check the
accuracy of the data.
• To test the null hypothesis of the case.
Data Set Dictionary:
S. No Variable Type
2 ID Continuous, Independent
Model Summary:
R R2 Adjusted R2 Std. Error of Durbin-
the Estimate Watson
• The R value represents the simple correlation of 0.840, this indicates a high degree of
correlation.
• The R2 value indicates how much of the total variation in the dependent variable
(House Price), can be explained by the independent variables (Basement Area per
square metre, Ground Area per square metre, Number of rooms on Ground floor, First
Floor Area per sq. metre and Number of rooms on First Floor).
• Here 70.5% of information regarding House Price can be explained by the predictors
i.e. explained Variance. The remaining 29.5% is the unexplained variance.
• Adjusted R2 penalises R2 for each increase in the independent variables. The more
Adjusted R2 indicates that if the new independent variable added improves the model
• Durbin Watson Statistic: The Durbin Watson statistic indicates the autocorrelation
among the residuals. Durbin Watson =2(1- ρ), where ρ is Auto correlation Coefficient.
Here the Durbin Watson statistic is 1.959 which is approximately 2.This shows that the auto
correlation coefficient is 0. That means errors are identical and independently distributed.
Model Hypothesis:
Ho (Null Hypothesis): No relation between house selling price and the dimension of house and
number of rooms available.
Ha (Alternate Hypothesis): There is relation between house selling price and the dimension of
house and number of rooms available.
All the coefficients were significant and hence no coefficient was eliminated.
The regression equation formed:
Sale Price= Constant + (B1 *61.2) + (B2*40.8) + (B3*-21222.4) + (B4*101.1) + (B5*4323.7)
Beta Coefficient
Constant 10904
• Sum of Squares – These are associated with the three sources of variance, Total,
Regression and Residual (errors). The Total variance is partitioned into the explained
variance by the independent variables (Regression) and the un explained variance by
the independent variables (Residual) i.e. the R2 value
• The F statistic value is the Mean Square (Regression)/Mean Square (Residual) is
91.414.
• The Significance Value is 0.000 which is less than alpha value, hence we reject null
hypothesis. Also, it indicates that the regression model statistically significantly
predicts the dependent variable and is a good fit for the data
Histogram:
Histogram is nearly bell-Shaped Curve with mean=0.The curve is symmetrical and bell
shaped, showing that predictors will usually give a result near the average, but will occasionally
deviate by large amounts.
Actual Price v/s Predicted price:
The purpose is to detect a value, or group of values, that are not easily predicted by the
model. Here, in the graph below, the actual and precited sale price are more or less same,
which means Sales Price of the house is accurately predicted by the model