You are on page 1of 10

BUSINESS STATISTICS -

IBM SPSS
EXPERIENTIAL
LEARNING-1
GROUP MEMBERS:
PARTH RANPARIYA - 45257
ABHINAV GROVER - 45106
KARTIK BHATEJA - 45194
SARTHAK GUPTA - 45285
SARTHAK GARG - 45284
SHANTANU SACHDEVA - 45287
SIDDHARTHA MALVIYA- 45310
SHASHANK HALDIA – 45288
SIDDHARTH BASU – 45309
ANANY JAUHARI - 45127
DATA IMPORTING AND SAMPLING
1) Data Importing from Excel to SPSS 2) Data Sampling (5000 rows)
● Click on Data option in the menu
Click on Data option in the
menu bar, click import data, bar. Then click on ‘Select Cases’,
and then select the file of type and generate a random sample
Excel of cases
DATA CLEANING AND SELECTING
● look for household type and select only those houses of type 1. Further, also
clean the data by removing values that do not exist, i.e data is not known.
Then use the ‘Select cases’ option and select the IF condition and enter the
equation. Display those cases where TYPE = 1 & AGE1 >= 0 & PER >= 0
& VALUE >= 0. Finally, filter out these cases, and do not consider them
while creating the model.
STRING TO NUMERICAL CONVERSION

● Convert the string values in the table to


numerical values so that we can include
them while designing the Machine
Learning model, we assume that these
are important parameters that could
influence the value of the
house(dependent variable).
● For this purpose, we use the automatic
recode function, to convert string values
to numerical, using the ‘transform’
option in the menu bar.
STRING TO NUMERICAL CONVERSION (OUTPUTS)
MULTIPLE REGRESSION MODEL
● To develop the multiple regression model, click on ‘Analyze’, then click on
regression, then click on Linear. Choose the dependent variable, as ‘Value’, and
other variables as an independent. Next, select okay.
MULTIPLE REGRESSION MODEL (OUTPUTS)
CORRELATION COEFFICIENTS
STATISTICS ( MEAN, SD, VARIANCE)
CONCLUSIONS
● Upon calculation, we obtained the mean of the house values to be 248580.95, while the standard
deviation was found to be 271092.556 for the generated sample of 5000 cases, where data is
known.
● While developing the model, the variable ‘Vacancy’, and ‘Status’, were removed as they were
either found to be constant or have a missing correlation. Variables that were in string form were
converted to numerical. In multiple regression, there is 1 DV, and 2 or more IVs
● We obtain the R^2 value to be 0.084 for our generated dataset mode. This means that
approximately 8.4% of the variance in the ‘value’ of the house is determined by the independent
variables as a group.
● The ANOVA is a test to realize if the value of R^2 is significantly greater than zero. We look at
the last column of the table, i.e Sig. Since the value (0.001) is less than 0.05, we conclude that
our model is significant. F=70.438, p<0.001, R^2 = 0.084.
● From the individual correlation table, we check for variables where significance is less than
0.05. We, therefore, find that the variables Region, Metro, Structure Type, Rooms, number of
bedrooms, built, and area median income are significant variables that determine the value of the
house.

You might also like