You are on page 1of 9

Median Housing Price Prediction Model for D. M.

Pan National Real Estate Company 1

Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company

[Your Name]

Southern New Hampshire University


Median Housing Price Model for D. M. Pan National Real Estate Company 2

Introduction

This assignment aims to complete the steps of a real-world linear regression problem. A

research question will be developed, a comprehensive statistical analysis completed, and a

summary of the research given from the analysis results. While doing the assignment, I will act

as the analyst hired by the D.M. Pan National real Estate Company to design a model to predict

housing prices for homes sold in 2019. The firm's CEO will use the information to assist their

real estate agents in better determining the use of square footage as a benchmark for listing prices

in homes. The question the research tries to answer is:

Is square footage a significant predictor of house listing price?

The hypothesis that will be tested is:

The square footage of houses is a significant predictor of their listing price – the smaller

the square footage, the lower the listing price, and the larger the square footage, the higher the

listing price.

Linear regression will be used to test the above hypothesis. According to Jan and Shieh

(2019), linear regression is used for various reasons. The most important ones include; when

determining the association between two variables; and determining the value of the dependent

variable at a specific value of the independent variable. From the hypothesis designed, I would

expect the scatterplot to rise from bottom left to top right – showing a positive correlation.

Variables of interest in studies – which are measured or observed are called dependent (response)

variables. Other variables which impact the response and can be set or measured by the

experimenter are called predictor (independent) variables (Bartlett et al., 2020).

Data Collection
Median Housing Price Model for D. M. Pan National Real Estate Company 3

Square footage is the predictor variable, while the listing price is the response variable.

The sample was obtained using the sampling function in the data analysis ToolPac in excel. A

sample of 50 data points was obtained for the analysis. The scatterplot for the two variables is

shown in figure 1 below.

Scatterplot; Listing price vs. Square feet


450,000
400,000
350,000
300,000
Listing Price

250,000
200,000
150,000
100,000
50,000
-
1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400 2,600
Square Feet

Figure 1: Scatterplot; listing price vs. square feet

A close look at the graph shows an increasing trend, which appears linear. This indicates

that the data can be used to model a linear regression. The trend of the data points above shows a

positive and strong association between square footage and listing price.

Data Analysis

The histograms for the variables are included in Figures 2 and 3 below.
Median Housing Price Model for D. M. Pan National Real Estate Company 4

Figure 2: Square Footage

Figure 3: Listing price

Generally, the shape of the histograms shows that the variables come from an

approximately normally distributed population. The summary statistics are included below:
Median Housing Price Model for D. M. Pan National Real Estate Company 5

square feet listing price

Mean 1887.6 Mean 276766


Standard Error 44.861 Standard Error 8336.954
Median 1833.5 Median 287250
Mode #N/A Mode #N/A
Standard Deviation 317.2133126 Standard Deviation 58951.16
Sample Variance 100624.2857 Sample Variance 3.48E+09
Kurtosis -0.571831114 Kurtosis -0.2076
Skewness -0.09663486 Skewness -0.45928
Range 1252 Range 250700
Minimum 1224 Minimum 145100
Maximum 2476 Maximum 395800
Sum 94380 Sum 13838300
Count 50 Count 50

The mean square feet and listing prices are 1887.6ft and $276766 respectively. The

standard deviation for square feet and listing price are 317.21ft and $58961.16 respectively. The

variables' data points are far apart due to their high sample variances. The histograms also

indicate that the variables come from an approximately normally distributed population. There

are no outliers.

The national population's mean square feet and listing prices are 2,111ft and $342,365.

Their standard deviations are 921ft and $125,914, respectively. The histogram for the national

population is positively skewed since it has a long right tail. Compared to the sample population,

the national statistics are higher than those for the selected samples. The selected sample

represents the national housing market sales. In most cases, a sample of 30 points is considered

representative of the population.

The Regression Model

The scatterplot is shown in figure 4 below.


Median Housing Price Model for D. M. Pan National Real Estate Company 6

Scatterplot; Listing price vs. Square feet


450,000
400,000
350,000
300,000 f(x) = 110.087255277766 x + 68965.2969376888
Listing Price

R² = 0.35090706803066
250,000
200,000
150,000
100,000
50,000
-
1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400 2,600
Square Feet

Figure 4: The scatterplot

A regression model can be developed for the variables. The data points show a linear

trend. There is a positive association between square footage and listing price. The association is

positive since the line of best fit shows an increasing trend – a positive slope. Since the line does

not have a steep slope, the association is not very strong – it is medium (Kumari & Yadav,

2018).

The value of R can be obtained from that of R2. The value of R2 is 0.3509 – as shown in

the scatterplot. The value of R will be the square root of R2.

R=√ 0.3509

= 05924.

This implies that the association strength of the variables is medium.

The Line of Best Fit

The regression equation – as shown in the graph, is;

y=68,965+ 110.09 x
Median Housing Price Model for D. M. Pan National Real Estate Company 7

There is a positive regression between square footage and listing price. The slope is

110.09, while the constant is 68,965. A unit increase in square footage causes the listing price to

increase by $110.09 when other factors are constant. The intercept ($68,965) shows the listing

price of the houses when the square footage is zero.

According to Schmidt and Finan (2018), a regression equation normally quantifies the

direction and strength of the association between two numerical variables (square footage and

listing price). The value of the R2 quantifies the strength of the association. It shows the

percentage of variation in Y explained by X. The value of R2 is 0.3509. This implies that square

footage explains 35.09% variations in the listing price of the houses.

The regression model obtained can determine the value of the listing price given that of

the square footage. I will use the equation to predict how much I should list my house, given that

its square footage is 1930ft. This can be done as below:

y=68,965+ 110.09 x ; x = 1930ft

y=68,965+ 110.09(1930)

y=68,965+ 212,473.7

= $281,438.7

Therefore, with the square footings, I can list my house at $281,438.7

Conclusions

When operating, businesses accumulate many data. These data may be related to sales,

client information, and profit. Often, insights are usually needed so that collected data can be

used to enhance business decisions. Linear regression is a statistical method businesses use to

find insight into their data and enhance their decisions (Maulud & Abdulazeez, 2020). In this

assignment, linear regression was conducted to determine if square footage is an important


Median Housing Price Model for D. M. Pan National Real Estate Company 8

predictor of house listing prices. It has been determined that the two variables have a positive

linear relationship. As the square footage of a house increases, the listing price also increases.

Similarly, as square footage decreases, the listing price also decreases. The strength of these

variables’ association is medium. The hypothesis that square footage is a significant predictor of

listing price is proven. It has been shown that square footage determines the prices of the houses

in the studied market. The business can use the regression equation modeled to estimate the

listing prices of the houses. The business needs to note that other factors – apart from square

footage – dictate the listing prices of houses. Square footage only explains a 35.09% variation in

listing prices. As much as the business should use square footage to estimate the listing prices of

the houses, there are other factors to consider which are not explained by the model (like house

location, accessibility to towns and roads, furnishes, and designs of the houses).
Median Housing Price Model for D. M. Pan National Real Estate Company 9

References

Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign overfitting in linear

regression. Proceedings of the National Academy of Sciences, 117(48), 30063-30070.

Jan, S. L., & Shieh, G. (2019). Sample size calculations for model validation in linear regression

analysis. BMC medical research methodology, 19(1), 1-9.

Kumari, K., & Yadav, S. (2018). Linear regression analysis study. Journal of the Practice of

Cardiovascular Sciences, 4(1), 33.

Maulud, D., & Abdulazeez, A. M. (2020). A review on linear regression comprehensive in

machine learning. Journal of Applied Science and Technology Trends, 1(4), 140-147.

Schmidt, A. F., & Finan, C. (2018). Linear regression and the normality assumption. Journal of

clinical epidemiology, 98, 146-151.

You might also like