Professional Documents
Culture Documents
(DATA SCIENCE)
Submitted for
LOVELY PROFESSIONAL UNIVERSITY
From 06/05/23
to 06/28/23
SUBMITTED
BY
NAME:Datla EswarsaiKrishnamraju
Registration Number:12104806
Student Declaration
Student Declaration
original work for the partial fulfillment of the requirements for the
Engineering.
Abstract
Airbnb listing prices are more than meets the eye as they both closely reflect
local real estate tendencies as well as directly influence such prices. Across the
many studies conducted on Airbnb's effect on regional real estate trajectories,
the positive correlation between the number of listings and rental prices/home
prices is well established; one study reporting a 0.018% increase in rental prices
and a 0.026% increase in home prices for every 1% increase in total listings.
With a narrowed focus on Airbnb in Boston, MA, the aim of this analysis is
threefold. First, I explore surface-level insights regarding quantity of listings, trends
in pricing, and guest traffic in the various neighbourhoods across the city. Second, I
plan to validate the relationship between rental and housing prices against the
number of listings within the timeframe of the Airbnb dataset used. And third,
through the use of a predictive model, develop a better understanding of the
listing features that have the largest contribution to listing price.
Introduction
Airbnb has become the first choice for many travellers to utilize not only when looking for
accommodation across the world, but also when exploring a city’s before getting there.
People would want a “home away from home” when thinking about sleeping over in their
destination. Therefore, Airbnb reinvents a unique and inspiring blue ocean into the market,
“Blue Ocean strategy is the simultaneous pursuit of differentiation and low cost to open up a
new market space and create new demand.” In Blue oceans, demand is created rather than
fought over. Airbnb provides a win-win situation for both customers and hosts; while
customers can get accommodation at lower prices, hosts can make money by renting their
properties.
To go even further with how Airbnb is utilized in Boston, I have analysed Boston Airbnb
listings. The datasets used in this analysis were acquired from Inside Airbnb database under
a Creative Commons CC0 1.0 Universal “Public Domain Dedication” license. The dataset
reports the listing activities of homestays in Boston. The dataset incorporates over 6150
property listings including but not limited to hosts info, prices, neighbourhoods, amenities,
The analysis and its findings are only observational and not the result of a formal study.
General business questions are listed below to guide us through the analysis to create a
model that can predict the rental price based on some features.
Related Work
As the Airbnb platform became popular over the last few years, several papers have
already addressed the Airbnb price prediction task, benefiting from the publicly available
datasets on InsideAirbnb.com. In this section, literature dealing with Airbnb in general is
presented, then detailed summarization of already existing literature on price prediction and
Airbnb price determinants is provided.
Despite that multiple projects were carried out on predicting the listing prices, none of
them has been performed across different cities. In this work, we focus on the following three
tasks. First, we would like to explore different features via feature extraction and engineering.
Second, we would like to experiment and compare different machine learning techniques in
price prediction. Finally, we want to train a more generalized model and perform transfer
learning.
Dataset
The Airbnb, Boston dataset retrieved from Kaggle (3585, 95). The dataset retrieved is a
collection of property listings, their key features and types, such as property type, host type,
neighbourhood, reviews and much more. The analysis and its findings are only observational
and not the result of a formal study. General business questions are listed below to guide us
through the analysis to create a model that can predict the rental price based on some
features.
Features
We will be further investigating the comparison of above features with the price
column and identify the relationship and inferences from the results. The datasets
were divided into two parts, one subset with room_id, longitude, and latitude of the
room for further process of identifying new characteristics for each listing, and the
rest of the dataset, which will be merged again at the end of the identifying process.
• Label: The ground-truth label is the listing price. As there exists abnormally high prices in the
datasets, we have used two approaches – data thresholding and label transformation – to
alleviate this problem. For data thresholding, we cut off data with price over 500 dollars per
night, which eliminates approximate 1% of the total listings. For label transformation, we have
tested different power transformations, as well as logarithmic transformation, and we found
that square root transformation and logarithmic transformation work well for the price
prediction. Figure 1 compares a regression model performances without/with label
transformation.
• Categorical features: For most of the categorical features, we directly performed one-hot
encoding, while for a small fraction of list features, like amenities and host verifications, we
encoded them into vectors via dictionary building and mapping. Altogether, we obtained 20
encoded features.
• Text features: To utilize the text features, such as summary, transit, neighbourhood
overview, we counted tiff on unigrams and bigrams. We then performed truncated singular
value decomposition (SVD) to reduce the dimension of each text feature to 50, which makes
the dimension of the 12 text features into a 600-dimension vector.
• Date features: We have 3 date features (host since, first review, last review), and we
converted them into continuous values by filling the null value with the mean date, and
subtracting the earliest date value from all date values.
Code
Entire home/apt 1825
Private room 1353
Shared room 76
Name: room_type, dtype: int64
Apartment 2325
House 547
Condominium 220
Townhouse 50
Bed & Breakfast 39
Loft 32
Other 14
Boat 12
Villa 6
Entire Floor 4
Dorm 2
Guesthouse 1
Name: property_type, dtype: int64
room_type
Entire home/apt 239.097039
Private room 96.356509
Shared room 81.065789
Name: price, dtype: float64
Conclusion
In this analysis, we conducted an exploratory data analysis (EDA) and built
predictive models to understand and predict Airbnb listing prices in Boston. By
analyzing a comprehensive dataset comprising property details, host information,
location, amenities, reviews, and pricing, we gained valuable insights into the factors
influencing listing prices and developed models to estimate prices for new listings.
During the EDA phase, we observed several trends and patterns specific to the
Boston Airbnb market. Factors such as location proximity to popular landmarks, the
number of bedrooms, and the presence of specific amenities emerged as significant
determinants of listing prices. Additionally, we identified seasonal variations in pricing
and observed higher demand during peak tourist seasons.
The insights derived from this analysis can assist hosts in optimizing their pricing
strategies and maximizing rental income. Hosts can leverage the developed models
to set competitive prices based on property characteristics, location, and market
demand. For guests, the models offer a valuable tool to estimate listing prices,
evaluate affordability, and plan their accommodation budgets accordingly.
It is worth noting that the accuracy and generalizability of the models are subject to
the quality and relevance of the data. Additionally, external factors such as changes
in the economy, local events, or regulatory policies may influence listing prices
beyond the captured features. Therefore, continuous monitoring, retraining, and
updating of the models with fresh data are recommended to ensure their
performance remains robust over time.
In conclusion, this analysis provides valuable insights into the Boston Airbnb market,
offering both hosts and guests a data-driven approach to understanding and
predicting listing prices. The combination of exploratory data analysis and predictive
modeling in R programming enables stakeholders to make informed decisions,
optimize pricing strategies, and enhance the overall Airbnb experience in Boston.
Reference
Kaggle. Airbnb price prediction
Borton Airbnb analysis
Airbnb. (2020a). About Us. Retrieved 2020-09-27, from https://news.airbnb.com/about-us/
Airbnb. (2020b). How should I choose my listing's price? Retrieved 2020-09-30, from
https://www.airbnb.com/help/article/52/how-should-i-choose-my-listings-price