Professional Documents
Culture Documents
Disquisitiones Arithmeticæ
Predicting the prices for breakfasts and beds
i
This work is dedicated to my niece Nguyen Le Tue An(Mochi), who has
brought a great source of joy to me and my family recently.
Abstract
Pricing and guessing the right prices are vital for both hosts and renters on home-
sharing plat-form from internet based companies. To contribute the growing inter-
est and immense literatureon applying Artificial Intelligence on predicting rental
prices, this paper attempts to build ma-chine learning models for that purpose
using the Luxstay listings in Hanoi. R2 score is used as the main criterion for the
model performance and the results show that Extreme GradientBoostings (XGB)
is the model with the best performance with R2 = 0.62, beating the most so-
phisticated machine learning model: Neural Networks.
iii
Contents
Abstract iii
1 Introduction 1
2 Literature Review 2
3 Experimental Design 5
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 K-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Measuring Model Accuracy . . . . . . . . . . . . . . . . . . . . . . 7
4 Methods 9
4.1 LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 FISTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Extreme Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . 16
4.5 LightGBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5.1 Gradient-based One-sided Sampling . . . . . . . . . . . . . 20
4.5.2 Exclusive Feature Bundling . . . . . . . . . . . . . . . . . . 20
4.6 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6.1 Adam Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6.2 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . 26
iv
CONTENTS v
References 34
Chapter 1
Introduction
Since its establishment in 2016, Luxstay has become one of the most popular plat-
forms on home-sharing along with Airbnb in Vietnam with a network of more than
15,000 listings. The platform connects the guests’ demand to rent villas, houses,
apartment,... to hosts and vice versa. Hence, providing a reasonable price will
help hosts to gain a high and stable income and guests will get great experiences
in new places. Therefore, working on the sensible predictor and suggestion of
Luxstay prices can generate a real-life value and practical application.
Hanoi is the capital of Vietnam and has the second most listings on Luxstay.
The city has been also ranked in top 10 destinations to visit by TripAdvisor. As a
dynamic city with active bookings and listings, Hanoi can be a great example for
the study of Luxstay Pricing.
In this paper, we build a price prediction model and compare the performance
of different methods using R2 as the main measure. The input of our models is
the data scraped on the Hanoi page of the website which includes continuous and
categorical records about listings. Then a number of methods including traditional
Machine Learning models (LASSO, random forest, gradient boosting), Extreme
Gradient Boosting, LightGBM and neural network to predict prices of listings.
1
Chapter 2
Literature Review
2
CHAPTER 2. LITERATURE REVIEW 3
study of rental price prediction on the leading platform. The two trends for this
topic are hedonic-based regression and artificial intelligence techniques.
The term Hedonic is defined to describe ”the weighting of the relative impor-
tance of various components among others in constructing an index of usefulness
and desirability” (Goodman 1998). In other words, Hedonic pricing is to identifies
factors and characteristics affecting an item price (Investopedia.com). Wang &
Nicolau (2017) aimed to design a system to understand which features are im-
portant input for an automated price suggestion on Airbnb using hedonic-base
regression approach. The functional form used were Ordinary Least Squares and
Quantile Regression to analyse 25 variables of 180,533 listings in 33 cities. The
result shows that features related to host attributes such as the number of their
listings and the profile pictures are the most important features. Among those,
super host status, which reveals experienced hosts on the platform, is the best one.
However, the authors also discussed the limitation of this analysis. The approach
is under some economic assumptions needed to be examined. The assumption of
hosts’ rationality requires a qualitative check which is skipped in the study. Gener-
ally, the effectiveness of hedonic-based regression for price prediction is restricted
by the model assumptions and esimation (Selim 2009).
Another approach for price prediction is to apply artificial intelligence tech-
niques which mainly includes machine learning an neural network models. Tang
& Sangani (2015) produced a model fore price prediction for San Francisco list-
ings. To reduce the complexity of the task, they turned the regression problem
into a classification task that predict both the neighbour hood and price range of
a listing and Support Vector Machine was the main model to be tuned. Uniquely,
they included images as inputs for the model by creating a visual dictionary to
categorise the image of a listing. The result shows that while the price prediction
achieves a high accuracy in the test set at 81.2%, the neighbourhood prediction
suffers from overfitting with a big gap between the train and test sets. Alterna-
tively, Cai & Han (2019) attempted to work on the regression problem using the
listings in Melbourne. The study implemented l1 regularisation as feature selec-
tion for all traditional machine learning methods and then compared to models
without it. The result shows that the latter perform better overall and gradient
CHAPTER 2. LITERATURE REVIEW 4
boosting algorithm produces the best precision with R2 = 0.6914 in the test set.
Recently, another study of the listings in New York holds an interesting result
with an highest R2 of 0.7768 (Kalehbasti et al. 2019). To gain that score, they
performed a logarithmic transformation to the prices and then train their mod-
els. Additionally, they also attempted to compare three feature selection methods,
which are manual selection, p-value and LASSO. The analysis shows that p-value
and LASSO outperformed manual selection and the best method to be applied in
the paper is LASSO.
In this paper, we applied the knowledge of the last three studies to build our
price predictor for the listings on Luxstay. Apart from widely used traditional
machine learning methods and neural networks, we also attempted to code an
algorithm to compute LASSO regression ourselves and used the two recent gradi-
ent boosting technique, Extreme Gradient Boosting and LightGBM. The project
worked on the original rental prices to produce a price prediction without any
logarithmic transformation.
Chapter 3
Experimental Design
3.1 Dataset
5
CHAPTER 3. EXPERIMENTAL DESIGN 6
district, type of home, name of its building, numbers of guests allowed, bedrooms
and bathrooms.
In order to make the dataset become available inputs for machine learning
models, we went through few pre-processing steps. Firstly, we droped features that
are not related to the prices such as listing id, listing name and listing link.
Secondly, we used dummy variable encoding to solve the issue with categorical
features which some machine learning algorithms can not work with directly. A
categorical variable is a variable that assign an observation to a specific group or
nominal category on the basis of some qualitative property (Yates et al. 2003). A
dummy variable is a binary variable that stores values of 0 and 1 where the former
represents the absence of a category and the latter shows the presence (James
H. Stock 2020, p. 186). The number of dummy variables depends on the number
of different categories such that it requires K-1 dummy variables if there are K
categories in a feature to make the data matrix to be invertible matrix, avoiding
the dummy variable trap (James H. Stock 2020, p. 230).
We ended up with 78 explanatory features. As we have a limited number of
listings in our dataset, we attempted to solve this problem by using K-Fold Cross
Validation for model selection since this method is considered to be useful when
the number of records is low (Bishop 2006, p. 32).
Figure 3.2: The technique of K-Fold Cross Validation with K=4 (Bishop 2006,
p. 33)
CHAPTER 3. EXPERIMENTAL DESIGN 7
The method involves splitting the dataset into K different groups. Then K-1
groups are used to train a specific model which are evaluated by the remaining
group. The last step is then repeated for K times until K specific groups are tested
individually. Finally, the performance score of a model, which is discussed in the
section below, is the average of the scores from K runs.
A major drawback of this technique is that it is computationally expensive as
a model is required to train and test K times. This issue is critical in our case
as there are machine learning algorithms that have a plenty number of hyper-
parameters with different combinations required to be tested. For instance, there
are more than 10 hyper-parameters to be tuned in Extreme Gradient Boosting,
which is infeasible to use K-Fold Cross Validation for all of the compositions.
Therefore, we only tuned some parameters supposed to have vital impacts on the
model performance while leaving the others in default values set by its package.
n
1X
M SE = (yi − fˆ(xi ))2 (3.1)
n
i=1
to have that in reality. Therefore, by choosing the smallest MSE among our models
we do not if that model can become a real tool for a price suggestion practically.
Thus, it is where the R2 statistic comes in as an alternative measure.
The R2 statistic shows the fraction of variance in the target that can be pre-
dicted using the features (James H. Stock 2020, p. 153). Then metrics always takes
on a values between 0 and 1, where an R2 near 0 provides a model with a bad
accuracy while an R2 close to 1 provides a model good at predicting the target.
The formula of this metrics is given by,
Pn
2 i=1 (yi − ȳ)2
R = 1 − Pn (3.2)
i=1 (yi − fˆ(xi ))2
where ȳ is the mean of the target that we try to predict. Additionally, the
formula can also be derived into this
Pn 2
i=1 (yi −ȳ)
2 n
R =1− Pn ˆ 2
i=1 (yi −f (xi ))
n
MSE of a model
R2 = 1 − (3.3)
MSE of the mean of data
As the MSE gets smaller toward 0, the R2 gets bigger toward 1. Therefore, we
can interpret that the R2 is a rescaling of the MSE. This is the reason for us to
choose the R2 as the main metrics for model selection as its intuitive scale appears
to be better descriptively.
Chapter 4
Methods
4.1 LASSO
Least absolute shrinkage and selection operator (LASSO) (Tibshirani 1996) is a
regression analysis technique that was introduced to improve the prediction accu-
racy and perform as feature selection method for regression models. LASSO seeks
to find the solution of this following problem,
9
CHAPTER 4. METHODS 10
4.1.1 FISTA
Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) is an iterative algorithm
based on the application of proximal operators to solve non-differentible convex
optimisation problems . In particular, the general optimisation problem is
, where:
FISTA can be used in many problems related to 4.2. LASSO is among the best
known. Hence, we can apply this algorithm to solve the following Loss function of
LASSO
1
Loss = min kXw − Y k + αkwk1
W 2
This is a slightly modified version of 4.1 as we add a fraction of 2 in the first
term for mathematical convenience. Then we set the loss function into this form:
For this problem, our job is to find two functions: ∇f (w) and L(f ) to proceed
the algorithm. Firstly, we compute the former function:
CHAPTER 4. METHODS 11
Then applying the chain rule (B), we get this partial derivative respected to
the weight:
∇f (w) = X T (Xw − Y )
Now we find the Lipschitz constant through this k∇f (a) − ∇f (b)k. By expanding
the argument we have:
kX T (Xa − Y ) − X T (Xb − Y )k
Thus, we factorise the common term X T X: kX T X(a − b)k .Applying this norm
inequality kA(a − b)k ≤ kAkka − bk(Benning 2019), we have the Lipschitz constant
L(f ) found that is L = kX T Xk
FISTA is a refined version of Iterative Shrinkage-Thresholding Algorithm (ISTA)
in which both methods seek to find the solution of this proximal function (Beck &
Teboulle 2009):
L 1
PL (v) = arg min g(w) + kw − (v − ∇f (v)k2
w 2 2
For this problem, we substitute g(w) = λkwk1 and set z = v − 12 ∇f (v). Then
it becomes
L 2
PL (z) = arg min λkwk1 + kw − zk
w 2
With some steps using calculus, we get this result
λ λ
z − L , z>
L
S λ (z) = 0, |z| ≤ Lλ (4.3)
L
z + λ ,
z < − Lλ
L
As a consequence, we have enough recipes to run the FISTA. Since the Lipschitz
constant in this case is easy to compute, we follow the algorithm with constant step
size according to the paper (Beck & Teboulle 2009). The algorithm is proceeded
as below
The Random Forest Regression in our study operates through this following
algorithm (Trevor Hastie 2009, p. 588):
CHAPTER 4. METHODS 14
The construct of Decision Tree is referred to the original paper for more details
(Breiman et al. 1984). Nonetheless, the Random Forest algorithm only uses a
limited number, which is less than the total amount, of features selected randomly
to decide the candidate to split a node. This helps remove the problem that the
ensemble over-relies on an individual feature and have a fair use of all features,
making the model more robust.
2. Fit a regression tree to the targets rim giving terminal regions Rjm ,
where j = 1, ...,Jm
3. For j = 1, ...,Jm compute
X
γjm = arg min L(yi , fm−1 (xi ) + γ)
γ
xi ∈Rjm
PJm
4. Update fm (x) = fm−1 (x) + ν j=1 γI(x ∈ Rjm )
end for
return fˆ(x) = fM (x)
The Algorithm 3 is indicated by the choice of of the loss function L(y, f (x)).
In this study, our choice of the loss criteria is the ’least-squares’ L(y, f (x)) =
1
2 (yi − f (xi ))2 . By optimising this function, we can find the first model, which is
a single terminal node tree showing the mean of the target y in the training set.
Moreover, the negative gradient of the loss function computed in each iteration
is called pseudo residual r. The succeeding trees are built based on this paper
(Breiman et al. 1984). Thus, each tree corrects the mistakes of the previous trees.
The corrections is scaled by the learning rate ν to avoid the problem of high
variance, increasing the robustness.
CHAPTER 4. METHODS 16
n
l(yi , ˆ(y)t−1
X
L(t) = i + wj ) + Ω(wj )
i=1
1
l(yi , ŷit−1 + wj ) ≈ l(yi , ŷit−1 ) + gi wj + hi wj2
2
where gi = ∂yt−1 l(yi , ŷit−1 ) and hi = ∂y2t−1 l(yi , ŷit−1 are the first and second order
gradients of the loss function. The term g and h are inspired to the facts that
the first-order of a function is often called Gradient and the second-order is often
called Hessian respectively.
Remove the constant l(yi , ŷit−1 ) of the approximation, the objective function
becomes:
n
X 1
L(t) = (gi wj + hi wj2 ) + Ω(wj )
2
i=1
CHAPTER 4. METHODS 17
T
1 X 2
Ω(wj ) = γT + λ wj
2
j=1
Here γ is the pruning para meter and λ is the l2 regularized parameter. Hence,
our objective function is given as:
n T
X 1 1 X 2
L(t) = (gi wj + hi wj2 ) + γT + λ wj
2 2
i=1 j=1
T
X X 1 X
= ( gi )wj + ( hi + λ)wj2 + γT
2
j=1 i∈Ij i∈Ij
T
(t)
X 1 2
L = Gj wj + (Hj + λ)wj + γT (4.4)
2
j=1
We then optimise the equation 4.4 by using the first-order condition with re-
spect to wj :
Gj
wj = −
Hj + λ
Hence, the corresponding objective function is:
T
1 X G2j
L(t) (q) = − + γT (4.5)
2 Hj + λ
j=1
The equation 4.5 can be used as a metric to measure the quality of a structure
q. It could be impossible to process all the tree structure q. Thus, a greedy
algorithm that starts from a single leaf and then iteratively adds branches to the
tree is used instead. Starting from a single leaf node, we split into two nodes
such that IL and IR are the instance sets of left and right nodes after splitting.
CHAPTER 4. METHODS 18
The equation 4.6 can be decomposed into four parts: the score on the new
left leaf, the score on the new right leaf, the score on the original leaf and the
regularization put on the additional leaf. We can observe that if the bracket in
the equation is smaller than γ, we would not get a better result from adding that
branch. Thus, the branch is removed and this is how the pruning techniques in
tree based work.
In order to find the best candidate to split, we need an algorithm to perform
the work. In this work, we applied the so-called exact greedy algorithm, which is
defined as ”a split finding algorithm enumerates overall the possible splits on all
the features” (Chen & Guestrin 2016):
and computations.
4.5 LightGBM
Figure 4.2: How LightGBM and other boosting algorithms work (Source)
LightGBM (Ke et al. 2017) is another vairant of Gradient Boosting method that
mainly focuses on speeding up the training efficiency. Instead of building trees
horizontally at level-wise, LightGBM grows trees vertically by adding a new tree
leaf at each iteration. The figure 4.2 displays the implementations of LightGBM
and other Gradient Boosting techniques.
The computational efficiency of LightGBM comes from the fact that this
method combines the two techiniques, Gradient-based One-sided Sampling (GOSS)
and Excludsive Feature Bundling (EFB). The descriptions of those two are shown
in the two sections below
CHAPTER 4. METHODS 20
The terms of Neural Networks are biologically inspired by how human process in-
formation in their brains. Figure 4.3 displays the architecture of Neural Networks.
Artificial Neural Network consists of nodes, which can be called as neurons, and
activation functions. Each node have a collection of weights and a bias, which can
be computed through the learning process. An activation functions is to produce
an output for a node depending on its input values. Every Neural Network in-
cludes three kinds of layers. The input layer receives information from the records
of a dataset, the output layer results in the network prediction and hidden layers
connect the input and the output with one another. Mathematically, a model of
Neural Network is defined as follow (Benning 2020, p. 28):
CHAPTER 4. METHODS 24
fw (x) = ϕL (ϕL−1 (...ϕ1 (ϕ0 (x, w1 , b1 ), w2 , b2 )..., wL−1 , bL−1 )wL , bL ) (4.7)
ϕ(x, W, b) = W T x + b (4.8)
s
( )
1X
arg min li (fw (xi ), yi ) (4.10)
w,b s
i=1
where i ∈ {1, ...,s} and {li }si=1 is a family of loss functions. For the regression
problem of this study, we choose the least-squares, li (x) = 12 (fw (xi ) − y)2 in order
to optimise the problem 4.10 as follow:
s
( )
1 X
arg min (fw (xi ) − y)2
w,b 2s
i=1
The algorithm of Adam is as follow regard to the original paper (Kingma &
Ba 2014):
CHAPTER 4. METHODS 26
Algorithm 8 Adam
Specify: f (θ): Stochastic objective function with parameters θ
Specify: α, β1 , β2 ∈ [0, 1),
Initialise: θ0
m0 = 0
v0 = 0
for t = 1, ...,T do
compute gt = ∇θ ft (θt−1 )
compute gt2 = gt gt (: Hadamard product (A))
compute mt = β1 mt−1 + (1 − β1 )gt
compute vt = β2 vt−1 + (1 − β2 )gt2
mt
compute m̂t = 1−β t
1
vt
compute v̂t = 1−β t
2
compute θt = θt−1 − √αv̂m̂t +
t
end for
return θT
4.6.2 Backpropagation
Backward propagation of errors (Backpropagation) is the practice of readjusting
the weights and biases of a Neural Network based on the error rate obtained in the
previous epoch of the training process. The term ”backwards” here means that the
algorithm goes through the network backwards. It computes the gradient of the
final layer first then does it to the first layer last. The algorithm of backpropagation
is given as (Benning 2020, p. 32):
CHAPTER 4. METHODS 27
Algorithm 9 Backpropagation
Specify: activation function ϕ, sample {(xi , yi )}si=1 , weight and bias di-
mensions and no. of layer L
Iterate:
for i = 1, ...,s do
for l = 1, ...,s do
Forward pass: compute zil = WlT xl−1
i + bl
Forward pass: compute xli = ϕ(zil )
end for
end for
for i = 1, ...,s do
for l = L, ...,1 do
Backward pass: compute
ϕ0 (z l ) 1 ∇ l(xL , y ), l<L
l i s 1 i i
δi =
ϕ0 (z l ) W δ l+1 , l ∈ {1, ...L − 1} ≥ 0
i l+1
end for
end for
Partial derivatives: compute
∂L
= δjl , j ∈ {1, ...,nl }
∂blj
∂L
l
= δjl xl−1
k , j ∈ {1, ...,nl } and k ∈ {1, ...,nl−1 }
∂wjk
Figure 4.1 shows the results of our models. For Random Forest and Gradient
Boosting algorithms, we used Scikit-learn library (Pedregosa et al. 2011). In this
package, we used grid search, which is a method to try every combination of a
selected list of parameters, for hyper-parameter tuning with 5-fold cross validation
on the whole dataset. For LASSO, we chose the regularisation parameter λ = 8.
For Random Forest, we chose max feature = 10, max depth = 50, 900 estimators
and others in default values 1 . For Gradient Boosting, we chose learning rate =
0.2, max depth = 7, max feature = 8, 50 estimators and others in default values
2. For Extreme Gradient Boosting, we used XGBoost library (Chen & Guestrin
1
Random Forest Regressor
2
Gradient Boosting Regressor
28
CHAPTER 5. EXPERIMENTS AND RESULTS 29
2016) and chose learning rate = 0.1, max depth = 4, ridge parameter = 0.01, 100
estimators and others in default values 3 . For LightGBM, we used lightgbm library
Ke et al. (2017) and chose learning rate = 0.081, max depth = 8, 100 estimators,
10 leaves and others in default values 4 . For Neural Network, we used Pytorch
library (Paszke et al. 2019) to build the network architect. As we had a small
amount of record in the dataset, we set the batch size to be the number of training
data and trained through 500 iterations. Adam Optimiser (Kingma & Ba 2014) is
chosen to optimise the model and the parameters of this algorithm are the default
values set in the package 5 .
In this study, Extreme Gradient Boosting achieves the best performance, scor-
ing R2 = 0.6252, among all models. As for LightGBM, this model is only better
than the baseline model LASSO. This can be explained that we have a limited
number of records. The insufficient number of training example may cause the
model to overfit since the leaf-wise tree growth can be sensitive to overfitting with
small data.
3
XGBRegressor
4
LGBMRegressor
5
Adam Optimiser
Chapter 6
This paper attempted to build an Artificial Intelligence tool for predicting the
prices of Luxstay listings based on the scraped data with a limited set of features,
including the numbers of available guests, bathrooms, bedrooms, type of home,
name of building and district of a listing. Machine Learning models used are
LASSO regression, Random Forest, Gradient Boosting, Extreme Gradient Boost-
ing and Neural Networks. The best model is chosen using K-Fold cross validation
with K = 5 and the best results are assessed in terms of Mean Squared Error and
R2 statistic. Among the models trained and tested, Extreme Gradient Boosting
achieved the best performance with an R2 of 0.6252 and a MSE of 1328.01 through
the cross validation.
Nevertheless, we are aware of that there are limitations that affect our study
negatively and we believed these three are the major ones. Firstly, the price
in the dataset is referred as the sticker price (or the advertised price). This is
the price advertised for potential guests instead of the actual price paid by the
previous ones. Hence, some of them may not reflect the real situation and then
this might cause some noises in our model performance (Lewis 2019). Secondly, we
skipped outlier analysis, which consists of techniques treating data points that are
significantly different from other observations, in this study. Figure 5.1 shows the
actual prices against the predictions by Extreme Gradient Boosting and the red
line displays the flawless prediction. From this figure we see that as the actual price
30
CHAPTER 6. CONCLUSION AND OUTLOOK 31
Figure 6.1: Predicted Values vs Actual Prices for Extreme Gradient Boosting
goes up, the precision of our best model decrease, especially for the records above
$500. Thus, we believe that if we apply some special treatment to those outliers
our model performances can be improved remarkably. Thirdly, as we mentioned
several times above, our study is restricted by data limitation. We acknowledged
that we need to include more listings as well as explanatory features such as review
scores and services included in an accommodation.
As for future work, we will attempt to correct the limitations above. We will
also try to run some further experiments with neural net architect. We will tune
with different numbers of output layers and add other techniques such as early
stopping and batch normalisation. We would like to try to integrate text and
image data as inputs in our models. Lastly, we would like to add listings in other
cities and attempt to produce a price suggestion tool for all Vietnamese cities on
Luxstay.
Appendix A
n
!1
X p
kxkp = |xi |p
i=1
Example:
• p = 1: The l1-norm
32
APPENDIX A. SOME SPECIAL MATHEMATICAL NOTATIONS 33
• p = 2: The l2-norm
q
kxk2 = x21 + x22 + ... + x2n
Suppose we have two differentiable functions f (x) and g(x). Then to differentiate
y = f (g(x)), let u = g(x) and then y = f (u). We have
dy dy du
= ×
dx du dx
34
Bibliography
Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984), Classification and
regression trees.
Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. (2004), ‘Least angle regres-
sion’, The Annals of Statistics 32(2), 407–451.
URL: http://www.jstor.org/stable/3448465
35
BIBLIOGRAPHY 36
Gibbs, C., Guttentag, D., Gretzel, U., Yao, L. & Morton, J. (2017), ‘Use of dy-
namic pricing strategies by airbnb hosts’, International Journal of Contemporary
Hospitality Management 30, 00–00.
Goodman, A. C. (1998), ‘Andrew court and the invention of hedonic price analysis’,
Journal of Urban Economics 44(2), 291 – 298.
URL: http://www.sciencedirect.com/science/article/pii/S0094119097920714
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. & Liu,
T.-Y. (2017), Lightgbm: A highly efficient gradient boosting decision tree, in
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan
& R. Garnett, eds, ‘Advances in Neural Information Processing Systems 30’,
Curran Associates, Inc., pp. 3146–3154.
BIBLIOGRAPHY 37
URL: http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-
boosting-decision-tree.pdf
Kingma, D. & Ba, J. (2014), ‘Adam: A method for stochastic optimization’, In-
ternational Conference on Learning Representations .
Lewis, L. (2019), ‘Predicting airbnb prices with machine learning and deep learn-
ing’.
URL: https://towardsdatascience.com/predicting-airbnb-prices-with-machine-
learning-and-deep-learning-f46d44afb8a6 (accessed: 30.05.2020)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J. & Chintala, S. (2019), Pytorch: An imperative style, high-performance deep
learning library, in H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc,
E. Fox & R. Garnett, eds, ‘Advances in Neural Information Processing Systems
32’, Curran Associates, Inc., pp. 8024–8035.
URL: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-
performance-deep-learning-library.pdf
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Pas-
sos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. (2011),
‘Scikit-learn: Machine learning in python’, Journal of Machine Learning Re-
search 12, 2825–2830.
Tang, E. & Sangani, K. (2015), ‘Neighborhood and price prediction for san fran-
cisco airbnb listings’, CSS 229 Final Project Report .
BIBLIOGRAPHY 38
Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, Journal
of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288.
URL: http://www.jstor.org/stable/2346178
Tibshirani, R., Hastie, T. & Friedman, J. (2010), ‘Regularized paths for generalized
linear models via coordinate descent’, Journal of Statistical Software 33.