FK - Airbnb's End-To-End Machine Learning Platform - Airbnb - English (Auto

[Music]
today I'll talk a little bit about
bighead which is our end-to-end machine
learning platform I'll cover the
different components for the agenda I
intend to talk about the background
design goals architecture deep dive and
then talk a little bit about open source
plans I tend to move a little bit so I
feel kind of bad for the camera man okay
you're gonna be okay cool
so I guess overall I just wanted to
start how many people have heard of
Airbnb bighead okay that's not too bad a
couple of hands so some of this might be
a little bit repetitive but if you do
have questions I think we have office
hours afterwards and I could dive a lot
deeper this is going to be a little bit
of a high level of exactly what we
decided to do with bighead so background
information how many people have stated
their baby okay
it would have made me very sad if those
lonely a couple hands agenda so overall
as you guys are well aware Airbnb is
product is a kind of a global travel
community that offers magical end to end
trips including where you stay what you
do and the people you meet and machine

learning is imbued within the product
it's been there for quite a long time
and so historically the teams that have
built their own machine learning
infrastructure or the search ranking
team smart pricing and fraud detection
as you can imagine as early as air maybe
existed there were search and so with
search came search ranking and so they
these are the teams that have staff
their own ml infrastructure teams they
are the ones that have invested the
efforts and engineers actually maintain
their own infra but there were
significantly more opportunities to take
advantage of at Airbnb and so some of
the use cases were paid growth how do we
know exactly where we invest our money
in terms of advertising is there any
smarter where you can do it is there any
way we can use ML classifying and
categorizing listings how do we make
sure that we give a better experience
the users by classifying their listings
experience ranking room type
categorizations which is a very
interesting one the room type
categorizations is if we have photos of
homes is there any way that we can tell

the bathroom living room bedroom yes the
use case seems silly when I just say it
but in terms of if we do automated ad
generation we really don't want to show
you five pictures of bathrooms or five
pictures or garage and you can imagine
that really tanks our conversion rate
and so there are so many opportunities
for ml but it's really really difficult
to staff an entire ml and 14 for each of
these use cases and so I'm gonna dive
into kind of the two components of how
we viewed ml and this is really led to
the inception of the ml infrastructure
team and so the intrinsic complexity is
the complexity with understanding the
business domain these are also
complexities that are fundamental
machine learning and you often encounter
these every time you go into a machine
learning problem and so that's
understanding the business domain
selecting the appropriate model
selecting the appropriate features and
fine tuning hyper parameter tuning your
model what are the incidental
complexities what are complexities that
can really be solved and simplified and
so some of the incidental complexities
are integrating with air BnB data

warehouse scale scaling model training
and serving keeping consistency between
prototyping versus production training
versus inference an inference if people
are not familiar it's more of kind of
scoring keep you track of multiple
models versions experiments supporting
the iteration of ml models and so these
incidental complexities come often and
it's you know time that spent precious
time that's spent on your model and so
this actually led to a lot of bloat
so ml models would take on average eight
to twelve weeks to build an ml workflows
tend to be slow fragmented and extremely
brittle and so we've yeah
human time yeah from first thought to
yeah cool I don't know what's the policy
for questions but we can at the end yeah
we can do at the end but a great
question though no worries
yeah and it tended to be extremely
brittle and so you can imagine that as
you're deploying models into production
especially when it's going on Airbnb
website or kind of the the application
we want to make sure that these are very
robust and they're resilient to failure
and so the ML infrastructure team was

formed to address these challenges the
vision was Airbnb routinely ships ml
powered features o ml powered features
throughout the product and then the
mission is equipping Airbnb with share
technology to build production ready
machine learning applications with no
incidental complexity obviously the
intrinsic complexity will always be
there understanding the business domain
as well as building the model however we
really want to simplify the into the
incidental complexities with it
connecting to the data warehouse does
need to be reinvented over and over
again we can really simplify that so
this is kind of the pinwheel effect for
ml this was built by my predecessor and
it's kind of you start with your data
management and then you go into your
prototyping you go into your model
lifecycle management and then you go
into production ization and then there's
this constant iteration and the more you
iterate the better you can kind of build
your ml model and we want to make this
as streamlined as possible as you can
imagine we streamlined that iteration
process there's a lot more potential for
development and new models to be created

so our design goals these are the four
goals that we had in mind when building
our infrastructure our infrastructure we
started about two and a half years ago
building out this infra and the four
goals were keeping it seamless versatile
consistent and scalable and I'll talk
about each one and how it plays a role
in each of our components so defining
seamless a little bit further seamless
easy to prototype easy to production
eyes same workflow across different
frameworks
making it versatile supports all major
ML frameworks meets various requirements
online offline data size SLA so service
level agreements GPU training as well as
scheduled and ad-hoc training a
consistency consistent environment
across the entire stack consistent data
transformation whether that's your
prototyping environment or production
and whether it's an online or offline
use case and keeping it scaleable
horizontally scalable and making it very
elastic as you can imagine Airbnb is a
very seasonal type of business and so we
do see spikes in traffic here and there
and so we want to make sure that we do

have a very elastic infrastructure for
that so we'll go into the architecture
deep dive big head is primarily made up
of these seven components it starts off
with red spot goes into that big head
service and then goes into your deep
thought for real-time inference and then
it goes into ml Automator for your batch
training as well as your batch inference
the environment management the dock is
through docker image service another
component we own execution management is
through our big head library and then we
have feature data management zipline and
as you can see the three services we
have underneath which is the docker
image service big head library and
zipline these kind of span the entire
board of these other services and they
work in conjunction I'll talk about each
one individually exactly why we've built
it why we felt like we needed to build
it what problems they solve and describe
it a little bit further so diving into
redspot so how many people have at least
used jupiter notebooks before awesome ok
I think that's move more than people who
raise their hands for Airbnb here so
Jupiter notebooks everybody's pretty
familiar with it
we found that Jupiter notebooks are
ideal for machine learning because a lot
of machine learning models we create
them to almost like research papers and
projects it's not a fact of the end
result is as important it's also the
development it's the ideation of it the
things that you've tested the things
that you've seen
and so Jupiter notebooks really do a
good job in terms of persisting all that
intermediate state of experiments trial
and error and so you can actually
justify why this model is a good one to
put in production and the thought
process behind it so what makes an ideal
machine learning development environment
it's that interactivity and feedback
making sure that you can execute
different cells as well as you have that
visualization it's access to very very
powerful hardware and then it's access
to the data and so what is Red Spot Red
Spot is our supercharged Jupiter
notebook service it's a fork of Jupiter
hub for those of you who are familiar
with Jupiter hub it's integrated with
our data warehouse to make sure that
individuals don't have to integrate the

access themselves it has access to
specialized hardware so you can actually
run on GPUs if you'd like to it has file
sharing between users via AWS EFS we
greatly promote collaboration with an
airbnb and so this is a big part of most
of our infrastructure that ml the ml
community at Airbnb loves to share their
work as well as loves to share their
ideas and so we make sure that even in
red spot we have built-in functionality
so people can share their work with one
another and then we're packaged in that
familiar Jupiter hub UI so this is kind
of the snippets of the page the Red Spot
home page you get to choose your own
doctor image so you get to choose what
libraries you have within your doctor
container and then from there you get to
choose your actual instance type whether
you need something very small and so we
do support T to the mediums all within
AWS we also support GPUs for really
really expensive instances we usually
ask beforehand for people to ask our
team just because we are cost conscious
and we want to make sure that people
aren't spinning up X ones on AWS because
that would be very very expensive for us
but it's nice to have the support where

we can have a suite of different type of
hardware for users and different types
of use cases
so doubling back to some of the themes
that I've mentioned before you have
consistency so this promotes prototyping
in the exact same environment that your
model will be used in this is through
that docker image we make sure that
you're you're in the same image that
you're going to be deploying into
production versatility you get to use
customized Hardware whatever you need
for your ML use case and then customize
dependencies the docker images can
support Python to Python 3 we're slowly
deprecating python 2 just because python
3 is being deprecated across the
industry but we do allow you to have a
suite of different types of libraries
within your images and then seamless
it's integrated nicely with Paquette
service docker image service via the big
head api so as well as the UI widgets
and so when you're interacting with the
big head service you can actually see
the same exact visualizations that you
would see there within your redspot
environment and you can make sure that

you have that consistency across the
board so next step is diving into docker
image service docker image service it's
the environment customization built into
big head and so why do we need docker
image service so ml users have a diverse
heterogeneous set of dependencies I'm
sure everyone who's dabbled at ml here
knows that there are many many different
frameworks as well as many different
libraries and the entire industry is
very dynamic and it's been changing very
frequently in terms of new libraries new
support for things and so we need an
easy way to bootstrap their own runtime
environment so that they have support
for libraries they need as well as it
needs to be consistent with the rest of
Airbnb so infrastructure which is why
we've kind of moved towards a docker
environments and so dependency
customization we allow for people to
choose their own customization for their
docker images and create their own
images we've really built on top of the
docker API to simplify this for users so
that they can create all their own
custom images as well it promotes that
consistency and versatility and the fact
that users can choose what they want as

well as make sure that this docker image
is going to be used within production
this is something we've seen pretty
often where individuals would have
different versions of libraries when
they're prototyping and when they deploy
into production which can lead to a lot
of strange effects in production and
it's something that's very difficult to
debug as well next we'll talk about big
head service so biggest service is the
model lifecycle management so why is
this needed why is model lifecycle
management needed tracking ml model
changes is just as important as tracking
code changes it's not just code anymore
it's also the model weights that you
have to track you have to make sure that
you fully encapsulate what you've
actually put into this model and it's we
would argue a little bit more complex
than just tracking code now ml model
work needs to be reproducible as well as
sustainable and so if there is any type
of rollback how do you actually roll
back to a previous version of the model
how do you guarantee that that model is
identical to what was trained before and
then how do you compare experiments

before you launch models into production
this is also critical how do you know
whether the model that you've built now
is better than the one previously and so
these are all questions that we had in
mind and we've kind of went out and
built began service for this exact
purpose and big at service does the
model versioning this is the UI
component of it you can see here you
have your model you have the different
trained models you also have the version
get jaws for exactly what the code is
for that was used to train that model
it also has time stands for the data the
the date that it was trained so that you
can actually correlate to exactly the
date that all the features were used to
as well and then you could download the
artifacts and actually port it over if
you'd like as well in debug it so we do
understand that going into production
has production in some incidents that
always occur and so we do have links to
your kubernetes cluster that you can
actually debug it if your inference is
going wrong in production and then the
quality's again that I mentioned
consistency it's a central model
management service and so we keep track

of the get shot to make sure the code
that we version that we make sure that
we have the artifacts so basically all
the train weights
well saved with an s3 and that's also
version so that you can roll back
you can also we want to support in the
future so that you can do
experimentation to compare different
versions of the same model so that you
can see whether if you trained it on a
week's worth of data versus a month that
you can actually compare to see if one
performs better than the other it's a
single source of truth it's a nice place
where bighead users can go to see
exactly what's the point in to
production and exactly what is deployed
in production just because models can be
fairly complex in terms of dependencies
as well as when it was actually last
trained and then seamlessness its
context of where visualizations carry
over from Red Spot all the way to pick
at service to make sure that if you use
certain visualizations you have a
consistent experiment experience across
the board
big head library so ml models probably

everyone is very very familiar here it's
it's highly heterogeneous there are a
lot of different ml frameworks out there
and Airbnb is full of different
employees who have different experiences
with different ml frameworks and so some
of the frameworks that we've seen are
tensorflow tied toward Karros MX net XG
Bou scikit-learn and it's just wide
across the board and in terms of verse
the heterogeneity of the data as well
the data quality is very very different
across the board as well as you have
your structured data and your
unstructured data and then in terms of
environments the needs range from GPUs
to many many CPUs to a single CPU and
the dependencies also were very very
different across the board as well which
is why and so making it kind of concise
data in production is different from
data in training offline pipeline is
different from your online pipeline and
everyone does everything in a very very
different way ml is not a standardized
kind of format yet and so we've built
the big head library it essentially is
this framework that we've built where
you can build wrappers around common ml
frameworks just so that people can use

what they're already familiar with
we don't really dictate whether you
should use tensorflow over pi torch we
say that we have this wrapper so that we
can serialize your model no matter what
you use
and now we can make sure that code is
the same exact code deployed in
production and what you're using in
prototype environment and so basically
we do have pipelines and through this
pipeline it's a compute graph for
pre-processing your inference your
training your evaluation and
visualization these are composable
reusable and shareable it supports
popular frameworks like tensor flow pipe
Torchic arrows and X nets I could learn
actually boost we've built some of the
pre-processing steps in C++ just so that
we can share some of our best practices
and we've seen a 30x boost in
performance compared to the Python
implementation and we also have metadata
for the Train models as well persisted
within the bigot library and then it's a
for consistency it has a uniform API to
make sure that we can actually process
in a very similar way in your prototype

as well as your production environments
and it's serializable which means that
we can actually port it over into
production have it online offline be the
same exact pipeline and so this is
actually the config so over here you can
see that we've specified the categorical
features and then the numeric features
and then we instantiate a pipeline over
there we say for the numeric features
that we wanted to Anantha mean and then
for the categorical features we do one
hot label encoder and then we attach and
actually boost classifier at the end and
we set the hybrid parameters and so now
we can port this over into offline
training or offline inference or even
online inference and it's the same exact
pipeline and we can make sure that we
have that consistency of code across the
aldi in different environments this is
just the visualization of the pipeline
this visualization you could have both
in that prototyping environment Called
red spot as well as inside the big head
service to make sure that you're looking
at the same visualizations and then the
nice thing about it is you can serialize
it very quickly just the p dot serialize
and this means that you can manually

upload the model if you want into big
head service or we can automatically
upload it for you as well when you're
ready to deploy into production so these
are some of the visualizations that we
have in the big head service
you know some more visualizations this
is just a simple feature importance cool
next we'll talk about deep thought so
deep thought is our online inference
service so what's really difficult about
making a model serving traffic online
well consistency staying consistent with
training is is very very difficult just
because your training environment on
where you've trained your model is going
to be fairly different from what you're
doing in production and so it's
different data usually different
pipeline different code and it's
different dependencies it's very
difficult for data science to launch
models without an engineering team just
because it does need to plug into the
Airbnb application it also is very
difficult for engineers to rebuild
models just because it's not it's
difficult for engineers to often rebuild
models just because there's no previous

knowledge of how the model was built
from data science to pour it over into
production and then it also needs to be
very very scalable and robust just
because this is going into production
where this can affect the critical path
and so there are resource requirements
that vary across the different models to
make sure that let's say you need
inference in 50 milliseconds for to not
slow down the Airbnb website and so
there are a lot of requirements and
scalability requirements as well and
throughput fluctuates quite a bit across
time as well as you can imagine
seasonality is quite a big thing for
every MB just as a b2c business so how
this deep thought solved this well deep
thought solved the consistency aspect
through docker and the big library we're
making sure that it's the same exact
data source the pipelines identical as
well as the environment is identical as
training we want to make sure that
exactly what you did in training and
prototyping is identical to what you get
in production so that there's no
confusion as to the version of my
dependency has changed over time you
know and what was deployed and its

operating very differently it's seamless
so it integrates with event logging and
dashboards as well as it integrates with
zip line which I'll talk a little bit
later which is our feature management
framework and it's also highly scalable
it's built on kubernetes the
model pods can scale very easily and
there's resource segregation across
model so that we have no noisy neighbor
problem we're single model can take down
the other models this is the
architecture of how we've built deep
thought and so the client traffic goes
through our REST API it goes into our
model manager which checks with the big
head service client to see which models
are registered and then it goes through
routing where it routes the traffic to
the different model pods that are
actually hosting the artifacts users get
to choose how many pods they want to be
on as well as we've deployed Auto scale
to make sure that if there is a spike in
usage that we can deploy more pods for
that model as well
ml auto meter so and a lot of mater is
our offline training and batch inference
and so why is this needed automating

training and inference and eval is
necessary because you need to do
scheduling you need to do resource
allocation for these jobs you need to
save the results somewhere you need the
dashboards and alerts and you need to do
the entire orchestration for this to run
smoothly and so with ml Automator once
again with those themes it's consistent
because we have that same docker and big
head library the same one that we used
to deep thought this is especially nice
because users can now do offline
inference and online inference just with
a simple config change on whether they
want to deploy it online or offline it's
seamless and so you can automate tasks
via air flow how many people use air
flow here so you can actually automate
the tasks via air flow you could
generate your Dax for training inference
with the appropriate amount of resources
whether you want to use distributed
spark for training and then it has tight
integration with zip line for training
and scoring data and is highly scalable
we use spark for distributed computing
across large data sets and so this is
just a picture of the airflow UI and the
dag that's automatically generated

through ml Automator
and then lastly we will talk about
zipline and so feature management is
incredibly difficult we've we've
allocated about 50% of our team we're a
team about 12 now on zipline because
this feature management is what we found
to be one of the hardest things to do
for ml we've also found that a lot of
data scientists are spending most their
time with feature engineering the model
management model lifecycle is another
portion of it but a lot of the hardware
to spend in feature engineering so why
is it incredibly difficult to do feature
engineering correctly inconsistency
between your offline and online data
sets what does that mean that means that
the data that you have when you're
running things in production might not
be the data you're seeing when you're
training and so I do have a nice tie
maybe I can just show the diagram
quickly so I have this diagram here
which is the essence of why this is so
difficult in most companies previously
before zipline was in the middle part
you had your production data stores that
would be used for model scoring and then

it go into a data warehouse go through
ETL jobs that aggregated daily and
that's the data you're seeing to train
and so it turns out that data can be
very very different from the data that
you'll see live in production and so how
do you make sure there is no label
leakage right how do you make sure that
you're not training on data that are as
a good proxy for how your models should
predicted scores but now when you deploy
into production you know the accuracy
drops tremendously because of it which
is kind of the difficulty with the
offline online data sets it's also very
very tricky to generate your training
sets that depend on time correctly and
so one of the most difficult things is
getting that point in time correctness
especially as you can imagine if it goes
through multiple ETL jobs that aggregate
it daily you need to make sure that you
persist the exact precise timestamps of
this user has clicked on this button
here at this timestamp and these are the
sequence of events and now I can do a
personalized search result saying hey
this person is interested in this Airbnb
listing other problems that are making
it really really difficult is training

set back fills it really really takes a
lot of expertise to do back
phil's correctly and in a reasonable
amount of time and usually you know us
it's take number of years of practice
and painful painful understanding to
understand that backfills is no joke and
making an efficient is very very
difficult as well
inadequate data quality checks or
monitoring and so we've seen that
feature drifts do happen over time your
features do change and model training is
best practice to make sure that you're
constantly training your models
depending on how stale your models get
but over time there are models that
don't have this best hygiene put into
place and so how do you know if your
features are drifting slowly over time
and how do you make sure that you know
whether to retrain your model or not and
then the other kind of hard thing about
it especially at Airbnb is that unclear
feature ownership who owns what feature
especially because this these features
are shared across the entire company and
so if the feature breaks if an upstream
pipeline breaks who's responsible for

fixing it and so these are fundamentally
some of the problems that made it
extremely difficult for us and so we've
went out and build zip line and zip line
maintains that consistency across data
from training and scoring it maintains
that consistency of data that use in
development to production it also has
that point in time correctness to make
sure that we don't have any label
leakage as well as to make sure that if
you do relying on intraday sequence of
events that we persist that as well it's
seamless it integrates with deep thought
and ml on demeanor so that you can make
sure that you're using the same exact
features that you're using in
prototyping as well as in production and
then it's highly scaleable it leverages
spark for batch and it uses flame for
streaming workloads and so the way we've
solved it is through this kind of
architecture where we have zip lined in
the middle as a middle layer and zip
line is the one that's making that
consistency between your data stores and
your data warehouse and so as the data
comes into the data warehouse whether
it's through events or through database
mutations we ingest it raw from there

and we actually persist all of these
mutations and events so that we can
recreate your data at any point in time
this is amazing because now if you
choose to
have a snapshot of your data at the end
of the day we can actually recreate it
from the transactions and the mutations
to make sure that all the data that you
see there will be the ones that's
available to you when you're actually
scoring online as well so this helps
with preventing data leakage label
leakage as well as and make sure that if
you're relying on the sequence of events
that we persist those sequences as well
and this is nice because now we have a
single store where we can train our
models as well as we can use that model
the training data for actually scoring
and making sure that consistencies in
place it's the same exact data across
the board so overall the summary we've
built this end-to-end platform to build
deploy machine learning models to
production that seamless versatile
consistent and scalable model lifecycle
management has been a huge part of that
effort as well this is one of those

things that it won't buy you the first
time around because you've built your
model it bites you on the second third
time and on the third time if you ever
need to roll back to a previous model it
really really really is painful if you
don't have good model icicle management
feature generation management at future
generation and management was also a key
point with zip line this is also one of
those most painful things where feature
engineering could be simpler and we can
really scale this out to the company and
abstract away a lot of the pain points
for users and it turns out that a lot of
the features that are being used across
the board at the company are shared and
so if we solve it once we could solve it
quite for a number of people who are
rebuilding the same features over and
over again keeping that consistency
between online offline inference we do
that through ml Automator the big head
libraries as well as deep thought and
then pipeline library support for major
frameworks reiterating on the fact that
Airbnb is full of Engineers that come
from many different companies the Google
engineers really like using tensorflow
the data scientists who come from

Facebook really like to use PI tortes as
you could imagine and so we really don't
want to do an enforcement that everybody
has to
use tensorflow everybody has to use pine
torch and this is a stance we've taken
that's different a little than a lot of
other companies so if you look at tf-x
or a lot of the other frameworks they've
kind of built on top of one single one
and they're not really agnostic to
whichever nml framework you want to go
with and so our stance is if you want to
use a different library we're open to it
and we have a flexible bighead wrapper
so you can actually do these wrappers
across your library so that you can use
it however you want
and then we have the docker image
customisation service docker has been a
huge saver for us to make sure that we
have this consistent environment from
prototyping to production and then
multi-tenant training environments as
well as even multi-tenant serving
environments as well just making sure
that we can scale both in training as
well as in scoring just because some of
these cases get very very massive and

you know more and more of the deep
learning models and more layers that
come into it require more and more
resources for them to properly run and
so we want to make sure that we're
highly flexible there bighead is built
on a lot of open-source technologies
tensorflow of pipe torch Karos MX net
scikit-learn actually boosts spark
Jupiter kubernetes docker airflow and so
we're kind of falling suit and we're
looking to open sources as well for
those of you are already familiar with
begad probably know that we've been
saying this for quite some time but we
have made concrete steps towards it we
had to delay a little bit as everybody
gets ready for IPO and there's a lot of
work to be done still there but we are
going to open source and we're right now
in the phase of selecting our first
couple private collaborators essentially
we want to dip our toe into the open
source kind of space and we want to just
see exactly how much work it is and so
if you're interested please email me
we're looking for our partners just a
couple just to really test the waters
and see how much work open-source is
going to be and then we'll be moving

more towards that broad open source
where it'll be open to everyone cool any
quick questions
hi can you describe a little bit about
how the teams came together to build
this and maintain it you know the Airbnb
can I have to restructure a teams to
make it happen
yeah so the truth is the trust team at
one point was maintaining a lot of their
own infra for machine learning and it
turns out that there were two other
teams doing this as well
and from that point it became pretty
clear that building and rebuilding ml
infrastructure was not the goal of the
company and so this team kind of came
together with a couple of engineers who
thought we should really standardize
this across the board and then from
there it's been growing and growing in
terms of the number of use case that's
been onboarding this is actually the
third time we've built ml infra so we're
quite experienced at this we've had a v1
and even before that we had something
called arrow solve that I don't know if
anyone's familiar with there's a nod
were there but like it was a learning

experience across the board and so yeah
it's just more of we've been bitten
enough times that we keep rebuilding it
and it's a lot of work for engineers to
maintain ml infra over and over again
and so standardizing it we made a lot of
sense and so we charted a team got
together and kind of started building
this out but it started very not so much
top-down more bottom-up yeah
[Music]

FK - Airbnb's End-To-End Machine Learning Platform - Airbnb - English (Auto

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FK - Airbnb's End-To-End Machine Learning Platform - Airbnb - English (Auto

Uploaded by

Copyright:

Available Formats

[Music]

today I'll talk a little bit about

bighead which is our end-to-end machine

learning platform I'll cover the

different components for the agenda I

intend to talk about the background

design goals architecture deep dive and

then talk a little bit about open source

plans I tend to move a little bit so I

feel kind of bad for the camera man okay

you're gonna be okay cool

so I guess overall I just wanted to

start how many people have heard of

Airbnb bighead okay that's not too bad a

couple of hands so some of this might be

a little bit repetitive but if you do

have questions I think we have office

hours afterwards and I could dive a lot

deeper this is going to be a little bit

of a high level of exactly what we

decided to do with bighead so background

information how many people have stated

their baby okay

it would have made me very sad if those

lonely a couple hands agenda so overall

as you guys are well aware Airbnb is

product is a kind of a global travel

community that offers magical end to end

trips including where you stay what you

do and the people you meet and machine

it's been there for quite a long time

and so historically the teams that have

built their own machine learning

infrastructure or the search ranking

team smart pricing and fraud detection

as you can imagine as early as air maybe

existed there were search and so with

search came search ranking and so they

these are the teams that have staff

their own ml infrastructure teams they

are the ones that have invested the

efforts and engineers actually maintain

their own infra but there were

significantly more opportunities to take

advantage of at Airbnb and so some of

the use cases were paid growth how do we

know exactly where we invest our money

in terms of advertising is there any

smarter where you can do it is there any

way we can use ML classifying and

categorizing listings how do we make

sure that we give a better experience

the users by classifying their listings

experience ranking room type

categorizations which is a very

interesting one the room type

categorizations is if we have photos of

homes is there any way that we can tell

use case seems silly when I just say it

but in terms of if we do automated ad

generation we really don't want to show

you five pictures of bathrooms or five

pictures or garage and you can imagine