Professional Documents
Culture Documents
Build A Google Search Autocomplete
Build A Google Search Autocomplete
You may also like: Implementing Low-Level Trie: Solving With C++.
System Requirements
Considering the scale of Google, the factors that we need to keep in mind are
latency, consistency, and availability. A desirable latency should be very low,
giving/changing suggestions on each letter you would type. Next, the system needs
to be available all the time; however, the consistency can be comprised here. Each
time you type something, it would change the frequency of the previously stored
query, which would affect the suggestions. The slight delay here is okay and
eventual consistency would also work.
The Concept:
For example, node �h� stores the search frequency of �h." Its child node, �a,�
stores the search frequency of �ha� and so on. Now, if we want to show the top N
recommendations, say you typed �h,� and the suggestions should show �harry potter�
or �harry styles.� Then, we need to sort all recommendations from the parent node
down to every level on the frequency and show it. This would mean scanning
terabytes of data, and as latency is our goal, this scanning approach would not
work.
Approach #2
To make this approach more efficient, we can store more data on each node, along
with the search frequency. Lets store the top N queries at each node from the
subtree below it. This means that the node �h� would have queries like �harry
potter,� �harley davidson,� etc stored. If you traversed down the tree to �harl�
(i.e. you type �harl�), the node, �l,� would have queries like �harley davidson,�
�harley quinn,� etc.
This approach is better compared to the previous one, as the read is quite
efficient. Anytime a node�s frequency gets updated, we traverse back from the node
to its parent until we reach the root. For every parent, we check if the current
query is part of the top N. If so, we replace the corresponding frequency with the
updated frequency. If not, we check if the current query�s frequency is high enough
to be a part of the top N.
If so, we update the top N with the frequency. Though this approach works, it does
affect our read efficiency � we need to put a lock on the node each time we do
write/update, so that the user won�t get stale values, but if we consider eventual
consistency, then this might not be much of an issue. The user might get stale
values for a while, but eventually, it would get consistent. Still, we will look at
an extension of this approach.
Approach #3
As an extension of the previous approach, we can store the data offline. We can
store a hashmap of a query to its frequency, and once the frequency reaches a set
threshold value, we can then map it to the servers.
Scaling
Now, there wouldn�t be just one big server that stores all petabytes of data; we
would vertically scaling for life � there is a better approach. We can distribute
data (sharding) by prefixes on various servers. For example, prefixes like "a,"
"aa," "aab," etc. would go on server #1 and so on. We could use a load balancer to
keep the map of the prefix with the server number.
But consider this, some servers with data like "x,� �xa,� �yy,� etc. would have
less traffic compared to the letter �a.� So, there can be a threshold check on each
server; if the load surpasses that traffic, then the data can be distributed among
other shards.
If you are concerned about a single point of failure, there can be many servers
acting as load balancer, so if any load balancer goes down, it is replaced by
another one. You can use ZooKeepers to continuously health check load balancers and
act accordingly.