You are on page 1of 16

4.

 Patterns and Data Modeling

Sec. Behavior Assumed Knowledge Practice and Study

Choose the most efficient data


structure for a given use case…

Sample question:

Suppose you're tracking active users


on your site. You need to know
Understanding the basic features of
exactly how many unique, active users https://blog.getspool.com/
all Redis types and data structures.
have been visited in the past seven 2011/11/29/fast-easy-
4.1 These will include string, hashes,
days. Which Redis data structure realtime-metrics-using-redis-
sets, lists, sorted sets, pub/sub,
would you use to most efficiently bitmaps/
hyperloglog, and streams.
answer this question?

1. String
2. HyperLogLog
3. Set
4. Bitfield
5. List

Choose the most appropriate and


space and time efficient Redis data
The pros and cons of modeling a
structure for storing a domain object.
domain object using a Redis string,
4.2 How can the object be represented
hash, and the ReJSON Redis
using a string, hash, as ReJSON?
module.
What are the pros and cons of each
approach?

Objects can be stored as a string as


long as the client knows how to
serialize them. Possible
Create, read, and update a domain
4.3 serialization formats might include
object stored as a string.
CSV, JSON, or MsgPack, but since
Redis strings can store binary data,
any serialization format is possible.

HMSET vs HSET
Create, read, and update a domain
4.4
object stored as a hash.
HMGET vs HGETALL vs HGET

4.5 Generate a unique ID and key for an How to create a sequence using a
object. counter (i.e., the INCR or HINCR
command).
There are two general techniques:
How to generate unique IDs using
1. Define a virtual sequence streams.
using a counter.
2. Use a stream's ID generation
capability.

For example, suppose we need to


create a new user object. For that,
we'll need a unique key. So might
define the key like so:

users:[user_id]

But where does "user_id" come from?


That's where a sequence or a stream
ID can help.

If we use a sequence, then we define a


key whose value is the latest integer
used as an ID. Something like this:

INCR sequence:users

The return value to this command will


be an integer that we can use for the
value of "user_id." If the command
The difference between an ID
returns the integer 300, the our user
generated using a counter and an
key becomes:
auto-generated stream entry ID.
users:300

Alternatively, we could use a stream to


generate our unique ID. Suppose a
new user enters our system at time T
and from IP address IPADDR. In this
case, we create a stream entry like so:

XADD stream:users * ts T ip IPADDR

The response to this command will be


the ID of the stream entry (e.g.,
"1572147367625-0"). We can also use
this ID to create our unique user key:

Users:1572147367625-0

It's also useful to know that the part of


the ID preceding the "-" is a Unix
timestamp having millisecond-level
granularity.

Indexing

4.6 Sets are best for assigning data to a https://redis.io/topics/indexes


Understand when and how to use a set
category. For instance, we could
store the

IDs of all active users in a set called


status:users:active.

We could also store each user


preference in a set. For example,
this would mean storing the IDs of
all users who would like to receive
notifications in a set called
preferences:users:notification.

To find the set of all active users


who also want to receive user
notifications, we perform a set
intersection:
to index data.
SINTER status:users:active
preferences:users:notifications

If we wanted to store this set for a


day, we'd use SINTERSTORE plus
EXPIRE like so:

SINTERSTORE notification:daily
status:users:active
preferences:users:notifications

EXPIRE notification:daily 86400

(Note that 86,400 is the number of


seconds in a day.)

Sorted sets allow us to index data


by numerical attributes.

A sorted set orders its members by


numerical score. We can store the
ID of the objects to index as the
member. We can then perform
range queries against the sorted set.
Understand when and how to use a
4.7 https://redis.io/topics/indexes
sorted set to index data.
For example, suppose we're storing
stock values in Redis. We can store
all the latest values for a stock in a
hash (e.g., symbol, market cap,
volume, current price, etc.), but if
we want to be able to see all stocks
within a given price range, then
we'll need to create a sorted set to
index that data.
Relationship Mapping

Use a set. For instance, a user in a


social network may have many
followers. In this case, create a set
for each user's followers. The set is
Understand how to map a one-to- named like so:
4.8
many association using a set.
user:[user_id]:followers

The set then contains one user ID


for each of a user's followers.

Counting

Record the number of events in a INCR, EXPIRE, Including a date


4.9
given time window using a counter. window in a key name

Record the number of unique events in SADD, EXPIRE, Including a date


4.10
a given time window using a set. window in a key name

Queues

https://redislabs.com/ebook/
part-2-core-concepts/chapter-
4.11 Construct a queue using a list. LPUSH, RPOP 6-application-components-in-
redis/6-4-task-queues/6-4-1-
first-in-first-out-queues/

https://redislabs.com/redis-
Construct a blocking queue using a best-practices/
4.12 LPUSH, BRPOP
list. communication-patterns/
event-queue/

Construct a reliable blocking queue https://redis.io/commands/


4.13 LPUSH, BRPOP, RPOPLPUSH
using multiple lists. rpoplpush

Other Use Cases

Events may be modeled with a list,


Represent a series of events (activity
but streams are generally the better
4.14 stream, message log, metrics history)
option. Consider the pros and cons
in Redis.
of each of these approaches.

4.15 Understand what a leaderboard is and Sorted sets. https://redislabs.com/redis-


how to implement one using Redis. enterprise/use-cases/
leaderboards/
This is also covered in Redis
University courses RU102J
and RU102JS.

Understand how to represent time-


series data using a sorted set.
This is covered in Redis
University courses RU102J
Know that the RedisTimeSeries
and RU102JS.
module is a custom-built solution for
4.16
the time series data problem.
https://redislabs.com/redis-
best-practices/time-series/
Note that knowledge of how to use
sorted-set-time-series/
RedisTimeSeries and its specific
features is not required.

Understand two techniques for


implementing a rate-limiter in Redis,
and the pros and cons of each
technique.

These techniques are the fixed-


https://redislabs.com/redis-
window rate limiter and the sliding
best-practices/basic-rate-
window rate limiter.
limiting/
The fixed window technique stores
https://engagor.github.io/
counts in a string using INCR. The INCR, EXPIRE, Scoping a key
blog/2017/05/02/sliding-
key pointing to the counter is scoped name to a particular time bucket
window-rate-limiter-redis/
4.17 to a timestamp. And the key is
expired. ZADD, ZCARD,
Both of these techniques are
ZREMRANGEBYSCORE
also discussed in Redis
The sliding window rate limiter uses a
University courses RU102J:
sorted set, where the scores are
Redis for Java Developers
timestamps, and the members are
and RU102JS: Redis for
another unique value (perhaps a
JavaScript Developers
timestamp plus an IP-address).

Note that in both cases, the access


token being rate-limited needs to
appear in the key that points to the
data structure being used.

A simple example: Daily Active Users


To count unique users that logged in today, we set up a bitmap where each user is identified by an offset value.
When a user visits a page or performs an action, which warrants it to be counted, set the bit to 1 at the offset
representing user id. The key for the bitmap is a function of the name of the action user performed and the
timestamp.

In this simple example, every time a user logs in we perform a redis.setbit(daily_active_users, user_id, 1). This
flips the appropriate offset in the daily_active_users bitmap to 1. This is an O(1) operation. Doing a population
count on this results in 9 unique users that logged in today. The key is daily_active_users and the value is
1011110100100101.
Of course, since the daily active users will change every day we need a way to create a new bitmap every day.
We do this by simply appending the date to the bitmap key. For example, if we want to calculate the daily unique
users who have played at least 1 song in a music app for a given day, we can set the key name to be play:yyyy-
mm-dd. If we want to calculate the number of unique users playing a song each hour, we can name the key name
will be play:yyyy-mm-dd-hh. For the rest of the discussion, we will stick with daily unique users that played a
song. To collect daily metrics, we will simple set the user’s bit to 1 in the play:yyyy-mm-dd key whenever a user
plays a song. This is an O(1) operation.
redis.setbit(play:yyyy-mm-dd, user_id, 1)
The unique users that played a song today is the population count of the bitmap stored as the value for the
play:yyyy-mm-dd key.To calculate weekly or monthly metrics, we can simply compute the union of all the daily
Bitmaps over the week or the month, and then calculate the population count of the resulting bitmap.
You can also extract more complex metrics very easily. For example, the premium account holders who played a
song in November would be:
(play:2011-11-01 - play:2011-11-02 -...-play:2011-11-30) n premium:2011-11
Performance comparison using 128 million users The table below shows a comparison of daily unique action
calculations calculated over 1 day, 7 days and 30 days for 128 million users. The 7 and 30 metrics are calculated
by combining daily bitmaps.
Period Time (ms)
Daily 50.2
Weekly 392.0
Monthly 1624.8
Optimizations In the above example, we can optimize the weekly and monthly computations by caching the
calculated daily, weekly, monthly counts in Redis.
This is a very flexible approach. An added bonus of caching is that it allows fast cohort analysis, such as weekly
unique users who are also mobile users - the intersection of a mobile users bitmap with a weekly active users
bitmap. Or, if we want to compute rolling unique users over the last n days, having cached daily unique counts
makes this easy - simply grab the previous n-1 days from your cache and union it with the real time daily count,
which only takes 50ms.

RPOPLPUSH source destination Time complexity: O(1) Atomically returns and removes the last element (tail)
of the list stored at source, and pushes the element at the first element (head) of the list stored at destination. For
example: consider source holding the list a,b,c, and destination holding the list x,y,z. Executing RPOPLPUSH
results in source holding a,b and destination holding c,x,y,z. If source does not exist, the value nil is returned and
no operation is performed. If source and destination are the same, the operation is equivalent to removing the last
element from the list and pushing it as first element of the list, so it can be considered as a list rotation command.
As per Redis 6.2.0, RPOPLPUSH is considered deprecated. Please prefer LMOVE in new code.

Return value Bulk string reply: the element being popped and pushed.
Examples
redis> RPUSH mylist "one"
(integer) 1
redis> RPUSH mylist "two"
(integer) 2
redis> RPUSH mylist "three"
(integer) 3
redis> RPOPLPUSH mylist myotherlist
"three"
redis> LRANGE mylist 0 -1
1) "one"
2) "two"
redis> LRANGE myotherlist 0 -1
1) "three"
redis>
Pattern: Reliable queue Redis is often used as a messaging server to implement processing of background jobs
or other kinds of messaging tasks. A simple form of queue is often obtained pushing values into a list in the
producer side, and waiting for this values in the consumer side using RPOP (using polling), or BRPOP if the
client is better served by a blocking operation. However in this context the obtained queue is not reliable as
messages can be lost, for example in the case there is a network problem or if the consumer crashes just after the
message is received but before it can be processed. RPOPLPUSH (or BRPOPLPUSH for the blocking variant)
offers a way to avoid this problem: the consumer fetches the message and at the same time pushes it into a
processing list. It will use the LREM command in order to remove the message from the processing list once the
message has been processed. An additional client may monitor the processing list for items that remain there for
too much time, pushing timed out items into the queue again if needed.

Basic Rate Limiting Building a rate limiter with Redis is easy because of two commands INCR and EXPIRE.
The basic concept is that you want to limit requests to a particular service in a given time period. Let’s say we
have a service that has users identified by an API key. This service states that it is limited to 20 requests in any
given minute. To achieve this we want to create a Redis key for every minute per API key. To make sure we
don’t fill up our entire database with junk, expire that key after one minute as well. Visualize it like this:

User API Key = zA21X31, Green represents unlimited and red represents limited.
Redis Key zA21X31:0 zA21X31:1 zA21X31:2 zA21X31:3 zA21X31:4
Value 3 8 20 >2 20
Expires at Latest 12:02 Latest 12:03 Latest 12:04 Latest 12:05 Latest 12:06
Time 12:00 12:01 12:02 12:03 12:04

The key is derived from the User API key concatenated with the minute number by a colon. Since we’re always
expiring the keys, we only need to keep track of the minute when the hour rolls around from 59 to 00 we can be
certain that another 59 don’t exist (it would have expired 58 minutes prior). With pseudocode, let’s see how this
would work.
1
>
GET
[user-api-key]:[current minute number]
2 If the result from line 1 is less than 20 (or unset) go to 4 otherwise line 3
3 Show error message and end connection. Exit.
4
>
MULTI
OK
> INCR [user-api-key]:[current minute number]
QUEUED
> EXPIRE [user-api-key]:[current minute number] 59
QUEUED
>
EXEC
OK
5 Do service stuff

Two key points to understand from this routine:


1) INCR on a non-existent key will always be 1. So, the first call of the minute will be result the value of 1
2) EXPIRE is inside a MULTI transaction along with the INCR, which means form a single atomic operation.
The worse-case failure situation is if, for some very strange and unlikely reason, the Redis server dies between
the INCR and the EXPIRE. When restoring either from an AOF or in-memory replication, the INCR will not be
restored since the transaction was not complete. With this pattern, it’s possible that anyone one user will have
two limiting keys, one that is being currently used and the one that will expire during the same minute window,
but it is otherwise very efficient.

All rate limits on the Instagram Platform are controlled separately for each access token, and on a sliding
1-hour window. We're allowed to do 5000 API calls per access token each hour.

------x------x---------x---------------x-----x---x--------x----→ Time

Every point on the axis represents an API call. The sliding window is an hour in this case. Every time we
do an API call we add the timestamp (in microseconds) of the API call to a list. In pseudocode:

timestamps.push(Date.now());

When we're about to do an API call in our crawler we need to check if we're allowed to do one:
1. How many calls did we do in the last hour?
callsInTheLastHour = timestamps.filter(timestamp => timestamp > now -
slidingWindow);
count = callsInTheLastHour.length

2. How many can we still do?


After we calculated the amount of API calls we did in the last hour we can calculate the remaining API calls:
remaining = maxCallsPerHour - count;

Let's say we did 4413 API calls in the last hour then we're allowed to do 587 more at this moment. Great, we
got our algorithm. Now we need some kind of database to store a list of timestamps grouped per access token.
Maybe we could use MySQL or PostgreSQL? Yes, we could but then we would need a system that periodically
removes outdated timestamps since neither MySQL or PostgreSQL allow us to set a time to life on a row. What
about Memcached? Yes, that's also an option. Sadly Memcached doesn't have the concept of an array or list (we
could serialize an array using our favourite programming language). We can do better. What about Redis? Yes, I
like where this is going. Redis is a key/value store that supports lists, sets, sorted sets and more. We're ready to
translate our algorithm to Redis commands. Assuming you already have Redis installed, start a server: $
redis-server. If you're on a mac, you can just $ brew install redis to get it. We're going to use a
sorted set to hold our timestamps because it fits our needs.
> MULTI
> ZREMRANGEBYSCORE $accessToken 0 ($now - $slidingWindow)
> ZRANGE $accessToken 0 -1
> ZADD $accessToken $now $now
> EXPIRE $accessToken $slidingWindow
> EXEC

Let's break it down:


• MULTI to mark the start of a transaction block. Subsequent commands will be queued for atomic
execution using EXEC.
• ZREMRANGEBYSCORE $accessToken 0 ($now - $slidingWindow) to remove API call
timestamps that were done before the start of the window.
• ZRANGE $accessToken 0 -1 to get a list of all API call timestamps that happened during the
window.
• ZADD $accessToken $now $now to add a log for the current API call that we're about to do.
• EXPIRE $accessToken $slidingWindow to reset the expiry date for this sorted set of
timestamps (for the current OAuth Token).
• EXEC will execute all previously queued commands and restore the connection state to normal.
Instead of using the actual OAuth access tokens (and duplicating them to Redis), you might want to use an
identifier or hash of the token instead as $accessToken. It serves as the key for our Redis sorted set. Also
note that, in the same transaction as reading the list of timestamps, we add a new timestamp to the list (the ZADD
command). We do this because this is being used in a distributed context (we have many workers performing
API calls), and we don't want to write when we already exceeded our limits.
In PHP this might look like this:
// composer require predis/predis
require_once __DIR__ . '/vendor/autoload.php';
$maxCallsPerHour = 5000;
$slidingWindow = 3600;
$now = microtime(true);
$accessToken = md5('access-token');
$client = new Predis\Client();
$client->multi();
$client->zrangebyscore($accessToken, 0, $now - $slidingWindow);
$client->zrange($accessToken, 0, -1);
$client->zadd($accessToken, $now, $now);
$client->expire($accessToken, $slidingWindow);
$result = $client->exec();
// The second command inside the transaction was ZRANGE,
// which returns a list of timestamps within the last hour.
$timestamps = $result[1];
$remaining = max(0, $maxCallsPerHour - count($timestamps));
if ($remaining > 0) {
echo sprintf('%s: Allowed and %d remaining', $now, $remaining) .
PHP_EOL;
} else {
echo sprintf('%s: Not allowed', $now) . PHP_EOL;
}

To conclude all of this, and to make this work within our codebase, we put
this all nicely in a class, behind an interface RateLimiter:

<?php

namespace CXSocial\RateLimiter;

interface RateLimiter
{
/**
* Request the remaining ratelimit points
* @param RateLimitedResource $rateLimitedResource
* @throws SorryRateLimitUnavailable
* @return int
*/
public function remaining(RateLimitedResource $rateLimitedResource);
}

This allows us to write code that doesn't couple to implemention too much. This has been working like
a charm for us! We're huge fans.
Sorted Set Time Series Time series with Sorted Sets (zsets) are the typical way of modeling time series data in
Redis. Sorted Sets are made up of unique members with a score all stored under a single key. Using this data
type for Sorted Sets means having the score act as some sort of indication of time (often a millisecond precision
timestamp, although not always) and the member being the data recorded. The one catch is that, since this is a
form of Set, only unique members are allowed and trying to record a time series entry with the same value as a
previous member will result in only updating the score. To illustrate this problem, take the following example of
recording temperature over time:
Timestamp Temperature C
1511533205001 21
1511533206001 22
1511533207001 21
If you just added this directly as a Sorted Set using ZADD, you would miss some data points:
ANTI-PATTERN
> ZADD temperature 1511533205001 21
(integer) 1
> ZADD temperature 1511533206001 22
(integer) 1
> ZADD temperature 1511533207001 21
(integer) 0
>
ZRANGEBYSCORE
temperature -inf +inf WITHSCORES
1) "22"
2) "1511533206001"
3) "21"
4) "1511533207001"
ANTI-PATTERN Notice how the third ZADD returns a 0 – this indicates that a new member was not added
to the sorted set. Then, in the ZRANGEBYSCORE, we can see the that the sorted set only has two
entries, ..7001 and ..6001, with ..5001 missing. Why? In this case because both ..7001 and ..5001 share the same
member (21) we only updated the score for the member. Not good!
There are several ways of approaching this problem. The first is to include some sort of random data with
sufficient entropy to ensure uniqueness. Let’s examine this method. First, we’ll create a pseudo-random floating
point number between 0 (inclusive) and 1 (exclusive) then we’ll add this to our timestamp. For our example,
we’ll leave it in decimal form for readability (in a real workload, it would be smart to just convert it back to a
raw 8 bytes to save storage space).
> ZADD temperature2 1511533205001 21:1511533205001.2583
(integer) 1
> ZADD temperature2 1511533206001 22:1511533206001.941678
(integer) 1
> ZADD temperature2 1511533207001 21:1511533207001.732015
(integer) 1
> ZRANGEBYSCORE temperature2 -inf +inf WITHSCORES
1) "21:1511533205001.2583"
2) "1511533205001"
3) "22:1511533206001.941678"
4) "1511533206001"
5) "21:1511533207001.732015"
6) "1511533207001"

As you can see, all ZADDs are returning 1s indicating new additions and the ZRANGEBYSCORE returns all
the values. This is workable method, however it is not very efficient with wasted bytes to ensure uniqueness
which adds to storage overhead. For most use cases, the uniqueness will be just discarded by your application. It
should be noted that adding uniqueness obviously would not be needed if your data would already be unique (for
example, some data that includes a UUID).
With this method you have access to all the sorted set methods to allow for analysis and manipulation:
ZRANGEBYSCORE allows you to get a specific slice between two timestamps
(ZREVRANGEBYSCORE for descending ordering)
ZREMRANGEBYSCORE allows for removal of a specific range of timestamps
ZCOUNT the number of items between a range of timestamps
ZINTERSTORE allows you to intersect two time-series data sets and save it in a new key
ZUNIONSTORE allows you to combine two time-series data sets and save it in a new key. It can also be used
to duplicate a sorted set.
ZINTERSTORE and ZUNIONSTORE are multi-key operations. Care should be taken when working in a
sharded environment to make sure that your new key ends up on the same shard, otherwise these commands will
end in an error.

Secondary indexing with Redis Redis is not exactly a key-value store, since values can be complex data
structures. However it has an external key-value shell: at API level data is addressed by the key name. It is fair to
say that, natively, Redis only offers primary key access. However since Redis is a data structures server, its
capabilities can be used for indexing, in order to create secondary indexes of different kinds, including
composite (multi-column) indexes.

This document explains how it is possible to create indexes in Redis using the following data structures:
Sorted sets to create secondary indexes by ID or other numerical fields.
Sorted sets with lexicographical ranges for creating more advanced secondary indexes, composite indexes and
graph traversal indexes.
Sets for creating random indexes.
Lists for creating simple iterable indexes and last N items indexes.
Implementing and maintaining indexes with Redis is an advanced topic, so most users that need to perform
complex queries on data should understand if they are better served by a relational store. However often,
especially in caching scenarios, there is the explicit need to store indexed data into Redis in order to speedup
common queries which require some form of indexing in order to be executed.

Simple numerical indexes with sorted sets The simplest secondary index you can create with Redis is by using
the sorted set data type, which is a data structure representing a set of elements ordered by a floating point
number which is the score of each element. Elements are ordered from the smallest to the highest score.

Since the score is a double precision float, indexes you can build with vanilla sorted sets are limited to things
where the indexing field is a number within a given range.
The two commands to build these kind of indexes are ZADD and ZRANGEBYSCORE to respectively add
items and retrieve items within a specified range.
For instance, it is possible to index a set of person names by their age by adding element to a sorted set. The
element will be the name of the person and the score will be the age.
ZADD myindex 25 Manuel
ZADD myindex 18 Anna
ZADD myindex 35 Jon
ZADD myindex 67 Helen
In order to retrieve all persons with an age between 20 and 40, the following command can be used:
ZRANGEBYSCORE myindex 20 40
1) "Manuel"
2) "Jon"
By using the WITHSCORES option of ZRANGEBYSCORE it is also possible to obtain the scores associated
with the returned elements.
The ZCOUNT command can be used in order to retrieve the number of elements within a given range, without
actually fetching the elements, which is also useful, especially given the fact the operation is executed in
logarithmic time regardless of the size of the range.
Ranges can be inclusive or exclusive, please refer to the ZRANGEBYSCORE command documentation for
more information.
Note: Using the ZREVRANGEBYSCORE it is possible to query a range in reversed order, which is often useful
when data is indexed in a given direction (ascending or descending) but we want to retrieve information the
other way around.

Using objects IDs as associated values In the above example we associated names to ages. However in general
we may want to index some field of an object which is stored elsewhere. Instead of using the sorted set value
directly to store the data associated with the indexed field, it is possible to store just the ID of the object.
For example I may have Redis hashes representing users. Each user is represented by a single key, directly
accessible by ID:
HMSET user:1 id 1 username antirez ctime 1444809424 age 38
HMSET user:2 id 2 username maria ctime 1444808132 age 42
HMSET user:3 id 3 username jballard ctime 1443246218 age 33
If I want to create an index in order to query users by their age, I could do:
ZADD user.age.index 38 1
ZADD user.age.index 42 2
ZADD user.age.index 33 3
This time the value associated with the score in the sorted set is the ID of the object. So once I query the index
with ZRANGEBYSCORE I'll also have to retrieve the information I need with HGETALL or similar commands.
The obvious advantage is that objects can change without touching the index, as long as we don't change the
indexed field.
In the next examples we'll almost always use IDs as values associated with the index, since this is usually the
more sounding design, with a few exceptions.
Updating simple sorted set indexes
Often we index things which change over time. In the above example, the age of the user changes every year. In
such a case it would make sense to use the birth date as index instead of the age itself, but there are other cases
where we simply want some field to change from time to time, and the index to reflect this change.
The ZADD command makes updating simple indexes a very trivial operation since re-adding back an element
with a different score and the same value will simply update the score and move the element at the right position,
so if the user antirez turned 39 years old, in order to update the data in the hash representing the user, and in the
index as well, we need to execute the following two commands:
HSET user:1 age 39
ZADD user.age.index 39 1
The operation may be wrapped in a MULTI/EXEC transaction in order to make sure both fields are updated or
none.
Turning multi dimensional data into linear data
Indexes created with sorted sets are able to index only a single numerical value. Because of this you may think it
is impossible to index something which has multiple dimensions using this kind of indexes, but actually this is
not always true. If you can efficiently represent something multi-dimensional in a linear way, they it is often
possible to use a simple sorted set for indexing.
For example the Redis geo indexing API uses a sorted set to index places by latitude and longitude using a
technique called Geo hash. The sorted set score represents alternating bits of longitude and latitude, so that we
map the linear score of a sorted set to many small squares in the earth surface. By doing an 8+1 style center plus
neighborhoods search it is possible to retrieve elements by radius.
6.4.1 First-in, first-out queues The queue that we’ll write only needs to send emails out in a first-come,
firstserved manner, and will log both successes and failures. As we talked about in chapters 3 and 5, Redis LISTs
let us push and pop items from both ends with RPUSH/LPUSH and RPOP/LPOP. For our email queue, we’ll
push emails to send onto the right end of the queue with RPUSH, and pop them off the left end of the queue with
LPOP. (We do this because it makes sense visually for readers of left-to-right languages.) Because our worker
processes are only going to be performing this emailing operation, we’ll use the blocking version of our list pop,
BLPOP, with a timeout of 30 seconds. We’ll only handle item-sold messages in this version for the sake of
simplicity, but adding support for sending timeout emails is also easy.
Task Priorities Sometimes when working with queues, it’s necessary to prioritize certain operations before
others. In our case, maybe we want to send emails about sales that completed before we send emails about sales
that expired. Or maybe we want to send password reset emails before we send out emails for an upcoming
special event. Remember the BLPOP/BRPOP commands we can provide multiple LISTs in which to pop an item
from; the first LIST to have any items in it will have its first item popped (or last if we’re using BRPOP).
Let’s say that we want to have three priority levels: high, medium, and low. Highpriority items should be
executed if they’re available. If there are no high-priority items, then items in the medium-priority level should
be executed. If there are neither high- nor medium-priority items, then items in the low-priority level should be
executed. Looking at our earlier code, we can change two lines to make that possible in the updated listing.
Event Queue Redis Lists are an ordered list of strings very much akin to the linked lists in which you may be
familiar. Pushing, adding a value to a list, and popping, removing a value from a list (from either the
left/head/start or right/tail/end), are very lightweight operations. As you might imagine, this is a very good
structure for managing a queue: add items to the list to the head and read items out the tail for a first-in-first-out
(FIFO) queue.
Redis also provides additional features that make this type of pattern more efficient, reliable, and easy to use.
There is a subset of the list commands that allow for ‘blocking’ behavior. The term ‘blocking’ applies only to a
single client connected-in effect these commands stop the client from doing anything until a list has an a value
(or a timeout has elapsed). This removes the need to poll Redis for results. Since the client can’t do anything
while waiting for values, we’ll need two open clients to illustrate this:
R Sending Client Blocking Client
1 > BRPOP my-q 0
[waits]

2 > LPUSH my-q hello 1) "my-q"


(integer) 1 2) "hello"
[ready for commands]

3 > BRPOP my-q 0


[waits]

In this example in row 1, we see that the blocking client does not immediately return anything as the list (my-q)
does not have any values in it. The last argument is the timeout-in this case the zero means that it will never
timeout and will wait forever. The second row the sending client issues a LPUSH to the my-q key and
immediately the other client ends its blocking. On the third row we can issue another BRPOP command (usually
accomplished with a loop in your client language) and wait for any further list values. You can get out of
blocking in redis-cli with ctrl-c.

Let’s invert this example and see how BRPOP works with a non-empty list:
R Sending Client Blocking Client
1 > LPUSH my-q hello
(integer) 1
2 > LPUSH my-q hej
(integer) 2
3 > LPUSH my-q bonjour
(integer) 3
4 > BRPOP my-q 0
1) "my-q"
2) "hello"
5 > BRPOP my-q 0
1) "my-q"
2) "hej"
6 > BRPOP my-q 0
1) "my-q"
2) "bonjour"
7 > BRPOP my-q 0
[waits]
In rows 1-3 we push three values into the list and the we can see the response growing (representing the number
of items in the list). Row 4, despite issuing a BRPOP command, returns the value immediately? Why because the
blocking behavior applies only only if there is no items in the queue. We can see the same immediate response in
rows 5-6 because it’s going through each item in the queue. In row 7, BRPOP encounters an empty queue and
blocks until items are added to the queue.
Queues often represent some sort of job that needs to be accomplished in another process (a worker). Critical in
this type of workload is that if a worker has a fault and dies for some reason during processing, the job is not
lost. Redis can support this type of queue as well. Instead of using BRPOP, substitute BRPOPLPUSH.
BRPOPLPUSH (what a mouthful) waits for a values in one list and once it’s got one, it pushes it to another list.
This is all accomplished atomicly, so it’s not possible for two workers to remove/take the same value. Let’s take
a look at how this would work:
R Sending Client Blocking Client
1 > LINDEX worker-q 0
(nil)
2 [If the result of row 1 is not nil, do something with it it, otherwise jump
to 4]
3 > LREM worker-q -1 [value from 1]
(integer) 1
[loop back to 1]
4 > BRPOPLPUSH my-q worker-q 0
[waits]
5 > LPUSH my-q hello "hello"
[ready for commands]
6 [do something with "hello"]
7 > LREM worker-q -1 hello
(integer) 1
8 [loop back to row 1]

In rows 1 and 2, we’re not doing anything yet since worker-q is empty. If something came out of woker-q, we
would process it and remove it then jump back to 1 to see if anything else is in the queue. This way we are
clearing the worker queue first and doing whatever jobs already exist. In row 4 we wait until a value is added to
to my-q and when we get a value it’s atomicly added to worker-q. Next, we do some type of non-Redis operation
to ‘hello’ when we’re done we remove one instance from the worker-q with LREM and loop back to row 1.

The real key is that if the process dies during the operation in row 6, we still have this job in worker-q. Upon
restart of the process, we’ll immediately clear any jobs that haven’t been removed by row 7. This pattern greatly
reduces the possibility that jobs will be lost. It is possible, however, that a job could be processed twice, but only
if the worker dies between rows 2 and 3 or 5 and 6, which is unlikely, but it would be a best practice to account
for this circumstance in your worker logic.

What is a leaderboard? The concept of a leaderboard a scoreboard showing the ranked names and current
scores (or other data points) of the leading competitors is essential to the world of computer gaming, but
leaderboards are now about more than just games. They are about gamification, a broader implementation that
can include any group of people with a common goal (coworkers, students, sales groups, fitness groups,
volunteers, and so on).
Leaderboards can encourage healthy competition in a group by openly displaying the current ranking of each
group member. They also provide a clear way to view the ongoing achievements of the entire team as members
move towards a goal.
There are two types of leaderboards:
Absolute leaderboards rank all competitors by some global measure. Typically these display the top-ranked
members of the group, such as a Top 10.
Relative leaderboards rank participants in relation to different facets of the data in such a way that members are
grouped according to more narrow or relative criteria. This may require complex calculations to slide the data in
numerous ways. A common gaming scenario, for example, is a view that shows the ranking of a given
competitor and the competitors just above and below them.
Technical challenges posed by leaderboards include:
Massive scale across millions of users
Mathematical computations on a large number of attributes (analyzing the data in numerous ways to obtain
. different views of the data)
Providing real-time leaderboard access with high availability
Allowing users to share their leaderboard stats across social media
Allowing users to receive notifications as the attributes they are interested in on the leaderboard change
Allowing applications to update leaderboards in a fully distributed manner across the globe and where the
actions are taken place, while also delivering a global view of the leaderboard’s status from any location
Providing this data in real time and keeping the system available is beyond the scope of many web technologies.
However, this is a challenge that Redis Enterprise solves with data structures built for use cases like these, and
with the variety of deployment options that Redis Enterprise provides.

How to create a leaderboard Let’s take a quick high-level look at how to implement a leaderboard in Node.js
alongside a previously existing web app. With Node Package Manager (NPM), it’s easy to add Redis to your
web app using the simple command npm install redis.
Once the Redis Node packages are installed into your web app project, you can access Redis functionality via a
JavaScript API. (The official docs at the node_redis Github repository can help you get started.)
To demonstrate, let’s create a simple in-memory database using the Sorted Set. We will create members named
player:[uniqueId], where uniqueId is an integer value that could easily be generated by your JavaScript or
Python code when a user joins the competition.
The score can be any numeric data you want to use to rank the players (daily steps in a company health program,
aliens shot down in a computer game and so on).
The basic player data will look something like this:
Now look at a bit of Node.js code that you can use to display the data.
Use a Hash to store multiple values You can create a dataset that can be sliced by numerous variables. To do
so, it’s helpful to store data in a structure that represents each competitor. Redis provides just such a structure,
called a Hash. A Hash can hold numerous name-value pairs that are associated with one key.
You can use a simple numeric key as a unique identifier in your hash, and then associate that unique key with a
Sorted Set, which will contain the score and the key. In this way, you can quickly obtain the score for your top
competitors or a range of competitors. Then, if you want more data, you can easily obtain it from the Hash using
the key value stored in the Sorted Set. Creating a Redis Hash is easy. This Hash, named allPlayers, uses the
following format:
hset [ unique id (to identify the hash)] [property name] [property value] ...
Next, create a new Hash named with a key of player:100 and add a screenName property that has the value Fred.
You could just make the hash key 100, but using the format of [stringID:IntegerID] makes it a bit more readable.
When you add another player, you’ll create a new Hash key, like player:101.
hset player:100 screenName Fred
If you want to retrieve all the properties and values (name-value pairs) stored for a particular hash, simply use
this command:
hgetall player:100
You can see that there is one name-value pair at this time.
1) "screenName"
2) "Fred"
The Hash is a flexible structure and it is easy to add properties and values dynamically.
Imagine that you want to save the date the player last logged in:
hset player:100 lastLoggedIn 2019-07-30
Now when you call hgetall again you see:
1) "screenName"
2) "Fred"
3) "lastLoggedIn"
4) "2019-07-30"
It is just a matter of adding each user to your allPlayers Hash with its own unique ID. Then you can associate
those with a Sorted Set that will contain each player’s score.
Once you add the Hashes (player:NNN) then you have your list and you can leverage those player data keys by
using them when you add data to the Sorted Set. This is how you leverage the power of the Redis in-memory
database to work with huge datasets (millions of players!) that track the rankings of each player, but stays
amazingly fast.
Now you can easily implement a solution that pulls the data using Node and the node_redis package so that you
can keep the leaderboard fresh on your web app. This work is easy using the node_redis package API, which
allows you to pull back the Sorted Set by name (playerRank).
Redis Enterprise is essential for keeping your leaderboard fresh and your users coming back to see their
rankings.

You might also like