Backporting into Redis 2.4 and other news


Monday, 18 April 11

I think I should write more about Redis development... lately I was so focused on writing the code and
the Redis Book that finding the time to blog about Redis was really hard, but I'll try to improve in the
next weeks. However today I want to provide some fresh news to Redis users: to have some insight
into the near future of a project can be very interesting for developers planning to start a new project
with Redis.

Currently we have three development branches of development: 2.2, 2.4, Redis Cluster (unstable

2.2 is a bugfix only development line, so we'll continue to ship 2.2.x versions only to fix bugs.

2.4 is our new branch, it is just a few days old. Our old development model with a stable branch and an
unstable branch did not worked well, we needed something in the middle. There is simply a lot of stuff
that can be back ported from the unstable branch.

The unstable branch where Redis Cluster development is happening, will take time to reach stability as
the cluster is a big project (our idea is to release a first stable version of Redis Cluster later this
summer). 2.4 is a way to put something better than 2.2 in the hands of our users ASAP.

We hope to ship 2.4 in an estimated time frame of 6 weeks. It will include the following changes
compared to 2.2:

Memory optimized sorted sets. This means that small sorted sets will take little memory like small
hashes, lists, and sets composed of integers are doing already.
Variadic versions [LR]PUSH, SADD, ZADD, ... so you can, for instance, push multiple values
inside a list with a single command. I measured the difference with a few benchmarks and the
difference is really dramatic compared to pipelining of many LPUSH commands.
Big improvements in .rdb persistence. Now specially encoded types are saved directly as they
are. Just to give you an example, if you have a dataset composed of lists with an average of 100
elements you can expect 50x faster .rdb persistence.

All the above stuff is already inside redis unstable of course, but with 2.4 it will be readily available to all
the users in short time. The current 2.4 branch only includes the first two changes, I'm working on
merging the latest.

How to play with Redis Cluster

We have also some news about Redis Cluster. You can test with your hands what we have already. The
following is an howto about testing Redis Cluster. Note: Redis Cluster is not complete, it is currently
an alpha with a lot of missing features, and it is not stable. Here the goal is just to provide a preview.

To play with Redis Cluster fire three instances with the following configuration:

Backporting into Redis 2.4 and other news

port 6379
cluster-enabled yes
cluster-config-file nodes-1.conf

Use port 6379 for the first instance, 6380 and 6381 for the other ports. Also make sure to use a different
cluster-config-file name, nodes-1.conf, nodes-2.conf, nodes-3.conf. The cluster config file is not
something you should change by hand, is a file where a cluster node saves the current configuration to
reload the state at restart.

Now that you have three instances running you can start performing some command:

redis> connect 6379

redis> cluster info
redis> cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 0 disconnected

As you can see this node only knows about a single node, that is, itself. You can see this from the
"myself" flag in the cluster nodes output. The cluster info output instead shows how out of the 4096
hash slots in which the key space is divided, nothing is assigned. This is why this node will not be
happy to reply to queries:

redis> get foo

(error) ERR The cluster is down. Check with CLUSTER INFO for more information

So the first thing to do is to join the cluster, that is, make nodes aware that there are other nodes
around, as this is a completely new cluster.

As a first step we join the instance running at 6379 with the instance running at 6380:

redis> connect 6379

redis> cluster meet 6380
redis> cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 0 disconnected
96fad8c3b4df5f86ac4abe6205a253c640c751ef master - 1303136527 1303136527 connected

As you can see now 6379 knows about 6380, and this is true for 6380 as well of couse as the new
nodes did an handshake:

redis> connect 6380

redis> cluster nodes
0c2f029a52bec8d17c43b137c74205fade1b1921 master - 1303136590 1303136590 connected
96fad8c3b4df5f86ac4abe6205a253c640c751ef :0 myself - 0 0 disconnected

I can already see in your face the "WTF this fields mean" expression... so every line of 'info nodes' is
composed of the following fields, from left to right:


Backporting into Redis 2.4 and other news


Every node has an ID that will be used for all the live of the node. All this info are saved in the
nodes.conf file. The format of this file is exactly the same as the cluster nodes output as I was lazy to
invent something new but this turned to be an advantage actually (less code, more descriptive info

Now Redis Cluster nodes are like bored old ladies, they gossip a lot about other nodes. But the good
thing is that at least cluster nodes are very well informed, and only report informations they are pretty
sure about ;)

Every node every second sends a PING packet to some random node, actually this node is not
selected at random, but among nodes that are believed to be OK but with the oldest pong_received
field in the node structure, so we tend to ping nodes that we don't chat with since more time.

In every PING packet, and in the PONG reply, there is a gossip section where we inform the other node
about informations about other nodes. Also when a node pings or pongs another node, there are a lot of
detailed information about the node sending the packet.

For a node to be marked as failing we need to both detect that it did not replied to our pings from some
time, AND also we need to receive that another node has troubles wit this node, thanks to the gossip
section. When this happens the node marks this other node as failing, and sends a "mark-as-failed"
message to all the other known nodes.

Let's test gossip in practice. Know we have 6379 joined with 6380. What happens if we join 6380 with
6381 is that also 6379 and 6381 will meet. But Redis Nodes are like good families girls, they only trust
and meet with other nodes either already trusted (in their nodes table) or trusted by their friends. The
only way to make a Redis Node talking with another node that is not already in the known nodes list,
nor in the know nodes of another trusted node is via the CLUSTER MEET command.

redis> connect 6381

redis> cluster meet 6380
redis> connect 6379
redis> cluster nodes
8f1e863160f2627108451d0a0155127e8b1b4597 noflags - 1303137505 1303137505 connected
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 1303137500 disconnected
96fad8c3b4df5f86ac4abe6205a253c640c751ef master - 1303137505 1303137505 connected

Now all the three nodes are connected and aware of their friends... however the nodes are still not able
to reply to queries as hash slots are not assigned at all. To assign hash slots we need to send
"CLUSTER ADDSLOTS" commands. We assign part of the 4096 slots to all the nodes, so that all the
slots will be covered:

$ echo '(0..1000).each{|x| puts "CLUSTER ADDSLOTS "+x.to_s}' | ruby | redis-cli -p 6379 > /dev/null
$ echo '(1001..2500).each{|x| puts "CLUSTER ADDSLOTS "+x.to_s}' | ruby | redis-cli -p 6380 > /dev/null
$ echo '(2501..4095).each{|x| puts "CLUSTER ADDSLOTS "+x.to_s}' | ruby | redis-cli -p 6381 > /dev/null

(note: actually CLUSTER ADDSLOTS can accept any number of hash slots as parameters, but redis-cli
does not work well with huge command lines, so we send a command for every hash slot).

Backporting into Redis 2.4 and other news

Ok now we should have a much more interesting cluster. Let's try to ask some node about how things
are going:

redis> connect 6379

redis> cluster nodes
8f1e863160f2627108451d0a0155127e8b1b4597 master - 1303138327 1303138327 connected 2501-4095
0c2f029a52bec8d17c43b137c74205fade1b1921 :0 myself - 0 1303138326 disconnected 0-1000
96fad8c3b4df5f86ac4abe6205a253c640c751ef master - 1303138326 1303138326 connected 1001-2500
redis> cluster info

Yes! Now our cluster state is OK. As you can see near every line of cluster nodes output there is the is
the list of assigned slots. This informations all propagated thanks to the gossip section of PING/PONG
packets. We are ready to try some actual query:

redis> get foo

(error) MOVED 3990
redis> get bar

Now nodes accept our requests finally. The first request was about hash slot 3990 as the key 'foo' will
hash to that hash slot. So we got routed to the right node. A good client will remember this and will
directly hit the right node the next time.

Ok, that's all for now. I hope that while I can't show a full solution for now this journey in the status of
Redis Cluster was more interesting than just reading my tweets about "I'm working at cluster".

Also note that to operate on a cluster you'll actually never do this kind of stuff by hand. The redis-trib
program will do all this for you, but my thought was that it is a lot less instructive to just type 'redis-trib
create ...'. I wanted to show a bit more of the inner workings.
post read 7270 times
Posted at 14:55:43 | 6 comments


Matthew Frazier writes: 1

18 Apr 11, 15:49:52

How is progress on Diskstore coming? Will that be in 2.4? I know there are a lot of people
(myself included) who are interested in it.

antirez writes: 2
18 Apr 11, 15:51:42

@Matthew: not in 2.4, probably not even in 3.0 (redis cluster stable release number) as we
consider cluster more high priority. Basically diskstore is just an experimental project, it will hit
a stable release only if/when we think it rocks. I'm a bit skeptical about mixing Redis and disk
as primary storage (not just for persistence) but we'll keep trying new solutions.

ariso writes: 3
19 Apr 11, 18:51:07

diskstore is really useful for embeding/desktop application. Could you please consider it more
high priority?

Backporting into Redis 2.4 and other news

Dean writes: 4
19 Apr 11, 20:04:26

I am very excited to start testing with Cluster. Thank you for your time working on it, and for
posting an update on your progress! Since Hiredis is the "official" client library for Redis, are
there plans to evolve it from being a 'naïve' client to a 'full-featured' client as far as Cluster
support is concerned? (Referencing previous Cluster terminology where a naïve client
requires two round trips for a lookup (using the MOVED response to find the key) and a
full-featured client will maintain (and update) a map of keys to hash slots.)

Willp writes: 5
19 Apr 11, 21:08:08

Salvatore, thank you! You are doing terrific work! Redis should have been born decades ago.

Willp writes: 6
19 Apr 11, 21:26:31

One small improvement in assigning hash buckets would be to round-robin the hash buckets
across the nodes in your cluster. Using the sample hash bucket division of hash slots will end
up with a pretty uneven distribution of keys.

The distribution is not very uniform for values of crc16() % 4096 on strings that only differ by
one or two ascii values. Strings that are generated sequentially will tend to have similar
values. If instead the hash slots are initially assigned in round-robin style (give hashslot mod
TotalNodes == NodeNumber to node numbered as NodeNumber), then there will be better
distribution of keys across nodes. If your keys end in the same string and differ more on the
leftmost characters, then the distribution is a lot better. Even better distribution would of course
be a random shuffle, though harder to manage/recover. I know, it's still early days for Redis
Cluster, and I am hoping for the best!

Python code to demonstrate:

>>> from crc16pure import * # from

>>> for x in range(10): print 'crc16(%s): %d' % ('abc:%d' % x, (crc16xmodem('abc:' + str(x)) %
crc16(abc:0): 2305
crc16(abc:1): 2336
crc16(abc:2): 2371
crc16(abc:3): 2402
crc16(abc:4): 2437
crc16(abc:5): 2468
crc16(abc:6): 2503
crc16(abc:7): 2534
crc16(abc:8): 2057
crc16(abc:9): 2088

(All but two of these consecutive strings would hash to slots assigned to the middle node on
port 6380. CRC16 is cheap but doesn't perturb the bits much on sequential input.)

Round robining the slots (mod 3) in this case would give better (but not great) distribution:
>>> for x in range(10): print 'crc16("%s"): %d and mod 3 is: %d' % ( ('abc:%d' % x),
(crc16xmodem('abc:' + str(x)) % 4096), (crc16xmodem('abc:' + str(x)) % 4096) % 3 )
crc16("abc:0"): 2305 and mod 3 is: 1
crc16("abc:1"): 2336 and mod 3 is: 2
crc16("abc:2"): 2371 and mod 3 is: 1
crc16("abc:3"): 2402 and mod 3 is: 2

Backporting into Redis 2.4 and other news

crc16("abc:4"): 2437 and mod 3 is: 1

crc16("abc:5"): 2468 and mod 3 is: 2
crc16("abc:6"): 2503 and mod 3 is: 1
crc16("abc:7"): 2534 and mod 3 is: 2
crc16("abc:8"): 2057 and mod 3 is: 2
crc16("abc:9"): 2088 and mod 3 is: 0

So, 1 slot went to node 0, 4 slots to node 1, and 5 slots to node 2. Better to have (1,4,5) than
(0, 8, 2).

comments closed


Send me a feedback: I wish to see an article about


