Scaling Realtime at DISQUS

Adam Hitchcock @NorthIsUp
Sunday, 17 March, 13

Sunday, 17 March, 13

Scaling Realtime at DISQUS
Adam Hitchcock @NorthIsUp
Sunday, 17 March, 13

13 .If this is interesting to you..com/jobs Sunday. we’re hiring disqus.. 17 March.

13 .what is DISQUS? Sunday. 17 March.

13 . 17 March.Sunday.

why do realtime? ! ! ! ! getting new data to the user asap for increased engagement and it looks awesome and we can sell (or trade) it Sunday. 17 March. 13 .

disqus.http://map.com/NorthIsUp/orbital2 Sunday. 17 March.labs.com http://github. 13 .

DISQUS sees a lot of tra!c Google Analytics: Feb 2013 . 13 . 17 March.March 2012 Sunday.

5 million concurrently connected users 45 thousand new connections per second 165 thousand messages/second <. 17 March.realertime ! currently active on all DISQUS sites tested ‘dark’ on our existing network during testing: ! ! ! ! ! ! 1.2 seconds latency end to end Sunday. 13 .

17 March. 13 . how did we do it? Sunday.so.

17 March.js and MongoDB! Sunday.Node. 13 .

17 March. 13 .Node.js and MongoDB! Sunday.

This is PyCon. 13 . Sunday. 17 March. We used Python.

13 . 17 March.and some other Technology You Know™ Sunday.

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday. 13 . 17 March.

architecture overview

Sunday, 17 March, 13

old-june
DISQUS
memcache New Posts memcache

poll memcache ever 5 seconds

DISQUS embed clients

Sunday, 17 March, 13

june-july
DISQUS
redis pub/sub Flask FE cluster New Posts redis pub/sub

HA Proxy

DISQUS embed clients

Sunday, 17 March, 13

july-october DISQUS redis queue Flask FE cluster New Posts redis pub/sub “python glue” “python glue” Gevent server Gevent server HA Proxy DISQUS embed clients redis pub/sub Sunday. 17 March. 13 .

august-october DISQUS redis queue New Posts redis pub/sub 6s 2 “python glue” “python glue” Gevent server Gevent server erv ers 14 Flask FE BIG cluster erv HA Proxy ers 5s DISQUS embed clients redis pub/sub Sunday. 17 March. 13 .

17 March. 13 .august-october DISQUS redis queue New Posts redis pub/sub 6s 2 erv ers lots ofcluster servers. we can do better 14 Flask FE BIG 2 for “python glue” “python glue” Gevent server Gevent server erv HA Proxy ers 5s DISQUS embed clients redis pub/sub Sunday.

17 March. 13 .october-now DISQUS redis queue nginx + push stream module New Posts ngnix pub endpoint “python glue” Gevent server DISQUS embed clients http post Sunday.

then fix it for us) 2 “python glue” Gevent server DISQUS embed clients http post Sunday. etc. we can’t fix this without kernel push stream hacking. 13 . module (if you know how. then apply for a job. tell us.october-now DISQUS redis queue New Posts ngnix pub endpoint 5 nginx Why still 5 for this? + Network memory restriction. tweaking. 17 March.

13 http post .october-now django thoonk queue nginx + push stream module New Posts ngnix pub endpoint Formatter Publishers DISQUS embed clients other realtime stu" Sunday. 17 March.

17 March. 13 .thoonk redis queue some python glue nginx push stream and long(er) polling Sunday.

17 March.the thoonk queue ! ! ! ! django post_save and post_delete hooks thoonk is a queue on top of redis implemented as a DFA provides job semantics ! ! useful for end to end acking reliable job processing in distributed system uses zset to store items == ranged queries ! did I mention it’s on top of redis? ! Sunday. 13 .

13 . 17 March.thoonk redis queue some python glue nginx push stream and long(er) polling Sunday.

the python glue ! ! listens to a thoonk queue cleans & formats message ! Formatter this is the final format for end clients compress data now ! ! Publishers publish message to nginx and other firehoses ! forum:id. post:id Sunday. thread:id. 13 . user:id. 17 March.

spawn import Watchdog from realertime.lib. 13 .lib. 17 March.com/geventspawn from realertime. so just import it # http://bitly.spawn import TimeSensitiveBackoff Sunday.gevent is nice # the code is too big to show here.

parsed_data. data. parsed_data) return self. computed_data) Sunday. data): raise NotImplemented('No ParserMixin used') def compute_data(self. computed_data): raise NotImplemented('No PublisherMixin used') def handle(self. data): parsed_data = self.data pipelines class Pipeline(object): def parse_data(self.compute_data(data. 17 March. parsed_data. 13 .parse_data(data) computed_data = self. data. parsed_data): raise NotImplemented('No ComputeMixin used') def publish_data(self.publish_data(data.

computed_data): u = urllib2. data. 17 March.loads(data) class AnnomizeDataMixin(Pipeline): def parse_data(self.write(computed_data) Sunday. data. 'a') as f: f. computed_data) return u class FilePublisher(Pipeline): def publish(self. parsed_data): return parsed_data.Example Mixins class JSONParserMixin(Pipeline): def parse_data(self. data): return json.output. parsed_data. parsed_data. data.urlopen(self. parsed_data): return {} class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self.dat_url. data. computed_data): with open(self. 13 .encode('rot13') class HTTPPublisher(Pipeline): def publish(self.

SuperSecureEncyptionMixin. 17 March. AnnomizeDataMixin.Finished Pipeline class JSONAnnonHTTPPipeline( JSONParserMixin. AnnomizeDataMixin. FilePublisherMixin): pass Sunday. 13 . HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin. HTTPPublisherMixin): pass class JSONSecureHTTPPipeline( JSONParserMixin.

JSONFormatterMixin. domains. 13 . SelfChannelsMixin. HTTPPublisherMixin): def __init__(self. FEChannelsMixin. self). JSONFormatterMixin.real live DISQUS code class FEOrbitalNginxMultiplexer( SchemaTransformerMixin.channels = ('orbital'.__init__(domains=domain class FEPublicAckingMultiplexer( PublicTransformerMixin. 17 March. self). api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer. domains. api_version=1): schema_namespace = 'orbital' self.__init__(domains=domain Sunday. ) super(FEOrbitalNginxMultiplexer. ThoonkQueuePubSubPublisherMixin): def __init__(self.

13 .thoonk redis queue some python glue nginx push stream and long(er) polling Sunday. 17 March.

nginx push stream ! follow John Watson (@wizputer) for updated #humblebrags as we ramp up tra!c an example config can be found here: http://bit. 13 . 17 March.org/HttpPushStreamModule Sunday.nginx.ly/disqus-nginx-push-stream ! http://wiki.

17 March.nginx push stream ! ! Replaced webservers and Redis Pub/Sub But starting with Pub/Sub was important for us ! Encouraged us to over publish on keys Sunday. 13 .

17 March. 13 .845% active writes (the socket is written to often enough to come up as ACTIVE) http://wiki.nginx.nginx push stream ! Turned on for 70% of our network. ! ! ! ~950K subscribers (peak single machine) peak 40 MBytes/second (per machine) CPU usage is still well under 15% ! 99...org/HttpPushStreamModule Sunday.

deny all.config push stream location = /pub { allow 127. set $push_stream_channel_id $arg_channel. } location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(. push_stream_content_type application/json. push_stream_publisher admin.1.0. push_stream_subscriber streaming. 13 .0.nginx.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2. 17 March. } } http://wiki.org/HttpPushStreamModule Sunday.

13 . 17 March.examples # Subs curl -s 'localhost/sub/forum/cnn' curl -s 'localhost/sub/thread/907824578' curl -s 'localhost/sub/user/northisup' # Pubs curl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}' curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}' curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}' http://wiki.org/HttpPushStreamModule Sunday.nginx.

push_stream_channels_statistics. set $push_stream_channel_id $arg_channel. 17 March. deny all.org/HttpPushStreamModule Sunday. } http://wiki.measure nginx location = /push-stream-status { allow 127.0.0. 13 .nginx.1.

13 .thoonk redis queue some python glue nginx push stream and long(er) polling Sunday. 17 March.

self. 17 March. _.xhr. self.split('\n').parse(obj). } Sunday.length) return. // Server returns JSON objects. var advance = 0.len === resp. obj = JSON. 13 . function (obj) { advance += (obj.slice(self. do nothing. // If server didn't push anything new.len += advance.long(er) polling onProgress: function () { var self = this. rows = resp.trigger('progress'.length + 1).len). if (!resp || self. one per line. obj).each(rows. var resp = self. var rows.responseText. }).

Sunday.Soon. handlePostEvent). 17 March.addEventListener("Post".. ev. 13 .. EventSource // Currently EventSource has CORS issues ev = EventSource(dat_url).

test. repeat Sunday. 13 . 17 March. measure.

test ! Darktime ! ! use existing network to load test (user complaints when it didn’t work. 13 .) load testing a single thread ! Darkesttime ! ! have knobs you can twiddle Sunday.. 17 March..

17 March.measure ! ! ! ! measure all the things! especially when the numbers don’t line up measuring is hard in distributed systems try to express things as +1 and -1 if you can Sentry for measuring exceptions ! Sunday. 13 .

13 . 17 March.pretty graphs Sunday.

how does it really scale? POPE white smoke francis announced Sunday. 17 March. 13 .

13 .maths Sunday. 17 March.

13 . 17 March.it’s been a busy few weeks Sunday.

wha? ! ! ! People do weird stu" with your stu" turned o" this server in Oct 2012 Still getting 100 req/sec Sunday. 13 . 17 March.

17 March. 13 . but expensive redis/nginx pubsub is e"ectively free Sunday.lessons ! ! ! do hard (computation) work early end-to-end acks are good.

13 .com/jobs Sunday. we’re hiring disqus..If this was interesting to you.. 17 March. psst.

@nfluxx who had to review all my code and especially our dev-ops guys like john watson a.com/jobs ! ! Sunday.a.a. 13 .k.k. 17 March. we’re hiring disqus. @wizputer who found the nginx-push-stream module psst.special thanks ! ! the team at DISQUS like je" a.

com ! ! ! ! Sunday.com/andyet/thoonk. 17 March. 13 .org/HttpPushStreamModule Thoonk (redis queue) http://github.slide full o’ links ! Nginx push stream module http://wiki.com/Greplin/scales code.com/dcramer/sentry Gevent (python coroutines and greenlets) http://gevent.py Sentry (distributed traceback aggregation) http://github.org/ Scales (in-app metrics) http://github.disqus.nginx.

. 10’x20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 20’ 20’ 20’ 20’ Sunday.%-.7&#4 !"#$%&'()* 8’ 8’ 8’ 19’ 10’ 20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x20’ )2!<+ = *-$(>..&#$%01%-2$%&#1%-1/ 3./..&456 +. 13 .Come find me here! 8’x20’ +./.&#.%-. 17 March.&#..

com/jobs Sunday. we’re hiring disqus. 17 March. 13 .we are still hiring psst.

Questions I have ! What is the best kernel config for webscale concurrency. 17 March. 13 . Nginx? I <3 gevent. why not RabbitMQ? Sunday. but what if I want to pypy? Nginx + lua? Seems kind of awesome. what is it good for? Seriously. Composing data pipelines: good or bad? I didn’t have time to mention: ! ! ! ! ! ! Kafka.

17 March.DISQUSsion? Adam Hitchcock @NorthIsUp Sunday. 13 .