You are on page 1of 12

Introducing Riak, Part 2: Integrating Riak as a heavy-duty caching server for web applications

Using Riak as a caching server to help alleviate the load on application and database servers
Simon Buckle Independent Consultant Freelance Skill Level: Intermediate Date: 15 May 2012

This article is Part 2 of a two-part series about Riak, a highly scalable, distributed data store written in Erlang and based on Dynamo, Amazon's high availability key-value store. For websites with heavy loads, a scalable caching solution can lighten the load on the application and database servers. This particularly applies to data that is read often but updated only occasionally. Explore an indepth example of an online betting site and how you can use Riak to implement a caching solution. You also will learn to integrate Riak with an existing website and look at other Riak features such as search and how to use it to directly serve user requests. You will need a working Riak cluster if you want to follow along with the examples. You can find the steps for setting up a cluster locally in Part 1 of this series. View more content in this series

Introduction
Certain types of data exhibit access patterns that lend themselves to be cached. For example, online betting sites have an interesting load characteristic: odds and bet slips get requested often but are updated relatively infrequently. Other articles in this series
View more articles in the Introducing Riak series.

These situations need a highly scalable system with the following characteristics to cope with the demands of high loads: The system acts as a reliable cache to reduce demand on the application servers and database
Copyright IBM Corporation 2012 Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Trademarks Page 1 of 12

developerWorks

ibm.com/developerWorks/

Cached items are searchable so you can update or invalidate them Any solution is easily integrated into an existing site Riak is a good choice for such a solution. Riak is not the only candidate for implementing such a caching solution; many different caches are available. A popular one is memcached; however, unlike Riak, memcached doesn't provide any kind of data replication, meaning that if the server holding a particular item goes down that item becomes unavailable. Redis, another popular key/value store that could be used as a cache, supports replication through a master-slave configuration; Riak has no concept of a master (node), therefore making the system resilient to failure.

Website integration
Any solution needs to be easily integrated into an existing website. It is important to be able to do this, as it might not be possibleor even desirableto migrate all of your existing data into Riak. As mentioned previously, certain types of data lend themselves to caching, particularly, in the case of a key/value store if you access that data with a primary key. That is the kind of data that is more suitable to migrate to Riak. As mentioned in Part 1 of this series on Riak, a number of client libraries are available in languages such as PHP, Ruby, and Java; the libraries provide an API that makes integrating with Riak very simple. In this example, I demonstrate the use of the PHP library to show how to integrate Riak with an existing website. Figure 1 shows the set-up to consider for this example. I left out details such as load balancing, firewall, and so on. The servers themselves, in this case, are just simple front-end boxes with a LAMP stack installed. I will assume that Riak is only used internally (it's not accessible from the outside) and that it runs in a non-hostile environment, so there are no security related issues such as authentication. This is not such a bad assumption to make as it might seem, as Riak does not have any built-in authorization anyway; you really should delegate authentication and the like to the application. Figure 1. A simple website integration

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 2 of 12

ibm.com/developerWorks/

developerWorks

What follows is a basic example of how you might integrate Riak into your existing website. You will create a simple form, that when submitted, will use the PHP client to store an object in Riak based on the values that were entered in the form. Figure 2 shows an example of a simple form that an administrator might use to create a bet entry in the system. Create this form in HTML and have it do a POST to the PHP script in Listing 1; you can use a similar form in the source code that accompanies this article as a starting point. The "key" field entered in the form will be used as the key to store the object under in the bucket. Figure 2. Example form for creating a bet

Listing 1 has example PHP code that shows how to use the PHP client library to integrate with Riak. Change the path to the PHP client libraryspecified in require_onceto wherever you have installed it. In this case, I just put it in the same directory as the PHP script. By default, all the client libraries expect Riak to be available on port 8098. Listing 1. Example PHP code for integrating with Riak
<?php require_once('./riak.php'); # Could do check here to see if the current user has the # appropriate credentials ? delegated to application. $client = new RiakClient('192.168.1.1', 8098); $bucket = $client->bucket('odds'); $bet = $bucket->newObject($_POST['key']); $data = array( 'odds' => $_POST['odds'], 'description' => $_POST['description'] ); $bet->setData($data); # Save the object to Riak $bet->store(); echo "Thanks!"; ?>

Save the code to a PHP file (call it whatever you like) and upload it and the form to some location on your website, For example, http://www.yoursite.com/riakIntroducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 3 of 12

developerWorks
test.php.

ibm.com/developerWorks/

Fill out the example form and submit it. To prove it did work, try to retrieve the item directly from Riak using the key you entered in the form to create the item (see Listing 2). Listing 2. Retrieving the item from Riak
$ curl -i http://localhost:8098/riak/odds/<key> ... { "odds":"", "description":"" }

Although this integration example used the PHP client, the approach is similar for other languages or application frameworks such as Java or Ruby on Rails.

Serving requests directly


In addition to using the client libraries to integrate Riak into your current set-up, it's possible to serve user requests directly from Riak, using it as a simple HTTP engine. To demonstrate this, I will create a simple demo to show how you can request pages directly from Riak. Download the source code for this article. Make sure Riak is running then execute the script load.sh. This script will copy all the HTML and JavaScript files into a bucket called demo. This example uses the JavaScript client. To view the demo, open up this URL in your browser: http://localhost:8098/riak/
demo/demo.html

If you enter some values in the form to create a bet and you submit the form, a JSON object is stored in Riak. The properties of the object will correspond to the fields in the form. You will be redirected to a page that displays the value of the object you just created. Listing 3 shows the code for creating the object from the values you entered. The values key, odds, and description come from the values entered into the form. Listing 3. Example use of the JavaScript client library in Riak
client.bucket("odds", function(bucket) { var key = $('#key').val(); bucket.get_or_new(key, function(status, object) { object.contentType = 'application/json'; object.body = { 'odds': $('#odds').val(), 'description': $('#desc').val() }; object.store(function(status, object, request) { if (status == 'ok') { window.location = "http://localhost:8098/riak/odds/"+key; } else { alert("Failed to create object."); } }); }); });

As mentioned previously, I assume that Riak is running in a trusted environment. In this case there's no security issue from adding pages that store and retrieve items
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 4 of 12

ibm.com/developerWorks/

developerWorks

in Riak; however, you don't want to expose this kind of functionality to the Internet at large without having some form of authentication in place. Although it's a simple example, it gives you an idea how Riak can serve page requests directly. You could, for example, include data stored in Riak directly in your existing web pages either by using a technique such as JSONP or cross-origin resource sharingAJAX requests are restricted to the same server the page resides on by a same domain policyor by proxying requests through your servers to Riak, to fetch the required data.

Using Riak as a cache


Caches are used to provide fast access to data. If requested data is contained in the cache (cache hit), the application can serve the request quickly by reading the value from the cache, comparatively quicker than retrieving the value from a database. If something is not in the cache (cache miss), then the application typically has to hit the database to retrieve the data. Generally, the more requests that you can serve from the cache, the faster the system will be. Riak has a number of features that make it a good choice for implementing a caching solution. One such feature of Riak is its pluggable storage back-end; the storage back-end determines how the data is stored. There are several available, but I'm not going to cover them all here (see Resources for more information). The default storage backend is Bitcask, an Erlang application that provides an API for storing and retrieving data backed by a hash table, which provides fast access to data; data is persisted. One back-end is perhaps more relevant for this article: the Memory back-end. The Memory back-end uses an in-memory table to store all of its data (internally it uses Erlang's ets tables) and, when enabled, makes Riak behave like an LRU cache with timed expiry. The advantage of using an in-memory store is that it is significantly faster than if you have to go to disk to retrieve the data. When the data is stored in memoryit's not persistedand a node goes down, the data stored in that node will be lost. As you use it as a cache this is less of an issuethe application can always retrieve the data from the databaseas it would be if you used Riak as your primary data store. Riak replicates the data across several nodes in the cluster, so it will still be available. Riak ships with the Memory back-end included. To use the Memory back-end, open app.config for each node in the cluster, locate the property storage_backend and change it from riak_kv_bitcask_backend to riak_kv_memory_backend. Now add the code in Listing 4 to the end of the file. Listing 4. Using the Memory back-end
{memory_backend, [ {max_memory, 4096}, %% 4GB of memory {ttl, 86400} %% Time in seconds ]}

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 5 of 12

developerWorks

ibm.com/developerWorks/

Change the values to whatever is appropriate for your set-up. Restart the nodes in the cluster. It's also possible to run multiple storage back-ends within a Riak cluster. This is useful as it means it's possible to use different back-ends for different buckets. For example, you could configure a bucket (let's call it cache) to use the Memory backend, but for the other bucketsthose that should persist the datato use, say, Bitcask. Now that you have Riak set-up to behave like a cache, you need some way to access the data in the cluster to either update it or possibly invalidate it for some reason (before its expiry time).

Looking for something?


As you have already seen, to retrieve data stored in Riak when using the HTTP interface, you construct a URL consisting of the bucket name and the key of the object you want to retrieve then do an HTTP GET on that URL. This is perfectly adequate when you know what the key is! However, sometimes you either don't know the key of the object you want to retrieve, or you want to retrieve a set of objects satisfying certain criteria. Then you need a way to search for objects held in the cluster. You have already seen how to query data by running a Map/Reduce job over documents that are stored in the cluster. The time taken to execute the query will, in general, be proportional to the number of documents in the cluster; the more documents, the longer it takes to query those documents. This is not a problem for queries that are not time sensitive. By this, I mean queries where the user does not expect to get a reply instantly. For something like search, it's not feasible to (dynamically) search all of the documents every time; it could take minutes or hours to get the results back! Fortunately Riak already has a solution to this problem: Riak Search. Riak Search provides the functionality you need to search documents stored across your cluster. The subject of search is too great to go into in any depth in this article but at a high level it works like this: Documents are tokenized (Riak Search uses standard Lucene analysers) and added to an inverted index. This index is then queried based on the search terms a user enters. As new documents are added, they too are indexed and added to the index. Riak Search is disabled by default. Before you can use it you need to enable it. For each node in your cluster, open up rel/riakN/etc/app.config, locate the property riak_search and set it to true. You will need to restart the nodes in the cluster. Riak allows you to specify the name of a function to run before and after a document is added to a bucket through the use of pre- and post- commit hooks. For example,
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 6 of 12

ibm.com/developerWorks/

developerWorks

you might want to check that a document has particular required fields before adding it to the cluster. To search a document, it needs to be indexed. To do this, install a pre-commit hook on the bucket where the documents are stored. To do that, run the following command: $ rel/riak/bin/search-cmd install <bucket name> This will install a pre-commit hook riak_search_kv_hook on the bucket. Now, whenever a document is added to that bucket, it is analyzed and added to the index. The whitespace analyser is the default analyser; it processes characters into tokens based on whitespace, which then get indexed. A number of different analysers are available and you can also define your own. In many cases, Riak Search knows how to index your data. For example, out-of-thebox, if a JSON object is added to a bucket, the value of each property will be indexed and can be queried using the property name in the query string. See the search example in Listing 5. For more complicated structures it's possible to define your own schema that tells Riak Search how to index your data. When you have some documents indexed you need to be able to issue queries against them. One way is to run a query from the Erlang shell. For example, the query in Listing 5 searches the odds bucket for all bets that are related to horse racing; you do this by querying the description property of the stored item. Listing 5. Searching the odds bucket for bets related to horse racing
$ rel/riak/bin/riak attach search:search(<<"odds">>, <<"description:horse">>).

In addition, Riak Search also provides a Solr-compatible HTTP API for searching documents. Apache Solr is a popular enterprise search server with a REST-like API. By making the API compatible with Solr it should be possible to switch out Solrif you use itand use Riak Search to power your searches instead. For example, to search for the odds for a particular event using the Solr interface, you would do something like this: $ curl "http:localhost:8098/solr/odds/select?
start=0&q=description:horse"

With search set-up, you now can locate items in the data store without knowing the primary key of the items you are looking for.

Conclusion
Other articles in this series
View more articles in the Introducing Riak series.

Riak's ability to scale and reliably replicate dataplus other features such as search makes it an ideal choice to implement a caching solution for heavy-load sites. You can easily integrate it into an existing site. With its ability to serve requests directly,
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 7 of 12

developerWorks

ibm.com/developerWorks/

you can use Riak to reduce and eliminate the load on the application and database servers.

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 8 of 12

ibm.com/developerWorks/

developerWorks

Downloads
Description
Article source code Information about download methods

Name
riakpt2sourcecode.zip

Size
85KB

Download method
HTTP

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 9 of 12

developerWorks

ibm.com/developerWorks/

Resources
Learn Part 1: The language-independent HTTP API: Store and retrieve data using Riak's HTTP interface (Simon Buckle, developerWorks, March 2012): Read this introduction to Riak that covers the basics of storing and retrieving items in Riak using its HTTP API. Read the Riak Search wiki page to learn more about how it works. See what storage back-ends Riak provides and how they differ from each other. Get a list of available client libraries for integrating with Riak. See Basic Cluster Setup and Building a Development Environment for more detailed information on setting-up a 3-node cluster. Read Google's MapReduce: Simplified Data Processing on Large Clusters. Read Introduction to programming in Erlang (Martin Brown, developerWorks, May 2011) and learn about Erlang and how its functional programming style compares with other programming paradigms such as imperative, procedural and object-oriented programming. Read Amazon's Dynamo paper on which Riak is based. Highly recommended! See the article How To Analyze Apache Logs to learn how you can use Riak to process your server logs. Get an explanation of vector clocks and why they are easier to understand than you might think. Find a good explanation of vector clocks and more detailed information on link walking on the Riak wiki. The Project Gutenberg site is a great resource if you need some text resources for experimenting. The Open Source developerWorks zone provides a wealth of information on open source tools and using open source technologies. developerWorks Web development specializes in articles covering various webbased solutions. Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics. Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends. Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers. Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on developerWorks. Get products and technologies Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 10 of 12

ibm.com/developerWorks/

developerWorks

in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently. Discuss Check out developerWorks blogs and get involved in the developerWorks community. Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 11 of 12

developerWorks

ibm.com/developerWorks/

About the author


Simon Buckle Simon Buckle is an independent consultant. His interests include distributed systems, algorithms, and concurrency. He has a Masters Degree in Computing from Imperial College, London. Check out his website at simonbuckle.com.

Copyright IBM Corporation 2012 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/)

Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications

Page 12 of 12

You might also like