Professional Documents
Culture Documents
Using Riak as a caching server to help alleviate the load on application and database servers
Simon Buckle Independent Consultant Freelance Skill Level: Intermediate Date: 15 May 2012
This article is Part 2 of a two-part series about Riak, a highly scalable, distributed data store written in Erlang and based on Dynamo, Amazon's high availability key-value store. For websites with heavy loads, a scalable caching solution can lighten the load on the application and database servers. This particularly applies to data that is read often but updated only occasionally. Explore an indepth example of an online betting site and how you can use Riak to implement a caching solution. You also will learn to integrate Riak with an existing website and look at other Riak features such as search and how to use it to directly serve user requests. You will need a working Riak cluster if you want to follow along with the examples. You can find the steps for setting up a cluster locally in Part 1 of this series. View more content in this series
Introduction
Certain types of data exhibit access patterns that lend themselves to be cached. For example, online betting sites have an interesting load characteristic: odds and bet slips get requested often but are updated relatively infrequently. Other articles in this series
View more articles in the Introducing Riak series.
These situations need a highly scalable system with the following characteristics to cope with the demands of high loads: The system acts as a reliable cache to reduce demand on the application servers and database
Copyright IBM Corporation 2012 Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Trademarks Page 1 of 12
developerWorks
ibm.com/developerWorks/
Cached items are searchable so you can update or invalidate them Any solution is easily integrated into an existing site Riak is a good choice for such a solution. Riak is not the only candidate for implementing such a caching solution; many different caches are available. A popular one is memcached; however, unlike Riak, memcached doesn't provide any kind of data replication, meaning that if the server holding a particular item goes down that item becomes unavailable. Redis, another popular key/value store that could be used as a cache, supports replication through a master-slave configuration; Riak has no concept of a master (node), therefore making the system resilient to failure.
Website integration
Any solution needs to be easily integrated into an existing website. It is important to be able to do this, as it might not be possibleor even desirableto migrate all of your existing data into Riak. As mentioned previously, certain types of data lend themselves to caching, particularly, in the case of a key/value store if you access that data with a primary key. That is the kind of data that is more suitable to migrate to Riak. As mentioned in Part 1 of this series on Riak, a number of client libraries are available in languages such as PHP, Ruby, and Java; the libraries provide an API that makes integrating with Riak very simple. In this example, I demonstrate the use of the PHP library to show how to integrate Riak with an existing website. Figure 1 shows the set-up to consider for this example. I left out details such as load balancing, firewall, and so on. The servers themselves, in this case, are just simple front-end boxes with a LAMP stack installed. I will assume that Riak is only used internally (it's not accessible from the outside) and that it runs in a non-hostile environment, so there are no security related issues such as authentication. This is not such a bad assumption to make as it might seem, as Riak does not have any built-in authorization anyway; you really should delegate authentication and the like to the application. Figure 1. A simple website integration
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 2 of 12
ibm.com/developerWorks/
developerWorks
What follows is a basic example of how you might integrate Riak into your existing website. You will create a simple form, that when submitted, will use the PHP client to store an object in Riak based on the values that were entered in the form. Figure 2 shows an example of a simple form that an administrator might use to create a bet entry in the system. Create this form in HTML and have it do a POST to the PHP script in Listing 1; you can use a similar form in the source code that accompanies this article as a starting point. The "key" field entered in the form will be used as the key to store the object under in the bucket. Figure 2. Example form for creating a bet
Listing 1 has example PHP code that shows how to use the PHP client library to integrate with Riak. Change the path to the PHP client libraryspecified in require_onceto wherever you have installed it. In this case, I just put it in the same directory as the PHP script. By default, all the client libraries expect Riak to be available on port 8098. Listing 1. Example PHP code for integrating with Riak
<?php require_once('./riak.php'); # Could do check here to see if the current user has the # appropriate credentials ? delegated to application. $client = new RiakClient('192.168.1.1', 8098); $bucket = $client->bucket('odds'); $bet = $bucket->newObject($_POST['key']); $data = array( 'odds' => $_POST['odds'], 'description' => $_POST['description'] ); $bet->setData($data); # Save the object to Riak $bet->store(); echo "Thanks!"; ?>
Save the code to a PHP file (call it whatever you like) and upload it and the form to some location on your website, For example, http://www.yoursite.com/riakIntroducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 3 of 12
developerWorks
test.php.
ibm.com/developerWorks/
Fill out the example form and submit it. To prove it did work, try to retrieve the item directly from Riak using the key you entered in the form to create the item (see Listing 2). Listing 2. Retrieving the item from Riak
$ curl -i http://localhost:8098/riak/odds/<key> ... { "odds":"", "description":"" }
Although this integration example used the PHP client, the approach is similar for other languages or application frameworks such as Java or Ruby on Rails.
If you enter some values in the form to create a bet and you submit the form, a JSON object is stored in Riak. The properties of the object will correspond to the fields in the form. You will be redirected to a page that displays the value of the object you just created. Listing 3 shows the code for creating the object from the values you entered. The values key, odds, and description come from the values entered into the form. Listing 3. Example use of the JavaScript client library in Riak
client.bucket("odds", function(bucket) { var key = $('#key').val(); bucket.get_or_new(key, function(status, object) { object.contentType = 'application/json'; object.body = { 'odds': $('#odds').val(), 'description': $('#desc').val() }; object.store(function(status, object, request) { if (status == 'ok') { window.location = "http://localhost:8098/riak/odds/"+key; } else { alert("Failed to create object."); } }); }); });
As mentioned previously, I assume that Riak is running in a trusted environment. In this case there's no security issue from adding pages that store and retrieve items
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 4 of 12
ibm.com/developerWorks/
developerWorks
in Riak; however, you don't want to expose this kind of functionality to the Internet at large without having some form of authentication in place. Although it's a simple example, it gives you an idea how Riak can serve page requests directly. You could, for example, include data stored in Riak directly in your existing web pages either by using a technique such as JSONP or cross-origin resource sharingAJAX requests are restricted to the same server the page resides on by a same domain policyor by proxying requests through your servers to Riak, to fetch the required data.
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 5 of 12
developerWorks
ibm.com/developerWorks/
Change the values to whatever is appropriate for your set-up. Restart the nodes in the cluster. It's also possible to run multiple storage back-ends within a Riak cluster. This is useful as it means it's possible to use different back-ends for different buckets. For example, you could configure a bucket (let's call it cache) to use the Memory backend, but for the other bucketsthose that should persist the datato use, say, Bitcask. Now that you have Riak set-up to behave like a cache, you need some way to access the data in the cluster to either update it or possibly invalidate it for some reason (before its expiry time).
ibm.com/developerWorks/
developerWorks
you might want to check that a document has particular required fields before adding it to the cluster. To search a document, it needs to be indexed. To do this, install a pre-commit hook on the bucket where the documents are stored. To do that, run the following command: $ rel/riak/bin/search-cmd install <bucket name> This will install a pre-commit hook riak_search_kv_hook on the bucket. Now, whenever a document is added to that bucket, it is analyzed and added to the index. The whitespace analyser is the default analyser; it processes characters into tokens based on whitespace, which then get indexed. A number of different analysers are available and you can also define your own. In many cases, Riak Search knows how to index your data. For example, out-of-thebox, if a JSON object is added to a bucket, the value of each property will be indexed and can be queried using the property name in the query string. See the search example in Listing 5. For more complicated structures it's possible to define your own schema that tells Riak Search how to index your data. When you have some documents indexed you need to be able to issue queries against them. One way is to run a query from the Erlang shell. For example, the query in Listing 5 searches the odds bucket for all bets that are related to horse racing; you do this by querying the description property of the stored item. Listing 5. Searching the odds bucket for bets related to horse racing
$ rel/riak/bin/riak attach search:search(<<"odds">>, <<"description:horse">>).
In addition, Riak Search also provides a Solr-compatible HTTP API for searching documents. Apache Solr is a popular enterprise search server with a REST-like API. By making the API compatible with Solr it should be possible to switch out Solrif you use itand use Riak Search to power your searches instead. For example, to search for the odds for a particular event using the Solr interface, you would do something like this: $ curl "http:localhost:8098/solr/odds/select?
start=0&q=description:horse"
With search set-up, you now can locate items in the data store without knowing the primary key of the items you are looking for.
Conclusion
Other articles in this series
View more articles in the Introducing Riak series.
Riak's ability to scale and reliably replicate dataplus other features such as search makes it an ideal choice to implement a caching solution for heavy-load sites. You can easily integrate it into an existing site. With its ability to serve requests directly,
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 7 of 12
developerWorks
ibm.com/developerWorks/
you can use Riak to reduce and eliminate the load on the application and database servers.
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 8 of 12
ibm.com/developerWorks/
developerWorks
Downloads
Description
Article source code Information about download methods
Name
riakpt2sourcecode.zip
Size
85KB
Download method
HTTP
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 9 of 12
developerWorks
ibm.com/developerWorks/
Resources
Learn Part 1: The language-independent HTTP API: Store and retrieve data using Riak's HTTP interface (Simon Buckle, developerWorks, March 2012): Read this introduction to Riak that covers the basics of storing and retrieving items in Riak using its HTTP API. Read the Riak Search wiki page to learn more about how it works. See what storage back-ends Riak provides and how they differ from each other. Get a list of available client libraries for integrating with Riak. See Basic Cluster Setup and Building a Development Environment for more detailed information on setting-up a 3-node cluster. Read Google's MapReduce: Simplified Data Processing on Large Clusters. Read Introduction to programming in Erlang (Martin Brown, developerWorks, May 2011) and learn about Erlang and how its functional programming style compares with other programming paradigms such as imperative, procedural and object-oriented programming. Read Amazon's Dynamo paper on which Riak is based. Highly recommended! See the article How To Analyze Apache Logs to learn how you can use Riak to process your server logs. Get an explanation of vector clocks and why they are easier to understand than you might think. Find a good explanation of vector clocks and more detailed information on link walking on the Riak wiki. The Project Gutenberg site is a great resource if you need some text resources for experimenting. The Open Source developerWorks zone provides a wealth of information on open source tools and using open source technologies. developerWorks Web development specializes in articles covering various webbased solutions. Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics. Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends. Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers. Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on developerWorks. Get products and technologies Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications Page 10 of 12
ibm.com/developerWorks/
developerWorks
in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently. Discuss Check out developerWorks blogs and get involved in the developerWorks community. Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 11 of 12
developerWorks
ibm.com/developerWorks/
Introducing Riak, Part 2: Integrating Riak as a heavyduty caching server for web applications
Page 12 of 12