You are on page 1of 7

Amazon Simple Storage Service (S3)

Amazon Simple Storage Service (S3) is cloud-based persistent storage. It operates independently
from other Amazon services. In fact, applications you write for hosting on your own servers can
leverage Amazon S3 without any need to otherwise “be in the cloud.”
When Amazon refers to S3 as “simple storage,” they are referring to the feature set—not its ease
of use. Amazon S3 enables you to simply put data in the cloud and pull it back out. You do not
need to know anything about how it is stored or where it is actually stored.
You are making a terrible mistake if you think of Amazon S3 as a remote file system. Amazon
S3 is, in many ways, much more primitive than a file system. In fact, you don’t really store
“files”—you store objects. Furthermore, you store objects in buckets, not directories. Although
these distinctions may appear to be semantic, they include a number of important differences:
• Objects stored in S3 can be no larger than 5 GB.
• Buckets exist in a flat namespace shared among all Amazon S3 users. You cannot create
“sub-buckets,” and you must be careful of namespace clashes.
• You can make your buckets and objects available to the general public for viewing.
• Without third-party tools, you cannot “mount” S3 storage. In fact, I am not fond of the use
of third-party tools to mount S3, because S3 is so conceptually different from a file system
that I believe it is bad form to treat it as such.
Access to S3
Before accessing S3, you need to sign up for an Amazon Web Services account. You can ask for
default storage in either the United States or Europe. Where you store your data is not simply a
function of where you live. As we discuss later in this book, regulatory and privacy concerns will
impact the decision of where you want to store your cloud data. For this chapter, I suggest you
just use the storage closest to where your access to S3 will originate.
Web Services
Amazon makes S3 available through both a SOAP API and a REST API. Although developers
tend to be more familiar with creating web services via SOAP, REST is the preferred mechanism
for accessing S3 due to difficulties processing large binary objects in the SOAP API. Specifically,
SOAP limits the object size you can manage in S3 and limits any processing (such as a transfer
status bar) you might want to perform on the data streams as they travel to and from S3.
The Amazon Web Services APIs support the ability to:
• Find buckets and objects
• Discover their metadata
• Create new buckets
• Upload new objects
• Delete existing buckets and objects
When manipulating your buckets, you can optionally specify the location in which the bucket’s
contents should be stored.
Unless you need truly fine-grained control over interaction with S3, I recommend using an API
wrapper for your language of choice that abstracts out the S3 REST API. My teams use Jets3t
when doing Java development. For the purposes of getting started with Amazon S3, however, you
will definitely want to download the s3cmd command-line client for Amazon S3
(http://s3tools.logix.cz/s3cmd ). It provides a command-line wrapper around the S3 access web
services. This tool also happens to be written in Python, which means you can read the source to
see an excellent example of writing a Python application for S3.
BitTorrent
Amazon also provides BitTorrent access into Amazon S3. BitTorrent is a peer-to-peer (P2P)
filesharing protocol. Because BitTorrent is a standard protocol for sharing large binary assets, a
number of clients and applications exist on the market to consume and publish data via
BitTorrent. If your application can leverage this built-in infrastructure, it may make sense to take
advantage of the Amazon S3 BitTorrent support. In general, however, transactional web
applications won’t use BitTorrent to interact with S3.
S3 in Action
To illustrate S3 in action, we will use the s3cmd utility to transfer files in and out of S3. The
commands supported by this tool are mirrors of the underlying web services APIs. Once you
download the utility, you will need to configure it with your S3 access key and S3 secret key.
Whether you are using this tool or another tool, you will always need these keys to access your
private buckets in S3. The first thing you must do with Amazon S3 is create a bucket in which
you can store objects:
s3cmd mb s3://BUCKET
This command creates a bucket with the name you specify. As I noted earlier in the chapter, the
namespace for your bucket is shared across all Amazon customers. Unless you are the first person
to read this book, it is very unlikely that the command just shown will succeed unless you replace
the name BUCKET with something likely to be unique to you. Many users prefix their buckets
with something unique to them, such as their domain name. Keep in mind, however, that
whatever standard you pick, nothing stops other users from stepping over your naming
convention.

You might also like