You are on page 1of 10
Windows Azure Storage Abstractions and their Scalability Targets sZ orage 10 May 2010 2:41PM | @8 The four object abstractions Windows Azure Storage provides for application developers are: + Blobs ~ Provides a simple interface for storing named files along with metadata for the file. + Tables - Provides massively scalable structured storage. A Table is a set of entities, which contain a set of properties. An application can manipulate the entities and query over any of the properties stored in a Table. + Queues - Provide reliable storage and delivery of messages for an application to build loosely coupled and scalable workflow between the different parts (roles) of your application. + Drives - Provides durable NTFS volumes for Windows Azure applications to use. This allows applications to use existing NTFS APls to access a network attached durable drive. Each drive is a network attached Page Blob formatted as a single volume NTFS VHD. In this post, we do not focus on drives, since their scalability is that of a single blob. The following shows the Windows Azure Storage abstractions and the Uris used for Blobs, Tables and Queues. In this post we will (a) go through each of these concepts, (b) describe how they are partitioned (c) and then talk about the scalability targets for these storage abstractions. Windows Azure Storage Concepts a ; Blobs — [ee ee a SU ed Account - r = 10 lStatdiaics — Te od Queue Kose texos3 coal es cel alee ae a eo Storage Accounts and Picking their Locations In order to access any of the storage abstractions you first need to create a storage account by going to the Windows Azure Developer Portal. When creating the storage account you can specify what location to place your storage account in. The six locations we currently offer are: US North Central US South Central Europe North Europe West Asia East Asia Southeast OuUaWNE ‘As a best practice, you should choose the same location for your storage account and your hosted services, which you can also do in the Developer Portal. This allows the computation to have high bandwidth and low latency to storage, and the bandwidth is free between computation and storage in the same location. Then also shown in the above slide is the Uri used to access each data object, which is: + Blobs © /http://accountName.blob.core.windows.net// + Tables © http://accountName.table.core.windows.net/ * Queues: © http://accountName.queue.core.windows.net/ The first thing to notice is that the storage account name you registered in the Developer Portal is the first part of the hostname. This is used via DNS to direct your request to the location that holds all of the storage data for that storage account. Therefore, all of the requests to that storage account (inserts, updates, deletes, and gets) go to that location to access your data, Finally, notice in the above hostnames the keyword “blob”, “table” and “queue”. This directs your request to the appropriate Blob, Table and Queue service in that location. Note, since the Blob, Table and Queue are separate services, they each have their own namespace under the storage account. This means in the same storage account you can have a Blob Container, Table and Queue each called “music”, Now that you have a storage account, you can store all of your blobs, entities and messages in that storage account. A storage account can hold up to 100TBs of data in it. There is no other storage capacity limit for a storage account. In particular, there is no limit on the number of Blob Containers, Blobs, Tables, Entities, Queues or Messages that can be stored in the account, other than they must all add up to be under 100TBs. Windows Azure Blobs The figure below depicts the storage concepts of Windows Azure Blob, where we have a storage account called “cohowinery” and inside of this account we created a Blob Container called “images” and put two pictures in that blob container called “pic01,jpq” and “pic02.jpq”. We also created a second blob container called “videos” and stored a blob called “vidi.avi” there. Blob Storage Concepts PICO1JPG Tin le, = 1 (oor fe) Reet + Storage Account - All access to Windows Azure Storage is done through a storage account. © This is the highest level of the namespace for accessing blobs ° An account can have many Blob Containers + Blob Container - A container provides a grouping of a set of blobs. The container name is scoped by the account. ® Sharing policies are set at the container level, where a container can be set to private or to be publically accessible. When a container is set to Public, all its contents can be read by anyone without requiring authentication. When a container is Private, authentication is required to access the blobs in that container. e Containers can also have metadata associated with them. Metadata is in the form of pairs, and they are up to 8KB in size per container. © The ability to list all of the blobs within the container is also provided. + Blob - Blobs are stored in and scoped by Blob Containers. Blobs can have metadata associated with them, which are pairs, and they are up to 8KB in size per blob. The blob metadata can be set and retrieved separately from the blob data bits. The above namespace is used to perform all access to Windows Azure Blobs. The URI for a specific blob is structured as follows: http://.blob.core.windows.net// The storage account name is specified as the first part of the hostname followed by the keyword “blob”. This sends the request to the part of Windows Azure Storage that handles blob requests. The host name is followed by the container name, followed by "/”, and then the blob name. Accounts and containers have naming restrictions, for example, the container name cannot contain a “/”. There are two types of blobs supported: + Block Blobs - targeted at streaming workloads. © Each blob consists of a sequence/list of blocks. e Max block blob size is 200GB © Commit-based Update Semantics - Modifying a block blob is a two-phase update process. It first consists of uploading blocks as uncommitted blocks for a blob. Then after they are all uploaded, the blocks to add/change/remove are committed a PutBlockList to create the updated blob. Therefore, updating a block blob is a two-phase update process where you upload all changes, and then commit them atomically. © Range reads can be from any byte offset in the blob. . lobs - targeted at random write workloads. © Each blob consists of an array/index of pages, © Max page blob size is 1TB e Immediate Update Semantics - As soon as a write request for a sequential set of pages succeeds in the blob service, the write has committed, and success is returned back to the client. The update is immediate, so there is no commit step as there is for block blobs. © Range reads can be done from any byte offset in the blob. Windows Azure Tables The figure below depicts the storage concepts for Windows Azure Tables, where we have a storage account called “cohowinery” and inside of this account we created a Table called “customers” and put entities representing customers into that table, where the entities have properties like their “name”, “email”, etc. We also created a table called “winephotos” and the entities stored in that table contain properties of “PhotoID”, “Date”, etc. ble Storage Concepts ‘The following summarizes the data model for Windows Azure Table: + Storage Account - All access to Windows Azure Storage is done through a storage account. © This is the highest level of the namespace for accessing tables © An account can have many Tables + Table - contains a set of entities. Table names are scoped by the account. An application may create many tables within a storage account. + Entity (Row) - Entities (an entity is analogous to a "row") are the basic data items stored ina table. An entity contains a set of properties. Each table has two properties, “Partitionkey and RowKey”, which form the unique key for the entity. © An entity can hold up to 255 properties © Combined size of alll of the properties in an entity cannot exceed 1MB. This size

You might also like