You are on page 1of 4

Priority Qualities in Cloud Databases

Nickleus G. Jimenez
De La Salle University

nickleus_jimenz@dlsu.ph ABSTRACT
This paper discusses the important qualities of cloud databases. These qualities blends well with the other components in a vast cloud system. The qualities are describe to justify why they are needed in a DBMS in cloud systems. suggested possible improvements that can be done to a DBMSs so that they can be used in data centric cloud applications.

2.0 Scalability
It was raised that traditional DBMSs are not cloud friendly since they cannot be easily scaled unlike the other components of cloud services such as web and application servers. The claim is supported by the lack of tools and guidelines in scaling databases from a system with a few machines to a systems that uses thousands of machines to contain them. It can be learned from here that traditional DBMSs do not provide the capabilities to support the large scale of data cloud services have to contain and process. Traditional DBMS also have limited concurrent connections available before experiencing failure hence they may be a liability if used in a cloud system that requires thousands to millions of concurrent connections. The large scale of data needed in cloud services are allocated to the gargantuan amount of possible end users of the system. Nevertheless, cloud application frameworks such as the Amazon Web Service (AWS), Microsoft Azure and Google App Engine still need DBMSs for handling the data in the applications. Cloud service providers have to use a DBMS that can be used and expanded to handle a great amount of workload the system has to deal with. The DBMSs must have the scalable, elastic and fault-tolerant qualities to match with the other components of the system they are a part of. The DBMSs in cloud frameworks used in the services are not traditional DBMSs, they used what is referred as keyvalues stores. It is the concept behind Cloud DBMS. [4] It was noted that each framework has their own implementation of the key-values store. One system that uses key-store values is Bigtable which is used by Google in their cloud systems to provide a flexible, high-performance processing in Google products such as web search and Google Earth. [5] In Cloud DBMSs, key-values become entities. Each entity is considered as an independent unit of information. As a result, the units can be freely transferred to different machines. [1] Henceforth, cloud DBMSs can run on hundreds to thousands of machines that are serving as nodes in a distributed system. The machines can be regular desktop computers that can be added to a server rack so it can contribute in the system as a node. The collection of nodes can serve huge amounts of data that can scale into petabytes. [6] Each node maintains a subset of the whole database while still being independent from one another. The key-values proved to be useful in retrieving data based on primary keys. [7] Meanwhile, in traditional DBMSs the database is treated as a whole and the DBMS must guarantee data consistency. Another trait of key-value stores is that atomicity of application and user access are assured at a

Categories and Subject Descriptors


Database, DBMS, Cloud

General Terms
Design, Management

1.0 Introduction
Cloud computing has gained popularity in deploying web applications since it deals with delivering computing and storage capacity which are demanded components in running web applications. By storing ones data in the cloud, he/she can access it on their device connected to the internet. This allows people to see their data across their different devices such as desktops, laptops and smartphones they use in different locations. The presentation of the data may change due to the browser or client application which the data is transmitted into. However, the content in the cloud is accessible to the internet enabled devices. It was mentioned that the web applications in the cloud are data driven so important attention is given to the Database Management System (DBMS) of these applications to keep them functional. [1] Additionally, cloud databases must scale to large operations, provide elasticity, automate management and minimize resource consumption. These are added to the requirements of the system to be fault-tolerant or reliable and highly available so that users can get the data when they need it. [1] The management of data in the cloud happened because of the reported benefits. One benefit reported was the economics of letting a company provide the software and hardware infrastructure to contain data. However, there are also complications and new challenges database systems in the cloud. The three management challenges cited in the Clermont report were the limited human intervention, highvariance workloads and the variety of shared infrastructures. [2] Another problem to manage is dirty data in the databases which is complicated by the larger scale of databases in cloud computing. [3] There must be solutions to provide these qualities in DBMSs in data centers that hosts cloud services so that the users will not be troubled of possible failures. Database research has

single-key level. [1] The characteristics of Cloud DBMSs blend well with web applications that uses cloud computing technology especially the scalability.

3.0 Elasticity
Cloud systems and their components including the database/s must continue to run when nodes are being added or removed. Nodes are added or removed since the scale of the system changes. The capability of changing the scale of the system dynamically without interrupting the system from working is called elasticity. The problem is that traditional DBMSs are not designed to be elastic to work on new machines on demand without reconfiguration. The removal or insertion can mean physically removing or adding the machine in the system or simply turning off or on the machine. [1] To make the databases elastic, they must be ready to migrate to other clusters of machines to share the workload without disrupt the system from running. Migrating the databases or parts of them without halting the service is called live database migration. Efficient live migration techniques have been suggested to improve elasticity of the DBMSs in the cloud system. One technique published is the Iterative Copy which migrates the databases in the machines on-demand and in a speed that will not halt the system and its nodes from doing other tasks. [8] Most Key-value Stores like Googles Bigtable and Amazons Dynamo support migration of databases for faulttolerance and load balancing. People from Google claimed that Bigtable can scale from the increase in number of machines to the system as their resource demands change over time. The capability makes Bigtable elastic in distributing structured data.

On Demand migration. This technique transfer minimal information during a fast stop and copy migration. New transactions execute at the destination DBMS once a cell (part of the database to be migrated) is available at the destination. This technique effectively reduces service interruption but it comes with a high post migration overhead resulting from page faults. Another problem occurs during recovery in the presence of failures is also complicated and would require costly synchronization between the source and destination servers. This makes the technique unideal for cloud systems since a fault must not affect the rest of the system from running. [8] Iterative Copy. This live database migration technique is used in a shared DBMS architecture. The persistent data of a unit which is sometimes called a partition is stored in the shared storage like a Network Access Storage (NAS) and does not need migration. This technique focuses on transferring the main memory state of the partition so that the partition restarts warm in the destination. The focus minimizes service interruption and guarantees low migration overhead making it useful in cloud systems that needs to satisfy lots of demand. The technique also allows active transactions during migration to continue executing at the destination while being also noted to maximize availability. [1] The results and traits of it make it a live database migration technique since it can be used in real time while system is running. [8] The partition transferred contains the cache database state which is referred as the DB state and the transaction state. This gives the destination node enough ability to process the data as if it is being done in the source destination. [1] Independent Copy. It is reported to detach a unit/partition in the database from the DBMS node that contain it so it can be transferred into another node. The transfer is done through a network like most of the techniques discussed since cloud system are implemented as a distributed system. The owner of the unit can process the data in that unit. The technique also allows the system to regularly use migration as a tool for elastic load balancing. The only disruption that occurs is on the owner of the unit of data. [8] Zephyr. This live database migration technique is used in a system designed to have nodes to use their own local storage for storing their database partition. This can mean that a machine has its own hardisk or storage component that cannot be accessed by other machines. It uses a synchronized phase that allows both the source and destination to simultaneously execute transactions. As a result, Zephyr minimizes service interruption. Afterwards, it uses a combination of on-demand asynchronous calls to push or pull data to let the source node complete the execution of the active transactions. Meanwhile, Zephyr also allows the execution of new transactions in the destination node. The technique can be used in a variety of DBMS implementations because it uses standard tree based indices and lock based concurrency control. It is claimed that there is flexibility in selecting a destination node since Zephyr does not rely on replication in the database layer. This makes it useful in a cloud system that has a massive

2.1 Definition of Live Database Migration


Live Database Migration is an operation involving the migration of parts of a DBMS while the system is running. Migration has to be done during system operations to provide elasticity and to satisfy the users who are promised to use the system at any time. Migration must not have any significant impact on the entire system to be effectively used for elasticity. It suggest that negligible effect on performance and minimal service interruption during migration to have quality service. [1]

2.2 Example of Migration Techniques


Stop and Copy. It works by stopping a unit at the source DBMS node then move the data to the designated node. The technique is simple but waiting time can be significantly long. Furthermore, the entire database cache is lost when the cell is restarted at the destination node. This results in a high overhead for warming up the database cache. The overhead causes a disruption in service thus making it unsuitable for cloud systems. [8]

amount of nodes around the world in different data centers. However, considerable performance improvement is plausible with replication. [1]`

monitoring the behavior and performance of the system. Some implementations like Google Analytics can model behavior to forecast workload spikes thus enabling proactive actions for handling those spikes. [1]

3.0 Manageability of Cloud DBMS


Cloud DBMSs need self-management capabilities since the scale is so big administrators cannot supervise the entire distributed system. Human administrators are limited by their physical presence. They cannot go to one node to another node in a different data center. There has to be a component of the system that manages itself so that the burdens of the administrators can be reduced. Automating the management maintains the entire system which can have infrastructures from different data centers around the world. The data centers can have massive amounts of servers which makes it too difficult for administrators. [1] Having the system manage itself reduces the required number of personnel to maintain the system as well as reduces the work of the administrators.

4.0 Dirty Data Management


Dirty data is understood as data that contains errors and irrelevance. [10] One capability that is given importance in developing cloud database systems is the ability to remove dirty data automatically. There is a greater importance in managing data in a cloud database because there is a higher possibility of dirty data in cloud systems due to the colossal amount of data processed. The system must get rid of the useless data to make room for relevant data and maximize the resources of the infrastructure. Unfortunately, Traditional cleaning methods are not designed for the demands of cloud systems. Implementing data cleaning may just be a liability for system efficiency. [3]. Consequently, cloud DBMSs need a way to remove dirty data that will have minimal impact on performance and quality of service to the users. A data storage structure for effective and efficient dirty data management in cloud databases is given as a research topic. One suggestion is three-level index for the query processing. The proper nodes that contain the dirty data are found based on the index. Once a node is found to have dirty data. The identified unclean data are removed from the system. The result from experimenting on it shows its practical capabilities in cloud databases so that dirty data can be reduced in the system. [3]

3.1 Automated System Manager


The automated manager system can be referred as the system controller. Regardless of the name, it is an intelligent component in the cloud system that lessens the burden of administrative and maintenance personnel in managing a massive system as well as keeping the system running smoothly. [1] The intelligent system controller different aspects in the database system to automate it manageably. The various aspects include the resource of a partition of the database. The resources needed to process data, the availability of the hardware, the paths to the available node for tasks, failure logs failure plans and much more scenarios. The various resources shared in the system must be managed in a way that will maximize resources in doing gigantic number of tasks. [8] One example of an automated controller for cloud database systems is the one used in Amazon Web Service (AWS). It is stated that it can balance load automatically so that the resources are maximized. It can also ensure that there is adequate amount of nodes for the hosted application using the database/s in the servers. It is also reported to scale the use and resources allocated to a user automatically due to a supervising node. [9]

5.0 Fault Tolerance


One of the desired qualities in cloud database systems is tolerating faults gracefully and allowing the system to run even when multiple faults occur at the same time. Faults are normal in a colossal cloud system so the system has to manage the faults while still running since a halt in service can harm user experience. [11] Fault tolerance becomes vital in web applications because it allows the application to run at any time even when problems in the system occur hence, pleasing users around the world. In the context of transaction workloads, fault tolerance is described as the ability to recover from a failure without losing any data or updates from recently committed transactions. A distributed database must commit transactions and make progress on a workload even during a worker node failure. [12] An example of a cloud system is the AWS. Amazon claims that failures in their web service can be dealt with automatically without disrupting the cloud applications they are hosting as if the failure did not occur. Simple DB is the DBMS used by the system. It is claimed to have features that make it a fault-tolerant and durable DBMS. It becomes fault tolerant since data is stored redundantly without single points of failure. When one node fails, another copy from another node is then used for the needed process. This might

3.2 Roles of an Automated Manager


In terms of database systems, the priority role of the automated manager in a cloud system is load distribution. This is made even more challenging by the high variant workloads the system has to handle. Other significant roles are for ensuring scalability and elasticity of the system. The other capabilities of the automated manager involves administrating operations. These operations include

take up more space but having other copies of data allows the system to do the job when errors occur. Furthermore, it works well with the other components in the AWS so the resources are maximized and that the database can be partitioned properly. [9]

Structured Data," in Seventh Symposium on Operating System Design and Implementation,, Berkeley, 2006. [6] C.-R. Chang, L.-Y. Ho, J.-J. Wu and P. Liu, "SQLMR : A Scalable Database Management System for Cloud Computing," in International Conference on Parallel Processing 2011, Taipei, 2011. [7] G. Chen, H. T. Vo, S. Wu, B. C. Ooi and T. zsu, "A Framework for Supporting DBMS-like Indexes in the Cloud," in 37th international conference on very large data bases, Seattle, 2011. [8] S. Das, S. Nishimura, D. Agrawal and A. E. Abbadi, "Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms," in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, New York, 2011. [9] J. Barr, A. Narin and J. Varia, "Building Fault-Tolerant Applications on AWS," Amazon, October 2011. [Online]. Available: http://media.amazonwebservices.com/AWS_Building_Faul t_Tolerant_Applications.pdf. [Accessed 11 August 2012]. [10] webopedia, "What is Dirty Data?," 2012. [Online]. Available: http://www.webopedia.com/TERM/D/dirty_data.html. [Accessed 10 August 2012]. [11] S. Das, S. Agarwa, D. Agrawal and A. E. Abbadi, "ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud," UCSB, California, 2010. [12] D. J. Abadi, "Data Management in the Cloud: Limitations and Opportunities," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 32, no. 1, pp. 3-12, 2009.

Conclusion
Cloud DBMS have to be differ from traditional DBMS due to the changes in scenarios in their respective systems. Databases for cloud system may have introduced new problems but new solutions from various researchers and companies were produced in order to avail the technology and benefits to users connected to the internet. The priority traits to be given in a cloud database is the same to the other components in a cloud system. These traits are scalability, elasticity, easy maintenance and fault tolerant.

Acknowledgements
Many thanks to Remedios Bulos for reviewing this paper and providing the guidelines and suggestions in writing this paper. Acknowledgements also for De La Salle University for having the resources for obtaining the references needed to write this paper.

References
[1] D. Agrawal, A. E. Abbadi, S. Das and A. J. Elmore, "Database Scalability, Elasticity, and Autonomy in the Cloud," in DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications, Berlin, 2011. [2] Rakesh Agrawal, et al, "Clermont Report on Database Research," 2011. [Online]. Available: http://f1.grp.yahoofs.com/v1/QEkmUCwrrJLIjoaZRbsTn5 DmvinvQYlpnMAJaTAWkvmDt5QNvWVARbTF9PuGRfKLQOjSwEtf02Cmwmm2 c_SXWnR2AGaBw/ADVANDB/Research%20Papers/The %20Claremont%20Report%20on%20Database%20Researc h.pdf. [Accessed 1 June 2012]. [3] H. Wang, J. Li, J. Wang and a. H. Gao, "Dirty Data Management in Cloud Database," in Grid and Cloud Database Management,, S. Fiore and G. Aloisio, Eds., Berlin, Springer-Verlag, 2011, pp. 133-150. [4] The Art of Service, "Data Base Management Systems (DBMS) as a Cloud Service," 10 June 2011. [Online]. Available: http://artofservice.com.au/data-basemanagement-systems-dbms-as-a-cloud-service/. [Accessed 4 August 2012]. [5] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes and a. R. E. Gruber, "Bigtable: A Distributed Storage System for

You might also like