Professional Documents
Culture Documents
Nickleus G. Jimenez
De La Salle University
nickleus_jimenz@dlsu.ph ABSTRACT
This paper discusses the important qualities of cloud databases. These qualities blends well with the other components in a vast cloud system. The qualities are describe to justify why they are needed in a DBMS in cloud systems. suggested possible improvements that can be done to a DBMSs so that they can be used in data centric cloud applications.
2.0 Scalability
It was raised that traditional DBMSs are not cloud friendly since they cannot be easily scaled unlike the other components of cloud services such as web and application servers. The claim is supported by the lack of tools and guidelines in scaling databases from a system with a few machines to a systems that uses thousands of machines to contain them. It can be learned from here that traditional DBMSs do not provide the capabilities to support the large scale of data cloud services have to contain and process. Traditional DBMS also have limited concurrent connections available before experiencing failure hence they may be a liability if used in a cloud system that requires thousands to millions of concurrent connections. The large scale of data needed in cloud services are allocated to the gargantuan amount of possible end users of the system. Nevertheless, cloud application frameworks such as the Amazon Web Service (AWS), Microsoft Azure and Google App Engine still need DBMSs for handling the data in the applications. Cloud service providers have to use a DBMS that can be used and expanded to handle a great amount of workload the system has to deal with. The DBMSs must have the scalable, elastic and fault-tolerant qualities to match with the other components of the system they are a part of. The DBMSs in cloud frameworks used in the services are not traditional DBMSs, they used what is referred as keyvalues stores. It is the concept behind Cloud DBMS. [4] It was noted that each framework has their own implementation of the key-values store. One system that uses key-store values is Bigtable which is used by Google in their cloud systems to provide a flexible, high-performance processing in Google products such as web search and Google Earth. [5] In Cloud DBMSs, key-values become entities. Each entity is considered as an independent unit of information. As a result, the units can be freely transferred to different machines. [1] Henceforth, cloud DBMSs can run on hundreds to thousands of machines that are serving as nodes in a distributed system. The machines can be regular desktop computers that can be added to a server rack so it can contribute in the system as a node. The collection of nodes can serve huge amounts of data that can scale into petabytes. [6] Each node maintains a subset of the whole database while still being independent from one another. The key-values proved to be useful in retrieving data based on primary keys. [7] Meanwhile, in traditional DBMSs the database is treated as a whole and the DBMS must guarantee data consistency. Another trait of key-value stores is that atomicity of application and user access are assured at a
General Terms
Design, Management
1.0 Introduction
Cloud computing has gained popularity in deploying web applications since it deals with delivering computing and storage capacity which are demanded components in running web applications. By storing ones data in the cloud, he/she can access it on their device connected to the internet. This allows people to see their data across their different devices such as desktops, laptops and smartphones they use in different locations. The presentation of the data may change due to the browser or client application which the data is transmitted into. However, the content in the cloud is accessible to the internet enabled devices. It was mentioned that the web applications in the cloud are data driven so important attention is given to the Database Management System (DBMS) of these applications to keep them functional. [1] Additionally, cloud databases must scale to large operations, provide elasticity, automate management and minimize resource consumption. These are added to the requirements of the system to be fault-tolerant or reliable and highly available so that users can get the data when they need it. [1] The management of data in the cloud happened because of the reported benefits. One benefit reported was the economics of letting a company provide the software and hardware infrastructure to contain data. However, there are also complications and new challenges database systems in the cloud. The three management challenges cited in the Clermont report were the limited human intervention, highvariance workloads and the variety of shared infrastructures. [2] Another problem to manage is dirty data in the databases which is complicated by the larger scale of databases in cloud computing. [3] There must be solutions to provide these qualities in DBMSs in data centers that hosts cloud services so that the users will not be troubled of possible failures. Database research has
single-key level. [1] The characteristics of Cloud DBMSs blend well with web applications that uses cloud computing technology especially the scalability.
3.0 Elasticity
Cloud systems and their components including the database/s must continue to run when nodes are being added or removed. Nodes are added or removed since the scale of the system changes. The capability of changing the scale of the system dynamically without interrupting the system from working is called elasticity. The problem is that traditional DBMSs are not designed to be elastic to work on new machines on demand without reconfiguration. The removal or insertion can mean physically removing or adding the machine in the system or simply turning off or on the machine. [1] To make the databases elastic, they must be ready to migrate to other clusters of machines to share the workload without disrupt the system from running. Migrating the databases or parts of them without halting the service is called live database migration. Efficient live migration techniques have been suggested to improve elasticity of the DBMSs in the cloud system. One technique published is the Iterative Copy which migrates the databases in the machines on-demand and in a speed that will not halt the system and its nodes from doing other tasks. [8] Most Key-value Stores like Googles Bigtable and Amazons Dynamo support migration of databases for faulttolerance and load balancing. People from Google claimed that Bigtable can scale from the increase in number of machines to the system as their resource demands change over time. The capability makes Bigtable elastic in distributing structured data.
On Demand migration. This technique transfer minimal information during a fast stop and copy migration. New transactions execute at the destination DBMS once a cell (part of the database to be migrated) is available at the destination. This technique effectively reduces service interruption but it comes with a high post migration overhead resulting from page faults. Another problem occurs during recovery in the presence of failures is also complicated and would require costly synchronization between the source and destination servers. This makes the technique unideal for cloud systems since a fault must not affect the rest of the system from running. [8] Iterative Copy. This live database migration technique is used in a shared DBMS architecture. The persistent data of a unit which is sometimes called a partition is stored in the shared storage like a Network Access Storage (NAS) and does not need migration. This technique focuses on transferring the main memory state of the partition so that the partition restarts warm in the destination. The focus minimizes service interruption and guarantees low migration overhead making it useful in cloud systems that needs to satisfy lots of demand. The technique also allows active transactions during migration to continue executing at the destination while being also noted to maximize availability. [1] The results and traits of it make it a live database migration technique since it can be used in real time while system is running. [8] The partition transferred contains the cache database state which is referred as the DB state and the transaction state. This gives the destination node enough ability to process the data as if it is being done in the source destination. [1] Independent Copy. It is reported to detach a unit/partition in the database from the DBMS node that contain it so it can be transferred into another node. The transfer is done through a network like most of the techniques discussed since cloud system are implemented as a distributed system. The owner of the unit can process the data in that unit. The technique also allows the system to regularly use migration as a tool for elastic load balancing. The only disruption that occurs is on the owner of the unit of data. [8] Zephyr. This live database migration technique is used in a system designed to have nodes to use their own local storage for storing their database partition. This can mean that a machine has its own hardisk or storage component that cannot be accessed by other machines. It uses a synchronized phase that allows both the source and destination to simultaneously execute transactions. As a result, Zephyr minimizes service interruption. Afterwards, it uses a combination of on-demand asynchronous calls to push or pull data to let the source node complete the execution of the active transactions. Meanwhile, Zephyr also allows the execution of new transactions in the destination node. The technique can be used in a variety of DBMS implementations because it uses standard tree based indices and lock based concurrency control. It is claimed that there is flexibility in selecting a destination node since Zephyr does not rely on replication in the database layer. This makes it useful in a cloud system that has a massive
amount of nodes around the world in different data centers. However, considerable performance improvement is plausible with replication. [1]`
monitoring the behavior and performance of the system. Some implementations like Google Analytics can model behavior to forecast workload spikes thus enabling proactive actions for handling those spikes. [1]
take up more space but having other copies of data allows the system to do the job when errors occur. Furthermore, it works well with the other components in the AWS so the resources are maximized and that the database can be partitioned properly. [9]
Structured Data," in Seventh Symposium on Operating System Design and Implementation,, Berkeley, 2006. [6] C.-R. Chang, L.-Y. Ho, J.-J. Wu and P. Liu, "SQLMR : A Scalable Database Management System for Cloud Computing," in International Conference on Parallel Processing 2011, Taipei, 2011. [7] G. Chen, H. T. Vo, S. Wu, B. C. Ooi and T. zsu, "A Framework for Supporting DBMS-like Indexes in the Cloud," in 37th international conference on very large data bases, Seattle, 2011. [8] S. Das, S. Nishimura, D. Agrawal and A. E. Abbadi, "Live Database Migration for Elasticity in a Multitenant Database for Cloud Platforms," in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, New York, 2011. [9] J. Barr, A. Narin and J. Varia, "Building Fault-Tolerant Applications on AWS," Amazon, October 2011. [Online]. Available: http://media.amazonwebservices.com/AWS_Building_Faul t_Tolerant_Applications.pdf. [Accessed 11 August 2012]. [10] webopedia, "What is Dirty Data?," 2012. [Online]. Available: http://www.webopedia.com/TERM/D/dirty_data.html. [Accessed 10 August 2012]. [11] S. Das, S. Agarwa, D. Agrawal and A. E. Abbadi, "ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud," UCSB, California, 2010. [12] D. J. Abadi, "Data Management in the Cloud: Limitations and Opportunities," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 32, no. 1, pp. 3-12, 2009.
Conclusion
Cloud DBMS have to be differ from traditional DBMS due to the changes in scenarios in their respective systems. Databases for cloud system may have introduced new problems but new solutions from various researchers and companies were produced in order to avail the technology and benefits to users connected to the internet. The priority traits to be given in a cloud database is the same to the other components in a cloud system. These traits are scalability, elasticity, easy maintenance and fault tolerant.
Acknowledgements
Many thanks to Remedios Bulos for reviewing this paper and providing the guidelines and suggestions in writing this paper. Acknowledgements also for De La Salle University for having the resources for obtaining the references needed to write this paper.
References
[1] D. Agrawal, A. E. Abbadi, S. Das and A. J. Elmore, "Database Scalability, Elasticity, and Autonomy in the Cloud," in DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications, Berlin, 2011. [2] Rakesh Agrawal, et al, "Clermont Report on Database Research," 2011. [Online]. Available: http://f1.grp.yahoofs.com/v1/QEkmUCwrrJLIjoaZRbsTn5 DmvinvQYlpnMAJaTAWkvmDt5QNvWVARbTF9PuGRfKLQOjSwEtf02Cmwmm2 c_SXWnR2AGaBw/ADVANDB/Research%20Papers/The %20Claremont%20Report%20on%20Database%20Researc h.pdf. [Accessed 1 June 2012]. [3] H. Wang, J. Li, J. Wang and a. H. Gao, "Dirty Data Management in Cloud Database," in Grid and Cloud Database Management,, S. Fiore and G. Aloisio, Eds., Berlin, Springer-Verlag, 2011, pp. 133-150. [4] The Art of Service, "Data Base Management Systems (DBMS) as a Cloud Service," 10 June 2011. [Online]. Available: http://artofservice.com.au/data-basemanagement-systems-dbms-as-a-cloud-service/. [Accessed 4 August 2012]. [5] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes and a. R. E. Gruber, "Bigtable: A Distributed Storage System for