Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
Priority Qualities in Cloud Databases

Priority Qualities in Cloud Databases

Ratings: (0)|Views: 76|Likes:
Published by Nickleus Jimenezz
This paper discusses the important qualities of cloud databases. These qualities blends well with the other components in a vast cloud system. The qualities are describe to justify why they are needed in a DBMS in cloud
systems.
This paper discusses the important qualities of cloud databases. These qualities blends well with the other components in a vast cloud system. The qualities are describe to justify why they are needed in a DBMS in cloud
systems.

More info:

Categories:Types, Research
Published by: Nickleus Jimenezz on Mar 16, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/16/2013

pdf

text

original

 
Priority Qualities in Cloud Databases
Nickleus G. Jimenez
De La Salle University
nickleus_jimenz@dlsu.ph
ABSTRACT
 
This paper discusses the important qualities of clouddatabases. These qualities blends well with the other components in a vast cloud system. The qualities aredescribe to justify why they are needed in a DBMS in cloudsystems.
Categories and Subject Descriptors
 
Database, DBMS, Cloud
 
General Terms
 
Design, Management
 
1.0 Introduction
 
Cloud computing has gained popularity in deploying webapplications since it deals with delivering computing andstorage capacity which are demanded components inrunning web applications. By storing one’s data in the cloud,he/she can access it on their device connected to the internet.This allows people to see their data across their differentdevices such as desktops, laptops and smartphones they usein different locations. The presentation of the data maychange due to the browser or client application which thedata is transmitted into. However, the content in the cloud isaccessible to the internet enabled devices.
 
It was mentioned that the web applications in the cloud aredata driven so important attention is given to the DatabaseManagement System (DBMS) of these applications to keepthem functional. [1]Additionally, cloud databases must scale to large operations, provide elasticity, automate management and minimizeresource consumption. These are added to the requirementsof the system to be fault-tolerant or reliable and highlyavailable so that users can get the data when they need it. [1]The management of data in the cloud happened because of the reported benefits. One benefit reported was theeconomics of letting a company provide the software andhardware infrastructure to contain data. However, there arealso complications and new challenges database systems inthe cloud. The three management challenges cited in theClermont report were the limited human intervention, high-variance workloads and the variety of shared infrastructures.[2] Another problem to manage is dirty data in the databaseswhich is complicated by the larger scale of databases incloud computing. [3]There must be solutions to provide these qualities in DBMSsin data centers that hosts cloud services so that the users willnot be troubled of possible failures. Database research hassuggested possible improvements that can be done to aDBMSs so that they can be used in data centric cloudapplications.
 
2.0 Scalability
It was raised that traditional DBMSs are not cloud friendlysince they cannot be easily scaled unlike the other components of cloud services such as web and applicationservers. The claim is supported by the lack of tools andguidelines in scaling databases from a system with a fewmachines to a systems that uses thousands of machines tocontain them. It can be learned from here that traditionalDBMSs do not provide the capabilities to support the largescale of data cloud services have to contain and process.Traditional DBMS also have limited concurrent connectionsavailable before experiencing failure hence they may be aliability if used in a cloud system that requires thousands tomillions of concurrent connections. The large scale of dataneeded in cloud services are allocated to the gargantuanamount of possible end users of the system.
 
 Nevertheless, cloud application frameworks such as theAmazon Web Service (AWS), Microsoft Azure and GoogleApp Engine still need DBMSs for handling the data in theapplications. Cloud service providers have to use a DBMSthat can be used and expanded to handle a great amount of workload the system has to deal with. The DBMSs musthave the scalable, elastic and fault-tolerant qualities tomatch with the other components of the system they are a part of.
 
The DBMSs in cloud frameworks used in the services arenot traditional DBMSs, they used what is referred as key-values stores. It is the concept behind Cloud DBMS. [4] Itwas noted that each framework has their ownimplementation of the key-values store. One system thatuses key-store values is Bigtable which is used by Google intheir cloud systems to provide a flexible, high-performance processing in Google products such as web search andGoogle Earth. [5] In Cloud DBMSs, key-values becomeentities. Each entity is considered as an independent unit of information. As a result, the units can be freely transferredto different machines. [1] Henceforth, cloud DBMSs can runon hundreds to thousands of machines that are serving asnodes in a distributed system. The machines can be regular desktop computers that can be added to a server rack so itcan contribute in the system as a node. The collection of nodes can serve huge amounts of data that can scale into petabytes. [6] Each node maintains a subset of the wholedatabase while still being independent from one another.The key-values proved to be useful in retrieving data basedon primary keys. [7] Meanwhile, in traditional DBMSs thedatabase is treated as a whole and the DBMS must guaranteedata consistency. Another trait of key-value stores is thatatomicity of application and user access are assured at a
 
 
2
 
single-key level. [1] The characteristics of Cloud DBMSs blend well with web applications that uses cloud computingtechnology especially the scalability.
 
3.0 Elasticity
 
Cloud systems and their components including thedatabase/s must continue to run when nodes are being addedor removed. Nodes are added or removed since the scale of the system changes. The capability of changing the scale of the system dynamically without interrupting the systemfrom working is called elasticity. The problem is thattraditional DBMSs are not designed to be elastic to work onnew machines on demand without reconfiguration. Theremoval or insertion can mean physically removing or adding the machine in the system or simply turning off or onthe machine. [1] To make the databases elastic, they must beready to migrate to other clusters of machines to share theworkload without disrupt the system from running.Migrating the databases or parts of them without halting theservice is called live database migration. Efficient livemigration techniques have been suggested to improveelasticity of the DBMSs in the cloud system. One technique published is the
 Iterative Copy
which migrates the databasesin the machines on-demand and in a speed that will not haltthe system and its nodes from doing other tasks. [8]
 
Most Key-value Stores like Google’s Bigtable andAmazon’s Dynamo support migration of databases for fault-tolerance and load balancing. People from Google claimedthat Bigtable can scale from the increase in number of machines to the system as their resource demands changeover time. The capability makes Bigtable elastic indistributing structured data.
2.1 Definition of Live Database Migration
 
Live Database Migration is an operation involving themigration of parts of a DBMS while the system is running.Migration has to be done during system operations to provide elasticity and to satisfy the users who are promisedto use the system at any time. Migration must not have anysignificant impact on the entire system to be effectively usedfor elasticity. It suggest that negligible effect on performanceand minimal service interruption during migration to havequality service. [1]
 
2.2 Example of Migration Techniques
 
Stop and Copy
. It works by stopping a unit at the sourceDBMS node then move the data to the designated node. Thetechnique is simple but waiting time can be significantlylong. Furthermore, the entire database cache is lost when thecell is restarted at the destination node. This results in a highoverhead for warming up the database cache. The overheadcauses a disruption in service thus making it unsuitable for cloud systems. [8]
 
On Demand migration
. This technique transfer minimalinformation during a fast stop and copy migration. Newtransactions execute at the destination DBMS once a cell(part of the database to be migrated) is available at thedestination. This technique effectively reduces serviceinterruption but it comes with a high post migrationoverhead resulting from page faults. Another problemoccurs during recovery in the presence of failures is alsocomplicated and would require costly synchronization between the source and destination servers. This makes thetechnique unideal for cloud systems since a fault must notaffect the rest of the system from running. [8]
 
Iterative Copy
. This live database migration technique isused in a shared DBMS architecture. The persistent data of a unit which is sometimes called a partition is stored in theshared storage like a Network Access Storage (NAS) anddoes not need migration. This technique focuses ontransferring the main memory state of the partition so thatthe partition restarts ‘warm’ in the destination. The focusminimizes service interruption and guarantees lowmigration overhead making it useful in cloud systems thatneeds to satisfy lots of demand. The technique also allowsactive transactions during migration to continue executing atthe destination while being also noted to maximizeavailability. [1] The results and traits of it make it a livedatabase migration technique since it can be used in realtime while system is running. [8] The partition transferredcontains the cache database state which is referred as the DBstate and the transaction state. This gives the destinationnode enough ability to process the data as if it is being donein the source destination. [1]
 
Independent Copy
. It is reported to detach a unit/partitionin the database from the DBMS node that contain it so it can be transferred into another node. The transfer is donethrough a network like most of the techniques discussedsince cloud system are implemented as a distributed system.The owner of the unit can process the data in that unit. Thetechnique also allows the system to regularly use migrationas a tool for elastic load balancing. The only disruption thatoccurs is on the owner of the unit of data. [8]
 
Zephyr
. This live database migration technique is used in asystem designed to have nodes to use their own local storagefor storing their database partition. This can mean that amachine has its own hardisk or storage component thatcannot be accessed by other machines. It uses asynchronized phase that allows both the source anddestination to simultaneously execute transactions. As aresult, Zephyr minimizes service interruption. Afterwards, ituses a combination of on-demand asynchronous calls to push or pull data to let the source node complete theexecution of the active transactions. Meanwhile, Zephyr also allows the execution of new transactions in thedestination node. The technique can be used in a variety of DBMS implementations because it uses standard tree basedindices and lock based concurrency control. It is claimedthat there is flexibility in selecting a destination node sinceZephyr does not rely on replication in the database layer.This makes it useful in a cloud system that has a massive
 
 
3
 
amount of nodes around the world in different data centers.However, considerable performance improvement is plausible with replication. [1]`
 
3.0 Manageability of Cloud DBMS
 
Cloud DBMSs need self-management capabilities since thescale is so big administrators cannot supervise the entiredistributed system. Human administrators are limited bytheir physical presence. They cannot go to one node toanother node in a different data center. There has to be acomponent of the system that manages itself so that the burdens of the administrators can be reduced. Automatingthe management maintains the entire system which can haveinfrastructures from different data centers around the world.The data centers can have massive amounts of servers whichmakes it too difficult for administrators. [1] Having thesystem manage itself reduces the required number of  personnel to maintain the system as well as reduces the work of the administrators.
3.1 Automated System Manager
 
The automated manager system can be referred as thesystem controller. Regardless of the name, it is an intelligentcomponent in the cloud system that lessens the burden of administrative and maintenance personnel in managing amassive system as well as keeping the system runningsmoothly. [1]
 
The intelligent system controller different aspects in thedatabase system to automate it manageably. The variousaspects include the resource of a partition of the database.The resources needed to process data, the availability of thehardware, the paths to the available node for tasks, failurelogs failure plans and much more scenarios. The variousresources shared in the system must be managed in a waythat will maximize resources in doing gigantic number of tasks. [8]One example of an automated controller for cloud databasesystems is the one used in Amazon Web Service (AWS). Itis stated that it can balance load automatically so that theresources are maximized. It can also ensure that there isadequate amount of nodes for the hosted application usingthe database/s in the servers. It is also reported to scale theuse and resources allocated to a user automatically due to asupervising node. [9]
 
3.2 Roles of an Automated Manager
 
In terms of database systems, the priority role of theautomated manager in a cloud system is load distribution.This is made even more challenging by the high variantworkloads the system has to handle. Other significant rolesare for ensuring scalability and elasticity of the system. Theother capabilities of the automated manager involvesadministrating operations. These operations includemonitoring the behavior and performance of the system.Some implementations like Google Analytics can model behavior to forecast workload spikes thus enabling pro-active actions for handling those spikes. [1]
4.0 Dirty Data Management
 
Dirty data is understood as data that contains errors andirrelevance. [10] One capability that is given importance indeveloping cloud database systems is the ability to removedirty data automatically. There is a greater importance inmanaging data in a cloud database because there is a higher  possibility of dirty data in cloud systems due to the colossalamount of data processed. The system must get rid of theuseless data to make room for relevant data and maximizethe resources of the infrastructure. Unfortunately,Traditional cleaning methods are not designed for thedemands of cloud systems. Implementing data cleaning may just be a liability for system efficiency. [3]. Consequently,cloud DBMSs need a way to remove dirty data that will haveminimal impact on performance and quality of service to theusers. A data storage structure for effective and efficientdirty data management in cloud databases is given as aresearch topic. One suggestion is three-level index for thequery processing. The proper nodes that contain the dirtydata are found based on the index. Once a node is found tohave dirty data. The identified unclean data are removedfrom the system. The result from experimenting on it showsits practical capabilities in cloud databases so that dirty datacan be reduced in the system. [3]
5.0 Fault Tolerance
One of the desired qualities in cloud database systems istolerating faults gracefully and allowing the system to runeven when multiple faults occur at the same time. Faults arenormal in a colossal cloud system so the system has tomanage the faults while still running since a halt in servicecan harm user experience. [11] Fault tolerance becomes vitalin web applications because it allows the application to runat any time even when problems in the system occur hence, pleasing users around the world.
 
In the context of transaction workloads, fault tolerance isdescribed as the ability to recover from a failure withoutlosing any data or updates from recently committedtransactions. A distributed database must committransactions and make progress on a workload even during aworker node failure. [12]An example of a cloud system is the AWS. Amazon claimsthat failures in their web service can be dealt withautomatically without disrupting the cloud applications theyare hosting as if the failure did not occur. Simple DB is theDBMS used by the system. It is claimed to have features thatmake it a fault-tolerant and durable DBMS. It becomes faulttolerant since data is stored redundantly without single points of failure. When one node fails, another copy fromanother node is then used for the needed process. This might

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->