This action might not be possible to undo. Are you sure you want to continue?
Lizhe Wang and Gregor von Laszewski Service Oriented Cyberinfrastruture Lab, Rochester Institute of Technology 102 Lomb Memorial Drive, Rochester, NY 14623-5608 Email: Lizhe.Wang@gmail.com, email@example.com
1 2 Introduction 2 Anatomy of Cloud Computing 3 2.1 Deﬁnition of Cloud computing . . . . . . 3 2.2 Functional aspects of Cloud computing . 3 2.3 Why is Cloud computing distinct? . . . . 4 2.4 Enabling technologies behind Cloud computing 4 Related Work 6 3.1 Globus virtual workspace and Nimbus . . 6 3.2 Violin: Virtual Inter-networking on Overlay Infrastructure 3.3 The In-VIGO system . . . . . . . . . . . 6 3.4 Virtuoso: a system for virtual machine marketplaces 7 3.5 COD: Cluster-on-Demand . . . . . . . . 7 3.6 OpenNEbula . . . . . . . . . . . . . . . 8 3.7 Amazon Elastic Compute Cloud . . . . . 8 3.8 Eucalyptus . . . . . . . . . . . . . . . . . 9 Cloud Computing and Grid computing: a Comparative Study
Cumulus: a Scientiﬁc Computing Cloud 10 5.1 Overview . . . . . . . . . . . . . . . . . 10 5.2 Cumulus frontend: re-engineering of the Globus Virtual Workspace Service 10 5.3 OpenNEbula as the Cumulus backend . . 11 5.4 Communication between the Cumulus frontend and the OpenNEbula 11 5.5 OS Farm for virtual machine image management 12 5.6 Networking solution . . . . . . . . . . . 13 On-demand Cyberinfrastructure Provisioning 6.1 overview . . . . . . . . . . . . . . . . . . 6.2 System Design . . . . . . . . . . . . . . 6.3 The Control Service . . . . . . . . . . . . 6.4 The Cyberaide Toolkit . . . . . . . . . . Conclusion and outlook 14 14 14 15 15 15
Figure 1: Cloud computing in Google trends Numerous projects within industry and academia have already started.g. and OpenNEbula . IBM’s Blue Cloud . research and education . NY 14623-5608 Email: Lizhe. We compare Grid computing and Cloud computing in Section 4.Scientiﬁc Cloud Computing: Early Deﬁnition and Experience Lizhe Wang and Gregor von Laszewski Service Oriented Cyberinfrastruture Lab. Web 2. multi-data center. Amazon Elastic Compute Cloud . enabling technology and typical applications. has already outpaced the Grid computing  (the red line). and middleware. customized and QoS guaranteed dynamic computing environments for end-users. HP. characterization and enabling technologies.0 and Service Oriented Computing. In this paper. e. for example. for example the RESERVOIR project  – an IBM and European Union joint research initiative for Cloud computing. The remaining parts of this paper is organized as follows. They work on Cloud computing from different viewpoints. Base on our experience. which is enabled by virtualization technology (the yellow line). QoS guaranteed computing environments and conﬁgurable software services. In this paper we present a scientiﬁc Cloud computing project. As reported in the Google trends shown in Figure 1. This paper reviews our early experience of Cloud computing based on the Cumulus project on compute centers. open source Cloud computing test bed for industry. which would ﬁnally justify the concept of Cloud computing. laszewski@gmail. Grid computing. In Section 5 the a scientiﬁc Cloud computing 2 1 Introduction The Cloud computing. recently announced the creation of a global. currently emerges as a hot topic due to its abilities to offer ﬂexible dynamic IT infrastructures.com. the Cloud computing (the blue line). We also give comparison of Grid computing and Cloud computing as we are interested in building scientiﬁc Clouds with existing Grid infrastructures. software engineering and database. . • Existing computing Clouds still lack large scale deployment and usage. Section 2 discusses the concept of Cloud computing in terms of deﬁnition. Intel Corporation and Yahoo! Inc. Rochester Institute of Technology 102 Lomb Memorial Drive. • Technologies which enable the Cloud computing are still evolving and progressing.. which was coined in late of 2007. scientiﬁc Cloud projects such as Nimbus  and Stratus . which is intended to built a scientiﬁc cloud for data centers by merging existing Grid infrastructures with new Cloud technologies. infrastructure. we attempt to contribute the concept of Cloud computing: definition. Rochester. There are still no widely accepted deﬁnitions for the Cloud computing albeit the Cloud computing practice has attracted much attention. we introduce the Cumulus project with its various aspects. functionality.Wang@gmail. 2008 Abstract The Cloud computing emerges as a new computing paradigm which aims to provide reliable. such as design pattern. Several reasons lead into this situation: • Cloud computing involves researchers and engineers from various backgrounds.com October 26. An sample use scenario is also shown to justify the concept of Cumulus project. Related research efforts are reviewed in Section 3.
• DaaS: Data as a Service Data in various formats and from multiple sources could be accessed via services by users on the network.between the services. Google Docs  and Adobe Buzzword . conﬁguration and updates. from anywhere on the Web. and data access demands. Amazon Simple Storage Service (S3)  provides a simple Web services interface that can be used to store and retrieve. any amount of data. we propose an early deﬁnition of Cloud computing as follows: A computing Cloud is a set of network enabled services.2 Functional aspects of Cloud computing Conceptually. or even an entire data enables users to build Web applications with Google’s center. as a pay-as-you-go subscription service.. the users with services to access hardware. interesting example of the PaaS. Based on our experience. scalable and manageable to meet power the Google applications. Google’s Chrome browser  gives an interesting SaaS scenario: a new desktop could be offered. IBM’s Blue Cloud project . data and programs were mostly located in local resources. in a transparent way: scribe to their favorite computing platforms with require• HaaS: Hardware as a Service ments of hardware conﬁguration. The Google App Engine  is an ization. SaaS therefore alleviates the customer’s burden of software maintenance. ElasticDrive  is a distributed remote storage application which allows users to mount a remote storage resource such as Amazon S3 as a local storage device. users acquire computing platforms or IT infrastructures from computing Clouds and then run their applications inside. Certainly currently the Cloud computing paradigm is not a recurrence of the history. manipulate the remote data just like operate on a local disk or access the data in a semantic way in the Internet. Section 7 concludes the paper. Examples could be found at Amazon EC2 . Computing resources and other hardware are prone to be outdated very soon. software and data Cloud computing in addition can deliver the Platform as a resources. 2. The DaaS could also be found at some popular IT services. The Google App Engine users could buy IT hardware. • SaaS: Software as a Service Software or an application is hosted as a service and provided to customers across the Internet. declared by Amazon. At the current stage. software installation Hardware as a Service was coined possibly in 2006. Microsoft’s “Software + Service”  shows another example: a combination of local software and Internet services interacting with one another. through which applications can be delivered (either locally or remotely) in addition to the traditional Web browsing experience. which HaaS is ﬂexible.project is presented and a typical use scenario is shown in Section 6. Users have to manage various software installations. This mode eliminates the need to install and run the application on the customer’s local computers. The APIs and SDKs across the same scalable systems. This is again a central processing use case. 3 .g. normally personalized. at any time. and reduces the expense of software purchases by on-demand pricing. Users thus can on-demand suba service. QoS guaranteed. Therefore outsourcing computing platforms is a smart solution for users to handle complex IT infrastructures. inexpensive computing platforms on demand. your needs . Similar scenario occurred around 50 years ago: a time-sharing computing server served multiple users. providing scalable. 50 years ago we had to adopt the time-sharing servers due to limited computing resources. An early example of the SaaS is the Application Service Provider (ASP) . 2 Anatomy of Cloud Computing 2. e. Until 20 years ago when personal computers came to us. the Cloud computing is still evolving and there exists no widely accepted deﬁnition. Eucalyptus  and Enomalism . Figure 2 shows the relationship As the result of rapid advances in hardware virtual. Nowadays the Cloud computing comes into fashion due to the need to build complex IT infrastructures.1 Deﬁnition of Cloud computing Cloud computing is becoming one of the next IT industry buzz words: users move out their data and applications to the remote “Cloud” and then access them in a simple and pervasive way. Therefore. The ASP approach provides subscriptions to software that is hosted or delivered over the Internet. which could be accessed in a simple and pervasive way. computing Clouds render Based on the support of the HaaS. for example. thereafter an integrated computing platform as Service (PaaS) for users. Nimbus . IT automation and usage metering & pricing. Users could. SaaS and DaaS.
compiler and opcomputing erating system.Application for users on demand.g. Hardware. The computing platforms should be ﬂexible to adapt to various requirements of a potentially large number of users. Cloud services and computing platforms offered by computing Clouds could be scaled across various concerns. Virtual network advances. for example. Virtualization techniques are the bases of the Cloud computing since they render ﬂexible and scalable hardware services. 2. support users with a customized network environment to access Cloud resources. or other attributes of the service like billing and even penalties in the case of violation of the SLA. performance. network conﬁguration. Virtual machine techniques. as users usually own administrative privileges. In fact. In other words. . . operation. software conﬁgurations.. In detail. Internet Computing  in the following aspects: • User-centric interfaces Cloud services should be accessed with simple and pervasive methods. For example. • Scalability and ﬂexibility The scalability and ﬂexibility are the most important features that drive the emergence of the Cloud computing. This feature differs Cloud computing from Grid computing as Grid users have A number of enabling technologies contribute to Cloud to learn new Grid commands & APIs to access computing. offer virtualized ITinfrastructures on demand. the Nimbus Cloudkit client size is around 15MB. such as VMware  and Xen . • On-demand service provisioning Computing Clouds provide resources and services 4 • Virtualization technology Virtualization technologies partition hardware and thus provide ﬂexible and scalable computing platforms. • QoS guaranteed offer The computing environments provided by computing Clouds can guarantee QoS for users. hardware performance. e. ﬁnally rendered to users.The Cloud interfaces do not force users to change their working habits and environments. water. programming language.4 Enabling technologies behind Cloud e. orchestrated and consolidated to present a single platform image. several state-of-the-art techniques are identiﬁed here: Grid resources & services. • Autonomous System The computing Cloud is an autonomous system and it is managed transparently to users. . such as VPN . serviceability. software installation. natural gas.. Users can customize and personalize their computing environments later on. I/O bandwidth and memory size. or telephone network).The Cloud client software which is required to be installed locally is lightweight.3 Why is Cloud computing distinct? The Cloud computing distinguishes itself from other computing paradigms. like Grid computing . the Cloud services enjoy the following features: . such as geographical locations. users obtain and employ computing platforms in computing Clouds as easily as they access a traditional public utility (such as electricity.g. hardware performance like CPU speed. PaaS SaaS HaaS DaaS Cloud resoruce Scientific Cloud Figure 2: Cloud functionalities 2. the Cloud computing adopts the concept of Utility computing. The computing Cloud renders QoS in general by processing Service Level Agrement (SLA) with users – a negotiation on the levels of availability.Cloud interfaces are location independent and can be accessed by some well established interfaces like Web services framework and Internet browser. Global computing . software and data inside clouds can be automatically reconﬁgured.
A Mashup  is a Web application that combines data from more than one source into a single integrated storage tool. The essential idea behind Web 2.REST.g. The data storage could be migrated. Computing Clouds therefore should be able to automatically orchestrate services from different sources and of different types to form a service ﬂow or a workﬂow transparently and dynamically for users. Virtual Data System (VDS)  is good reference. . . merging content from different sources.Wiki to support user-generated content. aggregation and notiﬁcation of Web data with RSS or Atom feeds. .Mashups. allowing the upload of an unlimited number of photos for all account types. It implements the MapReduce paradigm and provides a distributed ﬁle system – the . providing a published API which allows programmers to create new functionality. and supporting XML-based RSS and Atom feeds.Folksonomies (collaborative tagging. the Cloud programming model. • Web 2. indexing & social tagging). . merged. social classiﬁcation. .Innovative Web development techniques such as Ajax. The Map-Reduce-Merge  method evolves the MapReduce paradigm by adding a “merge” operation. however. client. It is thus a natural technical evolution that the Cloud computing adopts the Web 2.CSS to sperate of presentation and content. The MapReduce [6. which is a digital photo sharing Web site.0 applications typically include some of the following features/techniques: .0 is to improve the interconnectivity and interactivity of Web applications.Tools to manage users’ privacy on the Internet. .0 technique. data centers). Some Cloud programming models should be proposed for users to adapt to the Cloud infrastructure. . which follow the industry standards such as WSDL .XHTML and HTML markup.• Orchestration of service ﬂow and workﬂow Computing Clouds offer a complete set of service templates on demand. collaboration and functionality of the Web .Semantic Web technologies. • World-wide distributed storage system A Cloud storage model should foresee: . information sharing. thus making them available on various distributed platforms and could be further accessed across the Internet.A network storage system. • Programming model Users drive into the computing Cloud with data and applications. . Web 2. and managed transparently to end users for whatever data formats. The SmugMug  is an example of Mashup. . which is backed by distributed storage providers (e. A set of Cloud services furthermore could be used in a SOA application environment. The MapReduce model ﬁrstly involves applying a “map” operation to some data records – a set of key/value pairs. 7] is a programming model and an associated implementation for processing and generating large data sets across the Google worldwide infrastructures. Users could locate data sources in a large distributed environment by the logical name instead of physical locations. Hadoop  is a framework for running applications on large clusters built of commodity hardware..Syndication.A distributed data system which provides data sources accessed in a semantic way. . The new paradigm to develop and access Web applications enables users access the Web more easily and efﬁciently.and server-side.0 Web 2. Cloud computing services in 5 nature are Web applications which render desirable computing services on demand. • Web service and SOA Computing Cloud services are normally exposed as Web services. For the simplicity and easy access of Cloud services.0 is an emerging technology describing the innovative trends of using World Wide Web technology and Web design that aims to enhance creativity. and . XML and JSON-based APIs.Weblog publishing tools. and then processes a “reduce” operation to all the values that shared the same key. SOAP  and UDDI . should not be too complex or too innovative for end users. The services organization and orchestration inside Clouds could be managed in a Service Oriented Architecture (SOA). Examples are Google File System  and Amazon S3 . which could be composed by services inside the computing Cloud. offers storage capacity for users to lease.
operating system. and libraries. model: 6 . • Status Service offers the interface through which a client can query the usage data that the service has collected about it. middleware systems federate called create. The workspace could be inspected and managed through the Workspace Service opera. WSRF resource. Figure 3: Virtual workspace service  • customization of the virtual network topology & serA virtual workspace [17. a cloudkit named Nimbus  has been developed to build scientiﬁc Clouds. spans across multiple network do• The Workspace Factory Service has one operation mains. the physical infrastructure.3. 3 Related Work This section reviews current projects on Cloud computing and virtualized distributed systems. and as well as software conﬁguration aspects of the environment (for instance operating system installation or pro• containment of negative impact by isolating virtual vided services). workspace metadata and a deployment request for The top layer consists of two mutually isolated Violins.2 Violin: Virtual Inter-networking on Overlay Infrastructure Violin [15. users could: • browse virtual machine images inside the Cloud. The virtual workspace services provide the following on top of a shared computing infrastructure. with heterogeneous access interfaces [17. and • query virtual machine status. Intel and HP [5. The In-VIGO system  contains three layers of virtu• Group Service allows an authorized client to manage alization in addition to the traditional Grid computing a set of workspaces at one time. under which ment on deployment (for example processor or memory) an application was originally developed. operating system services.18] is an abstraction of a comvices. • deploy virtual machines. authorized clients via invoking Grid services. With Nimbus client. client to dynamically deploy and manage a computing enFigure 4 shows a multi-layer overview of two Violins vironment.1 Globus virtual workspace and Nimbus Metadata instance Deploment Request manage with group EPR inspect & manage one workspace EPR notify creat() authorize. 38].Hadoop Distributed File System. In the middle layer. Based on Globus virtual workspace services. schedule. which creates high-order virtual IP networks and offers some interesting advantages: • on-demand creation of both virtual machines and the virtual IP network which connects virtual machines. 3. and ﬁnally access the virtual machine. packages. and application serputing environment which are dynamically available to vices. The ab• achieving binary compatibility by creating the identistraction assigns resource quota to the execution environﬁed runtime and network environment. the workspace is represented as a running inside it. and application services customized for the application • After it is created. The MapReduce and the Hadoop are adopted by recently created international Cloud computing project of Yahoo!. 18] (see also Figure 3): networked resources. instantiate Workspace factory service Workspace group service Workspace service instantiate Group resource instance Workspace resource instance Workspace resource instance Workspace resource instance Deployment RP 3. The Workspace Service allows a Globus network address space. In the bottom layer. • submit their own virtual machine images to the clouds. Each Violin contains its own network.3 The In-VIGO system tions . which has two required parameters: the networked resources to form a shared infrastructure.16] is a virtual network technology. that metadata.
user speciﬁed software environment and a private network space. When a conﬁguration 7 . A virtual network.Figure 5: The In-VIGO approach  Figure 4: Multiple Violins on a shared Grid infrastructure  3. links the resource consumers virtual machines together efﬁciently and makes them appear. VNET . 3. Therefore. associated with user accounts and storage resources.5 COD: Cluster-on-Demand The Cluster-on-Demand (COD) [3. compiler or software libraries. virtual machines. The resource consumer could install and conﬁgure whatever software demanded on the virtual machine. the user acquires a remote virtual machine. virtual data. for instance. 14] project implements a virtual cluster and with the objective to separate the cluster usage from the cluster management. this layer manages jobs across administrative domains. • The ﬁrst virtualization layer consists of virtual resource pools. Primitive components of a virtual computing Grid are located in this layer. In the Virtuoso system. for example. memory size. This layer. overlay topology and routing rule manipulation. managing the execution of the underlying Grid applications is decoupled from the process of utilizing and composing services. operating system. and resource reservations to optimize the performance of a distributed or parallel application running in the resource consumer’s virtual machines. which can be connected as needed to create virtual Grids. Resource providers could sell their resources to consumers in the form of virtual machine and virtual network. VNET is an adaptive virtual network that can use inter virtual machine trafﬁc pattern inference. therefore decouples the process of generating interfaces of services from the process of delivering them on speciﬁc devices. physical machines and local software conﬁgurations. the third layer aggregates Grid services by exporting their virtualized interfaces. virtual machine migration. Various Grid computing methodologies can be adopted to execute Grid applications. and data storage resources. The implementation details of these mechanisms are hidden when the Grid enabled applications are encapsulated and composed as Grid services. The virtual cluster consists a subset of cluster nodes conﬁgured for some common purpose. virtual applications and virtual networks. In contrast to the traditional process of allocating applications to resources. • Grid applications in the second layer are wrapped as Grid services. • In order to enable displaying by different access devices. conﬁgured with desired processor type.4 Virtuoso: a system for virtual machine marketplaces The Virtuoso system  is developed by Northwestern University and aims to create a marketplace for resource usage. to be directly connected to the resource consumer’s local network environment. This means that tasks are mapped to virtual resources and only virtual resources are managed across domains and computing infrastructures. regardless of wherever they currently are.
Communications between frontend and back-ends employ SSH.7 Amazon Elastic Compute Cloud The Amazon Elastic Compute Cloud (EC2)  provides Web service interfaces which deliver resizable compute capacities to users in the compute cloud. reliable and fast repository to store user’s virtual machine images and data objects. The front-end provides users with access interfaces and management functions. decoupling the server. • upload the AMI into Amazon Simple Storage Service(S3). or use Amazon pre-conﬁgured. with automatic application to a cluster node.6 OpenNEbula The OpenNEbula (former GridHypervisor) is a virtual infrastructure engine that enables the dynamic deployment and re-allocation of virtual machines in a pool of physical resources. • employ Amazon EC2 Web service to conﬁgure security and network access. OpenNEbula contains one frontend and multiple back- . Users could rent and conﬁgure virtual machines and data storage easily and use the computing capacities remotely. thus makes it easy to redeploy cluster nodes among user communities and software environments under programmatic control. It is declared that the virtual cluster offers an opportunity to employ cluster resources more efﬁciently while providing better services to users. Two levels of administration are introduced in the virtual cluster. Figure 8: The OpenNebula architecture  3. libraries. 8 Figure 7: Cluster on Demand  3. The OpenNEbula gives users a single access point to deploy virtual machines on a locally distributed infrastructure. In more detail with Amazon EC2 users can: • create an Amazon Machine Image (AMI) which contains applications. Amazon S3 provides a safe. The back-ends are installed on Xen servers.ends. not only from the physical infrastructure but also from the physical location . templated images to start up and run immediately. where Xen hypervisors are started and virtual machines could be backed. data and associated conﬁguration settings. The OpenNEbula system extends the beneﬁts of virtualization platforms from a single physical resource to a pool of resources. Figure 6: A high level overview of Virtuoso system  is deﬁned.
for example. Cloud service orchestration. Cloud computing.. deployment and management. operated under central control. It is interesting to develop computing Clouds on the existing . 4 Cloud Computing and Grid computing: a Comparative Study This section is devoted to compare the Cloud computing and the Grid computing in various aspects. security control and monitoring & discovery. Unicore . oriented from high performance distributed computing. middleware and applications. The Grid computing emphasizes the resource side by making huge efforts to build an independent and complete distributed system. like Google and Amazon. The Grid community has established well-deﬁned industry standards for Grid middleware. from the viewpoint of users. follows an application-driven model. We make a comparative study in following aspects: • start. access interfaces and management policies. in general contain homogeneous resources.Grid infrastructure normally contains heterogeneous resources.Grid infrastructure in nature is a decentralized system. • Accessibility and application 9 cloud controller cluster controller node controller Figure 9: The Eucalyptus architecture Figure 9 shows the EUCALYPTUS architecture.8 Eucalyptus The EUCALYPTUS (Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems) project  from UCSD is an open-source software infrastructure for implementing Elastic/Utility/Cloud computing using computing clusters and/or workstation farms. infrastructures. Amazon EC2 interface client side API translator Grid computing. • Deﬁnition 3. simple installation. Globus Toolkit  and gLite . • Middleware Grid computing has brought up full-ﬂedged middleware.Grid infrastructures to get advantages of Grid middleware cute. such as hardware/software conﬁgurations. which is oriented towards the industry. shutdown. which spans across geographically distributed sites and lack central control. WSRF . distributed virtual machine management. or the Cloud operating system.• choose the type of instance that users want to exe. computing Clouds operate like a central compute server with single access point. is still underdeveloped and lacks of standards. The EUCALYPTUS provides Amazon EC2’s compatible interfaces. such as compatible interfaces with Amazon’s EC2. for example. On the contrary.g. Cloud computing provides usercentric functionalities and services for users to build customized computing environments. Cloud infrastructures could span several computing centers. . Multiple Rock clusters serve as backends of the EUCALYPTUS cloud. and monitor the instances of user’s AMI using Web service APIs. The Grid middleware could offer basic functionalities like resource management. support virtual privet network for users. such as deﬁnitions. • Infrastructure Grid infrastructure enjoys the following features: . The EUCALYPTUS implements mediator services to translate user’s requirements to Cloud back ends: the client-side API translator and the cloud controller. and and applications. The middleware for Cloud computing. and distributed storage management. EUCALYPTUS enjoys several interesting features. e. which demand cluster controllers and node controllers. aims to share distributed computing resource for remote job execution and for large scale problem solving. A number of research issues remain unsolved.
A recent famous example is LCG project  to process huge data generated from the Large Hadron Collider (LHC) at CERN. However inexperienced users still ﬁnd difﬁculties to adapt their applications to Grid computing. It would be a good solution to build Cloud computing services on Grid infrastructure and middleware. pervasive. • The Cumulus frontend can also delegate users’ requirements to other computing Clouds. Although Amazon EC2 has obtained a beautiful success. we use a third party software – the OpenNebula as an example here. on the contrary. The OpenNEbula communicates with the Cu. the GVWS control agents. • OS Farm The OS Farm  is a Web server for generating and storing Xen virtual machine images and Virtual Appliances. thus providing users with satisfying services.2 Cumulus frontend: re-engineering of the Globus Virtual Workspace Service . such as QoS guarantee and computing environment customization. middleware and application experience. The OS Farm is used in the Cumulus system as a tool for virtual machine template management. This design pattern could bring the scalability and keep the autonomy of compute centers: • The Cumulus frontend does not depend on any speciﬁc LVMS.contains two pieces of software: the GVWS frontend and mulus frontend via SSH or XML-RPC.Grid computing has the ambitious objective to offer dependable. which intends to provide virtual computing platforms for scientiﬁc and engineering applications. more killer applications are still needed to justify the Cloud computing paradigm. Next section discusses a scientiﬁc Cloud computing prototype based on the existing Grid infrastructure. and inexpensive access to high-end computational capabilities. consistent.1 Overview The Cumulus project is an on-going Cloud computing project in the lab of Service Oriented Cyberinfrastructure.The Globus Virtual Workspace Service (GVWS)  ends. it is not easy to get a performance guarantee from computational Grids. it could be concluded that Grid computing has established well organized infrastructures. Based on the forementioned discussion. HTTP/HTTPS Cumulus frontend SSH/XML−RPC OpenNEbula SSH OS Farm server VM VM VM VM VM VM Xen Hypervisor Host Xen Hypervisor Host Xen Hypervisor Host Oracle File System virtual network domain physical network domain Figure 10: Cumulus architecture • The OpenNEbula contains one frontend and multiple backends. Grid computing has gained numerous successful stories in many application ﬁelds. scalable and QoS guaranteed computing environments for users with an easy and pervasive access. We design the Cumulus system in a layered architecture (see also Figure 10): • The Cumulus frontend service resides on the access point of a computing center and accepts users’ requirements of virtual machine operations. Cloud computing recently outpaces Grid computing because it could offer computing environments with desirable features. Cloud computing. • The OpenNEbula works as Local Virtualization Management System (LVMS) on the Cumulus back. 5 Cumulus: a Scientiﬁc Computing Cloud 5. Furthermore. 10 5. could offer customized. The Cumulus project currently is running on distributed IBM blade servers with Oracle Linux and the Xen hypervisors. • Compute centers inside computing Clouds could deﬁne their own resource management policies such as network solution or virtual machine allocation. The OpenNEbula frontend communicates to its backends and the Xen hypervisors on the backends via SSH for virtual machine manipulation.
The GVWS control agent is installed on every backend server and communicates with the the Xen server in order to deploy a virtual workspace. to pause. Therefore the ﬁrst attempt to communicate the Cumulus frontend with the Open• Support a new networking solution – the forward NEbula via SSH. resume and rently the OpenNEbula employs the NIS (Network Inforreboot the just created virtual machine.4 Communication between the Cumulus frontend and the OpenNEbula The communication between the Cumulus frontend and the OpenNEbula service could be either SSH (Secure Shell) or XML-RPC: the Cumulus frontend requires virtual machine operations for example by remotely invoking the OpenNEbula commands via SSH or by sendThe GVWS is re-engineered to adapt it as the Cumulus ing XML-RPC messages via the OpenNEbula APIs. To employ the OpenNEbula in a more professional way. Users only demand the virtual machine network address or host name so that they can ﬁnally access the virtual machines across the network. users input no network is not supported by current OpenNEbula implemenconﬁguration information. which provides virtual machine deployment. Curtend for later use. for example. oneadmin. the NFS (Network File System) for virtual machine image management. the OpenLDAP software is used to implement the user’s directory. to its backends via SSH. However some limitations of the GVWS are identiﬁed : • Computer centers in general run their own speciﬁc LVMS. It identiﬁes the virtual machines running on the OpenNEbula with an identiﬁcation number when a virtual machine is started in the context of 5. The Openmation System) to manage a common user system and NEbula uses the terminal as the standard output to 11 . mands. However it has been widely recognized that the NIS has a major security ﬂaw: it leaves the users’ password ﬁle accessible by anyone on the entire network. The shared space is used to store virtual machine images and to allow the OpenNEbula system to behave as single virtual machine hypervisor. like VMware Infrastructure. thus allowing the GVWS to work with vari• communication with SSH ous virtual machine hypervisors and LVMS. A common user.The GVWS frontend receives the virtual machine requirements and distributes them to various backend servers via SSH. the backend servers allotation. we merged the OpenNEbula with some modern secure infrastructure solutions like the Lightweight Directory Access Protocol (LDAP)  and the Oracle Cluster File System (OCFS)  or the Andrew File System (AFS) . • The GVWS provides three network settings: the AcceptAndConﬁgure mode. the GVWS frontend issues its commands Infrastructure. agent.3 OpenNEbula as the Cumulus backend the OpenNEbula. VMware server and VMware Virtual Originally. The OpenNEbula frontend and the backends share directories using the OCFS. For frontend service. This paradigm would be more reasonable considering that the Cloud infrastructure might span across multiple network domains. However. In the OpenNEbula scenario. We suggest that the network conﬁgurations and solutions are transparent to Cloud users and they only controlled by the Cloud backends. 5. A directory is a set of objects with similar attributes organized in a logical and hierarchical manner. In principal the GVWS frontend could be employed as a Cloud frontend service. such as OpenNEbula. In the forward mode. The OpenNebula Command Line interface cate the IP address for the virtual machine and return (CLI) is the interface used when issuing SSH comit to users. the GVWS demands to install the workspace control agents directly on the backend servers. This comprises the following steps: other LVMS. The NFS has some minor performance issues and it does not support concurrency. to manage their local infrastructures. is authorized on the OpenNEbula frontend and all the OpenNEbula backends. like OpenNEbula or VMware Virtual Infrastructure. This use scenario provided by the GVWS lacks of the generality. As a consequence this identiﬁcaThe OpenNEbula is used to manage the local distributed tion number should be returned to the Cumulus fronservers. However users generally do not care about network conﬁgurations for virtual machines. The LDAP is an application protocol for querying and modifying directory services over TCP/IP. the AllocateAndConﬁgure mode and the Advisory mode. communication methods should be developed and plugged into the Cumu• Extend the GVWS frontend and remove the control lus frontend. However the SSH communication mode.
The advance • the image layer is on top of core layer and provides request gives the possibility to add yum packages to user-deﬁned software in addition to the core softthe virtual machine image. The OS Farm  is an shows the layers of virtual machine image generation: application written in Python to generate virtual machine images and virtual appliances used with the Xen VMM. the XMLRPC call returns a parameter with the identiﬁcation number of the created virtual machine. <architecture>i386</architecture> <package>emacs</package> <package>unzip</package> <group>Base</group> <group>Core</group> </image> • HTTP interface The OS Farm also provides the HTTP interface. lows the user to choose packages from a list of predeﬁned yum repositories. • communication with XML-RPC The OpenNEbula exposes the XML-RPC interfaces for users to access its services. which <image> thereafter sends HTTP requests to the OS Farm server. simplest method is simple request which allows users to select image class and architecture. For example. The appliances. following Unix conventions. can only get the return code after it is executed. • The core layer is a small functional image with a Two methods are provided for users to connect to the OS minimal set of software which is either required to Farm server: run the image on a VMM or required to satisfy higher level software dependencies. When Cumulus users send virtual machine requirements to the Cumulus below: frontend. The generation of a virtual machine The Cumulus can generate virtual machine images for image is divided into three layers or stages.print the identiﬁcation number. It cannot get the command line output. Support for request via XML descriptions is scenario. the SSH communication is not supported in the current implementation context. Below is a sample usage of the HTTP interface with the wget command: wget http://www.fzk. which is Script and XML.5 OS Farm for virtual machine image management The images in the OS Farm are generated in layers which can be cached. <name>myvm</name> The virtual machine images generated by the OS Farm are transferred to the Cumulus virtual machine repository. A drop down menu alware.tar& group=core& group=base 5. users can use the wget to access the HTTP interface of the OS Farm server. Figure 11 users by using the OS Farm. This method in some aspects is superior to the previous CLI implementation since easy recovery of the virtual machine’s identiﬁcation number is possible. A sample XML description is shown tend to generate virtual machine images. The SSH program. which is needed to be parsed to obtain virtual machine’s identiﬁcation number.de/osfarm/create? name=& transfer=http& class=slc_old& arch=i386& filetype=. • Web interface • The base layer provides some additional software The Web interface provides various methods to enneeded in order to satisfy requirements for virtual able users to request virtual machine images. a further list is returned which alan image with an extra set of rules aimed to allow lows users to select the corresponding yum packthe image to satisfy requirements of the deployment ages. Upon creation of virtual machines. An image description can be uploaded An OS Farm client is embedded in the Cumulus fronthrough the Web interface of the OS Farm server as an XML ﬁle. The OpenNebula’s features could therefore be accessed from the client programs. Using asynchronous Java • A subclass of image is virtual appliance. Also all other functions return the status of the executed commands and therefore a better control and interaction with the OpenNebula server are possible. the frontend invoke the OS Farm client. Therefore. <class>slc4_old</class> 12 . also available.
The administrator of the backend is only required to declare in the ﬁle the available IPs. Users do not have to specify the network conﬁguration. the ﬁrst available IP address in the ﬁle will be taken and be parsed as a Xen parameter. Finally the client obtains the virtual machine image from the OS Farm server and stores it in the Cumulus 13 . a wget URL could be built like this: wgeturl = osfarmserver + "/create?name=" +name + "& transfer =" + transferprotocol + "& class =" + class + "& arch =" + arch + "& filetype =" + filetype 3. The client then constructs a HTTP request and send it to the OS Farm server. The client gets virtual machine parameters from the Cumulus frontend. Then users could manipulate virtual machines after they are stared by the Cumulus backends (see also Figure 12).URL class. slc4 old.virtual appliance base image core Figure 11: Layers of the OS Farm image generation which is implemented via the OCFS. as deducted from the name.6 Networking solution We provide a new networking solution – the forward mode. It is demanded to read and write the coming data from the OS Farm server simultaneously due to the large size of the virtual machine image ﬁles (200MB – 2000MB). DNS and broadcast. The backend servers could implement the network solution for the incoming virtual machine requests. transfer protocol. When virtual machine requirements arrive at the Cumulus frontend. normally the IP address or the host name of the virtual machine. such as virtual machine name. the frontend just forwards the requirements to the Cumulus backend. This solution already provides functions to avoid reusing an already consumed IP address and assures that a new IP address is used for every new virtual machine. such as gateway. Other networking parameters. The forward mode could play an important role when the Cloud infrastructure span across multiple network domains. three methods to allocate virtual machine networking are implemented in the Cumulus backend: • lease IP addresses from an IP address pool In this solution. the host name parameter virtual machine operations Cumulus virtual machine images Figure 12: Using OS Farm in the Cumulus The OS Farm client is implemented as follows: 1. virtual machine image ﬁle format. core” 2. virtual machine architecture. • identify a virtual machine with the host name In this implementation. subnet mask. a ﬁle named “ipPool” stores. The frontend thereafter again forwards the conﬁgurations to users. which is normally around 128 MB. 5. it is available that on virtual machine’s creation the virtual machine is conﬁgured with the provided IP address. A connection is formed between the OS Farm server and the client with the help of java. can be speciﬁed in OpenNebula’s conﬁguration ﬁles. virtual machine class name. It is not possible to store such large image ﬁles in the default heap space of the Java virtual machine. The backend sets the virtual machine network conﬁguration and returns the network conﬁgurations to the frontend.tar. Finally users could access the virtual machine via the network conﬁgurations. when the fontend submits the virtual machine to the backend. Cumulus user virtual machine requirements Cumulus frontend invocation OS Farm client OS Farm server HTTP requests virtual machine repository. In this context. For example. i386. a pool of IP addresses. Since the OpenNebula (via Xen) already supports setting a virtual machine IP address.net. An example of these parameters are “myvm. and image generation layer. Speciﬁcally. http. .
This • get IPs from a central DHCP server In this implementation. The deployment process of the syswould like to add. for example. however. A possible drawback related 3 interfaces to this solution is its dependence on certain speciﬁc Cyberaide networking conﬁguration. also Figure 13). the problem here is how to the backend resource provisions. Users could also build a distributed is demanded to ask the OpenNebula to directly subcomputing middleware. lis. ﬂexvirtual machines to the backend. Then the virtual machine is started. re-using existing Grid testbed. This variable is • Users could demand resource in term of performance used to directly pass to the Xen server.. memory size chine’s creation. returns it to the frontend. which could decouple the conglutination between frontend. where they work with their apchine. It puting portal.2 System Design end can return the host name to users for later referThe system is designed in a three-layered architecture (see ence.6.g. on virtual marequirements. Since every virtual ma1 chine has its own unique MAC address. is invisible to users. for example. since the implementation already generates a unique host name on virtual machine’s creation time. However. Depending on administoolkit Cyberaide trator’s policies of the backend. some queries to the functionalities DHCP server might be banned. The back.Figure 13: On-demand cyberinfrastructure provisioning mand. the backend would forward the 5 2 IP address to the frontend. e. so that user could access access the virtual machine. when the frontend sends design pattern could bring beneﬁts like scalability. this solution permits tem. then returns it to the frontend for later refplications. which is then deployed on conﬁgurations.1 overview existing Grid testbed This section discusses an interesting topic of Cloud computing: build a cyberinfrastructure for Cloud users on de. 4 6 On-demand Cyberinfrastructure Provisioning 6. the OpenNebula is not yet capabe a scientiﬁc computing workbench or user comble of asking the Xen server to set a host name. The user computing environment could erence. When the • Users controls can customized the access interface frontend sends a virtual machine to the backend. CPU bandwidth. such as an existing Grid get the IP address of the started virtual machine and testbed. 26] in that: 1 Cloud users access the control service to require 14 . one solution control is to query the DHCP server to obtain the virtual mausers service chine’s IP address by providing the MAC address. which renders customized access interfaces and middleware deployed on backend resources in a black The system behaviors are executed in the following box. On successful reply. However.uniquely identiﬁes a virtual machine. middleware and backend servers. This computing paradigm is different from existing procedures: Grid portals or Grid computing environment [22. Globus Toolkit mit this parameter to the Xen hypervisor by setting and virtual data system. and deﬁne the user computing environment or comthe backend set the host name for the virtual maputing middleware. the backend sets ibility. any additional Xen parameter users and storage size.deployment of the cyberinfrastructure (here cybeaide intens to the central DHCP server and gets the network terfaces and functionalities). the OpenNebula’s “RAW” variable. no further conﬁguration of any additional ﬁles. The the networking item in the conﬁguration ﬁle with control service is responsible for conﬁgure and control the “DHCP”. Moreover.
E. K. A framework for IP based virtual private networks. Leung. 16 . pages 34–42. Keahey. Dynamic virtual clusters in a Grid site  X. Harris. Moore. access on July 2008. Morgan Kaufmann. 2005.roughtype. Feb. A. Cappello. Ghemawat. Mi. 2008. In Yumerefendi. I. J. Mapreduce: simpliﬁed quality of life in the Grid.web. Germain. and S. J. D. S. Foster and C. S. Barham.  J. 2004. http://en. and X. Keahey. L. Becker. SharProceedings of the 19th ACM Symposium on Oping networked resources with brokered leases. The Google ﬁle system. VIOLIN: virtual internetworkhttp://www. Doering. pages 937–  Global Cloud computing test bed [URL]..wikipedia. pages (in part) by NSF CMMI 0540076 and NSF SDCI NMI 582–587. Chase. access on Sep. I.0 deﬁnition [URL]. 2004. R. S. 2003. E. B.build a scalable scientiﬁc cloud: Cumulus could adopt to various virtual machine hypervisors as back ends. and provision desired cyberinfrastructure on demand. In Proceedings of the 5th International Worklarge-scale distributed systems at google. 21(6):896–909. Fortes. 946.ch/glite/. Ho. Dean. and  D. SODA: a service-on-demand architecture for application service hosting utility platmanager. ACM. Chase. Ghemawat. In erating Systems Principles.  Here comes HaaS [URL].  K. 2000. 2006. Fedak. 2003. 13(4):265–275. V. Pratt. L.  S. ing on overlay infrastructure. Matsunaga.php/. Figueiredo. 2nd International Symposium of Parallel and Distributed Processing and Applications.hp. Tsugawa. ceedings of the USENIX Annual Technical Conference. New Proceedings of the 2006 USENIX Annual Technical York.. S. In Proceedings of the 12th International SymSymposium on High Performance Distributed Composium on High-Performance Distributed Computputing.  Web 2. Scientiﬁc Programming.com/hpinfo/newsroom/press/2008/080729xa. Krsul.org/wiki/web 2/. Chawla. and S. XtremWeb: a generic global computing system. http://www. Irwin. Future Generation Comp. In Proceedings of the 1st IEEE International SympoWork conducted by Gregor von Laszewski is supported sium on Cluster Computing and the Grid. pages 90–103. Zhang.  P. T. A. C. pages 164–177. Grit. A. 2001. N´ eri. In Proshop on Grid Computing. Zhu. R. Jiang and D.html/. H. Acknowledgment References  I. The Grid: blueprint for a new computing infrastructure. T. Dragovic. Sprenkle. Freeman. access on June 2008. Hand. J. Kesselman. In-VIGO system. Yocum. This objective deﬁnes the resource re-organization and service orchestration for Cloud computing. delegates users’ requirements to other Clouds or existing Grid testbed. Mapreduce and other building blocks for Grid.  K. Chadha. Oct. http://glite. From sandbox to playground: dynamic virtual environments in the  J.  X. Zhu. and K.com/archives/2006/03/here comes haas. From virSymposium on Operating Systems Principles. Foster. Xu. and I. Gleeson etc. In Proceedings of the 19th ACM J. 2007. In Proceedings of the 12th International forms. Fraser. M. 2005. Irwin. Conference. K. L. pages tualized resources to virtual computing Grids: the 29–43. R. Xu. Commun. 2003. 1998. P. The Internet Engineering Task Force. Adabala.  gLite [URL]. 51(1):107–113. A. Zhang. D.  B. 2003. E. 0721656. and F.cern. Grit. U. Neugebauer. Xen and the art of virtualization. Virtual workspaces: achieving quality of service and  J. Zhao. and X.  G. Syst. Dean and S. Warﬁeld. E.  S. Foster. pages 199–212. RFC2764. I. Jiang and D. Gobioff. ing. pages 174–183. data processing on large clusters. V. In Proceedings of the access on June 2008. D. L. A. 2008.
http://www.  Google Chrome [URL].com/appengine/.  Amazon Elastic Compute Cloud [URL]. July 2004. access on Nov.org/toolkit/.wss/.cs. access on June 2008. and P.  Unicore project [URL].amazon.  IBM Blue Cloud project [URL]. 36(5):38–46. Sundararaj and P. access on Sep.com/s3/. access on Nov. http://www.  P. 2008. access on Oct. 2007. work. cess on Nov. http://www. access 2008. http://docs. Jiang.com/chrome/. Milenkovic.  Cluster on Demand project [URL]. and S. H. http://hadoop.org/.com/acom/buzzword/. S. V. Technical on Sep. Towards virtual access on June 2008.amazon. access ing. Anderson. 2007.com/. http://www. S. access on June 2008. 2007.microsoft. Shoykhet. us/architecture/aa699384.ch/lhcgrid/. access on June 2008.html/. http://workspace. 2008.org/. 2005. 2008. http://www.google.  Amazon Simple Storage Service [URL]. access on July 2008. X. 2008. access on Sep. Ruth. IEEE Internet Computhttp://msdn. Bowman. R.org/clouds/nimbus. http://cloudtestbed.  A.ucsb.com/ec2/. access on Nov. http://cern. access on Sep. Knauerhase.cs. Apr. ogce.edu/nicl/cod/. Garg.com/. IEEE Computer.  Adobe Buzzword [URL].  The Open Grid Computing Environments Portal Project [URL]. and trends. J.org/committees/uddispec/doc/tcspecs. 38(5):63–69. access on Sep.collab. [URL]. http://www. Vakali. Northwest University. access system for virtual machine marketplaces.openafs. 2008. access on Sep.oasis-open. http://code. practices.aspx/. access on Sep.org /.globus. access on Sep. Application Service Provider [URL]. access on Sep.elasticdrive. Dinda. Report NWU-CS-04-39. C. ac2007. Koutsonikola and A.  OS Farm project [URL]. 2003. and  Elasticdrive Project M. A.wikipedia. [URL].enomalism. Barkai.opennebula. Tewari.org/. networks for virtual machine Grid computing.  Nimbus Project [URL].unicore. 2008. 2007. and Technology Symposium. ac. Dinda. Robinson. 2004. Toward Internet distributed computhttp://www. 2008.  Mashup project[URL]. http://www. I. In Proceedings of the 3rd Virtual Machine Research  OASIS UDDI Speciﬁcation [URL].globus.com/ening.htm. access on Sep. T.  LHC Computing Grid project [URL]. http://aws.google. D. Virtuoso: a  Hadoop [URL].  Enomalism Project [URL]. D.org/. IEEE Computer.ibm. 2008. 2008.adobe.  OpenNEbula Project http://www.com/. 2004. LDAP: frame. . 2008. on Sep. Eucalyptus http://eucalyptus.  A. on [URL]. A. http://en. A.edu/.  Google Docs [URL].  M. Virtual distributed environments in a shared infrastructure. V. http://aws. http://cern.ch/osfarm /.org/wiki/mashup web application hybrid/. pages 177–190. 2008.google. Lange. http://www. 2008. Goasguen. 17  Open AFS [URL]. access on Nov. 8(5):66–72.  Global Cloud Computing Research Test Bed Wiki http://www-03.duke. http://www. and Gateway Toolkit [URL]. cess on July 2008.apache.com/press/us/en/pressrelease/22613. Google App Engine [URL]. Xu.org/.  Globus Toolkit [URL].
2007. S. http://www.com/softwareplusservices/. Hsiao.globus. and D. 2007. Dasdan. http://www. http://www.com/press/us/en/pressrelease/23448.vmware.ibm. http://www. access on June 2008.  Virtual Data System [URL]. 2007. access on June 2008. 2007.html/.  Software + Services [URL]. http://www.acis. access on June 2008. A. Oracle Cluster File System [URL].  H. access on Nov.edu/. access on June 2008. access on June 2008.  Status Project [URL].  Reservoir Project [URL]. access on Nov. http://www.oasis-open.3/interfaces/index. http://vds.w3. 18 .w3. R. http://www03. In Proceedings of the ACM SIGMOD International Conference on Management of Data.uﬂ.php?wg abbrev=wsrf. access on Nov. access on Nov.  VMware virtualization technology [URL].  SmugMug [URL].org/tr/wsdl/.  Globus virtual workspace interface guide [URL]. http://oss.oracle.org/committees/tc home. access on Nov. Yang.com.  Web Service Resource Framework [URL]. Map-reduce-merge: simpliﬁed relational data processing on large clusters. pages 1029–1040.com/projects/ocfs/. Parker Jr. http://www.microsoft.smugmug.wss/.  Simple Object Access Protocol (SOAP) [URL].uchicago. 2007. 2007.edu/vws/. access on June 2008.  Web Service Description Language (WSDL) [URL].org/vm/tp1.org/tr/soap/. http://workspace.com/.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.