www.acm.

org/crossroads

Spring 2010 • Issue 16.3

Spring 2010—Issue 16.3

CROSSROADS STAFF
EDITOR-IN-CHIEF Chris Harrison, Carnegie Mellon University DEPARTMENTS CHIEF Tom Bartindale, University of Newcastle EDITORS Ryan K.L. Ko, Nanyang Technological University James Stanier, University of Sussex Malay Bhattacharyya, Indian Statistical Institute Inbal Talgam, Weizmann Institute of Science Sumit Narayan, University of Connecticut DEPARTMENT EDITORS Daniel Gooch, University of Bath David Chiu, Ohio State University Rob Simmons, Carnegie Mellon University Michael Ashley-Rollman, Carnegie Mellon University Dima Batenkov, Weizmann Institute of Science COPY CHIEF Erin Claire Carson, University of California, Berkeley COPY EDITORS Leslie Sandoval, University of New Mexico Scott Duvall, University of Utah Andrew David, University of Minnesota ONLINE EDITORS Gabriel Saldaña, Instituto de Estudios Superiores de Tamaulipas, Mexico Srinwantu Dey, University of Florida MANAGING EDITOR AND PROFESSIONAL ADVISOR Jill Duffy, ACM Headquarters INSTITUTIONAL REVIEWERS Ernest Ackermann, Mary Washington College Peter Chalk, London Metropolitan University Nitesh Chawla, University of Notre Dame José Creissac Campos, University of Minho Ashoke Deb, Memorial University of Newfoundland Steve Engels, University of Toronto João Fernandes, University of Minho Chris Hinde, Loughborough University Michal Krupka, Palacky University Piero Maestrini, ISTI-CNR, Pisa José Carlos Ramalho, University of Minho Suzanne Shontz, Pennsylvania State University Roy Turner, University of Maine Ping-Sing Tsai, University of Texas—Pan American Andy Twigg, University of Cambridge Joost Visser, Software Improvement Group Tingkai Wang, London Metropolitan University Charles Won, California State University, Fresno OFFERING #XRDS0163 ISSN#: 1528-4981 (PRINT) 1528-4982 (ELECTRONIC) Front cover image courtesy of Opte Project.

COLUMNS & DEPARTMENTS
LETTER FROM THE EDITOR: PLUGGING INTO THE CLOUD . . . . . . . . . . . . . . . . . . . . 2
by Chris Harrison, Editor-in-Chief

ELASTICITY IN THE CLOUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
by David Chiu

CLOUD COMPUTING IN PLAIN ENGLISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
by Ryan K. L. Ko

FEATURES
VOLUNTEER COMPUTING: THE ULTIMATE CLOUD
by David P. Anderson
As a collective whole, the resource pool of all the privately-owned PCs in the world dwarfs all others. It’s also self-financing, self-updating, and self-maintaining. In short, it’s a dream come true for volunteer computing, and the cloud makes it possible.

7 10 14 19 24 26

CLOUDS AT THE CROSSROADS: RESEARCH PERSPECTIVES
by Ymir Vigfusson and Gregory Chockler
Despite its ability to cater to business needs, cloud computing is also a first-class research subject, according to two researchers from IBM Haifa Labs.

SCIENTIFIC WORKFLOWS AND CLOUDS
by Gideon Juve and Ewa Deelman
How is the cloud affecting scientific workflows? Two minds from the University of Southern California explain.

THE CLOUD AT WORK: INTERVIEWS WITH PETE BECKMAN OF ARGONNE NATIONAL LAB AND BRADLEY HOROWITZ OF GOOGLE
by Sumit Narayan and Chris Heiden
Two leaders in the computing world explain how they view cloud computing from the research and industry perspectives.

STATE OF SECURITY READINESS
by Ramaswamy Chandramouli and Peter Mell
Fears about the security readiness of the cloud are preventing organizations from leveraging it, and it’s up to computing professionals and researchers to start closing that gap.

THE BUSINESS OF CLOUDS
by Guy Rosen
Businesses are flocking to cloud computing-based solutions for their business needs. The best way to understand the magnitude of this mass movement is to look at the hard data.

Contact ACM and Order Today! Phone: 1.800.342.6626 (USA/Canada) Postal Address: ACM Member Services +1.212.626.0500 (outside USA/Canada) P.O. Box 11405 Fax: +1.212.944.1318 New York, NY 10286-1405 USA Internet: http://store.acm.org/acmstore Please note the offering numbers for fulfilling claims or single order purchase below. Copyright 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page or initial screen of the document. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. Crossroads is distributed free of charge over the internet. It is available from: http://www.acm.org/crossroads/. Articles, letters, and suggestions are welcomed. For submission information, please contact crossroads@acm.org.

joining Ryan K. If you’ve been thinking about getting your feet wet in the cloud. Editor-in-Chief Open Hack Day In fact. document storage and heavy computation (like web searches) will all occur in the cloud. Rob Simmons (Carnegie Mellon). No. neurship. You can check it out and sign up your web site for the service at wwww. google.org) and Twitter (hashtag #xrds). In the 24 “hacking” hours permitted by the contest. Inbal Talgam (Weizmann Institute of Science. Google has opened up its “app engine” back end.acm. to event listings and puzzles. briefly. one thing is for sure: Cloud computing is hot and will soon have a big presence on your PC. Crossroads 2 Spring 2010/ Vol. I had to seriously reevaluate my perception and definition of cloud computing. Michael Ashley-Rollman (Carnegie Mellon). Expect everything from code snippets and school advice. AT&T Labs. Sumit Narayan (University of Connecticut). and most recently.org/crossroads . especially those that leverage hardware and the environment in new and compelling ways. The idea. where it begins and ends. is readying ChromeOS. this allows you to see where people are congregating on a web page.com). Speaking of the web. we invite you to join our Facebook group (ACM Crossroads) and also to let us know what you think via email (crossroads@acm. a thin operating system that boots right to a browser. already a big player in the consumer space with services like Gmail and Google Docs. is to embed a simple visualization into web pages. making it massively parallel and distributed. Israel). Google. —Chris Harrison. and we are very excited about the amazing lineup of feature articles. 3 www. We’ve placed special emphasis on recurring columns headed up by our new editorial team. interesting news story. Israel). and I (all Carnegie Mellon University students) built a cloud-based application in Python we call The Inhabited Web. Julia Schwarz. we built the back end on the Google App Engine (appengine. For end users. Is this a taste of things to come? Fortunately for programmers and students. given the wide array of computing models it encompasses. cloud computing’s inherent intangibility makes it tough to get a good grip on what it is and isn’t. to historical factoids and lab highlights.Letter from the Editor Plugging into the Cloud A fter reading this issue.inhabitedweb. I hope you find the current issue stimulating. Heading up these departments is a talented team from all over the globe: Daniel Gooch (University of Bath). all the way to volunteer computing. Singapore). and Tom Bartindale (Newcastle University).com. Dima Batenkov (Weizmann Institute of Science. Erin Claire Carson (University of California-Berkeley). I am also very pleased to announce James Stanier (University of Sussex) is now part of the senior editorial team. Microsoft Research. We’re very excited to announce Crossroads will be relaunching as of the next issue with an all-new look and tons of fresh content for students. Collectively. covering topics from security and entrepre- Biography Editor-in-chief Chris Harrison is a PhD student in the Human-Computer Interaction Institute at Carnegie Mellon University. 16. there really isn’t a better time to start tinkering! Presenting XRDS This issue also marks the last Crossroads that will arrive in the present format. joining other powerful services like Amazon’s EC2 and Yahoo!’s BOSS. responsible for soliciting and magazine feature articles. However. he has worked on several projects in the area of social computing and input methods at IBM Research. I’m already guilty. Small triangles are used to represent users’ positions on the current page (scroll position). L. You’ll also find interviews with people working on the biggest and best cloud computing systems (see page 19). David Chiu (Ohio State University). Unsurprisingly. Bryan Pendleton. As part of Yahoo!’s Open Hack day this past October. The whole Crossroads team has been hard at work for three months on this cloud-centric edition of the magazine. Over the past four years. His research interests primarily focus on novel input methods and interaction technologies. perhaps next to a great shopping bargain. agreement among even experts is somewhat elusive. or funny video. next to the browser's scroll bar. Ko (Nanyang Technical University. With Chrome OS.

However. it’s one more device that sucks up power. to some extent. is elasticity a reality? Sharing Resources In the past several years. that is. Elasticity has become an essential expectation of all utility providers. as predicted by Moore’s Law. Cloudera. and Google’s App Engine. The case could certainly be made. the cloud offers a departure from the fixed provisioning scheme. labor. It would be more cost-effective if the company paid some third-party provider for its storage and processing requirements based on time and usage. maintenance. 3 3 . a company that supports its own computing infrastructure may suffer from the costs of equipment. but you’re willing to eat the cost. such as Amazon Elastic Compute Cloud (EC2). is capable of handling peak loads. but only a small fraction of that during down time. For instance. Azure. the superfluous cores (even a single core) are underutilized or left completely idle. some people have suggested that computing be treated under the same model as most other utility providers. By designing our applications to scale servers accordingly to the load. No. Through advancements in virtualization and the ability to leverage existing supercomputing capacities. but interesting property in utility models is elasticity. When’s the last time you plugged in a toaster oven and worried about it not working because the power company might have run out of power? Sure. While the former case. CPUs are now being shipped with multi. you would expect the provider to charge you less on your next billing cycle. its inability to handle peak loads may cause users to leave its service. Known to most as cloud computing. known as over-provisioning. providers must be able to support the sense of having an unlimited number of resources. To provide an elastic model of computing. In the distant past. A Departure from Fixed Provisioning Consider an imaginary application provided by my university. 16. However. utility computing is finally becoming realized. Likewise. The latter case of under-provisioning might address. or less than 100 servers. What elasticity means to cloud users is that they should design their applications to scale their resource requirements up and down whenever possible. if you switch to using a more efficient refrigerator. Without elasticity.acm. leaders. this application requires 100 servers during peak time. and mounting energy bills. Crossroads www. we have experienced a new trend in processor development. Because computing resources are unequivocally finite.and many-cores on each chip in an effort to continue the speed-up. the presence of idle machines. have already begun offering utility computing to the mainstream. this is not as easy as plugging or unplugging a toaster oven. the ability to stretch and contract services directly according to the consumer’s needs. the overhead of becoming a computingas-a-utility provider was prohibitive until recently.org/crossroads Spring 2010/ Vol. Ohio State. gas.Elasticity in the Cloud By David Chiu T ake a second to consider all the essential services and utilities we consume and pay for on a usage basis: water. Over the period of a day. While it made perfect sense from the client’s perspective. However. A simple. Ohio State has two options: either provision a fixed amount of 100 servers. electricity. it also wastes servers during down time.

statistical multiplexing allows a single resource to be shared by splitting it into variable chunks and allocating each to a consumer. 3 www. the provider would simply have to refuse service. turn to statistical multiplexing for maximizing the utilization of today’s CPUs. and more im portant. Certainly. a provider can charge a normal price for low-grade users. EC2 only allows 20 simultaneous machine instances to be allocated at any time. who might be fine with having their service interrupted very infrequently. which is not what users want to hear. they are afforded a way to allo cate ondemand. they certainly leave room for the provider to offer flexible pricing options. But even with virtualization. as we know it. for most users. which allows cloud providers to not only maximize the usage of their own physical resources. on the other hand. Another option might be to preempt currently running processes. 16. can pay a surplus for having the privilege to preempt services and also to prevent from being preempted. virtualization technology. In the meantime. Looking Forward With the realization of cloud computing. they identify new opportunities and may potentially transform computing. which offers a new dimension for cost optimization problems. For instance. From the consumers’ perspective. but also multiplex their resources among multiple users. fullycontrollable systems. Virtualization has since become the de-facto means toward enabling CPU multiplexing. Informally. independent. many stakeholders are afforded on-demand access to utilize any amount of computing power to satisfy their relative needs. scaling applications to handle peak loads has been a long-studied issue. The elastic paradigm brings with it exciting new development in the computing community. Currently. ❞ Biography David Chiu is a student at The Ohio State University and an editor for Crossroads. High-grade users. While downscaling has received far less attention in the past. 4 Spring 2010/ Vol. No. which allows several instances of operating systems to be run on a single host machine.David Chiu System engineers.acm.org/crossroads Crossroads . As clouds gain pace in industry and academia. the cloud invokes a novel incentive for applications to contract. Although both are unpopular choices. the question persists: What if the physical resources run out? If that ever occurred. as a result. has matured to a point of production. cloud users is ❝What elasticity means totheir applications that they should design to scale their resource requirements up and down whenever possible.

what does all this mean for common folks like you and me? It means that we are freed from the need to upgrade hardware and the need to spend more than half of the time trying to make a product work. This marks the change from the focus on full implementation of computing infrastructures before the year 2000 to the abstraction of the high-level. By accessible. L. 3 5 . The key difference between this and other similar-sounding approaches. With cloud computing. is in the concept of abstracting services from products. like many. But is “cloud computing” really just another buzzword brought forward by the software giants. and practically anyone can utilize computing to the max.org/crossroads Spring 2010/ Vol. Michael Porter’s Competitive Advantage).) So. No. As a fledgling researcher. and so forth. or is there something more? Significance of the Cloud Computing Era Fundamentally. applications. value-driven activities from the low-level.acm. but are now able to focus on the real essence of our activities—the valueadding activities (cf. I was also quite appalled at how many seasoned researchers were able to recently claim that their research “has always been” cloud computing. Ko I am not an evangelist of cloud computing. it means that we are now moving toward services instead of focusing on selling products. In the words of those advocating cloud computing. hardware specifications. the complex network of computers. the number of emails to set up for its employees. A small. and I must admit that.” which is a virtual environment that embodies data centers.to medium-sized enterprise no longer needs to own and maintain actual physical servers to host Web applications but are instead able to lease virtual private servers (VPS) for a monthly subscription fee. we mean that it is easy for a non-technical person to use this software and even create his or her own. technical activities and details in the present and near future. such as grid computing or utility computing. end-users and businesses can simply store and work on data in a “cloud. (More technical information on these services can be found in “The Business of Clouds. a startup company would no longer need to worry about the RAID configurations and the number of scheduled backup jobs. and the hardworking folks at the IT companies. I was once a skeptic. This concept is already evident in many current technologies that are not explicitly labeled as cloud computing. For example. such as the actual web content. 16. I suppose many people believe cloud computing is just a buzzword and are also quite turned off by the ever-growing list of acronyms plaguing the computer science world. and the file structure and permissions to be granted for its content management structure. servers. Crossroads www. and applications that are used in the back end) so that computing is now accessible to anyone with a computing need of any size. efficiency of data processing. but instead could focus on more important details. cloud computing is a concept that aims to enable end-users to easily create and use software without a need to worry about the technical implementations and nitty-gritties such as the software’s physical hosting location. services. end-users no longer need to learn a new language or worry about the program’s memory requirements to create a Facebook or MySpace application.” page 26. This is done by virtualizing the products (for example.Cloud Computing in Plain English By Ryan K. With cloud computing.

In today’s context. “Click’s Favourite Cloud Links. but rather on a host computer. L. “http://news. Crossroads has prepared a starter kit (see sidebar).stm “See in particular G.” By Bill Thompson. or an employee of a cloud vendor betrays the trust of the public. a global virtual computer hosting site.org/crossroads Crossroads . there are bound to be many problems and loopholes. our data and critical information is at the com- “A layman’s summary of the recent cloud computing trend. cloud computing is the wave of the future.uk/2/hi/technology/7421099. A good way to do this is to organize and draw what you have learned into mind maps. cloud computing loosely means that software you use does not reside on your own computer. “http://news. we will need to urgently address both data privacy and data security concerns.co.uk/2/hi/technology/7693993. introducing some non-technical links to interesting articles and videos to kickstart your journey.co.co. published on TechWorld “www. The evolution of computing languages from the first generation (assembly languages) to the more human-readable fourth-generation languages (4GLs. specializing in the semantic web and business process management. experts from computer security.marketwatch. “Storm warning for cloud computing. With so much at stake. for IDG News Service.cfm?newsid=102279 “Just for fun. but something that embodies this innate attempt by humans to make computing easier.uk/2/hi/programmes/click_online/ 7464153. Singapore. Ko is a final year PhD candidate at Nanyang Technological University. accessed via the Internet. as it will be a borderless and large-impact problem. I hope that I have also opened up your mind to witnessing the increasing influence of cloud computing in our daily lives.bbc. concerns. SQL).techworld. While it is my greatest wish for you have a better understanding of cloud computing through this article. BBC News. The following resources have been selected by Crossroads’ editors in an attempt to help other students understand the meaning. accessed via the Internet. “Like it or not. 16. software engineering and many other related areas are crucial people in this turn of a new era. many researchers are now geared toward creating better trust evaluation mechanisms and authentication procedures.stallman ❞ “Richard Stallman on why he’s against cloud computing. not many have an objective view of this highpotential but controversial topic.” By Bobbie Johnson.stm “Highlighting concerns surrounding cloud computing.st. Given this fact.Ryan K. Take a step back and try to see how things fit together.uk ❝ Cloud computing loosely means that software you use does not reside on your own computer. Biography Ryan K.bbc. Cloud computing’s focus is on empowering Internet users with the ability to focus on value-adding activities and services and outsource the worries of hardware upgrades and technical configurations to the “experts residing” in the virtual cloud.stm “See in particular the short video clip on Microsoft Azure in this piece from the BBC. when we look beneath the sales talk and big promises of cloud computing and observe the shifts in trends in our computing approaches. we start to realize that cloud computing is not just another buzzword. “Cloud computing is a trap.acm.” From Click’s BBC News plete mercy of these criminals. security decreases.co. MarketWatch. 6 Spring 2010/ Vol. No. data integrity.richard. Ko Now.com/opsys/news/index. Once a hacker or malicious attack successfully penetrates the security boundaries of the cloud. “http://news. run by someone else. and the evolution from structural/modular programming to object-oriented programming are both earlier evidences of this trend. “Dell attempts to copyright ‘cloud computing. He is also an editor for Crossroads.uk/technology/2008/sep/29/ cloud. service computing. it is not rare to find researchers claiming that they are working in a research area that contributes to cloud computing. Dell tries to beat other computing companies to the punchline. Do not bounce on the latest buzzwords you hear. Hence. Researchers need to find the right balance between convenience and security. L.com/story/like-not-cloudcomputing-wave Imminent Issues If we are evolving into a cloud-oriented environment and way of doing business. warns GNU founder Richard Stallman. “www. To further increase the security. How Can Graduates Approach Cloud Computing? The best way to approach this field is to have a good balance between the quest of knowledge and discernment. It’s a balancing act: when convenience increases. Guardian. and vice versa. “Microsoft to battle in the clouds. “www.” By Rory Cellan-Jones.bbc. while the industry is busy figuring out scalability solutions.’” By Agam Shah.” By Therese Poletti. BBC News. run by someone else. 3 www. As cloud computing is a highly trust-based system.ho. but rather on a host computer.co. computer networking. and latest trends of cloud computing. and security issues. we would need legislation and laws to catch up with the nature of cloud computing.guardian. Cloud Computing Starter Kit While there are plenty of sites and articles describing cloud computing.computing.

self-updating and self-maintaining. cloud computing. it dwarfs the other pools. Einstein@home (from University of Wisconsin and Max Planck Institute. the computers owned by a university. but the computational demands of science are growing even faster. grid computing. There are many approaches to distributed computing: History of Volunteer Computing In the mid-1990s. The number of privatelyowned PCs is currently 1 billion and is projected to grow to 2 billion by 2015. were launched in 1996 and 1997. GIMPS and distributed. Jobs are run at low priority.  in which jobs are run on networked computers. from Stanford. and so for several years there were no new projects. Third. Both projects attracted tens of thousands of volunteers and demonstrated the feasibility of volunteer computing. maintain their computers. gravitational wave detectors. These projects received significant media coverage and moved volunteer computing into the awareness of the global public. and pay their electric bills. in which separate organizations agree to share their computing resources (supercomputers. communicate with other volunteers. In the case of volunteer comCrossroads www. People buy new PCs. as consumer PCs became powerful and millions of them were connected to the Internet. the rate of job completion. galaxy. Folding@home.net. In 2002. Some of the larger projects include Milkyway@home (from Rensselaer Polytechnic Institute. and distributed. are state of the art. the resource pool is the set of all privately-owned PCs in the world. Consumer markets drive research and development. and studying the distribution of outcomes requires many simulation runs with perturbed initial conditions. For starters. To achieve high throughput. searches for gravitational Spring 2010/ Vol. the BOINC project was established to develop general-purpose middleware for volunteer computing. which is similar to desktop grid computing except that the computing resources are volunteered by the public. SETI@home and Folding@home. desktop grid computing. which uses dedicated computers in a single location. planet. particle colliders) produce huge amounts of data. in which desktop PCs within an organization (such as a department or university) are used as a computing resource. The models are typically chaotic.Volunteer Computing The Ultimate Cloud By David P. but there are already 100 million GPUs in the public pool. which in many cases requires compute-intensive analysis. 2) Compute-intensive analysis of large data: Modern instruments (optical and radio telescopes. and track their progress. the pool is self-financing. Traditional HPC is scrambling to use GPUs. the use of distributed computing. Reducing this to a feasible interval—days or weeks—requires high-performance computing (HPC). the fastest processors today are GPUs developed for computer games. SETI@home from University of California-Berkeley analyzes data from the Arecibo radio telescope. and today there about 60 such projects. consumer PCs. Second. as well as web interfaces by which volunteers could register. ecosystem. and/or desktop grids). The first two projects. This pool is interesting for several reasons. These areas engender computational tasks that would take hundreds or thousands of years to complete on a single PC. Extreme requirements arise in at least three areas. and tens of thousands are already being used for volunteer computing. volunteer computing. These projects all developed their own middleware. the application-independent machinery for distributing jobs to volunteer computers and for running jobs unobtrusively on these computers.net breaks cryptosystems via brute-force search of the key space. in the areas listed above. 1) Physical simulation: Scientists use computers to simulate physical reality at many levels of scale: molecule. 3) Biology-inspired algorithms such as genetic and flocking algorithms for function optimization. universe. clusters. rather than the turnaround time of individual jobs. making it easier and cheaper for scientists to use. is the important performance metric. The first BOINC-based projects launched in 2004.  GIMPS  finds prime numbers of a particular type. looking for synthetic signals from space. For example. No. • • • Each of these paradigms has an associated resource pool: the computers in a machine room.acm. with funding from the National Science Foundation. is often more costeffective than supercomputing. 16. One approach is to build an extremely fast computer— a supercomputer. However. not special-purpose computers. In 1999 two new projects were launched. or while the PCs are not being otherwise used. Anderson C omputers continue to get faster exponentially. 3 • • cluster computing. in which a company sells access to computers on a pay-as-you-go basis.org/crossroads 7 . studies how proteins are formed from gene sequences. Few scientists had the resources or skills to develop such software. gene sequencers. organism. the idea of using them for distributed computing arose. This subset of HPC is called highthroughput computing. studies galactic structure). puting. in a wide range of scientific areas. the computers owned by a cloud provider. upgrade system software.

And in terms of software. The most promising solution to this is to create umbrella projects serving multiple scientists and operated at a higher organizational level (for example. In addition. Cloud computing is even more expensive. and knows what they’re The BOINC Project The BOINC software consists of two parts: server software that is used by to create projects and client software. and for each project can assign a resource share  that determines how the computer’s resources are divided among the projects.000 computers. the fastest supercomputer supplies about 1. hosts 5-10 humanitarian applications from various academic institutions). volunteer computing is cheaper than other paradigms—often dramatically so. As a comparison. this is not necessarily the case. to which the PC is thermodynamically equivalent. Performance. However.000 per year. from Oxford University.David P. its technical competence. ClimatePrediction. malicious hackers—can create a project. the public has direct control over how resources are allocated. the near-term potential of volunteer computing goes well beyond Exa-scale. No. although it pays for the resources. By doing so. 16. 3 www. As a result. The HPC community. For example. 50. So in terms of throughput. resources are allocated by bureaucracies: funding agencies. being used for. and Mac OS X. BOINC encourages volunteers to participate in multiple projects simultaneously. volunteers install and run client software on their computers. the result will be 1 ExaFLOPS of computing power.4 PetaFLOPS. few research groups have the resources and skills needed to operate a project. The FLOP/Watt ratio of a PC is lower than that of a supercomputer. ClimatePrediction.000 instances would be needed. A medium-scale project (10.net. and its scientific merit. Cost effectiveness. volunteers can then attach it to any set of projects. public awareness of science is increased. since 4 million PCs is only 0. hobbyists. while perhaps a hundred times that many could benefit from it. Actually.net (from Oxford University. For scientists. The client software is available for all major platforms. are doing the most important and best research. a numerical measure of a volunteer’s contribution to a project. About 900. The public. Rosetta@home (from University of Washington.  and  IBM World Community Grid (operated by IBM. In volunteer computing. Having installed the client program. So the volunteer must assess the project’s authenticity.000 computers are actively participating in volunteer computing. on whom scientists rely for guidance. Volunteer computing has not yet been widely adopted. Linux. No study has been done taking such factors into account. More generally. Anderson waves). BOINC provides a cross-project notion of identity (based on the volunteer’s email address). However. has no direct voice in their allocation. by making it easy to join and leave projects. Attaching to a project allows it to run arbitrary executables on one’s computer.acm. Anyone—academic researchers. and the largest grids number in the tens of thousands of hosts. studies proteins of biomedical importance). volunteer computing is competitive with other paradigms. and doesn’t know how they’re being used. The accumulation of a large amount of credit in a particular project can be a disincentive to try other projects. Projects are independent.40 per day. Each one operates its own server and provides its own web site. Resource allocation policy and public outreach. for example. An equivalent CPU cluster costs at least an order of magnitude more. BOINC has no centralized component other than a web site from which its software can be downloaded. Together they supply about 10 PetaFLOPS (trillion floating-point operations per second) of computing power. although BOINC has reduced the barrier to entry. Each project exports its credit statistics as XML files. and BOINC provides only limited (account-based) sandboxing. each with a 1 TeraFLOPS GPU (the speed of current high-end models) and computing 25 percent of the time. Amazon Elastic Computing Cloud (EC2) instances provide 2 GigaFLOPS and cost $2. including Windows. To combat this. at the level of a university). Sixty research groups are currently using volunteer computing. The ownership of intellectual property resulting from the project may also be a factor. In traditional HPC paradigms. and to devote their computing resources to that projects that. energy used by a PC may replace energy used by a space heater. perhaps because it offers neither control nor funding. and it has the near-term potential to greatly surpass them: if participation increases to 4 million computers. simulates the Earth’s climate change during the next 100 years. they avoid having their computer go idle if one project is down. Multiple attachment also helps projects whose supply of work is sporadic. Evaluating Volunteer Computing Volunteer computing can be compared with other high-performance computing paradigms in several dimensions. institutions. To attain 100 TeraFLOPS. 100 TeraFLOPS) can be run using a single server computer and one or two staff for roughly $200. BOINC encourages volunteers to occasionally evaluate the set of available projects. has ignored volunteer computing. and various third-party credit statistics Crossroads 8 Spring 2010/ Vol. and committees. studies long-term climate change). Energy efficiency. in their view. In cold climates. studies suggest that cloud computing is cost-effective for hosting volunteer computing project servers. The choice of projects is up to the volunteer. and research projects that are outside of the current academic mainstream can potentially get significant computing resources. Other paradigms are projected to reach this level only in a decade or more. the fraction supplied by GPUs is about 70 percent and growing.8 million per year. Scientific adoption. and it is tempting to conclude that volunteer computing is less energy-efficient than supercomputing. BOINC does accounting of credit.org/crossroads . costing $43. Cluster and grid computing are much more widely used by scientists.4 percent of the resource pool.

 This difficulty may soon be reduced by running applications in virtual machines. and so on. otherwise issue additional replicas. a project must perform a variety of human functions. the process of locating them. analogous to mutual funds. Yet another reason people volunteer is because of the credit incentive. Crossroads www. these problems are addressed by BOINC. such as curing diseases. computers are constantly joining and leaving the pool of a given project. why do people volunteer? This question is currently being studied rigorously. methods. disk space). finding extraterrestrial life. volunteers are effectively anonymous. The volunteer computer population is extremely diverse in terms of hardware (processor type and speed. and will become infeasible if the number of projects grows to hundreds or thousands. These efforts have failed because the potential buyers. This framework has been used by third-party developers to create “one-stop shopping” web sites. In addition. BOINC provides scheduling mechanisms that assign jobs to the hosts that can best handle them. 16. This is complicated by the fact that different computers often do floating-point calculations differently. the volunteer’s total credit across all projects. Evidence suggests that there are several motivational factors. It must publicize itself by whatever media are available—mass media. redundancy has the drawback that it reduces throughput by at least 50 percent. BOINC has a mechanism called adaptive replication that identifies trustworthy hosts and replicates their jobs only occasionally. reading their web sites. compare the results. but projects cannot trust volunteers. If a volunteer behaves maliciously. Instead of being attached directly to projects. such offenders can be identified and disciplined or fired. In other HPC paradigms. and scientists need not be concerned with them. BOINC addresses this with a mechanism called homogeneous redundancy that sends instances of a given job to numerically identical computers. social networking sites. From a project’s perspective. Result validation. To begin with. Volunteers must trust projects. through message boards and other web features. It must manage the moderation of its web site’s message boards to ensure that they remain positive and useful. Heterogeneity. are sporadic and generally unpredictable. The time intervals when a computer is on. Even with the modest number (60) of current projects. volunteers wanting to support cancer research could attach to an American Cancer Society account manager. so that Spring 2010/ Vol. The server is based on a relational database. 3 The BOINC client software lets volunteers attach to projects and monitor the progress of jobs. that is. around volunteer computing projects. and credentials. A level of indirection can be placed between client and projects. directly or via a lottery. BOINC provides a framework for dealing with this problem. However. To address this. Linux 32 and 64 bit. Scalability.Volunteer Computing: The Ultimate Cloud sites import these files and display cross-project credit. Technical Factors Volunteer computing poses a number of technical problems. the project has no way to identify and punish the offender. where volunteers can read summaries of all existing BOINC projects and can attach to a set of them by checking boxes. software (operating system and version) and networking (bandwidth. American Cancer Society experts would then select a dynamic weighted “portfolio” of meritorious cancer-related volunteer projects. Mac OS X. and attaching to a chosen set is a tedious process. Another factor is community. so BOINC leverages advances in scalability and availability of database systems. Sporadic availability and churn. the client can be attached to a web service called an account manager. proxies. or predicting climate change. Large volunteer projects can involve a million hosts and millions of jobs processed per day.org/crossroads 9 . Because volunteer computers are anonymous and untrusted. passing it account credentials and receiving a list of projects to attach to. Some volunteers are interested in the performance of computer systems. and reselling the computing power. One general way of dealing with this is replication: that is. and when BOINC is allowed to compute. In addition. The client periodically communicates with the account manager. BOINC cannot assume that job results are correct. accept the result if the replicas agree. for example by intentionally falsifying computational results. alumni magazines.acm. Human Factors All HPC paradigms involve human factors. The communication architecture uses exponential backoff after failures. BOINC must address the fact that computers with many jobs in progress may disappear forever. RAM. To attract and retain volunteers. BOINC addresses this using an efficient server architecture that can be distributed across multiple machines. such as pharmaceutical companies. Volunteer computers are not dedicated. For the most part. It must provide volunteers with periodic updates (via web or email) on its scientific progress. are unwilling to have their data on computers outside of their control.  It must develop web content describing its research goals. One such factor is to support scientific goals. send a copy of each job to multiple computers. blogs. BOINC tracks these factors and uses them in estimating job completion times. so that there is no unique correct result. but in volunteer computing these factors are particularly crucial and complex. firewalls). Some volunteers enjoy participating in the online communities and social networks that form. For example. No. various GPU platforms). projects still generally need to compile applications for several platforms (Windows 32 and 64 bit. and they use volunteer computing to quantify and publicize the performance of their computers. This is beyond the capabilities of grid and cluster systems. The framework could also be used for delegation of project selection. or that the claimed credit is accurate. There have been attempts to commercialize volunteer computing by paying participants.

 Such devices. It would help if more universities and institutions created umbrella projects. Volunteer computing poses a variety of security challenges. such as workflow management systems and MapReduce. in which individuals are rewarded for participating in new projects that later produce significant results. 1. The goal of this article is to present some of the research directions that are fundamental for cloud computing. For example. could be used for volunteer computing.David P. 2. Clouds at the Crossroads Research Perspectives By Ymir Vigfusson and Gregory Chockler D espite its promise. Large clouds may contain some hundreds of thousands of computers. would certainly help it popularity. Anderson is a research scientist at the Space Sciences Laboratory at the University of California-Berkeley. Can it be grown by an order of magnitude or two? A dramatic scientific breakthrough. stronger sandboxing may be possible using virtual machine technology. 16. has so far been patchy without a clear agenda. Folding@Home is bundled with the Sony Playstation 3 and with ATI GPU drivers. What if hackers break into a project server and use it to distribute malware to the attached computers? BOINC prevents this by requiring that executables be digitally signed using a secure. Biography David P. such as Google.org/crossroads . the bulk of the world’s computing power is in desktop and laptop PCs. while docked. and volunteer computing experiences explosive growth. Microsoft. Increased scientific adoption: The set of volunteer projects is small and fairly stagnant. Increased participation: The volunteer population has remained around 500. the effective use of social networks like Facebook could spur more people to volunteer. and new allocation mechanisms will be needed. the limited participation stems from the prevalent view that clouds are mostly an engineering and business-oriented phenomenon based on stitching together existing technologies and tools. only a small fraction of this potential has been realized. but in a decade or two it may shift to energy-efficient mobile devices. resource Crossroads 10 Spring 2010/ Vol. Here. such as the discovery of a cancer treatment or a new astronomical phenomenon. 3 www. The involvement of a wider research community. Security. Another way to increase participation might be to have computer manufacturers or software vendors bundle BOINC with other products. posing a range of unique and exciting challenges deserving collective attention from the research community. In the future. Future of Volunteer Computing Volunteer computing has demonstrated its potential for highthroughput scientific computing. failure-prone. We pose various challenges that span multiple domains and disciplines. we take a different stance and claim that clouds are now mature enough to become first-class research subjects. both in academia and industrial labs. Yahoo!. 3. No. The services supported by this layer typically include communication (for example.acm. multicast and publish-subscribe). the “mutual fund” idea mentioned above. storage. Or. offline signing computer. failure detection. At this point volunteers can no longer be expected to evaluate all projects. In our opinion. Currently. An Architectural View The physical resources of a typical cloud are simply a collection of machines. physical platform. Anderson the rate of client requests remains bounded even when a server comes up after a long outage. Moving forward will require progress in three areas. What if hackers create a fraudulent project that poses as academic research while in fact stealing volunteers’ private data? This is partly addressed by account-based sandboxing: applications are run under an unprivileged user account and typically have no access to files other than their own input and outputs. Amazon. and networking resources collectively representing the physical infrastructure of the data center(s) hosting the cloud computing system. Tracking technology: Today. or something analogous to decision markets. permeating the entire stack of any imaginable cloud architecture. Such “expert investors” would steer the market as a whole. We hope these questions will provoke interest from a larger group of researchers and academics who wish to help shape the course of the new technology. and IBM. the realization of privacy in clouds is a cross-cutting interdisciplinary challenge. The distributed computing infrastructure offers a collection of core services that simplify the development of robust and scalable services on top of a widely distributed. Two other factors that would increase scientific adoption are the promotion of volunteer computing by scientific funding agencies and and increased acceptance of volunteer computing by the HPC and computer science communities. If these challenges are addressed. However. or if there were more support for higher-level computing models.000 for several years. For example. there will be thousands of projects. most cloud computing innovations have been almost exclusively driven by a few industry leaders.

group membership.000 write-erase cycles. Are there reasonable notions of privacy that would still allow businesses to collect and store personal information about their customers in a trustworthy fashion? How much are users willing to pay for additional privacy? We could trust the cloud partially. The most stringent consistency semantics. and locking. We can identify the most important functions that need to be computed on the private data and devise a practical encryption scheme to support these functions—think MapReduce [7] on encrypted data. We will focus on these issues: energy. The amount of information known by large cloud providers about individuals is staggering. More generally. while implementing mechanisms for auditing and accountability. We could use a recent invention in cryptography called fully homomorphic encryption [10]. The consistency issues are particularly relevant to the distributed computing infrastructure services (see Figure 1). and power consumption is a major operating expense for the large industry leaders. privacy and consistency. servicelevel agreements. the growth of cloud provider services has been rapid. To reduce the carbon footprint. data center nodes can be powered up or down to adapt to variable access patterns. Unfortunately. and at what cost can we reduce power consumption in the cloud. but beg the question whether homomorphic encryption can be made practical. globally orders the service requests and Spring 2010/ Vol. data storage (such as distributed file systems and key-value lookup services).  They even patented a “water-based” data center on a boat that harnesses energy from ocean tides to power the nodes and also uses the sea for cooling. and perform SMC on the sensitive data? Is SMC the right model? Consistency In a broad sense. then Gmail could not produce a search index for the mailbox. Energy Large cloud providers are natural power hogs. where. and test beds for conducting cloud related research. No. for instance. Solid-state disks (SSDs) have substantially faster access times and draw less power than regular mechanical disks. The application resource management layer manages the allocation of physical resources to the actual applications and platforms including higher-level service abstractions (virtual machines) offered to end-users. perhaps including a trusted third-party service. Microsoft. due to diurnal cycles or flash crowds. 3 Privacy Concerns Storing personal information in the cloud clearly raises privacy and security concerns. a scheme allowing the sum and multiplication (and hence arbitrary Boolean circuits) to be performed on encrypted data without needing to decrypt it first. thus making deployment easier. Sensitive data are no longer barred by physical obscurity or obstructions. and the lack of transparent knowledge about how this information is used has provoked concerns. The companies that gather information to deliver targeted advertisements are working toward their ultimate product:  you. load balancing. making it elusive to define privacy within clouds [5]. distributed agreement (consensus). File systems spanning multiple disks could. or should the cloud be considered an untrusted entity altogether? If we choose not to trust the cloud. the lack of standards. if each individual word in the email were encrypted. SSDs have made their way into the laptop market—the next question is whether cloud data centers will follow [14]. Some CPUs and disk arrays have more flexible power management controls than simple on/off switches. benchmarks. 16. how should data and computation be organized on nodes to permit software to decrease energy use without reducing performance? Technological advances have reduced the ability of an individual to exercise personal control over his or her personal information. if all emails in Gmail were encrypted by the user’s public key and decrypted by the user’s web browser. The management layer deals with problems related to the application placement. Can we engineer mechanisms to store read-intensive data on SSDs instead of disks? Google has taken steps to revamp energy use in hardware by producing custom power supplies for computers which have more than double  the efficiency of regular ones  [12]. the first implementations are entirely impractical. Crossroads www. Sun. If privacy leaks have serious legal repercussions. exact copies can be made in an instant.org/crossroads 11 . The latter case implies that Gmail could not serve targeted ads to the user. then cloud providers would have incentives to deploy secure information flow techniques (even if they are heavyhanded) to limit access to sensitive data and to devise tools to locate the responsible culprits if a breach is detected [17]. What are the practical points on the privacy versus functionality spectrum with respect to computational complexity and a feasible cloud business model? Secure multiparty computation (SMC) allows mutually distrusting agents to compute a function on their collective inputs without revealing their inputs to other agents [19]. Could we partition sensitive information across clouds. we enumerate some cross-cutting concerns that dissect the entire cloud infrastructure.acm. Another approach is to sacrifice the generality of homomorphic encryption. However. As a high-level example. consistency governs the semantics of accessing the cloud-based services as perceived by both the developers and end users. and others. known as serializability or strong consistency [11]. and Dell have advocated putting data centers in shipping containers consisting of several thousand nodes at a time. Gmail could produce an index (the encrypted words would just look like a foreign language) but would not understand the message contents. then one avenue of research is to abstract it as a storage and computing device for encrypted information. such as data storage. Although multi-tenancy and the use of virtualization improves resource utilization over traditional data centers. Here we examine three examples to illustrate potential directions. bundle infrequently accessed objects together on “sleeper” disks [9]. for example. task scheduling. How can such mechanisms be made practical? Is the threat of penalty to those individuals who are caught compromising privacy satisfactory. Instead. The downside is that SSDs are more expensive and lack durability because blocks can become corrupted after 100.000. thus permitting intermediate levels of power consumption [13]. Finally. Fundamental questions exist of how. data centers are frequently deployed in proximity to hydroelectric plants and other clean energy sources. How can we better design future hardware and infrastructure for improved energy efficiency? How can we minimize energy loss in the commodity machines currently deployed in data centers? In the same fashion that laptop processors adapt the CPU frequency to the workload being performed.Clouds at the Crossroads: Research Perspectives usage monitoring.000 to 1.

 Can the smaller players leverage their collective power to lobby for an open and flexible cloud computing standard that fosters competition while still allowing businesses to profit? Or can this be accomplished by the larger companies or governments? What business models are suitable for an open cloud? On the technical side. then no user would ever see $5 as the valid balance of that account (since in this case. Benchmarks. Making cloud services open and interoperable may stimulate competition and allow new entrants to enter the cloud market. other ways of weakening consistency semantics have looked into replacing single global ordering with multiple orderings. Customers would be free to migrate their data from a stagnant provider to a new or promising one without difficulty when they so choose. The largest incumbents in the market are nevertheless reluctant to follow suit and have chosen to define their own standards. making it ideal an ideal resource for experimental validation of geographically networked systems which sustain heavy churn (peer arrivals and departures). Apart from eventual consistency. In the database community. the lack of interoperability may have adverse effect on consumers who become locked-in on a single vendor. for example. There is a plethora of cloud interoperability alliances and consortia (for example. Can we make strong consistency services that are more dynamic and easier to reconfigure. causal consistency [1] allows different clients to observe different request sequences as long as each observed sequence is consistent with the partial cause-effect order. It should also help to bridge diverse perspectives on consistency that exist today within different research communities like the database and distributed systems communities. or perhaps synthetically generate them until real ones are produced? Also. and face limited random churn but occasionally suffer from large-scale correlated failures. and cloud computing is no exception. If Carol checks the account balance twice and discovers it first to be $10 and then $15. The Internet was built on open standards. and durability). Test Beds Technical innovations are often followed by standards wars. Other components have more diverse APIs. For example. How can we obtain such traces. PlanetLab constitutes more than 1.acm.  Developing distributed computing infrastructure layers or data storage systems is a hard task. The possible research questions here would have to address questions such as can we produce a comprehensive and rigorous framework to define and reason about the diverse consistency guarantees. Crossroads 12 Spring 2010/ Vol. Although it is well understood that a cloud architecture should accommodate both strongly and weakly consistent services. Researchers have looked for practical ways of circumventing the CAP theorem. when the network connectivity is restored. it might be inherently impossible to compromise on strong consistency without risking catastrophic data losses at a massive scale. could users switch between providers without needing their support. Since cloud services are typically massively distributed and replicated (for scalability and availability). Open Cloud Manifesto. such as the TPC benchmark for databases (www. for example. reaching global agreement may be infeasible. as long as both of them will eventually see $15 as the final balance. (Just imagine what would happen if withdrawals were allowed in the bank account example above. No. MapReduce and Hadoop expose a similar API. this type of semantics is typically implied by ACID (atomicity. consensus benchmarks enable researchers outside the major incumbent companies to advance the core cloud technologies. but evaluating them for the massive scale imposed by clouds without access to real nodes is next to impossible. How should they interact. The question is whether clouds will be as well. It should be expressive enough to allow new properties to be both easily introduced. Yahoo!’s Zookeeper [16]. Yet another problem is that for certain types of data.tpc. Open Group’s Cloud Work Group). as do the various keyvalue lookup services (Amazon’s Dynamo [8]. Most work has so far focused on relaxing the consistency semantics. for instance locking services like Google’s Chubby  [3].org) test bed for deployment. This observation underlies the notion of eventual consistency [18]. The worry is that clouds become natural monopolies. 16. memcached [4]). Bob’s deposit gets sequenced before Alice’s). Intuitively.org/crossroads . The framework should unify both weaker and stronger models and could serve as a basis for rigorous study of various consistency semantics of cloud services and their relative power. The broad question asks what components and interfaces are the “right” way to provide the cloud properties mentioned previously. For instance. supporting serializability requires the participants to maintain global agreement about the command ordering. For instance. basically substituting serializability or (some of ) the ACID properties with weaker guarantees. 3 www. The nodes in the data centers underlying the cloud tend to be numerous. how can we evaluate the properties of key-value stores like PNUTS and Facebook’s Cassandra [15]? The most appealing approach is to compare well-defined metrics on benchmark traces. DTMF Open Cloud Standards Incubator. for instance by using a third-party service? Different cloud providers often adopt similar APIs for physical resources and the distributed computing infrastructure. suppose Alice deposits $5 to a bank account with the initial balance of $0 concurrently with Bob’s deposit of $10 to the same account. consistency. Academics who work on peer-to-peer systems (P2P). For instance. rely heavily on the PlanetLab (www. such as the meta-data of a distributed file system.000 nodes distributed across nearly 500 sites. for example by composing the existing basic properties. it does not matter if Carol and Bob in the example above would see either $5 or $10 as the intermediate balances. it is unclear how the two can be meaningfully combined within a single system. providing a simpler and more robust solution? Standards. Weaker consistency semantics work well only for specific types of applications.Ymir Vigfusson and Gregory Chockler presents them as occurring in an imaginary global sequence. and real-time event dissemination services. For instance. Brewer’s celebrated CAP theorem [2] asserts that it is impossible in a large distributed system to simultaneously maintain (strong) consistency. but do not easily generalize to arbitrary services. semantics that are weaker than serializability (or ACID) tend to be difficult to explain to users and developers lacking the necessary technical background. such as cooperative editing. which allows states of the concurrently updated objects to diverge provided that eventually the differences are reconciled. While beneficial for scalability. and understood by both developers and consumers of the cloud services. Whereas the strategy is understandable.planet-lab.) Moreover. hierarchically structured with respect to networking equipment. such an approach creates an extra dependency on a set of servers that have to be carefully configured and maintained. isolation. or what implications would such a model have on performance and scalability? Current approaches to supporting strong consistency primarily focus on isolating the problem into “islands” of server replicas. Yahoo!’s PNUTS [6]. and to tolerate partitions—that is. A more specific question is how we can compare and contrast different implementations of similar components.org). network connectivity losses. availability.

Reed B. Li. Khuller. Danga Interactive. Dean. How to Get Involved Students and researchers who are interested in shaping cloud computing should consider participating in the LADIS (www. No. E..pdf. E. 2. Eventually consistent. R. ACM. and Rowstron. 1. 18. In Proceedings of HotOS. at least for academia. com/blog_resources/PSU_white_paper. 16. E.. Ganesh. 2009. C. Hastorun. Brewer. 13.277-1. edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009. 2. 2007. 2001. 2009. D. In Proceedings of the Workshop on New Security Paradigms. 107-113. Hoelzle. Wenliang. and the Open Cloud Testbed. 1. We encourage other players to participate and contribute resources  to  cloud research. 1993. Neiger.. A handful of test beds appropriate for cloud research have made their debut recently. 9. S. et al. Intel and Yahoo!. J. Optimizing power consumption in large-scale storage systems. P. Google Inc. including researchers from underrepresented universities. google. A simple totally ordered broadcast protocol. 2009. J.. Gentry.. Vogels.microsoft. 2008. MapReduce: Simplified data processing on large clusters. M. Distributed Comput. http://www. 6. real-world problems that embody deep trade-offs. and Reuter. Hutto. Comm. chap. 2007. Balakrishnan. M.danga. In Proceedings of EuroSys.. In Proceedings of the ACM Symposium on Theory of Computing (STOC’09). A. 12. M.. 2008. and Birman K. A. 3. D.. Morgan Kaufmann.usenix. Cooper. 291-307. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA). Isolation concepts. 14. PNUTS: Yahoo!’s hosted data serving platform. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07).. USENIX Association. F. memcached: A distributed memory object caching system. High-efficiency power supplies for home computers and servers. Dr. 15. J.com/memcached/. Crossroads www.org/crossroads Spring 2010/ Vol. He is one of the founders and organizers of the ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware (LADIS). and Saha.Clouds at the Crossroads: Research Perspectives PlanetLab’s focus on wide-area networks is suboptimal for cloud platform research. D.org/events/hotcloud10) workshops.. Smith. 19. 205-220. http://www. 2006. R. Christodorescu. G. Burrows. 37-49. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS’08). L. implementations and programming. 8. Secure multi-party computation problems and their applications: a review and open problems. Principles of secure information flow analysis. Energy efficient scheduling via partial shutdown. 2008.288.acm. Migrating server storage to SSDs: Analysis of tradeoffs. 2010. G. Jampani. Data management challenges in the cloud. Elnikety. Chap. 2000. Gregory Chockler is a research staff member in the Distributed Middleware group at the IBM Research Haifa Labs.cs. S. with the goal of providing a standard test bed with openaccess. ACM Queue 6. Large industry players are currently driving the research bandwagon for cloud computing..edu/ projects/ladis2010) or HotCloud (www. In Malware Detection. J. on. His research interests span a wide range of topics in the area of large-scale distributed computing and  cloud computing. DeCandia. U. and Ghemawat. In Transaction Processing: Concepts and Techniques. and the same holds true for other similar resources. Narayanan.. B. Towards robust distributed systems. ipc. 11. P. Weatherspoon. 335-350 4.. and Atallah. He holds a PhD from Cornell University. cornell.cornell.. Gray. He holds a PhD from the Hebrew University of Jerusalem. Ymir Vigfusson is a postdoctoral researcher with the Distributed Middleware group at the IBM Research Haifa Labs. Fully homomorphic encryption using ideal lattices. and Junqueira. or the upcoming Symposium on Cloud Computing (SoCC: http://research. Eds. His research is focused around distributed systems. et al.pdf. The Chubby lock service for loosely-coupled distributed systems. et al.. 6. P. W. In Proceedings of Principles of Distributed Computing (PODC). Donnelly. 13. Cavoukian. but the journey is only beginning. Springer-Verlag.cs. B. 2008. B. References 1. White Paper on Privacy and Digital Identity: Implications for the Internet. Ahamad. M. J. Association for Computing Machinery. Ramakrishnan. 7.. 2007. Burns..pdf. specifically. M. 17. S. http://services. and Weihl. A. W.. In Proceedings of the 70th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). ACM 51. unfortunately.com/en-us/um/redmond/events/socc2010). G. In Proceedings of ACM SIGOPS LADIS. VLDB Endow. 1995. Who will create the future “CloudLab”? 10. H. 2006. 1. and Kohli. 16. 7. Causal memory: Definitions. 3 13 . Ramakrishnan. 9. 5. Proc. Biographies Dr. A.. 2008. including Open Cirrus from HP. Privacy in the clouds. ca/images/Resources/privacyintheclouds. Thereska. Dynamo: Amazon’s highly available key-value store. A concerted multi-disciplinary effort is needed to turn the cloud computing promise into a success. http://www..

to understand the structure of the universe. and data transfer in and out of the cloud. Besides public data repositories. also known as grids [13]. No. a nation-wide cyberinfrastructure—a computational environment. Figure 2 shows a graphical representation of a small Montage workflow containing 1. a 4-degree square mosaic (the moon is 0. In physics. model the background radiation in the input images to achieve common flux scales and background levels across the mosaic. These infrastructures. For example. and others. which dictate the number of images and computational tasks in the workflow. scientists are using workflows to generate science-grade mosaics of the sky [26]. 9. In addition to traditional high performance computing (HPC) centers. the TeraGrid is composed of computational and data resources at Indiana University. And it’s difficult to achieve good performance and reliability for an application on a given system. scientific collaborations maintain community-wide data resources. Although clouds were built primarily with business computing needs in mind. and reliability. Clouds have recently appeared as an option for on-demand computing. grids. and rectify the background that makes all constituent images conform to a common background level. 16. applications can target campus clusters. Montage mosaics can be constructed in different sizes. scientists use workflows to explore the issues of biodiversity [21]. Another example is from the earthquake science domain. researchers seldom spend time at a telescope. where researchers use workflows to generate earthquake hazard maps of Crossroads 14 Spring 2010/ Vol. and. cyberinfrastructure could refer to both grids and clouds or a mix of the two—is being provided to the scientific community. In astronomy. which can result in infrastructure savings for a business. In this article we focus primarily on workflow-based scientific applications and describe how they can benefit from the new computing paradigm. they are also being considered in science. In bioinformatics. In addition to the large-scale cyberinfrastructure. Today.acm. Scientific workflows are used to bring together these various data and compute resources and answer complex research questions. data repositories hosted by entities such as the National Institutes of Health [29] provide the data gathered by Genome-Wide Association Studies and enable researchers to link particular genotypes to a variety of diseases. clouds can provide computational and storage capacity when needed. allow access to highperformance resources over wide area networks.Scientific Workflows and Clouds By Gideon Juve and Ewa Deelman I n recent years. For example.000 tasks and 750 input images.5 degrees square) corresponds to a workflow with approximately 5. One idea driving cloud computing is that businesses can plan only for a sustained level of capacity while reaching out to the cloud for resources in times of peak demand. calculate the geometry of the output mosaic on the sky. University of Illinois. 44]. to examine the structure of galaxies [46]. These normalized images are added together to form the final mosaic. empirical science has been evolving from physical experimentation to computationbased research. Originating in the business sector. and clouds. the need to process these data is growing. TeraGrid. storage. workflow applications are running on national and international cyberinfrastructures such as OSG. in gravitational-wave physics. For example. They stitch together computational tasks so that they can be executed automatically and reliably on behalf of the researcher. workflows are used to search for gravitational waves [5] and model the structure of atoms [40]. performance optimization. Workflows describe the relationship of the individual computational components and their input and output data in a declarative way. Workflow Applications Scientific workflows are being used today in a number of disciplines.org/crossroads . Figure 1 shows a mosaic of the Rho Oph dark cloud created using this workflow. These resources are accessible to users for storing data and performing parallel and sequential computations. including the Open Science Grid (OSG) [36] and the TeraGrid [47]. or utility computing platforms such as commercial [1. that hosts a number of heterogeneous resources. these opportunities also bring with them many challenges. When using the cloud. They provide remote login access as well as remote data transfer and job scheduling capabilities. researchers are using workflows to understand the underpinnings of complex diseases [34. Workflow management systems such as Pegasus [4. workflows are used to predict the magnitude of earthquakes within a geographic area over a period of time [10]. These workflows are composed of a number of image-processing applications that discover the geometry of the input images on the sky. In bioinformatics. It’s hard to decide which resources to use and how long they will be needed. 3 www. the Laser Interferometer Gravitational-Wave Observatory [3] maintains geographically distributed repositories holding time-series data collected by the instruments and their associated metadata. The broad spectrum of distributed computing provides unique opportunities for large-scale. consumers pay only for what they use in terms of computational resources. In earthquake science. but instead access the large number of image databases that are created and curated by the community [42].200 computational tasks. In astronomy. and EGEE [11]. Workflow management systems enable the efficient and reliable execution of these tasks and manage the data products they produce (both intermediate and final). 17] and academic clouds [31]. in general. re-project the flux in the input images to conform to the geometry of the output mosaic. In ecology. It’s hard to determine what the cost-benefit tradeoffs are when running in a particular environment. 39] orchestrate the execution of these tasks on desktops. usually distributed. Along with the large increase in online data. However. Louisiana University. complex scientific applications in terms of resource selection.

Spring 2010/ Vol. Crossroads www. Clouds that provide computational capacities (Amazon EC2 [1].acm. Figure 3: In this shake map of Southern California. This application requires large-scale computing capabilities such as those provided by the NSF TeraGrid [47]. clouds are primarily built using resource virtualization technologies [2. but scientific applications can benefit from them as well. Davy Kirkpatrick.) Southern California [38]. and K as red. however.) 3) provide reliability so that scientists do not have to manage the potentially large numbers of failures.000 to 1. The lines connecting the tasks represent data dependencies. Southern California Earthquake Center including Scott Callaghan. often have different requirements than enterprise customers. Figure 3 shows a map constructed from individual computational points. Patrick Small. Scientists. In particular. OpenNebula [43]. providing a limited number of computational platforms on demand: Cumulus [49]. and Tom Jordan. Each point is obtained from a hazard curve (shown around the map) and each curve is generated by a workflow containing approximately 800. libraries. Nimbus [31]. the three-color composite is constructed using Montage. instant messaging [25]. is often heterogeneous and distributed). Finally. These maps show the maximum seismic shaking that can be expected to happen in a given region over a period of time (typically 50 years). Platform as a service (PaaS) clouds such as Google App Engine [17] provide an entire application development environment including frameworks. These science clouds provide a great opportunity for researchers to test out their ideas and harden codes before investing more significant resources and money into the potentially larger-scale commercial infrastructure. 16. scientific codes often have parallel components and use MPI [18] or shared memory to manage message-based communication between processors. clouds are also emerging in academia. and a deployment container. In order to support such workflows. Kevin Milner. 2) optimize workflows for performance to provide a reasonable time to solution. 7.org/crossroads 15 . To support the needs of a large number of different users with different demands in the software environment. and 4) manage data so that it can be easily found and accessed at the end of the execution.000. The curves show the results of the calculations. 50] that enable the hosting of a number of different operating systems and associated software and configurations on a single hardware host. by necessity. and many others. points on the map indicate geographic sites where the CyberShake calculations were performed. (Image courtesy of CyberShake Working Group.Scientific Workflows and Clouds Figure 1: In this 75x90 arcmin view of the Rho Oph dark cloud as seen by 2MASS. 3 Figure 2: A graphical representation of the Montage workflow with 1. More coarse-grained parallel applications such as workflows rely on a shared file system to pass data between processes. J band is shown as blue. Science Clouds Today. (Image courtesy of Bruce Berriman and J. H as green. Eucalyptus [33]. Nimbus. Commercial clouds were built with business users in mind. No.200 computational tasks represented as ovals. Cumulus) are often referred to as an infrastructure as a service (IaaS) because they provide the basic computing resources needed to deploy applications and services. software systems need to 1) adapt the workflows to the execution environment (which.000 computational tasks [6]. software as a service (SaaS) clouds provide complete end-user applications for tasks such as photo sharing.

2. by default. These applications are often very brittle and require a very specific software environment to execute successfully. in that they can be configured (with additional work and tools) to look like a remote cluster. 48] or Condor [8. To use a cloud storage service. In addition. 10]). as soon as the first job finishes. Elastic Block Store: a block-based storage system that provides network attached storage volumes to EC2. and the like. libraries. and for storing input and output data. where overheads of scheduling individual. workflows can usually make use of a high-performance. Thus the overall workflow can be executed much more efficiently. 3 www. or Panasas [37]. 24]. AWS services provide computational. and several others to act as worker nodes. Another interesting aspect of the cloud is that. GPFS [41]. disk). the second job will not be released to a local resource manager on the cluster until the first job successfully completes. SimpleDB: a structured key-value storage service.acm. With virtualization. Clouds are similar to grids. when running on the cloud. a workflow running on Amazon’s cloud could make use of the Simple Queue Service. Scientific workflows require large quantities of compute cycles to process tasks. and many other computations work on different execution sites. Simple Queue Service: a distributed queue service for sending messages between nodes in a distributed application. it includes resource provisioning as part of the usage mode. the environment can be customized with a given OS. memory. Elastic Compute Cloud (EC2): a service for provisioning virtual machine instances from Amazon’s compute cluster. software such as Nimbus Context Broker [22] can be used. presenting interfaces for remote job submission and data transfer. Today. ocean modeling. Setting up a virtual cluster in the cloud involves complex configuration steps that can be tedious and error-prone. In the cloud. Many of the existing workflows were developed for HPC systems such as clusters. the workflow management system would likely need to change the way it manages data. As such. these storage systems must scale well to handle data from multiple workflow tasks running in parallel on separate nodes. called “virtual clusters” [12]. relational storage (RDS). Clouds and their use of virtualization technologies may make these legacy codes much easier to run. There are many ways to deploy a scientific workflow on a cloud. Unlike the grid. which enables database records to be stored. scientists struggle to make the codes that they rely on for weather prediction. For example. No. For example. Amazon’s cloud provides services for monitoring (CloudWatch). where jobs are often executed on a best-effort basis. scientists can use existing grid software and tools to get their work done. or they can deploy their own shared file system. if there are two dependent jobs in the workflow. inter-dependent tasks in isolation (as it is done by grid clusters) can be very costly. One of the great benefits of the cloud for workflow applications is that both adaptation approaches are possible. For example. In the provisioned case. software packages. and others. The needed directory structure can be created to anchor the application in its preferred location without interfering with other users of the system. For example. parallel computing (Elastic MapReduce). and communication infrastructure on-demand via web-based APIs. Resource provisioning is particularly useful for workflow-based applications. How many resources and how fast one can request them is an open question. indexed and queried by key. These collections of VMs. Adapting the workflow to the cloud involves changing the workflow to take advantage of cloud-specific services. workflows can either make use of a cloud storage service. can be managed using existing offthe-shelf batch schedulers such as PBS [34. a workflow task needs to fetch input data from S3 to Crossroads Scientific Workflows The canonical example of a cloud is Amazon’s Elastic Compute Cloud (EC2). depending on the services offered by the cloud and the requirements of the workflow management system. scientific applications are often composed of many interdependent tasks and consume and produce large amounts of data (often in the Terabyte range [5. AWS offers five major services. 4. 3. an HPC cluster can be emulated in Amazon EC2 by provisioning one VM instance to act as a head node running a batch scheduler. parallel file system such as Lustre [45]. libraries. a user requests a certain amount of resources and has them dedicated for a given duration of time. rather than using a batch scheduler to distribute workflow tasks to cluster nodes. 5.org/crossroads . Adapting the cloud to the workflow involves configuring the cloud to resemble the environment for which the application was created. No one wants to touch the codes that have been designed and validated many years ago in fear of breaking their scientific quality. To achieve good performance. the second job is released to the local resource manager and since the resource is dedicated. In the cloud. to use Amazon S3. Virtualization also opens up a greater number of resources to legacy applications. Simple Storage Service (S3): an object-based storage system for the reliable storage of binary objects (typically files). these cycles are provided by virtual machines such as those provided by Amazon EC2. To automate this process. 16 Spring 2010/ Vol. Volumes can be attached to an EC2 instance as block device and formatted for use as reliable.Gideon Juve and Ewa Deelman Additionally. unshared file system. This software gathers information about the virtual cluster and uses it to generate configuration files and start services on cluster VMs. 16. Many virtual machine instances must be used simultaneously to achieve the performance required for large scale workflows. grids and supercomputers. and application code on a variety of predefined hardware configurations (CPU. it can be scheduled right away. The downside is that the environment needs to be created and this may require more knowledge and effort on the part of the scientist than they are willing or able to spend. which allows messages queued by one node to be retrieved and processed by another. which allows users to deploy virtual machine (VM) images with customized operating systems. In addition to compute cycles. which provides operations to “put” and “get” objects from a global object store that is accessible both inside and outside Amazon’s cloud. When running on HPC systems. storage. Porting these workflows to the cloud involves either adapting the workflow to the cloud or adapting the cloud to the workflow. Thus the second job will incur additional queuing delays. which is part of Amazon Web Services (AWS). 1. scientific workflows rely on shared storage systems for communicating data between workflow tasks distributed across a group of nodes.

com/ec2/.-H. in the future we are likely to see a great proliferation of clouds that have been designed specifically for science applications. Dragovic. Field.. Pratt. Amazon EBS does not allow volumes to be mounted on multiple instances. they will potentially Crossroads www. K. In addition to fast storage systems. In Workflows for e-Science.. and McNabb. J.. Lustre cannot be deployed on Amazon EC2 because it requires kernel modifications that EC2 does not allow). it comes with a performance cost. K. References 1. Beattie. Neugebauer. Ho. J. 2008. D. and Shields. 6. Mehta. Xen and the art of virtualization. E. Callaghan. 32]. or they must create their own file system using services available in the cloud... Deelman. 3 Future Outlook While many scientists can make use of existing clouds that were designed with business users in mind. Brown. The HPC systems typically used for scientific workflows are built using high-bandwidth. Clearly. Johnson. 16.. In addition they could come with science-oriented infrastructure services such as workflow services and batch scheduling services. Making multiple copies in this way can reduce workflow performance. Although virtualization provides greater flexibility. We already see science clouds being deployed at traditional academic computing centers [14. Amazon. and Weiss. P. Gannon. Additionally. In comparison. Like existing clouds. Berriman. S. Juve.. clouds can be directly beneficial to HPC centers where the staff is technically savvy. come in a variety of flavors depending on the level of abstraction desired by the user. and computational science. Physics Today 52.. 2006.. which results in poor performance for demanding workflow applications.. HPC centers are looking at expanding their own infrastructure by relying on cloud technologies to virtualize local clusters. where she heads the Pegasus project. At the same time. I. which is at least difficult and potentially impossible depending on the file system desired (for example. S. 30]. His research interests include distributed and highperformance computing.. Jacob. 2. T. LIGO and the detection of gravitational Waves. P. B. Laity. but will come equipped with features and services that are even more useful to computational scientists.. Reducing time-to-solution using distributed highthroughput mega-workflows—Experiences from SCEC CyberShake. and Su. R. B. http://aws. there is the overhead of deploying and unpacking VM images before the VM can start. Such systems could include access to collections of datasets used by the scientists. A. G. Springer. Cao. Current estimates put the overhead of existing virtualization software at around 10 percent [2. D. Finally. Gunter. T. P. Although clouds provide many different types of shared storage systems.. Eds. B. M. S. an extra VM can be started to host an NFS file system and worker VMs can mount that file system as a local partition. 1999. Singh. For example.. Milner. scientific workflows. Deelman. Spring 2010/ Vol. B. Graves. R.. most existing commercial clouds are equipped with commodity gigabit Ethernet. This cost comes from intercepting and simulating certain low-level operating system calls while the VM is running. For example. Barham. E. Prince. and parallel storage systems. K.amazon. E. 51] and VM startup time takes between 15 and 80 seconds depending on the size of the VM image [19. advances in virtualization technology... Relatively slow networks. A. G. 15.. Fraser.. E.acm. J. Virtualization overhead. Ewa Deelman is a research associate professor at the University of Southern California Computer Science Department and a project leader at the USC Information Sciences Institute. may reduce or eliminate runtime overheads in the future. 5.. Biographies Gideon Juve is a PhD student in computer science at the University of Southern California.. communication and storage. Although clouds like Amazon’s already provide several good alternatives to HPC systems for workflow computation. In addition. Deelman. Taylor.org/crossroads 17 . A. G. which takes time.. To run on a cloud like Amazon’s. PaaS science clouds could be similar to the science portals and gateways used today.. Montage: A grid enabled engine for delivering custom science-grade mosaics on demand. some commonly used science applications could be deployed using a SaaS model.. In Proceedings of the 19th ACM Symposium on Operating Systems Principles. However. Brady.... HPC centers can also make use of commercial clouds to supplement their local resources when user demand is high. which designs and implements workflow mapping techniques for large-scale workflows running in distributed environments. perform its computation.. scientific workflows rely on high-performance networks to transfer data quickly between tasks running on different hosts. in Amazon EC2. D.. A case study on the use of workflow technologies for scientific analysis: Gravitational wave data analysis. 52] or GlusterFS [16]. I. G.. 4. In SPIE Conference 5487: Astronomical Telescopes. Dietz. Maechling. Lack of shared or parallel file systems. the use of commodity networking hardware is not a fundamental characteristic of clouds and it should be possible to build clouds with high-performance networks in the future. there are still challenges to overcome. then transfer output data from the local disk back to S3.Scientific Workflows and Clouds a local disk. Elastic compute cloud. These applications would allow scientists from around the world to upload their data for processing and analysis. One can imagine that these science clouds will be similar to existing clouds. Hand. Barish. 164-177. A. such as improved hardware-assisted virtualization. Good. and Jordan. Vahi. C. a workflow application must either be modified to use these different storage systems.. which would allow them to provide customized environments to a wide variety of users in order to meet their specific requirements. 44. Kesselman. Fortunately. T. D. Another alternative would be to deploy a file system in the cloud that could be used by the workflow. These overheads are critical for scientific workflows because in many cases the entire point of using a workflow is to run a computation in parallel to improve performance.. M. A. and Amazon S3 does not provide a standard file system interface. J. C. 3. R. Harris. Fortunately. and Warfield. the adoption of clouds for domain scientists depends strongly on the availability of tools that would make it easy to leverage the cloud for scientific computations and data management.. K. R. such as genome repositories and astronomical image archives... 2003. they are not typically designed for use as file systems. lowlatency networks such as InfiniBand [20] and Myrinet [27]... 28. 2004. IaaS science clouds could provide access to the kinds of high-performance infrastructure found in HPC systems such as high-speed networks. D. A.. Okaya. If better performance is needed then several VMs can be started to host a parallel file system such as PVFS [23.. Katz. No. They could provide tools for scientists to develop and deploy applications using domain-specific APIs and frameworks.

. 200-222. K. C. 36. MIT Press. C. FREENIX Track. Chicago. 25. D.. D. Bioinformatics 19. NASA Ames Research Center. and Vetter. S. Gil. Deelman. Myrinet. and Cobban. R. et al. Blythe. H. Foster. D. 24.. R.. J. Deelman.org/. Sotomayor.eu-egee. In Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing. C. Berriman. The Eucalyptus open-source cloud-computing system. L.com/myrinet/. 48. Software as a service. 2008. Oinn. Grzegorczyk. E. C.. 2006. 2002. 20.. 19..gov. 22. J.com.. Using MPI: Portable Parallel Programming with the Message Passing Interface.. http://pegasus. J. Beichman.panasas. I. J.. 474. Nurmi. W. 2009. Youseff.. S. 33. 2008. H.opensciencegrid.. Vahi. 12. 2006. Laity. http://www.. 2003. Computer Science Tech. N. In Challenging Issues in Workflow Applications (SWBES’08). Day. 30.. S. http://www. W. 13. org/. Scientific Program. T. Magellan.E.. In Proceedings of the 4th International Conference on eScience (e-SCIENCE’08). 31. J. and Foster. D. Lustre.. In The Impact of Large Scale Near-IR Sky Surveys. Garzon. Xen-based HPC: A parallel I/O perspective. S. T. 42. 25. Condor: A hunter of idle workstations. 40. K.ipac. T. Int. Kell. Hsieh. A. Obertelli. Youseff. and Olsen. 38. Contextualization: Providing one-click virtual clusters. C. Rattu. Stevens. J. http://nebula. Eds. Eucalyptus: A technical report on an elastic utility computing architecture linking your programs to useful systems. J. 18. D. Youseff.. Wolski. L. M.nasa. E. P. Torque.. In Cloud Computing and Applications. 483-494. Keahey. R. 51.. 17. Taylor. Mehta. O. 2008. 26. B. M.. Clark. A. http://www.. Soman.. R. Gupta. Panasas Inc. D. Soman.mspx. rep. Dordrecht. S... M. OpenPBS. Y. In Workflows in e-Science. Hyperic Inc. 21. and Skjellum. http://www. M.. Goderis. Myricom. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the USENIX Annual Technical Conference. Mehringer. I. http://www. and Freeman. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (e-SCIENCE’06). 2003. 52..com/ serviceproviders/saas/default. http://futuregrid.teragrid. Kesselman. 2008.ncbi. 28.. Skrutskie. Greenwood. Lattice QCD workflows: A case study. rHype: IBM research hypervisor. Newman. J. Litzkow. R.... C. Microsoft. 37... Ligon.cs. Ground motion environment of the Los Angeles region. Cutri. S. Celebioglu. Francoeur.. 1997. http://supercluster. http://montage. D. S... and Rajasekar. E. Graves. Yu.openpbs.. TeraGrid. R. M. J. Okaya.. Weinberg. 2004. R. and Ross. D. and Haskin. 2008. http://www. C. Tao. Obertelli. C.. M. Implementation and performance of a parallel file system for high performance distributed applications.cloudstatus. Gluster Inc. 2006. 45. Somerville. R. Kesselman. 35. Iqbal. E. Taylor. http://www.. D. Strom. Condor. In Proceedings of the 1st USENIX Conference on File and Storage Technologies. Vahi. G. Li. B.acm. 50.. M. F... Mehta. Capacity leasing in cloud systems using the opennebula engine.. Pennington. Eds. 1994. A. and Katz. Springer. Stiening. and Tuecke. 44. The database of genotypes and phenotypes (dbGaP). L. 2005. B. Robinson.org/.org. EGEE Project. The Cumulus Project: Build a scientific cloud for a data center.. M. M. and Zagorodnov. Piccoli. FutureGrid. The Two Micron All Sky Survey (2MASS): Overview and status. Wang...html.google. Managing large-scale workflow execution from resource provisioning to provenance tracking: The CyberShake example. R.gov/nusers/systems/magellan. Scheftner..edu/condor. Herne. Wolski. 32. K. and Zagorodnov. C. E. S. 1988. Google App Engine. 41. Nebula... 18 Spring 2010/ Vol. Comput. B. B. M. G. Pegasus.org/clouds/ nimbus.. Kunze. Kesselman. 34. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Paravirtualization for HPC systems. 15. Finlayson.nersc. Gorda. Pepper. and Elias.. I. Santa Barbara. 8. 9. 2005. Lonsdale..gluster. The anatomy of the grid: Enabling scalable virtual organizations. 15..gov/gap.com/appengine/. Livny.. and Goble.. http://www.. C. Distributed P2P computing within Triana: A galaxy visualization test case. J. D. Montero. Field. K. Wang. Dow.Gideon Juve and Ewa Deelman In Proceedings of the 4th IEEE International Conference on e-Science (e-SCIENCE’08). B.. T. E.. J.. 39. 27. Deelman....nlm.wisc.. R. http://www. 135–144. 2005. B... G. I. Foster. http://www. Good. Goble. Data integration and workflow solutions for ecology... Shields.-H.. MA. C. 104-111.. T. 13. J. InfiniBand Trade Association. 29. L. G.edu.org/crossroads Crossroads . http://www.. B.. NERSC. 2006. and Castellanos. B... I. 2006. I. CloudStatus. I. Sotomayer.org. University of California. K.... T. 2005. 14. 2008. In Proceedings of the IEEE International Parallel and Distributed Processings Symposium (IPDPS’03).... Sun Microsystems. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05). Singh.. Gannon. Jacob. 16. 3 www. InfiniBand. 513-520. F. M. D. Open Science Grid. and Shields. Taverna/myGrid: Aligning a workflow system with the life sciences community. and Krintz. G. A. In Cloud Computing and its Applications. L. 47.org. Su..isi.lustre. Deshane. 2008-10. V. G. High Perform. D.. L. 1996. 14. Panasas. Nurmi. A.. J. Gilbert. In Data Integration in Life Sciences. 2001. E..org/. W.nih. F. L.. vol. Schneider.. and Philp. Jordan. S. 49.. 471–480. Ludascher. R. Enabling Grids for E-sciencE.. Turi. M. 219-237.. Xenidis. M. 16.. and Zhao. Grzegorczyk.microsoft.. R. N. Keahey. J. IBM Research. Tseng..org/torque. A. J. Maechling. Cambridge. myGrid: Personalised bioinformatics on the information grid. Jones. In Proceedings of the 8th International Conference on Distributed Computing Systems.. Llorente. and Zhang.com. R. 10. http://code. Montage. Freeman..infinibandta. In Lecture Notes in Computer Science. 4331. Virtual Clusters for Grid Communities.. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid’08).. Xen and the art of repeated research. No.. Nimbus Science Cloud.. In Cloud Computing and its Applications. Gupta.. J. C. Paul. and Matthews. NCBI. Stevens.caltech.. 2008.. Structural Design Tall Special Buildings 15. R. GlusterFS. http://www. 23. http://www. 7.. W. Hull.. 11.edu. J. and Zhao. Performance implications of virtualization and hyper-threading on high energy physics applications in a grid environment. http://workspace. Schmuck. W. B.. Evanchik. G.. F. 46. http://www. Lusk. D. R. P. 43. D. Callaghan. S.. In Proceedings of the 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06).globus. Kluwer Academic Publishing Company. X. G.org. Appl. and Mutka. S... M. D. A. Chester. Gropp.myri. Wolski.

and now to cloud computing. another way we refer to cloud computing. In other words.The Cloud at Work. and security measures necessary to run stuffs remotely or to get to the data that is remote. but not enough candidates!) —Sumit Narayan Sumit Narayan: “Cloud computing” is the new buzz word among computer scientists and technologists. etc. 3 Pete Beckman. Now the virtual machine part. the one that is really a technological breakthrough. So all these systems which used to reside locally in an IT organization on-site are getting cloud versions. or IMAP’ping to a server that was within their infrastructure. This is because they never really had it close to them on a server down the hall. SN: What are they key new technologies behind cloud computing? Can you talk a little about the different services available through cloud computing? PB: From a technical standpoint. Spring 2010/ Vol. and speculates where it might be headed next. they are now getting that as a service remotely and are being charged an incremental fee. now you are able to ship the entire virtual machine to other sites.” has been around for a very long time. it looks like they have their emails present locally like they used to. but anything I want. but the notion is that the services. thereby allowing for utility computing. that’s really a technological breakthrough that allows me to run anything. which essentially. to meta-computing. director of the Argonne Leadership Computing Facility. That’s the only one that has a technological breakthrough. is using and shipping around virtual machines. Virtual machine technology in particular has made this much easier than in the past. Interviews with Pete Beckman of Argonne National Lab and Bradley Horowitz of Google data. To the user.acm. So. The technology. All the others are just a model breakthrough. interviewed by Sumit Narayan Pete Beckman is the director of the Argonne Leadership Computing Facility at Argonne National Laboratory (ANL) in Illinois.” as opposed to “computing on your local machine. Beckman explains cloud computing from a scientist’s perspective. it all looks the same. (He also notes that Argonne has a well-developed student internship program. like a per-user fee. Argonne National Laboratory. are located remotely. either compute or Crossroads www. The people are doing the same for calendars. but probably in another building. The idea that I can create a web application that makes it look like I’m running something locally when it is really remote—these are model differences in terms of APIs and providing a per-user capacity and so forth. They provide a very definitive and well-described interface and also allow you to run a piece of code or software remotely. Argonne National Lab is the United State’s first science and engineering research laboratory as well as home to one of the world’s fastest supercomputers.org/crossroads 19 . only requires a network connection. which people have often referred as “computing out there. the idea that I could store my data locally or remotely has been around for a long time. travel. There is no provisioning that the site has to do for hosting and running their own machines. somewhere around 20 years. the only one that has a strong root in technology is the virtual machine-based “infrastructure as a service” (IaaS). Tell us a little about the origins of cloud computing. to grid computing. No. That is unique and the new thing over the last couple of years. except that the IMAP server is now some other IMAP server. When people move their email to the cloud. The challenge that still remains is data. It’s used in different ways to define a variety of things. 16. Rather than the complicated nature of deciding which software can be run and which packages are available. how to best share the data. An example of model breakthrough is what people are doing when they say they are going to run their email on the cloud. They’re all a little different. and scientists have to then work out protocols. not just what they provide in a package like POP or IMAP. What does cloud computing mean to you? Pete Beckman: Distributed computing. We went from a period of distributed computing. grid computing was focused primarily on sharing the resources among the providers and the genesis of cloud computing came from some technologies that allowed folks to do this in a very clear and sandboxed way. To an organization. HR. They were probably POP’ping. policies.

like security. the language we have used to describe security has been to say that there is a “user. why do you think cloud computing is important? What opportunities is cloud computing opening in the scientific community that weren’t supported by big computing clusters like Blue Gene? PB: Scientists have been using computers and hosting data remotely for a long time. image or media editing. discussion. Cloud computing offers promise here. we will be able to allow the cloud to do that for us. Uploading photos or videos of my kids to the cloud for editing is probably still out of reach. People are migrating into using a really cheap portable device. They are used to this. No. There already are netbooks. PDFs. cyber-security plans differentiate those two very clearly. All the documentation. infrastructure? What are the risks? PB: Security is very complex in a virtualized environment. if you do something that you are not supposed to do when you are using a virtual machine. or a root. The scientific community has been doing its work remotely for a long time. Another thing of course will be. But it is changing dramatically. you can think of it as someone being able to turn a mobile phone on and off. in terms of support for that package. but a lot of colleges do not have that sort of expertise. scientists can customize and make their own special stacks. in the past. or setting up two or three servers down the hall. But not for long. It becomes a very big challenge and there are a couple of things we want to be able to do. and they work just fine. largely because of its low barrier to entry. or if they have the right version of Perl. 3 www. supercomputers had a very well-defined set of software stacks—the package is either in the stack or not. for most home users. the cluster mailing lists. on Dropbox/S3 for storage. They’re still controlled by someone else: the mobile phone service providers. Yet. because they imagine scientists here come to use our supercomputer.” and an “escalated privileged user. We see this a lot in metagenomics and biology. where scientists have a very complex workflow of 10 different tools and they want to create a web interface for that. and one we will be exploring at Argonne. Let me give you a couple of examples. But now instead of the science community provisioning all these servers and mid-range computers. There is a great Photoshop-as-a-service app. they want to teach parallel processing. There certainly is value in that sort of hacking. and there will be a lot of research in that space. SN: How do you think cloud computing would impact education? PB: Oh I think cloud computing is just amazingly fun and fantastic for education. MATLAB.acm. you don’t really require a high-end notebook for that. that lets you upload a picture and then using a web interface. funded by the U. when you give someone a virtual machine. change the color. Magellan. We really want to give people a total virtual machine. for example. computation.” Now. Whether or not it comes to a close. You may also know about Google’s Chrome OS. But you don’t get to control the cell towers. That’s administrator privileges on the phone. Department of Energy for research into the scientific community and the cloud as a way to free up scientists so that they are not provisioning. It’s kind of funny when people who are not from the scientific community visit Argonne. As an analogy. or cloud services. They are different from cloud resources that are primarily useful for mid-range computing. And again.” like a root. So this notion of security really has to change. These are school-aged kids who have enormous amounts of energy. you’ll see stories about folks who have a 16-node cluster bought from the cheapest parts possible on the planet. If you look at the future of computing. shrink it. With respect to media editing. Some things still need intense local hacking. We are going to write a code that calculates the surface area to volume ratio of this system with each particle moving around. supercomputers are still super. then these lightweight portal devices will become the way many people access their machine or their servers. or if they added the Java bindings to MySQL. a lot of people are using Google Docs for that.SN: Do you think the era of personal computers is coming to a close? Will we increasingly rely on cloud services like Google for search. SN: Is there anything about cloud computing that worries you.S. and Word documents. or Skype for communication? PB: It is changing. An inexpensive Atom processor on your netbook is enough. The professor would say. “We are going to do a homework assignment. Argonne has a project for cloud computing. they can just put everything together in a virtual machine and ship it around and run it wherever. etc. As a scientist. You can do simple student stuff: web. there will still be a gap. I have one of the initial versions of a netbook. They are able to do [their computing] using SSH on their laptop from a coffee shop in Italy. basic spreadsheet. We have a lot yet to explore and change. even those are making their way out into the cloud. The fact is. who is to blame? Is it the users to whom you handed the virtual machine? Are they responsible for all the security? If they upload their virtual machine that has a bunch of security holes in it and someone gets into that—how do we stop that? How do we manage that risk? How do we scan our own virtual machine? So that’s a pretty big research area. If you look at the Beowulf. They want it all together in a package so that they can run their genomics application. But. Users and escalated privileged user— administrator. Animoto is an example one that is happening in the cloud. MPI programming and scientific calculations. scratch it. email. The other thing that’s changing is. and it is really cool. SN: Cloud computing is well suited for business needs. In a virtualized environment. Skype. However. if we really get to a ubiquitous network and all these things rely on network. But now. they don’t. We have had lightweight Linux for quite some time now. But with IaaS cloud architecture. Occasionally. These machines have no cases! They’re just sitting on a table with their motherboards. either on a supercomputer or mid-range machines. Doing it in a virtual machine means they don’t have to worry whether their package is supported. there are people who have set up their own clusters at various places. crop it. probably not.org/crossroads . Of course. 16. they have the root access on that virtual machine. each student has his or her own credit hours that he or she uses in the cloud to do the computation. and it’s fantastic for universities! More and more students can get access to Crossroads 20 Spring 2010/ Vol. That means we would be giving them root access on a virtual machine. That’s the sort of thing where we are likely to be headed. and they get a couple of old or new computers and wire them together. not on the complete infrastructure. In the past. I can imagine in the future a student just being handed 100 credithours on the cloud.

Google Voice. and other services that businesses now sometimes host on site. which developed new products such as Yahoo! Pipes. and we’ll allow scientists to slice and dice and explore the data in the cloud here at Argonne and elsewhere. But with respect to the Magellan project. with respect to cloud computing? Where do you see cloud computing in ten years? Fifty years? PB: We are slowly moving to faster and faster networks. I can imagine five years from now universities routinely handing out cloud credit hours to students. SN: What is your vision for the future. Calendar. Usually.dep. Crossroads Bradley Horowitz. then they can get time on the machine.gov) that lists all its projects. both at ANL and outside? PB: Argonne has a lot of student interns.acm. We can imagine that as we improve that last mile. And Google runs one of the largest— if not the largest—“cloud computers” on the planet. in terms of motherboards and CPUs and hard disks. or write a new ray-tracer and do some visualization.000-core supercomputer that has Infiniband. We need specialized architecture for that. However.gov. your pictures. You’re never going to be without email. These kinds of tasks are much more easily accomplished in the cloud.org/crossroads 21 . We have a fantastic student program. calendaring. it will still be mid-range computing. everything from your home network and 802. Much of it requires very large networked resources. Previously. both in terms of computing power and data sets. Over time. Bradley Horowitz: Much of my research has involved looking at what computers do well and what people do well. and Picasa. Google Talk. he led Yahoo!’s advanced development division. our problem is that we cannot find enough students! It’s not that we don’t have enough slots for undergraduates or graduates who are in computational science or computer science—we don’t have enough students! Argonne has a catalog (www.11x. but it’s a great way to get students started and tinkering. but on a different level. So it’s great to be able to build applications that run on this massively scaled architecture. and very large data sets. Horowitz holds a bachelor’s degree in computer science from the University of Michigan. more things will be stored in the cloud. And. interview by Chris Heiden Bradley Horowitz oversees product management for Google Apps. biology or genomics on very high-performance machines like exa-scale machines.megallen. Google carefully calculates every penny spent. he was co-founder and CTO of Virage.anl.The Cloud at Work resources. You probably have read stories about Google’s data centers and how cheap they are. No. that line has shifted. And if a student has a fantastic idea for exploring cloud computing in some way that benefits the lab and is in line with the mission of this lab in understanding cloud computing. we have a web site that we are still working on: www. —Chris Heiden Chris Heiden: Describe your background and how it brought you into cloud computing. optimizing different variables.alcf. 3 www. we will probably see a move toward cloud computing for mid-range capabilities like email. In the future. It won’t have the impact of a 5. and drove the acquisition of products such as Flickr and MyBlogLog. earth data. A good example is face recognition. and figuring out how to marry the two most effectively. There will always be the need for a supercomputer that goes beyond commodity inexpensive computing to provide a capability that you can do in a high-end environment. Blogger. and so forth. we’re trying to figure out how fast we can simulate 100 years of climate change. We’re not optimizing the cost to run queries per second that can force an embarrassingly parallel search. all the way to businesses relying on cloud services that will be spread out around multiple data servers. We are rapidly moving towards that. materials. For high-performance computing. Now. Here. vice president of Product Management. There. where he oversaw the technical direction of the company from its founding through its IPO and eventual acquisition by Autonomy. we do that too. including Gmail. 16. and a master’s degree from the MIT Media Lab and was pursuing his PhD there when he co-founded Virage. but instead. your data because it is replicated somewhere in the cloud. Before joining Google. providing for that mid-range computing is where the science will go: We will be hosting climate data. he discusses the issues affecting cloud computing and how we must address them going forward.anl. even with respect to our homes. The ubiquitous nature of computing today is a perfect fit for computing in the cloud. Spring 2010/ Vol. SN: What are the opportunities for students to consider. Google. Google Docs. You can do a simple MATLAB thing or parallel processing. we will also have space where we will solve the world’s most challenging problems in climate. Students can essentially apply to a project for a summer position. we will have a place to apply for cycles or time on the cloud machine. and computers now do many things that used to require a lot of manual effort. and we will always need high-end architectures like Blue Gene that are low-powered but still massively parallel.

and universities run. It’s not that there isn’t ever-growing all the mechanics of backing of their discs.org/crossroads Crossroads . is replicated live across multiple datacenters. They’re moving so slowly compared to what users want. mobile apps are what people are actually It helps that the stats are finally starting to get out comparing apples using to communicate with each other and get work done. CH: What do you see as the single most important issue affecting cloud computing today? BH: The network is getting exponentially faster. not ments. too. As an example. 3 www. you don’t use Google is now the primary platform for building apps. Google’s cloud is made up of a highly resilient network of thousands and thousands of disposable computers. what people want to do in the cloud. They make you lazy as a provider. and they can always see how the data is being used. CH: Describe how much of an impact cloud computing will have in CH: What are some of the aspects of cloud computing that you are the next evolution in computing. They make you lazy as a provider. Universities has been sluggish for software makers with entrenched interests on the like Arizona State and Notre Dame have switched to cloud-based email. it’s The movement away from the desktop and on-premise solutions students demanding their administrators switch to Gmail. The providers actually can’t keep up with Iraq keeping in touch with their families using video chat in Gmail.is becoming more and more synonymous with web developer. and they use that doc make that collaboration  process every day. And it’s starting to be available everywhere.acm. There’s no tolerance for downtime. The City of Los Angeles just switched to the app used in isolation. 16. We’re trying to get invited to collaborate on a document online. governwith in one hour. And efforts like the Data Liberation Front and Google Dashboard make it clear that users maintain control of their data. transactions. They to apples. since mechanics of version control and But it’s going to start getting competitors are a click away. And on the web. laziness is deadly.hook into an app after it’s built. they’re built to run in the cloud. so sharing is easy one day they wake up and realize it’s and you can work on stuff togethstrategy. and they’re now often than web apps. People now buy phones expecting 3G data connections—talking has become the “oh yeah” feature at the bottom of a list of a dozen web and data features. and we’re seeing big businesses like Motorola making the switch. more obvious as people switch to I’m not sure people have quite —Bradley Horowitz netbooks  and  smartphones. invites and the like. They and can be collaborated on by everyone in that circle. Gmail. which is great for users. ❝ ❞ 22 Spring 2010/ Vol. At universities. But stored something. with apps and data spread out on them across geographies. The value to you if you’re a cloud provider is not in locking in users’ data. and that yields the most interesting real-world appli. They can take their stuff to go anytime they like. social. where the stuff you produce lives in the center of a social circle a “cloud” app. Not only can you pack tremendous computing power into a mobile phone to access the cloud anywhere. Philadelphia International Airport cuts down on delays and costs by coordinating their operations using CH: What about concerns people have voiced about trusting cloud Google Docs. It’s Moore’s Law meeting Nielsen’s Law.most users. We hear about soldiers in the desert in BH: I’d flip that on its head. but not anymore. or hassling with document versions. Lock-in and closed formats are a losing strategy. worrying about where they computing power on the desktop or in beefy all-purpose servers. laziness is deadly. Even the word developer developing for. How will it affect the everyday comworking on that will revolutionize the field? puter user? BH: Everything “cloud” is really about collaboration. and they’re looking at technologies like Google Wave too. been months since they’ve opened er without worrying about the up their desktop software. I’ve always been interested in the way people socialize and collaborate online. The most interesting part of cloud computing is in the network and the interaction between computers—not in individual computers themselves. It’s not just a BH: This shift to cloud computing has already started invisibly for larger computer—cloud computing gives birth to a new way of work. pens behind closed doors. even if you don’t hear about it as often since it hap. No. These fast. It’s the networked process that’s revolutionary. the app and data keeps on running smoothly on another. They can’t build this stuff fast enough for and checking their voicemail on Google Voice. You use it to get 10 people to hack out a plan everyone’s happy Cloud computing is already changing the way businesses.More broadly. Airplanes used to be off-limits to cloud computing. where. for example. desktop. and then another. the cloud using Google Apps. user and customer demand.“growing down” into big enterprises. Desktop and on-premises applications break down far more sprouted in the cloud. roughly 50 percent per year. for example. the flow of data. The web cations—is the networked part. And on the web. At this point you expect the network to be fast and always-on. computing? Do you see these concerns as slowing adoption? And it’s individuals. since competitors are a click away. not an afterthought you Docs to write solo reports and print them out to stick on a shelf some. and Lock-in and closed formats are a losing more seamless. so if a meteor hits one. even when they’re using the cloud. WiFi is becoming standard on planes. choices. lightweight. driven by user demand. It’s already the part of computing people really care about—that developers are making daily life more mobile and more fluid. too. That may be how they start using other cloud apps. rather. The nice thing about cloud computing is that all these service providers are so publicly accountable for even the slightest glitch. but bandwidth is growing exponentially as well. and grasped how much of computing as they stop having to worry about can and will shift to the cloud. it’s in the service. Most people use webmail without realizing it’s essentially ing.

Many commercial cloud services are now beginning to provide support for the security assertion markup language (SAML) federation protocol (which contains authentication credentials in the form of SAML assertions) in addition to their own proprietary authentication protocol. on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. The survey touches upon the various artifacts or entities involved in IT services. 1. the security readiness of cloud computing is commonly cited among IT executives as the primary barrier preventing organizations from immediately leveraging this new technology.State of Security Readiness By Ramaswamy Chandramouli and Peter Mell loud computing is a model for enabling convenient. resource pooling. we survey some of these challenges and the set of security requirements that may be demanded in the context of various cloud service offerings (noted in the article as No. These applications are created using programming languages and tools supported by the provider.org/crossroads 23 . These problems are real and arise from the nature of cloud computing: broad network access.” To further refine the definition of cloud computing presented above. The consumer does not manage or control the underlying cloud infrastructure. C++. Until Spring 2010/ Vol. but has control over the deployed applications and possibly application hosting environment configurations. servers. it may have a diverse user base consisting of not only its own employees but also its partners. but also contingency-related operations. many of the cloud providers still use their own proprietary interfaces for user management. We call the enterprise or government agency subscribing to the cloud services as the “cloud user” and the entity hosting the cloud services as the “cloud provider. applications. The consumer does not manage or control the underlying cloud infrastructure including network. Crossroads • • support for a federation protocol for authentication of users (No. The capability provided to the consumer is the use of a provider’s applications running on a cloud infrastructure. However. the enterprise may need an effective identity and access management function and therefore require the following security requirements: Service Models Software as a service (SaaS). and for user provisioning and de-provisioning with capabilities such as the service provision markup language (SPML). and other fundamental computing resources where the consumer is able to deploy and run arbitrary software. deployed applications. Examples of this include the case of a cloud provider providing physical and virtual hardware (servers. Examples of this include the case of a cloud provider providing a set of tools for developing and deploying applications using various languages (for example. Infrastructure as a service (IaaS). The Users When an enterprise subscribes to a cloud service. The capability provided to the consumer is provision processing. and the cloud infrastructure or services. we classify cloud computing service offerings into three service models. In this scenario. the users’ software clients. such as customer relationship management or human resources management. or even individual application capabilities. storage. As far as the user provisioning and de-provisioning requirement is concerned. host firewalls). which can include operating systems and applications. suppliers. C. or storage. and hence we do not see a big obstacle in meeting the first of the above requirements. machine-neutral formats or XML vocabularies for expressing user entitlements or access policies. The capability provided to the consumer is the deployment of consumer-created or acquired applications onto the cloud infrastructure. Platform as a service (PaaS).acm. such as the extensible access control markup language (XACML). No. such as failover measures. storage. Java) under a whole application framework (JEE. In this article. 2). and so forth). such as encryption of data at rest and in transit. operating systems.NET. . No. computing platforms and hardware. and on-demand service. 2. The security challenges and requirements we survey not only involve core security operations. 3 www. There exist common. operating systems. such as the users. storage volumes) for hosting and linking all enterprise applications and storing all enterprise data—in other words. 16. data. with the possible exception of limited user-specific application configuration settings. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems. The applications are accessible from various client devices through a thin client interface such as a web browser. and so on). C Survey of Security Challenges In reviewing the security challenges and requirements of cloud computing. Examples of this include the case of a cloud provider offering a software application used for a specific business function. and possibly limited control of select networking components (for example. servers. networks. the infrastructure backbone for an enterprise’s data center. With this pay-as-you-go model of computing. cloud solutions are seen as having the potential to both dramatically reduce costs and increase the rapidity of development of applications. storage. 1) and support for a standardized interface to enable the cloud user (or the cloud user’s system administrator) to provision and de-provision members of their user base (No. and contractors. on a subscription or usage basis rather than the familiar purchase or licensing basis. including network. we will look first at the necessary interactions between the cloud users.

the latter should have the means to rapidly migrate the data from one cloud storage provider to another. While it is the responsibility of cloud users to ensure that vulnerabilities such as buffer overflows and lack of input validation are not present in their custom applications. Given the challenges in protecting access to cloud data. PaaS. encryption may provide additional levels of security. Hence the cloud user has to look for these capabilities in an IaaS provider offering storage service. the modern application frameworks based on service oriented architectures provide facilities for dynamically linking applications based on the dynamic discovery capabilities provided by a persistent program called the Directory Server. In the cloud environment. 4). hosting hardware within a specific region can be done easily. We will call this subclass of IaaS cloud provider a cloud storage provider. Data protection. in IaaS it’s critical for the cloud provider to rent to the users secure operating systems. 5 and 6). As a result. 3 www. What are the security challenges and requirements surrounding access to data stored in the cloud infrastructure? Driven by citizen safety and privacy measures. • • hosted in hardware located within the nation’s territory or a specific region. for example. encryption of data at rest requires the additional tasks of key management (for example. such as a side channel attack. The cloud environment has a unique ownership structure in the sense that the owner of the data is the cloud user while physical resources hosting the data are owned by the cloud provider. the custom applications developed by the cloud user are hosted using the deployment tools and run time libraries or executables provided by the PaaS cloud provider. secure and rapid data backup and recovery capabilities should be provided for all mission-critical data (No. readiness is commonly cited ❝SecurityIT executives as the primary barrier among preventing organizations from immediately leveraging cloud computing. government agencies and enterprises (for example. they might expect similar and additional properties. the cloud user’s control of this important security function cannot be realized. and protected against malicious or misused processes running in the cloud (No. from rogue virtual machines collocated on the same physical server. Based on the above discussion. protecting the data itself from malicious processes in the cloud is often more difficult. IaaS cloud providers usually offer a platform for subscribers (cloud users) to define their own virtual machines to host their various applications and associated data by running a user-controlled operating system within a virtual machine monitor or hypervisor on the cloud provider’s physical servers. 9). a primary concern of a subscriber to an IaaS cloud service is that their virtual machines are able to run safely without becoming targets of an attack. the competitiveness of the service offering may depend upon the degree of multi-tenancy.org/crossroads Crossroads . Vulnerabilities for PaaS When developing applications in a PaaS cloud environment. This represents a threat exposure as the many customers of a cloud could potentially gain control of processes that have access to other customers’ data. for disaster recovery concerns (No. and this is one of the areas the standard bodies or industry consortiums have to address in order to meet the encryption requirements of data at rest. such as lack of parsing errors and immunity to SQL injection attacks. depending upon the criticality of data.acm. the modules in the application framework provided are free of vulnerabilities (No. First. For many cloud providers. Additionally. In this context. For many cloud providers. The economy of service offered by an IaaS cloud provider comes from the maximum utilization of physical servers and hence it is difficult to think of an IaaS cloud offering without a virtual machine. Further. they have the right to expect that persistent programs such as web servers will be configured to run not as a privileged user (such as root). 8). Second. to be present in the application framework services provided by a PaaS cloud provider. may need to protect the confidentiality of data and hence may require that both data in transport and data at rest (during storage) be encrypted (Nos. two security requirements may arise from cloud users. may call for either periodical backups or real time duplication or replication. Hence this directory server program also needs to be securely configured. 10). best practices for key management have yet to evolve. key ownership. and key escrow). The biggest business factors driving the use of IaaS cloud providers is the high capital costs involved in purchase and operation of high performance servers and the network gears involved in linking up the servers to form a cluster to support compute-intensive applications. 16. Access to Data Data is an enterprise’s core asset. 7). This is true in any enterprise IT environment. No. 3). and common APIs should be required to migrate data from one cloud storage provider to another (No. healthcare organizations) may demand of a SaaS. especially for PaaS solutions. or IaaS cloud provider that the data pertaining to their applications be: In some cases. Some enterprises. Further. While encryption of data in transit can be provided through various security protocols such as transport layer security and web services-security based on robust cryptographic algorithms.Ramaswamy Chandramouli and Peter Mell the user management interface of the cloud provider provides supports for these kinds of protocols. what might leave the application security vulnerable? Vulnerabilities represent a major security concern whether applications are hosted internally at an enterprise or offered as a service in the cloud. However. While it’s critical in PaaS to offer services to ensure the security of developed applications. persistent programs such as web servers and directory servers are configured properly (No. if the cloud storage provider has experienced a data breach or if the cloud user is not satisfied with the data recovery features or data availability (which is also a security parameter) provided by that organization. the data protection may also call for capabilities for segmenting data among various cloud storage providers. In this environment. key rollovers. ❞ 24 Spring 2010/ Vol. due to sensitive or proprietary nature of data and due to other protection requirements such as intellectual property rights.

Hence common run time formats are also required to enable the newly migrated virtual machine to start execution in the new environment. He holds a PhD in information technology security from George Mason University. open virtualization format for virtual machines migration). 11). Ramaswamy Chandramouli is a supervisory computer scientist in the Computer Security Division. 16.org/crossroads Spring 2010/ Vol. some requirements can only be met by developing new standards (common run time formats for virtual machines. others may be more problematic (such as threat exposure due to multi-tenancy). Standards With respect to standards and cloud security readiness. and hence the first of the above requirements can be easily met. the capability for the user to migrate virtual machines (in non-real time) from one cloud provider to another (No. Further. Live migration of virtual machines (in situations of peak loads) is now possible only if the source and target virtual machines run on physical servers with the same instruction set architecture. Finally. security policy specification and validation. He is the author of two text books and more than 30 peer-reviewed publications in the areas of role-based access control models. it has the potential to revolutionize how we use information technology and how we manage datacenters. 13). some requirements such as data location and non multi-tenancy can be met by restructuring cost models for associated cloud service offerings. Tools to continuously monitor the vulnerabilities or attack on virtual machines running on a server have already been developed or are under development by many vendors.acm. • • • the capability to monitor the status of virtual machines and generate instant alerts (No. Users may need to migrate from one virtual machine to another in real time. However. differing models and types of clouds will be used for data of varying sensitivity levels to take into account the residual risk. While some issues may have ready answers (such as existing security standards). and the capability to perform live migration of VMs from one cloud provider to another or from one cloud region to another (No. Thus. where he is the cloud computing and security project lead. First. 3 25 . some requirements can be met if there is more market support for existing standards (XACML and SPML for user provisioning. Security standards will enable new capabilities. they should have the capability to de-provision the virtual machines from the unsatisfactory cloud provider and provision them on a new cloud provider of their choice. so as to provide a seamless computing experience for the end users. conformance testing of smart card interfaces and identity management. it is critical that we investigate and address these security Crossroads www.State of Security Readiness If cloud users are not satisfied with the services provided by the current cloud provider due to security or performance reasons. some requirements are already met today using existing standards (such as Federation protocols for authentication) and technologies (automatic real-time duplication of data for disaster recovery). Technical solutions will be discovered and implemented. Third. The industry is already taking steps to address this limitation. as well as vice chair of the interagency Cloud Computing Advisory Council. Second. Peter Mell is a senior computer scientist in the Computer Security Division at the NIST. The impact may be enormous with respect to IT cost reduction and increased rapidity and agility of application deployment. And fourth. While cloud computing presents these challenges. He is also the creator of the United States National Vulnerability Database and lead author of the Common Vulnerability Scoring System (CVSS) version 2 vulnerability metric used to secure credit card systems worldwide. common APIs for migration of data from one cloud storage provider to another). a virtual machine migrated using a common import format should not require extensive time to reconfigure under the new environment. this is not a major limitation. No. model-based test development. Large scale adoption of virtual machine import format standards such as open virtualization format will enable the user to rapidly provision virtual machines into one cloud provider environment and de-provision at another cloud provider environment which is no longer needed by the cloud user and thus meet the second requirement above. Biographies Dr. Information Technology Laboratory at NIST. we have made four major observations. 12). These needs translate to the following security requirements: issues. The ultimate answer is almost certainly multifaceted. since the majority of virtualized environments run the x86 ISA.

Just as with electricity. a monthly report titled “State of the Cloud” (see www. let’s focus on the achievable and examine the results for the high-visibility category of public web sites.” page 3. aims to estimate the adoption of cloud infrastructure among public web sites.The Business of Clouds By Guy Rosen A t the turn of the 20th century. The Reality The big question is whether cloud computing is just a lot of hot air. equivalent to 294 percent annual growth. No. cloud infrastructure is in its infancy with a small slice of the overall web hosting market. 3 www. In this article. In an attempt to shed some light on de facto adoption of cloud infrastructure. IDC. typically delivered to the end user via a web browser. the largest player in the IaaS space. hard data is exceedingly hard to come by. The caveat to this technique is that it analyzes a particular cross section of cloud usage and cannot pretend to take in its full breadth. Amazon.acm. See Figures 1 and 2. Instead of purchasing a server. mass-produced computing power as a viable alternative to maintaining their IT infrastructure in-house. The first study. storage and networks by providing those resources on-demand. for more. For this. Not included are back end use cases such as servers used for development. broad network access. This adoption of the cloud among enterprises and backend IT systems has been likened to the dark matter of the universe—many times larger but nearly impossible to measure directly. Nick Carr analogizes those events of a hundred years ago to the tectonic shift taking place in the technology industry today. rapid elasticity and measured service. estimated that cloud IT spending was at $16 billion in 2008 and would reach $42 billion by 2012. we can find more evidence of the growing interest in cloud computing by analyzing search volume for the term cloud computing. The Hype Research outfit Gartner describes cloud computing as the most hyped subject in IT today. To add to the mystery. From this data. from which provider. I conducted some research during 2009 that tries to answer these questions. we’ll test each and every one of the sites listed and tally the total number of sites in the cloud and the total number of sites hosted on each provider.org/crossroads . In its financial reports. (See also “Elasticity in the Cloud. examining what the cloud means for businesses and how it is fueling a new generation of tech startups. businesses are now turning to on-demand. for research or for other internal office systems. which lays out five essential characteristics: on-demand self-service. The end-user need not understand a thing about the underlying infrastructure or platform! This model has uprooted traditional software. Figure 1: Amazon EC2 has a clear hold on the cloud market. you can now provision one within minutes and discard it when you’re finished. we can use one of the more accepted definitions from NIST. It’s relatively straightforward to determine whether a given site is running on cloud infrastructure. we can draw two main conclusions: First. In practice the top 500. the cloud is growing rapidly.) Platform as a service (PaaS) adds a layer to the infrastructure. Crossroads 26 Spring 2010/ Vol. Using Google Trends.jackofallclouds. What is the Cloud? While the exact definition of cloud computing is subject to heated debate. and if so. companies stopped generating their own power and plugged into the electricity grid. So rapidly in fact. freeing them from the worries of the physical (or virtual) infrastructure. We’ll also take a look at why it is all so significant.com/top-sites-1). providing a platform upon which applications can be written and deployed. is deliberately vague. we’ll try to hunt down some hard data in order to shed some light on the magnitude of this shift. Now all we need is a data set that will provide a large number of sites to run this test on. another leading firm. Quantcast makes available a ranked list of the Internet’s top million sites (see www.000 of these million were used. It’s extraordinary that a term that was virtually unheard of as recently as 2006 is now one of the hottest areas of the tech industry. which was delivered on CDs and required installation. 16. In his now famous book The Big Switch. Of particular interest to us are the three service models NIST describes: Infrastructure as a service (IaaS) displaces in-house servers. often paying by the hour only for what you actually used. on the one hand. we can use a site listing such as that published by marketing analytics vendor Quantcast. To complete this survey. the revenues from its IaaS service are rolled into the “other” category. These platforms aim to focus the programmers on the business logic. resource pooling. On the other hand. Software as a service (SaaS) refers to applications running on cloud infrastructures. that Amazon EC2 alone grew 58 percent in the four months analyzed.com/category/state-of-the-cloud/). possibly even requiring purchase of a server to run on. For now. by examining the site’s DNS records as well as the ownership of its IP.quantcast.

See Figure 3.) With these serial numbers now visible. • • • • Instances (servers): 50.840 EBS snapshots (snapshots of EBS volumes): 30. 3 These numbers are incredible to say the least. We can fold the benefits of the cloud into two primary categories: economics and focus.242 Reservations (atomic commands to launch instances): 41. building a web application meant you Spring 2010/ Vol. we could perform the same action. Amazon is reaping the rewards of being the innovating pioneer. With this perspective. Maintaining 10. we can clearly witness the substantial growth Amazon EC2 has seen since its launch.121 EBS volumes (network storage drives): 12. 16.The Business of Clouds Figure 2: The top 500. From an accounting point of view. In a simplistic scenario. at first glance Amazon EC2’s IDs appear to have no meaning at all and are certainly not trivial serial numbers. Twenty-four hours later.org/crossroads 27 . The above view is of a single 24-hour period. air conditioned server rooms and enterprise software—not to mention the IT staff necessary to maintain them—when you can outsource the lot? In the new model. your company. again recording the ID. We should recall that Crossroads www.com/2009/09/anatomy-of-an-amazon-ec2resource-id. be it a legal firm. Just a few years ago. such as i-31a74258.acm. rightscale. If that were the case. Simple geographic factors also come into play: whereas an on-premise server must be powered by the electricity from your local grid. No. we can perform our measurement as described above. Amazon EC2 leads the cloud infrastructure industry by a wide margin.000 servers is cheaper per server than maintaining one server alone. The first and foremost of the cloud’s benefits is cost. Over the 24-hour period observed. these numbers represent the number of resources created and do not provide clues to how many of them exist at any given point in time. based on publicly-available data that had been overlooked. the resource ID 31a74258 can be translated to reveal the serial number 00000258. there are no assets on the company’s balance sheet: CAPEX (capital expenditure) becomes OPEX (operating expenditure). request to start a new server instance) that resource is assigned an ID.925.000-50. They show the use of Amazon EC2 to be extensive as well as dynamic. from as little as a few hundred instances per day in the early days to today’s volumes of 40. providing better. a company that provides management services on top of IaaS. an accountant’s dream! When it comes to IT.000 daily instances and more. Is the Cloud Good for Business? What’s driving adoption is the business side of the equation. the quantities of resources provisioned were: Figure 3: The chart shows resource usage of Amazon EC2 in the eastern United States in September 2009 over a 24-hour period. A mixture of luck and investigations began to reveal patterns in the IDs. For example. or a multinational bank. after all. because we do not know which resources were later destroyed and when. The difference between the two IDs would represent the number of resources provisioned within the 24-hour period. focus on their core competency. focuses on its core business goals. The second reason you might use the cloud is in order to focus on your core competencies and outsource everything else. we could perform a very simple yet powerful measurement: we could provision a resource and record its ID at a certain point in time. Every time you provision a resource from Amazon EC2 (for example. Everyone wins. an incrementing serial number. Indeed. One by one. a motor company.000 sites by cloud provider are shown.jackofallclouds. Informal polls among customers of IaaS suggest that economics trumps all other factors. The second study we’ll discuss examined the overall usage of Amazon EC2. RightScale. Second. Start-Up Companies Love the Cloud One sector particularly boosted by cloud computing is the tech startup space. more reliable and cheaper IT. Unfortunately. The ID is an opaque number that is used to refer to the resource in subsequent commands. com/2009/10/05/amazon-usage-estimates). collected IDs from the logs it has stored since its inception and broadened the analysis to a much larger timeframe—almost three years (see http://blog. (This process was published in detail and can be found in the blog post Anatomy of an Amazon EC2 Resource ID: www. you pay as you go for what you really use. Its first cloud service was launched as early as 2005 and the richness of its offering is at present unmatched. The cloud companies. What is the benefit of holding on to on-premise servers. These cost savings can then be passed on to customers. translated and the resource usage calculated from the differences. these patterns were isolated and dissected until it was discovered that underlying the seemingly opaque ID there is. in turn. that ID would be a serial number that increments each time a resource is provisioned. economies of scale matter. cloud datacenters are often constructed near sources of low-cost electricity such as hydroelectric facilities (so the cloud is good for the environment as well). during a 24-hour period in September 2009 the IDs for several types of resources were recorded. Instead of large up-front payments.

These providers are in a constant race to widen their portfolio and lower their costs. After the surge. well-established. The second major opportunity is down the stack. teams scramble to set up more servers and the CEO. traffic continued to spike up and down for a while. I for one am convinced that beyond the hype and excitement the world of IT is undergoing a very real period of evolution. The rest of the computing power is sitting idle—wasted dollars. so you pay for what you need but not a penny more. a few days later the surge subsides and the company is left with even more idle servers. therefore. but slow-to-move incumbents. Out goes up-front investment and in comes pay-per-use and elasticity. No. 3 www. so computing power is of the utmost importance. what matters is not to succeed cheaply but to fail cheaply so that you have enough cash left for the next round. experience shows that new companies go through a few iterations of their idea before hitting the jackpot. in areas ranging from datacenter automation to virtualization technologies and support management. 16. The classic opportunity is in SaaS applications at the top of the cloud stack. cloud computing has lowered the price of buying a lottery ticket for the big game that is the startup economy. the cloud enables you to provision as many resources as needed to handle the load.500 servers. In the meantime. ranging from management and integration to security and inter-provider mechanisms. authorizes the purchase of even more costly equipment. a category of enabling technologies is emerging. The third and final category of start-ups aims to profit from the increased competition between IaaS providers. Animoto scaled up its usage to 3. under pressure. you provision the minimal number of required servers in the cloud and pay just for them. you’re only using a small fraction of the resources you purchased. they find that investors are forking over less and less in initial funding. that the number of such companies is consistently on the rise. where he publishes original research and analysis of the cloud market.acm. Cloud computing is not a flash flood: it will be years before its full effect is realized. Startups can innovate and be the ones to deliver that sought-after edge. Over the course of just three days. This elasticity—the ability to scale up as well as down—leaves the two scenarios described above as moot points. He also blogs about cloud computing at JackOfAllClouds. The colossal change in IT consumption has created a ripe opportunity for small. Investors prefer to see results before channeling additional funds to a company. Biography Guy Rosen is co-founder and CEO of Vircado. At first. In practice this would lead to two common scenarios: 1) Underutilization: Before that big break comes and millions come swarming to your web site. start-ups are infiltrating the market with low-cost. How would this have been feasible. its marketing efforts on Facebook bore fruit.Guy Rosen had to estimate (or guesstimate) the computing power and bandwidth needed and purchase the necessary equipment up front. on-demand alternatives. It’s become so cheap to take a shot that more and more entrepreneurs are choosing the bootstrap route.org/crossroads Crossroads . The Animoto story illustrates the tidal change for start-ups. on the other hand the availability of PaaS and IaaS which lower costs and reduce time-to-market. Elastra. Although providing IaaS services remains the realm of established businesses. Animoto maintained approximately 50 active servers running on Amazon EC2. and the traffic went through the roof. Examples of such companies include RightScale. When they do seek external investment.com. When the floods arrive. These start-ups are enjoying both sides of the cloud equation: on the one hand the rising need for SaaS and awareness of its validity from consumers. Video generation is a computation-intensive operation. Animoto took advantage of the cloud’s elasticity by scaling up and down as necessary. If you like. paying only for what they really had to. the application went viral. Animoto is a web-based service that generates animated videos based on photos and music the user provides. Vircado. too. and my own start-up. One of the best-known examples of this is a start-up company called Animoto. particularly up front. Along comes cloud computing. practically or economically. it’s bigger than expected and the servers come crashing down under the load. Then. newly formed companies to outsmart the large. you can scale your resources back down. before the age of cloud computing? Following the initial surge. 28 Spring 2010/ Vol. To make things worse. 2) Overutilization: Finally. one day. Users of IaaS tend to need more than what the provider offers. starting out on their own dime. which was enough to handle the mediocre success they were seeing at the time. The belief among start-ups and venture capitalists alike is that there is a large market for facilitating the migration of big business into the cloud. Examples in this category include Virtensys and ParaScale. a startup company in the cloud computing space. the big break comes! Unfortunately. If there’s something start-up companies don’t have much of. Before the big break. A Bounty of Opportunity Cloud computing isn’t just an enabler for start-ups—the number of start-ups providing cloud-related services is growing rapidly. Under this assumption. Examples of such organizations include Unfuddle (SaaS-based source control running on the Amazon EC2 IaaS) and FlightCaster (flight delay forcaster running on the Heroku PaaS). it’s money. To make up for this. Additionally. The existing players are struggling to rework their traditional software offerings into the cloud paradigm. It’s not surprising to see. out of realization that it now takes less to get a start-up off the ground.

org" email forwarding address plus filtering through Postini phone: 800-342-6626 (US & Canada) +1-212-626-0500 (Global) hours: 8:30am–4:30pm US Eastern Time fax: +1-212-944-1318 email: acmhelp@acm. PUBLICATION SUBTOTAL: Signature . Publications Total amount due $ $ Check or money order (make payable to ACM. • acmqueue (online only) • Computers in Entertainment (online only) Computing Reviews • Computing Surveys • Crossroads • interactions (included in SIGCHI membership) Int’l Journal of Network Management (Wiley) Int’l Journal on Very Large Databases • Journal of Educational Resources in Computing (see TOCE) • Journal of Experimental Algorithmics (online only) • Journal of Personal and Ubiquitous Computing • Journal of the ACM • Journal on Computing and Cultural Heritage • Journal on Data and Information Quality • Journal on Emerging Technologies in Computing Systems • Linux Journal (SSC) • Mobile Networks & Applications • netWorker • Wireless Networks Transactions on: • Accessible Computing • Algorithms • Applied Perception • Architecture & Code Optimization • Asian Language Information Processing • Autonomous and Adaptive Systems • Computational Biology and Bio Informatics • Computer-Human Interaction • Computational Logic • Computation Theory (NEW) • Computer Systems • Computing Education (formerly JERIC) • Database Systems • Design Automation of Electronic Systems • Embedded Computing Systems • Graphics • Information & System Security • Information Systems • Internet Technology • Knowledge Discovery From Data • Mathematical Software • Modeling and Computer Simulation • Multimedia Computing. N/A $44 ❐ N/A $60 ❐ $62 ❐ $37 ❐ $35 ❐ $53 ❐ $55 ❐ $84 ❐ $85 ❐ $110 ❐ $85 ❐ $110 ❐ N/A N/A N/A N/A $119 ❐ $86 ❐ $56 ❐ $107 ❐ $50 ❐ $68 ❐ $50 ❐ $68 ❐ $43 ❐ $61 ❐ $27 ❐ $60 ❐ $64 ❐ $89 ❐ $56 ❐ $81 ❐ $64 ❐ $89 ❐ $50 ❐ $53 ❐ $44 ❐ $44 ❐ $40 ❐ $42 ❐ $36 ❐ $46 ❐ $45 ❐ $50 ❐ $48 ❐ N/A $47 ❐ $44 ❐ $45 ❐ $52 ❐ $45 ❐ $48 ❐ $43 ❐ $42 ❐ $48 ❐ $52 ❐ $43 ❐ $55 ❐ $60 ❐ $50 ❐ $43 ❐ $44 ❐ N/A $43 ❐ $42 ❐ $68 ❐ $71 ❐ $62 ❐ $62 ❐ $58 ❐ $60 ❐ $77 ❐ $64 ❐ $63 ❐ $68 ❐ $66 ❐ N/A $65 ❐ $62 ❐ $63 ❐ $70 ❐ $63 ❐ $66 ❐ $61 ❐ $60 ❐ $66 ❐ $70 ❐ $61 ❐ $108 ❐ $85 ❐ $68 ❐ $61 ❐ $62 ❐ N/A $61 ❐ $60 ❐ $ Member dues ($19.acm. Age Range: ❐ 17 & under ❐ 18-21 ❐ 22-25 ❐ 26-30 ❐ 31-35 ❐ 36-40 ❐ 41-45 ❐ 46-50 ❐ 51-55 ❐ 56-59 ❐ 60+ Do you belong to an ACM Student Chapter? ❐ Y ❐ No es I attest that the information given is correct and that I will abide by the ACM Code of Ethics. $42. ❐ Sophomore/2nd yr. EDUCATION Name of School Please check one: ❐ High School (Pre-college. Inc. Secondary School) College: ❐ Freshman/1st yr. 1.S.acmqueue. Graduate Student: ❐ Masters Program ❐ Doctorate Program ❐ Postdoctoral Program ❐ Non-Traditional Student Major Expected mo. add $50 here (for residents outside of $ North America only). dollars or equivalent in foreign currency) ❏ Visa/Mastercard Card number Signature ❏ American Express Exp.acm Name Address City STUDENT MEMBERSHIP APPLICATION AND ORDER FORM Join ACM online: www. Box 30777 New York. PLEASE CHOOSE ONE: ❏ Student Membership: $19 (USD) ❏ Student Membership PLUS Digital Library: $42 (USD) ❏ Student Membership PLUS Print CACM Magazine: $42 (USD) ❏ Student Membership w/Digital Library PLUS Print CACM Magazine: $62 (USD) P R I N T P U B L I C AT I O N S Check the appropriate box and calculate amount due. subscriptions. Communications. ❐ Junior/3rd yr. FAX this application to +1-212-944-1318. Inc.O. NY 10087-0777 For immediate processing. State/Province Postal code/Zip Country E-mail address CONTACT ACM Member number. Please consult with your tax advisor.org/joinacm CODE: CRSRDS Please print clearly INSTRUCTIONS Carefully complete this application and return with payment by mail or fax to ACM. ❐ Senior/4th yr. date Member dues. I understand that my membership is non transferable. or $62) T have Communications of the ACM o sent to you via Expedited Air Service. PAYMENT INFORMATION Payment must accompany application Please check one Issues per year 6 4 12 4 4 6 6 4 4 12 6 6 4 4 4 12 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 4 4 4 4 4 4 Code 143 247 104 103 XRoads 123 136 148 239 129 144 102 173 171 154 137 130 133 125 174 151 145 146 138 158 149 119 135 114 277 109 128 142 112 134 113 140 170 108 116 156 118 110 172 155 115 153 157 159 Member Member Rate Rate PLUS Air* Visit www. and optional contributions are tax deductible under certain circumstances.org for more info. if applicable Area code & Daytime phone Fax MEMBERSHIP BENEFITS AND OPTIONS • Electronic subscription to Communications of the ACM magazine • Electronic subscription to Crossroads magazine • Free software and courseware through the ACM Student Academic Initiative • 2. ACM student e-newsletter (quarterly) • ACM's Online Guide to Computing Literature • Free "acm. You must be a full-time student to qualify for student rates. of grad.500 online courses in multiple languages. and Applications • Networking • Programming Languages & Systems • Reconfigurable Technology & Systems • Sensor Networks • Software Engineering and Methodology • Speech and Language Processing (online only) • Storage • Web marked • are available in the ACM Digital Library *Check here to have publications delivered via Expedited Air Service. For residents outside North America only.org mail: Association for Computing Machinery./yr. in U. General Post Office P.000 virtual labs and 500 online books • ACM's CareerNews (twice monthly) • ACM e-news digest TechNews (thrice weekly) • Free e-mentoring service from MentorNet • ACM online newsletter MemberNet (twice monthly) • Student Quick Takes.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.