Professional Documents
Culture Documents
Avoiding the death zone: choosing and running a library project in the cloud
Denis Galvin Mang Sun
Article information:
To cite this document:
Denis Galvin Mang Sun, (2012),"Avoiding the death zone: choosing and running a library project in the
cloud", Library Hi Tech, Vol. 30 Iss 3 pp. 418 - 427
Permanent link to this document:
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
http://dx.doi.org/10.1108/07378831211266564
Downloaded on: 07 January 2015, At: 12:23 (PT)
References: this document contains references to 14 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 886 times since 2012*
Users who downloaded this article also downloaded:
Michael Groenendyk, Riel Gallant, (2013),"3D printing and scanning at the Dalhousie University Libraries: a
pilot project", Library Hi Tech, Vol. 31 Iss 1 pp. 34-41 http://dx.doi.org/10.1108/07378831311303912
Judith Mavodza, (2013),"The impact of cloud computing on the future of academic library practices and
services", New Library World, Vol. 114 Iss 3/4 pp. 132-141 http://dx.doi.org/10.1108/03074801311304041
Nuria Lloret Romero, (2012),"“Cloud computing” in library automation: benefits and drawbacks", The
Bottom Line, Vol. 25 Iss 3 pp. 110-114 http://dx.doi.org/10.1108/08880451211276566
Access to this document was granted through an Emerald subscription provided by 191412 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
LHT
30,3 Avoiding the death zone: choosing
and running a library project in
the cloud
418
Denis Galvin and Mang Sun
Library IT, Rice University, Houston, Texas, USA
Received February 2012
Revised March 2012
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
Introduction
There are few terms as nebulous as cloud computing. It is defined in many different
ways, in many different papers, by many different people. This is because the term
cloud computing is used to define numerous different types of online environments
where computer operations are happening. Most of what people are referring to when
they say cloud computing is infrastructure as a service (IaaS), software as a service
(SaaS) and platform as a service (PaaS). These three environments make up the bulk of
the cloud computing services available.
Cloud computing can be differentiated from traditional data center or server room
computing because it is typically on demand and off premise. It is elastic in nature and
it is typically, but not always, maintained by a third party. It has been applied to other
services such as monitoring and communications, but most of what most people mean
when they say cloud computing is IaaS, PaaS and SaaS.
Probably the best definition is the most open-ended; cloud computing is a set of
services delivered via the internet (Korzeniowski, 2010). This allows the broadest
Library Hi Tech possible interpretation of what it is, but problematically it suggests the necessity of an
Vol. 30 No. 3, 2012
pp. 418-427 external entity when cloud computing does not need one. There is no reason why the
q Emerald Group Publishing Limited
0737-8831
model can’t work on an internal basis. In fact an ideal scenario might be IaaS delivered
DOI 10.1108/07378831211266564 through central IT to departments on an academic campus.
Literature review Running a
With the maturation of cloud computing over the past several years, much interest and library project in
attention has been paid to this technology and its services by academic libraries and
other institutions. The authors Yang (2012), Mitchell (2010) and Doelitzscher et al., 2010 the cloud
explicitly adopt the three-tier classification of Software-as-a-Service (SaaS),
Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) to categorize
cloud computing. Lin et al. (2009) also adopted the same SaaS/PaaS/IaaS notation to 419
introduce cloud computing, he further expands this idea by using a layering system.
They use the same three-layer architecture that is most often used to describe cloud
computing. However, the way the model is presented has the potential to mislead new
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
cloud users with the impression that one type of cloud computing is reliant
hierarchically on another type. Although there is some correlation and interdependence
between types of cloud computing (SaaS/PaaS/IaaS), each can be thought of separately
without any obvious relationship. Chorafas(2011) generalizes cloud computing into
four pillars:
(1) applications;
(2) platforms;
(3) infrastructures; and
(4) enabling services.
The first three pillars are the same categories that cloud computing is most often
segmented into:
(1) SaaS;
(2) PaaS; and
(3) IaaS.
institution. The high technical threshold means that most institutions would not be able to
build similar services. Nevertheless, it is an interesting look at private/hybrid cloud
computing which would nicely supplement or replace the public cloud.
Library at Rice University has had a projects sever for a number of years. It hosts
numerous sites and projects. The server was built and deployed with Debian Etch in
mid-2008. At the request of the preservation department Archivist’s Toolkit, an open
source data management tool, was installed on the server. Over the years it
accumulated a number of other web sites and it eventually ran a mobile application.
Sometime in 2010 the preservation department made a request for the installation of
Omeka, an open source web-publishing tool. The server largely had all of the
prerequisites for the installation. It was running MySQL, PHP and Apache, Omeka
installed on the machine easily. The site was turned over to the preservation
department and they did some testing and quickly found some issues. Every time they
tried to delete items or files from Omeka’s web interface they were not able to do so and
the application threw a MySqli error suggesting some MySQL table was not valid, or
the application would crash. After investigation and testing it was found that the
server needed to be at a higher version of MySQL. Though not directly related to the
Omeka issue, it was determined that it was better to also bring PHP and the essential
modules to a higher version to avoid any other bugs. This meant upgrading Debian to
Lenny, which was a newer version of the operating system (OS), which had the version
of MySQL and PHP that was needed.
Most versions of Linux today, including Debian, use a package management
system. This means software is installed out of repositories using utilities built into the
system. This solves a great problem that existed in the early days of Linux called
dependency hell, where users would install a piece of software, only to be informed that
they needed to install some other piece of software to get the current one to work.
Package management is great in that it solves this problem. It also means that users
must use the version of software which is in the repositories, or compile their own.
Compiling software means taking source code and turning it into a usable program for
the computer. This is a workable solution until the operating system needs to be
upgraded. If the operating system is upgraded, and the libraries that the compiled code
depend on change, it breaks the program. The projects server had codes that had been
compiled and shared libraries had been manually linked for other projects, so
upgrading the production system without thorough testing and practicing was going
to be a concern. In theory another version of MySQL could have been compiled to get to
the right version, but why use a package managed system if it means continually
compiling code?
Cloud computing offered a solution to this problem. Why not use cloud
infrastructure to host the applications and projects on the Debian box and then rebuild
it with a newer operating system and then move back to the rebuilt box?
LHT Choosing a cloud service provider
30,3 While the rest of the market was suffering over the past few years, cloud-computing
providers saw their fortunes rise. Rackspace (NASDAQ ticker symbol: RAX), a hosting
and cloud computing service, saw their stock price rise by over ten times between
March of 2009 and December of 2011. Since cloud computing has become so hot more
providers have come to join in an already crowded market. Comprehensive or quick
422 comparison charts and matrices on cloud computing providers can be found on the
internet to help choose a vendor. Comparisons are made mainly in respect to:
.
types of cloud computing;
.
pricing schedule;
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
.
supported platforms and/or operating systems;
.
add-on features, services and tools;
.
support and service level agreements;
.
administrative interface;
.
data security; and
.
number of years in cloud computing field.
The site http://www.cloudorado.com/ provides has a tool for comparing cloud providers.
Users can calculate price based on ram size, storage and CPU. Some stats may be
misleading however. It only shows the price for a standard large server from Amazon and
this makes the prices look significantly higher than had they chosen a smaller server.
Amazon is the largest supplier of cloud-based solutions. They have been around for
a long time, and they do not appear to be going anywhere. For Library IT department
they were a logical choice. They were easy to set up, and a staff member had already
been using their service outside of work.
launched Amazon calls the phone number and the user is prompted to input a PIN.
From this point forward a user can start using the S3 service, but EC2 where users
create instances, can take a little longer. Once available it is possible to log into the
AWS console and choose EC2, where instances are launched.
There are two interfaces for launching an instance; classic and quick launch. Classic
allows for greater control over the type of instance and its configuration. Quick launch
allows for rapid deployment of an instance, but it gives less control over how it is
configured. Classic lets users choose more distributions and has more options. Classic
also lets a user choose community AMIs. This is helpful for a number of reasons.
Community AMIs allows users to choose preconfigured systems that others have
created. If someone wants to bring up an AMI that already has Drupal installed, and
they do not want to do it from scratch, they can do so through the Classic interface.
Amazon allows user to choose from Windows or a number of different versions of
Linux for the operating system of the AMI. This article focuses on the creation of an
AMI based on Linux. Costs vary, depending on the operating system chosen. Amazon
has their own Linux distribution and it is based on Red Hat so anyone familiar with
that flavor of Linux should feel right at home. It uses the same package management
system and has roughly the same layout and settings as Red Hat does. If it is necessary
to use a distribution that Amazon does not offer, they can be built, or often they are
available through community AMIs.
An instance has two requirements; a name and a key pair. The name is arbitrary, but
it is necessary to understand how a key pair works in order to be able to log into a
system. When an instance is created a user is prompted to either pick a key or to create
one. If it is the first time any instance has been brought up the system offers a prompt to
create one. Amazon then creates an access key, which is downloaded. A private key will
be placed in the home directory of the default user, which matches up with the access
key. The home user is likely to be named “root” or ec2-user. The two keys are linked
through a mathematical algorithm and allow for encrypted communication between a
computer and the AMI. Once the key pair has been created the user downloads the access
key. This key will need to have permissions set so that it is read-only by a single user.
The whole process of launching the instance takes only a few minutes and is almost
deceptively easy. Once it is up and running it is important to remember that all of the
same rules that apply to physical hardware apply to a cloud server. Anything
worthwhile is worth backing up. Amazon could have a failure, and human error is
always a factor. Once an instance is up and running the user logs in through the key
that was created. On a machine running Linux this is fairly easy. A user need only type
the correct command and they authenticate with their key.
LHT Even though Amazon and many other third parties provide a large number of public
30,3 AMIs ready for immediate use there may be reasons to create an AMI from scratch.
Among them is the desire to have clean installation of a Linux distribution, which
Amazon does not offer. It is not unusual for an IT department to make the decision to
standardize on one distribution of Linux. If that distribution is not supported by
Amazon, it can be built, or it can be installed from a community AMI. Community AMIs
424 often has configuration changes and other tweaks to them. To get a clean install of a
distribution not offered by Amazon it is necessary to build it from scratch.
A project like Omeka does not need use a lot of computational resources. Amazon
offers different sizes of instances each with its own tier of pricing. A small project like
Omeka running Linux, Apache and MySQL can be run on a small instance. Amazon’s
small instance (m1.small) is roughly equivalent to a 32 bit computer with 1 1.0-1,2Ghz
2007 Opteron or Xeon CPU, 1.7 GB memory and 160GB disk space including an
optional partition. This needs to be paired with a subscription to Amazon’s Elastic IP
service, which allows users to assign static IPs to the instance. Once the IP has been
attached a hostname can be assigned through DNS.
Once the instance is up and running a user would then sign in using SSH. From
there installing Omeka is simple. Prerequisite software packages would need to be
installed including the right version of MySQL and the correct PHP module. Also, any
packages that the Omeka software requires would need to be installed. These can all be
pulled in using a package management system like Yum.
Omeka runs over HTTP and is an excellent candidate for the cloud. The library IT
department also tried to port Archivists Toolkit to the cloud. Testing showed this
application to suffer from a noticeable performance hit. It runs in as a traditional
client/server application that runs at the socket level. For reasons that could not be
explained – possibly transmission of larger chunk of data, bottlenecks on the network
path between Rice and Amazon – there was an unacceptable amount of slowness with
this application. The decision was made to move this server back to campus. This also
led to the conclusion that nothing would be ported to the cloud but web sites and web
applications. If an application doesn’t run over HTTP or HTTPs it may not be a good
candidate for the cloud.
location such as S3 or EBS. These commands can be used to create whole new AMIs,
which can then be started from the AWS console. If an instance were to be lost for some
reason it can instantly be restarted from the point of its last creation. Important data
may be better off being stored on EBS, which is persistent. That way if the AMI goes
down a new one can be started which can then mount EBS. This way no data is lost.
Cloud computing has security issues and like all forms of outsourcing, cloud
computing raises serious concerns about the security of the data assets that are
outsourced to providers of cloud services ( Julisch and Hall, 2010). Security departments
should be notified if sensitive information is passed between campus and a cloud
service provider. Academic universities have security policies that may apply to cloud
computing. SaaS solutions that require user names and passwords which authenticate
through a campus LDAP server, are vulnerable if they are compromised. The
information passed through a cloud provider is potentially susceptible to harvest and
snooping. There are steps that can be taken to mitigate these vulnerabilities and they
include using secure connections. Cloud providers should have clear and obvious
security policies, which they can provide to campus security personnel.
Commercial cloud providers are normally expected to have better more reliable data
centers as well as more advanced and efficient disaster prevention measures. However,
Amazon did experience a major outage in April of 2011 (Pepitone, 2011). The incident
left users without services for nearly 24 hours and took down some of the largest sites
on the internet. It is tempting to look at this issue as proof of the dangers of cloud
computing, but what it should signal is that the cloud, like any new technology, should
be approached with caution but approached nonetheless (Preimesberger, 2011).
Conclusion
Rice’s Library IT began using cloud computing due to a routine administrative task. A
project was picked which could be gently ported to the cloud as a safe test. If it didn’t
work it could be brought back to the data center. Some conclusions were drawn about
would could and could not be ported to the cloud. After a year’s worth of time a number
of other projects and services are being considered for the cloud. A number of older yet
popular web sites are good candidates. Windows AMIs is also being considered for
some sites which were going to be ported to Linux. By moving these types of projects
to the cloud it is thought that both time and money can be saved. The cloud is not right
for every kind of service, but it is certainly correct for some. The preliminary
experience in the cloud is encouraging enough that more applications will be ported
and it is thought this can be done so in an efficient manner. There is advantage in Running a
server deployment and support, as well as some advantages to budgeting.
library project in
References the cloud
Castro-Leon, E. and He, J. (2009), “Virtual service grids: integrating IT with business processes”,
IT Pro, Vol. 11 No. 3, pp. 7-11.
Chorafas, D.N. (2011), Cloud Computing Strategies, CRC Press, Boca Raton, FL. 427
Doelitzscher, F., Sulistio, A., Reich, C., Kuijs, H. and Wolf, D. (2010), “Private cloud for
collaboration and e-learning services: from IaaS to SaaS”, Computing, Vol. 91 No. 1,
pp. 23-42.
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)
Han, Y. (2010), “On the clouds: a new way of computing”, Information Technology and Libraries,
Vol. 29 No. 2, pp. 87-92.
Isckia, T. (2009), “Amazon’s evolving ecosystem: a cyber-bookstore and application service
provider”, Canadian Journal of Administrative Sciences, Vol. 26 No. 4, pp. 332-43.
Julisch, K. and Hall, M. (2010), “Security and control in the cloud”, Information Security Journal:
A Global Perspective, Vol. 19 No. 6, pp. 299-309.
Kambil, A. (2009), “A head in the clouds”, Journal of Business Strategy, Vol. 30 No. 4, pp. 58-69.
Korzeniowski, P. (2010), “SaaS, IaaS, PaaS? You must be talking ‘cloud’; it’s no passing fancy;
cloud computing keeps growing, as do the tech definitions of the term”, Vol. 3, November,
p. 3.
Lin, G., Fu, D., Zhu, J. and Dasmalchi, G. (2009), “Cloud computing: IT as a service”, IT Pro,
March/April, pp. 10-13.
Mitchell, E. (2010), “Using cloud services for library IT infrastructure”, Code4lib, No. 9.
Pepitone, J. (2011), “Amazon Ec2 outage downs Reddit”, Quora, available at: http://money.cnn.com/
2011/04/21/technology/amazon_server_outage/index.htm (accessed December 2, 2011).
Preimesberger, C. (2011), “Thoughts on AWS outage”, eWeek, Vol. 28 No. 9, p. 14.
Sotomayor, B., Montero, R.S., Llorente, I.M. and Foster, I. (2009), “Virtual infrastructure
management in private and hybrid clouds”, Internet Computing, Vol. 13 No. 5, pp. 14-22.
Yang, S.Q. (2012), “Move into the cloud, shall we?”, Library Hi Tech News, Vol. 29 No. 1, pp. 4-7.
1. Mu‐Yen Chen, Edwin David Lughofer, A.N. Zainab, C.Y. Chong, L.T. Chaw. 2013. Moving a repository
of scholarly content to a cloud. Library Hi Tech 31:2, 201-215. [Abstract] [Full Text] [PDF]
2. Yan Han. 2013. IaaS cloud computing services for libraries: cloud storage and virtual machines. OCLC
Systems & Services: International digital library perspectives 29:2, 87-100. [Abstract] [Full Text] [PDF]
Downloaded by University of Maryland College Park UMCP At 12:23 07 January 2015 (PT)