You are on page 1of 13

Software defined G.

Kandiraju
H. Franke
infrastructures M. D. Williams
M. Steinder
A fundamental component of any large-scale computer system S. M. Black
is infrastructure. Cloud computing has completely changed the
way infrastructure is viewed, offering more simplicity, flexibility,
and monetary benefits compared to a traditional view of
infrastructure. At the core of this transformation is the notion of
virtualization of infrastructure as a whole, with providers offering
infrastructure-as-a-service (IaaS) to consumers. However, just
offering IaaS alone is insufficient for software defined environments
(SDEs). This paper examines infrastructure in the context of SDE
and discusses what we believe are some of the fundamental
characteristics required of such infrastructureVcalled software
defined infrastructure (SDI)Vand how it fits into the larger
landscape of cloud computing environments and SDEs. Various
components of SDI are discussed, including core intelligence,
monitoring pieces, and management, in addition to a brief
discussion on silos such as compute, network and storage.
Consumer and provider points of view are also presented along
with infrastructure-level service-level agreements (SLAs). Also
presented are the design principles and high-level architectural
design of the infrastructure intelligence controller, which constantly
transforms infrastructure to honor consumer requirements (SLAs)
amidst provider constraints (costs). We believe that the insights
presented in this paper can be used for better design of SDE
architectures and of data-center systems software in general.

Introduction Providers have their own advantages in this model.


Cloud computing has gained significant adoption in recent By dealing with all the services management challenges
years with many consumers choosing to acquire cloud-based themselves, providers remove that burden from the
infrastructure, platform services, and software services rather consumers, offering them simplified services. At the same
than having them located in-house [1]. A fundamental time, knowledge and expertise gained in provisioning for
advantage for consumers who choose cloud environments one consumer can be applied to service many consumers
is the shift from spending on capital expenditure (CAPEX) (as opposed to the traditional model where each of the
to only operational expenditure (OPEX), coupled with almost consumersVwhether they be individual end users or large
zero time to acquire these infrastructure, platform, and corporationsVdeal with managing services and associated
software services (in contrast with weeks and even months). challenges themselves). Also, having the same services
Additionally, this model also offers other advantages and (infrastructure, platform, or software) available to many
flexibility as consumers typically need only pay as they use consumers fosters efficient utilization of the services and
the infrastructure services (pay-as-you-go model) with no the knowledge gained in managing these, thus leading to
contracts, and acquiring services as they need, as opposed a high return on investment for the capital investment that
to committing to expenditure, time, and resources that they a provider makes. In addition, having continuous consumer
would in the traditional model. traffic can create a constant stream of revenue. Also, as a
provider, once the management of these services is mastered,
hosting a new service can help retain existing consumers
Digital Object Identifier: 10.1147/JRD.2014.2298133 and also attract new ones.

Copyright 2014 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without
alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

0018-8646/14 B 2014 IBM

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2:1

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
Therefore, while consumers come to providers to minimize network connectivity, storage, power sources, racks, cooling
their costs (CAPEX and OPEX), providers seek consumers elements, etc.) brought together onto a single physical
to obtain the maximum from the one-time CAPEX platform, whether it is a simple Bmachine room[ or a large
investment. This paradigm has brought a radical change data center. All these disparate elements are configured to
in the way computing is being performed today, with work together in a coordinated manner (by connecting them,
the last few years resulting in intense competition configuring the firmware, hypervisors, operating systems,
among many companies (e.g., Rackspace [2], Amazon [3], and higher-level software) to serve the needs of a workload,
IBM [4], VMware [5], etc.) to attract consumers and and hence of a business.
retain them. In the recent years, virtualization has completely
From a business perspective, a key element of this model transformed this view of infrastructure. Evolution of cloud
relies on attracting consumers and retaining them. In this computing particularly presents us with the notion of
regard, a provider makes every attempt to differentiate virtual infrastructure where consumers request infrastructure
from other providers by offering new features, services, patterns (virtual compute, virtual network, virtual storage,
enhancements and a host of other value-adds. However, etc.) on which they deploy their workloads. Infrastructure
it is extremely important for a provider to do this hence has become closely associated with the notion of
while still keeping the costs low (for maximum return consumers (those that consume the infrastructure) and
on investment). It is exactly this element that we want providers (those that provide the infrastructure to the
to explore in this paper, particularly in the context of consumers) [6].
infrastructure-as-a-service (IaaS). Virtual infrastructure refers to a combination of the
With high consumer adoption of the cloud, acquiring three aspectsVvirtual compute, virtual network, and
physical infrastructure is becoming more and more virtual storageVto create a complete end-to-end resource
uncommon, if not a rare phenomenon. However, from pattern Bon top of[ which a workload can be manifested.
a provider perspective, simply offering IaaS is not a Infrastructure requirements of a workload are expressed
business differentiator today. We believe that to offer to create an infrastructure pattern along with SLAs of
value, infrastructure provisioning needs to be constantly all the three components. Creating virtual infrastructure
transforming or Bmutating[ amidst continuously changing amidst a constantly changing workload environment and
consumers (and consumer needs) coupled with constantly a heterogeneous physical infrastructure is a non-trivial
evolving physical infrastructures (upgrades, etc.). This problem.
requires intelligence to continuously monitor, analyze,
and execute to both satisfy consumer requirements and Infrastructure consumers and providers
provider requirements to minimize cost. This is precisely Cloud computing and virtualization (at all levels) has led
the focus of this paperVwe describe the infrastructure to significant changes in the way infrastructure is viewed by
that honors these requirements and call it software defined both consumers and providers. We briefly highlight some
infrastructures (SDIs). In a larger setting of software important observations in this context.
defined environments (SDEs), SDI becomes an integral
part of SDE, handling all the infrastructure-level Consumers
goals. We discuss what we think are a few important From a consumer viewpoint, infrastructure today Bis just
characteristics of SDIs, along with some important a click away,[ a phrase that implies that while acquiring
components, interactions with SDI, and intelligence infrastructure used to be a long process consisting of
in SDI. choosing, ordering, unpacking, assembling and installing,
The remainder of the paper is organized as follows. The today it is as simple as choosing and clicking on a provider’s
first section deals with a traditional view of infrastructure, website. Also, today’s infrastructure is economical (with
consumers, providers and their interaction through SLAs no CAPEX, or capital expenditure, and only OPEX,
and infrastructure management. The second section or operational expenditure). Thus, consumers no longer
defines and examines SDIs in detail, discussing various need to worry about actual provisioning of infrastructure
components, with deeper discussions on some of the (which can be an arduous task in itself) and the cost
components (particularly on the monitoring and the associated with it. They simply can Brent[ virtual
SDI controller components). The final section concludes infrastructure (in a pay-as-you-go model) without even
this paper. knowing where the physical infrastructure actually resides.
Infrastructure today is also flexibleVi.e., consumers can
What is infrastructure? Bresize[ or Breturn[ or Bacquire[ virtual infrastructure
Traditionally, infrastructure refers to the physical and at any given time in many ways. It also provides certain
organizational elements put together to serve the needs guarantees in settings in which consumers are constantly
of workloads. This includes all the hardware (servers, looking for a better infrastructure provider, with better being

2:2 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
defined as infrastructure that comes with stronger guarantees. Providers also need to be extremely aware about potential
These guarantees may include performance guarantees, cyber-attacks, malicious software, resource compromises,
for which a virtual resource is expected to have certain etc. With elements of cyber attacks not being uncommon
amount of performance (compute speed, network bandwidth, today, providers need to be able to detect, analyze, and
storage input/output operations per second, etc.). Also, isolate resources at various levels of the infrastructure,
availability guarantees may involve a virtual resource while simultaneously allowing clients to plug in their own
that is expected to be available for certain fraction of time policies for the same. Thus, security becomes another
(e.g., 99.99%) or have certain automated bring-up/recovery challenge for providers. In addition, regulatory and legal
services or be associated with certain actions that are matters may place additional constraints on data storage
taken based on predicted failures or some other availability and compute deployment.
policy. Security guarantees may involve a virtual resource Finally, while a consumer’s goal is to acquire (virtual)
that is expected to be trustable and should be quarantined infrastructure with some guarantees, the provider’s main
in case of a security attack. Regulatory guarantees may goal is to satisfy the consumer’s requests incurring minimal
involve a virtual resource that is expected to conform cost. Providers adapt techniques such as over-commitment
to all the regulatory and compliance rules (such as retention to give the notion to consumers they have all the resources
policies, geographic regulations, etc.) associated with an they need, but actually multiplexing a single physical
organization or a government. Other types of guarantees resource across potentially multiple consumers to save
may also exist. cost. Providers also embrace policies for energy saving
and consolidation to save power costs.
Providers
While a consumer considers a virtual infrastructure as a Consumer-provider interaction
standalone entity, a provider is typically responsible for In this section, we focus on consumer-provider interaction
provisioning multiple virtual infrastructures onto a single and how the contract for acquiring the infrastructure
physical infrastructure and managing them. From a provider is specified. In our opinion, there are two fundamental
viewpoint, several challenges may be present. Heterogeneity considerations here: infrastructure patterns and SLAs
in the physical infrastructure is probably the first challenge associated with these patterns. We examine each of these
to address, as physical components may comprise versatile aspects in some detail below.
hardware combinations such as multiple processor
architectures, different kinds of storage and network Infrastructure patterns
devices (solid-state disks, hard disk-drives, etc.), specialized Infrastructure pattern refers to the entire infrastructure
functional units (such as graphical processing units, specification (compute, network, storage) along with
field-programmable gate arrays, processor accelerators), associated requirements (SLAs).
appliances with specific functionality (Netezza** [7], Figure 1 shows an example of a three-tier application
IBM V7000U [8]), etc. While one challenge here is the with a web server, application server, and a database
management of diverse physical components, Resource (indicated by WS, AS, and DB, respectively). At the highest
Mapping and placement of virtual infrastructure onto such level, a consumer might only be interested in what the
physical infrastructure is another complex optimization deployment as a whole provides in terms of the highest-level
problem (due to a large number of variables [9–11]). metricVi.e., web transactions-per-second (tnxs/sec) in this
Considerations here not only include the usual performance case. However, we believe that guaranteeing such a metric
and availability service-level agreements (SLAs) that has implications on how the infrastructure needs to be
consumers request but also other ones including licensing instantiatedVi.e., infrastructure itself needs to guarantee
of software, energy and power savings (for minimal certain infrastructure-level metrics (lower-level metrics). It is
cost) [12], geographical placement (regulatory constraints this conversion of metrics that an orchestrator (which sits
and policies), etc. above infrastructure) performs, handing down lower-level
Another important challenge that a provider faces is that of metrics to the infrastructure, which then is responsible for
continuous transformation. This refers to two continuously honoring these.
evolving aspects: the changing behavior of workloads, In this example of a three-tier application, the orchestrator
which results in dynamically evolving characteristics of its Bdecides[ to provision a load balancer in front of the web
virtual resources, and the changes in physical resources server to handle many requests. For the second tier, the
(nodes/links going down, hardware upgrades, hypervisor/ orchestrator decides to place three instances of application
operating system upgrades, etc.). These two aspects result in servers on three different nodes, and for the third tier, the
continuous remapping and change of the virtual-to-physical orchestrator makes a decision to create two instances of the
resource mapping, requiring continuous transformation of databases. In addition, the orchestrator also specifies the
virtual infrastructure. desired network bandwidth between each of these nodes and

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2:3

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
Figure 1
Example of an infrastructure pattern with infrastructure-level SLAs. Here, C1 , C2 , C3 , and C4 denote different compute requirements in terms of
processing power, memory size, etc. N1 and N2 denote different network bandwidth requirements, while N3 denotes the network latency requirement. S1
denotes local storage capacity requirement, and S2 denotes the shared storage bandwidth requirement. S3 denotes the shared storage capacity requirement.

desired storage bandwidth and capacity for local and shared higher-level consumers. Some of these include abstractions
storage. Required processing power is also specified. The in the following paragraphs.
orchestrator might have decided these parameters using
historical data collected by repeated experiments of such Performance abstractionsVWith heterogeneity being an
applications and building a knowledge-base (models such important aspect of a data center, it becomes necessary
as neural networks and machine learning techniques can be to abstract out the real physical details and provide APIs
used here). for the consumers to acquire resources that have been
All of the above requirements constitute performance calibrated under a common denomination. For example,
requirements from the infrastructure point of view. We a data-center hosting systems of different processor
refer to these requirements as the SLAs. In addition to architectures (x86, POWER*, etc.) could abstract out the
performance SLAs, other SLAs such as availability SLAs, exact architectural details by exposing the processing
security SLAs, etc. can be specified as well. power as a calibrated number (e.g., a CoreMark** number
that is derived by executing CoreMark [13] benchmarks),
Infrastructure abstractions thus making it easy for consumption. Likewise, storage
Note that in the above example, while the orchestrator performance can be calibrated using the FIO (flexible I/O)
specifies the number of nodes, in our opinion, it will not [14] benchmark.
specify which nodes. In other words, the infrastructure Availability abstractionsVVirtual resource availability has
provider has the freedom to choose the nodes (while become an important area of research over the years.
provisioning compute) and, similarly, corresponding network Enterprise applications need to be highly available
and storage choices as well. Thus, the orchestrator at a by potentially replicating processing, memory, or data
higher-level Bsees[ an abstraction of the infrastructure with across physical machines, multiple data centers, or
actual infrastructure details (e.g., physical details such as even geographic regions around the world. However,
data-center room, rack, chassis, type of network cards, and an infrastructure consumer need not be exposed to every
manufacturer specifications) hidden. detail of how availability is implemented. Intelligent
We think there are various aspects of this abstraction abstractions can be put in place that indicate the Bstrength[
that can be provided by the infrastructure provider to the (or degree) of availability, thereby allowing a consumer

2:4 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
to choose what is needed (the higher the strength, the (i.e., provider software) to provision the infrastructure.
greater the cost). For example, a provider could offer For example, Heat [16] is evolving as an important
five different levels of availability abstractions for a template-provisioner (or orchestrator, as it is sometimes
compute resource (a virtual machine or a container): referred to) in this context.
(i) no availability, (ii) 3-minute outage maximum, but Although we believe that such SLAs should ideally be
re-provisioning similar type of compute, (iii) greater defined in terms of the abstractions (defined in the section
than 3-minute outage, but re-provisioning more robust BInfrastructure abstractions[), we leave the possibility
compute, (iv) additional prediction capabilities to open that not every implementation detail might fit into
avoid several outage scenarios, but re-provisioning like the ideal notion of abstractions. This may indeed lead to
(iii) above if outage occurs, and (v) no outage with a implementation descriptions becoming part of SLAs. Given
total disruption time of less than 3 seconds. below are a few examples of infrastructure-level SLA
elements that a template may refer to:
The above availability abstractions can be implemented
by the provider in the following ways. Abstraction (ii) • Performance SLAs
may simply involve a local re-start of the virtual compute.
Abstraction (iii) may involve a remote re-start of the virtual  compute: virtual CPU utilization and abstracted
compute, but on a node that has been statistically observed compute power in CoreMark number
(over a period of time) to have a fewer number of failures  network: bandwidth and latency
(virtual compute failures). For (iv), in addition to (iii), a  storage: IOPS (input/output operations per second),
prediction engine is active (see the section BSDI controller[), capacity, and abstracted storage power in a FIO number
and if a failure is predicted, the compute is live migrated
to another node (outage avoided). For (v), a technology • Availability SLAs
like micro-checkpointing [15] can be in place, constantly
replicating the virtual compute memory onto another node,  compute: availability options (indicated above), local/
thus having a secondary ready in case of primary failure. remote restart, live migration, and availability zones
In this case, the Bswitch[ from the primary to the secondary  network: multi-pathing and abstracted proximity
would be less than 3 seconds. allocations
From a provider point of view, the ways to implement  storage: RAID (redundant array of independent disks),
the availability abstractions can change over time (as flash-copy, and abstracted redundancy levels
technology evolves); therefore, abstractions play an
important role to insulate the consumer from the raw • Security SLAs: trusted hypervisor, trusted compute, and
implementation details. encrypted data
• Other SLAs (examples): architectural capabilities (hyper-
Service-level agreements threading level, etc.) graphical processing units (GPUs),
Providers typically provide a notion of SLAs for the and solid-state disks (SSDs).
infrastructure consumers. These SLAs serve as a binding
contract between both these parties. SLAs can range from a The above list is only a small sample (in the interest of space)
single page describing the guarantees that the infrastructure in the face of multitude of possibilities and definitions for
will offer, to hundreds of pages describing subtle nuances elements that constitute an SLA. In addition, providers
(pricing subtleties, outage scenarios, etc.). For a provider, typically package several of these elements (packaging
they not only serve as the primary interface to interact with criteria possibly derived from their business strategies) into
a consumer, but also to differentiate from other providers. tiers denoted by bronze, silver, gold, platinum, etc.
Note that consumers of SLAs need not only be real In fact, it might indeed only be these packages that a
consumers (such as people, businesses, and organizations) consumer will ultimately see.
but also the higher-level softwareVi.e., software that needs
infrastructure. Consumers acquiring software-as-a-service Infrastructure management
(SaaS) or platform-as-a-service (PaaS) would request such Infrastructure management (IM) refers to the piece of the
services, which in-turn would Bmaterialize[ infrastructure provider software that accomplishes two important tasks
with SLAs as required. for the provider: first, it forms the user interface and point
Since different elements of infrastructure may be of contact for the consumer (website/user-interface with
associated with different SLAs, an SLA specification is SLA options, service listing, etc.) and second, it provisions
closely tied to an infrastructure pattern. In fact, the same the infrastructure that consumer has asked for. A third
template that describes the pattern may also describe important piece of the management (which we refer to
the SLAs. Such templates are then consumed by the provider later in the paper in the context of SDIs) is continuously

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2:5

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
transforming the provisioned virtual infrastructure. In our ability to trigger such flexibility of resources is governed
opinion, the IM software that is built for clouds must adhere by the current system behavior, which involves another
to the following characteristics. important characteristicVdeep monitoring of compute,
The first characteristic is availability. IM forms the network, and storage and performing this in the context
gateway to the customer. Its high-availability is of foremost of diverse SLA facets such as performance, health, security,
importance. Typical solutions here include active-active etc. (achieved using monitoring at application, guest,
or active-passive provisioning of IM services. Examples hypervisor, and hardware levels). Data gathered from
include efforts in CloudStack HA (High Availability) such deep monitoring needs to be consumed and analyzed
[17], OpenStack** HA [18], Eucalyptus** Cloud by a control intelligence to take appropriate actions to
Controller [19], etc. transform/mutate infrastructure, thus making such control
The second characteristic is scalability. With a number intelligence as a characteristic of infrastructure. The result
of consumers moving to the cloud and with increasing of such intelligenceVcontinuous remapping virtual resources
size of data-centers, IM scalability becomes an important to physical resourcesVcan also be seen as a characteristic
factor. Only a highly scalable IM service provider can of infrastructure today. Finally, this process of continuous
guarantee quick provisioning of virtual infrastructures in transformation results in learning by having ability to grow
large settings. knowledge-base to become Bmore intelligent[ (with time)
A third characteristic is decentralization/decomposability. enabling capabilities such as predicting and avoiding
We believe that the IM software should not be deployed SLA-under-threat scenarios with constant monitoring,
as a single monolithic piece of software. It should be built analysis, planning, and execution.
in a highly-componentized manner to compose IM services The key goal of infrastructures today is to use all of the
as required. above characteristics (or work amidst these) to honor the
A fourth characteristic is upgradeability. Software SLAs promised to the consumers, while at the same time,
life-cycle management and DevOps of IM are essential performing this to maximize provider benefits (such as
to capture the continuous evolution of physical infrastructure minimizing costs). We think that the above mentioned
and to prevent any downtime for the consumers during characteristics more or less constitute our definition
the upgrade. of an SDI.
A final characteristic is intuitiveness. IM needs to be Software defined infrastructure refers to infrastructure
user-friendly, simple and intuitive for consumers. that is continuously transforming itself by appropriately
Later, in the section BManagement services,[ we describe exploiting heterogeneous capabilities, using insights gained
the implications of SDIs on management services. from built-in deep monitoring, to continuously honor
consumer SLAs amidst provider constraints.
Software defined infrastructure
Components of SDI
What is software defined infrastructure? Unlike the view of traditional infrastructure where software
With virtual infrastructure becoming the primary model plays some role in infrastructure provisioning (with the
of consumption today, the ability to create infrastructures other role played by hardware and humans), in SDI, software
on-the-fly is an absolute necessity for providers. In addition, plays a major role. In fact, most of the elements of SDI
overlaying of these virtual infrastructures on the same are expected to be completely automated. However, it is to be
physical infrastructure to satisfy diverse SLAs requirements noted that since SDI is indeed a composition of individual
(at same time) is a challenging task. On one hand, silos (compute, network, and storage), the automation
this requires intelligent decisions and placement while within each silo becomes an integral part of the end-to-end
provisioning (i.e., virtual to physical mapping). On the automation.
other hand, continuous transformation of workload and Figure 2 shows a high-level overview of SDI. Several
physical infrastructure (discussed in Section 2) mandates important components include software defined compute
the infrastructure to be continuously mutating, adapting (SDC), software defined network (SDN), software
constantly to honor the SLA guarantees. defined storage (SDS), resource abstraction, SDI controller,
A few aspects become important characteristics of management services, and monitoring services. We briefly
infrastructure today. Among these, the ability to increase/ discuss each of these, and in the interest of space, provide
decrease capacities of virtual resources on-the-fly is a more information on three of the components here:
key characteristic that differentiates physical and virtual management services, monitoring services, and SDI
infrastructures. Also required is the flexibility to expand controller. Resource abstraction was dealt with in the
infrastructure in various dimensions to incorporate novel section BInfrastructure Abstractions,[ whereas the
capabilities and monitoring (plug-ins for new software, easy remaining components are discussed by other papers
importing of new hardware, etc.). At the same time, the in this journal issue.

2:6 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
capacity. Suppose that the SLA that was requested (and
given) to the consumer for the virtual machine implies
that the VM needs certain (virtual) compute performance.
Now, if the load on other VMs running on the same node
increases, thereby starving this VM, then migrating this
VM to another node could be one way to continue to honor
the SLA for this VM (and possibly SLAs for the other
VMs on the node too). In this case, migrating compute
instances across physical nodes without affecting any
of the higher level layers (i.e., applications running on
the compute instances that constitute the workload layer)
is an action that is completely contained in the purview
of the SDI (although not necessarily within the
compute domain).

Software defined network


Figure 2
Virtual network
Software defined infrastructure: overview of components.
The virtual network connects different virtual compute/
storage instances, providing guarantees such as security
and isolation, performance (network bandwidth), availability
Software defined compute (multi-pathing and routing), etc. Creating the virtual network
typically consists of configuring controllers present in
Virtual compute physical switches and hypervisors to create virtual switches,
Virtual compute is defined as a running instance of links, and end-points. Overlaying virtual networks on
an operating system with a certain number of virtual physical networks and continuous remapping of this
processors, certain amount of memory, and potentially constitutes an important problem here. Evolution of
other specialized devices such as graphical processing units standards such as OpenFlow [20] has played an important
(GPUs), field-programmable gate arrays (FPGAs), or other role in proliferation of virtual networks for a data-center
accelerators, etc. The capacities and capabilities of these in general.
heterogeneous compute resources are expressed in an
abstract manner by calibrating them using specific Software defined network
benchmarks. An operating system instance itself may be Similar to SDC, SDN is transforming and evolving.
instantiated as a bare-metal instance, a virtual machine, Monitoring data from the network controllers is constantly
or a container. Depending on the constraints in the SLAs being fed to the control intelligence to determine several
requested by the consumers, appropriate virtual compute variables (flow and bandwidth parameters, link reliability
instances are created for use. Virtual compute can be used parameters, etc.). The intelligence is itself constantly
to provision with guarantees on metrics such as processor evaluating and potentially mutating the virtual
performance and utilization, compute availability, isolation network(s) by reprovisioning or readjusting them
level, security, etc. using the controller API.
For example, a virtual network might have been
Software defined compute provisioned at start to connect two compute instances.
Software defined compute refers to virtual compute that is While there might be two different physical paths
transforming as needed. This transformation is accomplished connecting the nodes on which these compute instances
by intelligence that is continuously consuming monitoring were instantiated, one path might have been used to provision
data and validating if SLAs are being met (we revisit the initial virtual network. However, as the compute
monitoring in detail in the section BMonitoring services[). instances communicate, if the control intelligence determines
When this intelligence (known as SDI controller, discussed that the required network SLAs are not being met, it
in the section BSDI controller[) determines that there might use the other physical link to distribute the traffic,
is a potential threat to the guaranteed SLA or that SLA or switch the virtual network to the other link in entirety
is already being violated, it initiates necessary action to (if SLA can be honored). Not only the applications
honor/re-enforce the SLAs. running on the compute instances, but compute instances
As an example, let us consider a simple scenario where a themselves might be completely Boblivious[ to such network
virtual machine has been created on a node with a certain mutations occurring Bunderneath[ (a hypervisor might

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2:7

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
probably be aware of this). While some such actions may thousands of nodes and tens of thousands of paraphernalia
not require involvement of other SDI silos, some may have to (links, switches, racks, chassis, power cables, etc.).
be coordinated with SDC/SDS (we will revisit this in the Consequently, maintaining large physical data centers
section BSDI controller[). is a very difficult task, in addition to the much-needed
management of SLAs promised to the consumers. In other
Software defined storage words, some amount of manual intervention particularly
to manage the physical infrastructure is inevitable, such as
Virtual storage adding or removing physical infrastructure, connecting it
Virtual storage refers to provisioning and accessing storage (according to consumer specifications at times), nonstandard
with diverse capabilities. Important variables here include bare-metal installations, repairs, hot swaps, power and
access bandwidth (IOPS), capacity, and the nature of fault cooling related tasks, etc. With the increase in scale,
tolerance. Underlying capabilities (such as tiering, solid-state it becomes practically impossible to manually manage the
disks [21], RAID, object/block storages, and distributed system. Doing so would result in potential bottlenecks
file-systems) may be appropriately combined and abstracted and worse human errors. The programmability of the SDI
to create virtual storage (virtual volumes, etc.) to guarantee lends itself naturally to automating many of the tasks of
SLAs. data-center management, reducing the requirements impact
of manual intervention.
Software defined storage Honoring consumers’ infrastructure SLAs in an automated
Similar to SDC and SDN, SDS refers to utilizing the way becomes one of the important goals associated with
storage capabilities and transforming the virtual storage SDIs. Intelligence is required in SDI that continuously
to continuously honor the SLAs. Monitoring information keeps track of the current state of the infrastructure and
(such as number of reads/writes, IOPS, locality, etc.) plays acts (or transforms) virtual and physical infrastructure
an important role in instantiating virtual storage with so that SLAs are continuously honored. It is this core
appropriate capabilities. intelligence of SDI that we refer to as the SDI controller.
As an example, let us consider a compute instance As such, it becomes a critical entity for the provider. For
provisioned on a node with a few RAID-enabled hard-disk example, lower costs might be an important benefit of
drives. Let us consider that due to the required storage SLA such automationVon one hand people-related costs
specification of certain storage bandwidth and tolerating a can be reduced, and, on the other hand, savings may be
single-disk failure, the virtual storage was created in a striped gained in infrastructure maintenance by fine-grain real-time
fashion on multiple disks. However, due to an unexpected automation (consolidation for power and energy, savings
failure in the parity disk, the storage controller module in cooling, etc.). The blueprint of such a controller
(SDS controller), possibly in conjunction with the incorporates several components that penetrate deep
hypervisor, finally assigns a new disk [only available into the infrastructure. For example, deep monitoring
free disk, but with a lower revolutions per minute (rpm)] as (of hardware, hypervisors, network links, switches, disks,
the parity disk. At this point, while the storage availability etc.) becomes a critical component of the SDI controller.
SLA is still honored (single disk failure can still be tolerated), With disparate sources publishing a variety of monitoring
the performance SLA is under threat as the new parity information, a standard way of publishing, accumulation,
disk is a slower one. The control intelligence (see the and subscription is required. This also implies a data-center
section BSDI controller[) in this case Bdecides[ to migrate pervasive message bus for publishing and consuming
the compute instance to another node that has solid-state information and events, in a scalable and a fault-tolerant
disks for caching in order to meet the performance manner. We refer to this component of deep monitoring
criteria. Note that in this example, the SLA-under-threat and information sharing mechanism as the M or monitoring
situation was first attempted to be handled by the SDS part of the SDI controller.
controller. However, after possibilities at the SDS silo Next, consider the matter of consuming and analyzing this
have been exhausted, the SDI controller (a higher-level monitoring information. This is where different analytics
entity that coordinates across silos) was notified, which engines plug-in. For example, a health statistics engine
took the action to migrate the compute instance itself. may only be Binterested in[ and subscribe to health
Thus, while some issues might be handled at individual events such as nodes going up and down, VMs crashing
silo-level, others might have to be passed over to the and rebooting, network link failure counts, switch
higher level controller. replacements, hot disk swaps, etc. It can use this information
to derive data-center level health statistics at physical and
SDI controller virtual levels. Similarly, an infrastructure policy engine
Increasing adoption of the cloud makes data-center may subscribe to information and events related to power
expansion inevitable, resulting in large data-centers with consumption, CPU usage events, resource provisioning

2:8 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
Figure 3
SDI controller overview.

times, etc. It may use such information to enforce data-center virtual machine to a different host will likely have impact on
level policies for energy consolidation, cost limits, IM the network SLA and thus needs to be taken into account
scaling, etc. Similarly, a workload policy engine may when entire workloads are deployed.
subscribe to events related to VM deployment, specific Finally, the last part of the SDI controller is actually
performance metrics, VM state, etc. It may use this executing the derived plan. In the execution phase, the
information to change the way a workload maps to the controller invokes APIs (potentially across all the silos)
physical infrastructure. Other engines that might plug-in to instantiate the designed plan. This phase becomes
similarly include a prediction engine, security engine, much more management specific with specific adapter
customer trend analysis engine, etc. This analysis and functions getting invokedVfor example, if OpenStack is the
accumulation part of the data-center becomes the second management system in the data center (or part of the data
important component of the SDI controller, and we refer center), generic operations that were part of the designed
to this component as A or analysis. plan become translated into OpenStack APIs to execute
An important result of such analyses could be the need to individual operations. We refer to this component as the
transform the infrastructure (i.e., change the virtual resources E (execution) part of the SDI controller.
in some way). This is where a policy engine might apply The SDI controller is a generic controller that fits in the
certain criteria or solve certain optimization problems to model of M , A, P, and E components (collectively referred
derive a forward plan for the (virtual) resources. We refer to as MAPE Loop [22], as shown in Figure 3) that can
to this component as the P (planning) part of the SDI subsume a variety of policy engines. It is possible for one
controller. State maintenance and cross-silo coordination of these policy engines to govern a critical function (such
are important aspects to consider here. as placement) with it being the sole entity performing that
Infrastructure-level SLAs may also be taken into account function data-center wide (possibly for consistency). In this
during both the A and P phases. For instance, moving a case, we envision that other policy engines will coordinate

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2:9

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
Figure 4
SDI controller architectural vision. (ssh: secure shell; VM: virtual machine.)

with it using an API (or the message bus, exposing provisioning of the management stack, and implementation
operations as events). Figure 4 shows an architectural of policies related to health, security, and availability
overview of the SDI controller that we envision. of physical resources. The second layer involves virtual
All in all, the SDI controller is responsible for maintaining resource management responsible for the life-cycle
the SLA of infrastructure patterns provisioned on physical management and local optimization of virtual resources
infrastructure. Generally, such management needs to consider of compute, network, and storage. Typically these functions
a very broad set of concerns related to pattern performance, would be implemented within the domains of SDC, SDN,
security, availability, and cost, as well as concerns and SDS, respectively. The third layer involves infrastructure
pertaining to the operational policies of the data-center pattern management responsible for the holistic orchestration
owner, which include electrical energy cost, software license and optimization spanning the domains of compute,
entitlements, server maintenance schedules, and many others. network, and storage, leveraging the abstracted capabilities
Developing a system capable of addressing such broad provided by SDC, SDN, and SDS, respectively as well
considerations has eluded the technical community so far. as the abstractions provided by physical infrastructure.
As an example, resource mapping solutions have evolved An example of a component implementing this function is
around independent technologies narrowly scoped to described in a companion paper [26] (where infrastructure
networking [9], storage [10, 11, 23], cost efficiency [24], pattern is referred to as virtual resource topology).
or power [12, 25]. The SDI controller addresses the challenge While each component addresses a broad set of
of holistic multiobjective management by introducing considerations (e.g., security, availability, etc.) they are
clear abstractions that are used to subdivide the problem scoped to a narrow and well-defined set of management
space into smaller more manageable components and by entities, actions, and inputs. As an example, physical IM
introducing advanced management in each of the following is concerned with the fine-grained state of physical servers,
three layers. The first layer involves the physical IM including server network reachability, state of individual
responsible for the discovery and management of individual DIMMs (dual in-line memory modules), cores, and disks,
physical resources and physical resource topologies, hypervisor state, provisioning process, etc. However,

2 : 10 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
these details are abstracted to a single flag Bunavailable/ Monitoring
unavailable[ when Bseen[ by an infrastructure pattern Every element of SDI is monitored. This not only includes
management component. deep monitoring of the hypervisor, compute, network,
Figure 4 shows the high-level design of the SDI controller, storage, etc., but also monitoring of resource provisioning
with the MAPE loop and individual silos represented in and management services. This is in fact an absolute
logical fashion (i.e., compute nodes or storage disks are requirement to even know how the infrastructure services
not shown). We believe that the SDI controller (and are performing as a whole. Monitoring data here might
the policy engines comprising it) are an important refer to metrics such as compute provisioning time (which in
value-add to a data center. This framework can particularly itself might be split as image provisioning time, boot time,
be applied to enforce infrastructure-level SLAs promised to individual services start time, etc.), network and storage
the consumers in an automated manner. Also, the ability provisioning time, time to link virtual resources (such as
to define a diverse set of policies on data-center-wide associating a virtual volume to compute), metrics in
universal functions (and APIs) can be extremely useful to management services software (authentication time), etc.
providers and consumers. In fact, we believe that the API to Monitoring data forms the basis for the SDI controller,
define these policies should be exposed to consumers so that not only for scaling management services but also
they can build and deploy their own policies. for event and correlation analysis, diagnostics, prediction
models, etc.
Management services
In this section, we discuss the management services SLAs for management services
component of SDI. In the section BInfrastructure Having SLAs for management services can be a differentiator
management,[ we highlighted a few important features that for a cloud providerVi.e., in addition to having SLAs
today’s management services must provide: availability, associated with infrastructure patterns, a consumer can also
scalability, decomposability (de-centralized), upgradeability, be guaranteed that the pattern will be provisioned in a
and intuitiveness. We feel that management in an SDI setting specified amount of time.
must make use of the above five aspects as primitives but
build on them to provide additional capabilities (to make Examples of management services
the infrastructure truly software defined). These capabilities A number of options for management services exist today,
include the following. with several clouds building their own management stack.
Examples of management software include Eucalyptus
Elasticity CLC (cloud controller), CloudStack Management Server,
In SDI, the management services need to be elasticVi.e., VMware vCloud Suite, OpenStack, etc. OpenStack
scale out (up) or in (down), based on the need (current load is evolving as a de facto open source standard for cloud
on the system). Scaling out (up) may be required when the management. More than 200 companies are part of the
load on the management services system is highV either OpenStack project released under the Apache License.
due to external consumers and due to internal infrastructure
mutation discussed earlier. Without scaling out (up), it Monitoring services
may take long time to provision resources (with limited Monitoring services of SDI (and a data center in general)
management services) and hence become a consumer Bpain form one of the most important components aiding in making
point.[ At the same time, scaling in (down) is an essential many decisions. In addition, monitoring itself might be
feature to save costs (during low loads)Vthere is no need offered as a service to the consumers. Monitoring here not
to run multiple management instances on nodes that are idle. only refers to performance aspects of the system but also
Therefore, as a provider, this dynamic scaling (up/down or its health. It is also essential to enforce security and other
out/in) is an essential factor for achieving high consumer regulatory needs of the consumers. As mentioned in the
satisfaction with low cost. We believe that this is only discussion on SDI controller, there are two fundamental
possible when such dynamic scaling mechanisms are aspects of monitoring. First, consider the method of
automated, and this is precisely what we mean by elasticity. measuring and gathering data; this varies for all the silos
This would be an essential part of the SDI intelligenceVin and individual components and statistics that are being
other words, elastic management services would be yet measured. While a data center’s monitoring system will
another policy engine as part of the SDI controller. make available a plethora of default statistics, consumers
We also believe that in order for management services might be interested in specific monitoring information
to be elastic, the core services need to be built, exposing by providing a means to measure the same. Second, consider
APIs for scaling, and the APIs are then invoked using the method of sharing the measured information through
a higher-level intelligence (see the section BSDI controller[), a common message busVa bus that is highly scalable,
which uses the monitoring information. asynchronous, fault-tolerant and light-weight (does not

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2 : 11

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
interfere with system performance) where information and infrastructure providers. It is the additional value-adds
events can be published and subscribed to. of availability, security, monitoring, and sophisticated
For either of these aspects, we think that fundamental management combined with intelligence for automation
primitives (APIs) need to be exposed to build generic that we believe will emerge as differentiators in the market.
services including policies in the SDI controller discussed In addition, the ability for consumers to create or plug-in
earlier. Some of these services may be provided by the their own policy engines as part of this automation using a
data center in a generic way (e.g., CloudWatch), whereas standard API exposed by the infrastructure-level would be an
others can be custom-built for the consumers (or built by additional benefit for creating consumer-specific solutions.
consumers themselves). We believe that there are a few We believe that the rapid shift to consuming virtualized
important characteristics that a monitoring framework solutions necessitates infrastructures to be software defined
for a next-generation data-center needs to obey. and that the characteristics and components discussed in
First, consider the end-to-end characteristic. Monitoring this paper will aid the design of data-center systems software
needs to be end-to-end; this includes monitoring of for infrastructures.
lower-level hardware, hypervisor, virtual resources (compute,
network, and storage), applications and monitoring of *Trademark, service mark, or registered trademark of International
infrastructure provisioning itself. APIs at all levels need Business Machines Corporation in the United States, other countries,
or both.
to be exposed for consumption of monitoring data.
Second, consider the lightweight characteristic. We believe **Trademark, service mark, or registered trademark of Netezza, Inc.,
that simple, lightweight and non-intrusive monitoring system Embedded Microprocessor Benchmark Consortium, OpenStack
Foundation, Eucalyptus Systems, Inc., Linus Torvalds, or The Open
is a very essential component of data-centers. Simplicity Group in the United States, other countries, or both.
is not only in terms of functionality, but also install and
upgrade (DevOps). Heavyweight systems that come with References
a host of tools and analysis can quickly become an overhead 1. R. Prodan and S. Ostermann, BA survey and taxonomy of
and intrude system performance. infrastructure as a service and web hosting cloud providers,[ in
Third, consider fault-tolerance. With component failures Proc. 10th IEEE/ACM Int. Conf. Grid Comput., 2009, pp. 17–25.
2. RackSpace. [Online]. Available: http://www.rackspace.com/
in data-centers not far from being norm, monitoring system 3. Amazon Web Services. [Online]. Available: http://aws.amazon.
needs to be a reliable source for monitored data. com/
Fourth, consider distributed and loose coupling. We feel 4. IBM Cloud Computing Overview, IBM Corporation, Armonk,
NY, USA. [Online]. Available: http://www.ibm.com/
that monolithic monitoring infrastructures do not scale and cloud-computing/us/en/
also increase the complexity in consumption and analysis. 5. Cloud Computing with VMWare Virtualization and Cloud
Lower-level simpler monitoring services should be built to Technology. [Online]. Available: http://www.vmware.com/
cloud-computing.html
be able to couple together to form higher-level monitoring 6. Review and Summary of Cloud Service Level Agreements.
services (such as CloudWatch). [Online]. Available: http://www.ibm.com/developerworks/cloud/
Fifth, consider flexible and event driven characteristics. library/cl-rev2sla.html
7. Netezza: IBM Netezza Data Warehouse Appliances, IBM
Monitoring APIs need to be very generic for consumers Corporation, Armonk, NY, USA. [Online]. Available:
to put in data, retrieve data (say, across any time windows, http://www-01.ibm.com/software/data/netezza/
any ranges, etc.) or receive events using some generic 8. IBM Storwize V7000 and Storwize V7000 Unified Disk Systems,
IBM Corporation, Armonk, NY, USA. [Online]. Available:
criteria about metrics at any level (e.g., notification if CPU http://www-03.ibm.com/systems/storage/disk/storwize_v7000/
utilization on a node crosses 50% or if a node has seen 9. O. Biran, A. Corradi, M. Fanelli, L. Foschini, A. Nus, D. Raz, and
increasing trend in memory errors in last week, etc.). E. Silvera, BA stable network-aware VM placement for cloud
systems,[ in Proc. 12th IEEE/ACM Int. Symp. CCGRID,
Fifth, consider clean APIs. APIs at various levels of Washington, DC, USA, 2012, pp. 498–506.
monitoring should be simple and intuitive. RESTful APIs 10. M. Korupolu, A. Singh, and B. Bamba, BCoupled placement in
have become a common model of consumption in general modern data centers,[ in Proc. IEEE IPDPS, 2009, pp. 1–12.
11. D. Yuan, Y. Yang, X. Liu, and J. Chen, BA data placement
and they seem to fit in well here in designing loosely strategy in scientific cloud workflows,[ Future Gen. Comput. Syst.,
coupled, distributed large-scale monitoring systems. vol. 26, no. 8, pp. 1200–1214, Oct. 2010.
Monitoring frameworks such as ganglia [27] have gained a 12. B. Li, J. Li, J. Huai, T. Wo, Q. Li, and L. Zhong, BEnacloud:
An energy-saving application live placement approach for cloud
lot of attention in recent years not only due to their scalability computing environments,[ in Proc. IEEE Int. Conf. Cloud
and light-weightedness, but their simplicity to be used to Comput., 2009, pp. 17–24.
compose higher-level services. 13. The Embedded Microprocessor Benchmark Consortium
(EEMBC). [Online]. Available: http://www.eembc.org/coremark/
14. FIO Benchmark. [Online]. Available: http://freecode.com/
Conclusion projects/fio
Virtualization has transformed the view of infrastructure 15. B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield, BRemus: High availability via asynchronous virtual
and its consumption model. With costs of infrastructure machine replication,[ in Proc. 5th USENIX Symp. NSDI, Berkeley,
dropping, price may not be the primary differentiator for CA, USA, 2008, pp. 161–174.

2 : 12 G. KANDIRAJU ET AL. IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.
16. Heat. [Online]. Available: https://wiki.openstack.org/wiki/Heat Michael D. Williams IBM Systems and Technology Group,
17. CloudStack High Availability. [Online]. Available: http:// Poughkeepsie, NY 12601 USA (mdw@us.ibm.com). Mr. Williams is
cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.1.0/ a Distinguished Engineer and member of the IBM Academy of
html/Installation_Guide/feature-overview.html Technology. He joined IBM in 1989 after graduating magna cum laude
18. OpenStack High-Availability. [Online]. Available: http://docs. from the State University of New York at Oswego with a bachelor’s
openstack.org/trunk/openstack-ha/content/ch-intro.html degree in computer science. Throughout his career, Mr. Williams
19. Eucalyptus. [Online]. Available: http://www.eucalyptus.com/ has had a broad set of client facing and development responsibilities
20. Open Networking Foundation. [Online]. Available: https://www. in the area of systems and software. His current focus is software
opennetworking.org/ architecture and design for IBM’s next-generation systems with an
21. K. El Maghraoui, G. Kandiraju, J. Jann, and P. Pattnaik, emphasis on software defined environments, cloud, and virtualization.
BModeling and simulating flash based solid-state disks for
operating systems,[ in Proc. 1st Joint Int. Conf. Perform. Eng.
WOSP/SIPEW, 2010, pp. 15–26.
22. P. Lalanda, J. A. McCann, and A. Diaconescu, Autonomic Malgorzata (Gosia) Steinder IBM Research Division,
Computing: Principles, Design and Implementation. Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA
London, U.K.: Springer-Verlag, 2013. (steinder@us.ibm.com). Dr. Steinder is a Research Staff Member and
23. A. Singh, M. Korupolu, and D. Mohapatra, BServer-storage Manager of the Middleware and Virtualization Management Group
virtualization: Integration and load balancing in data centers,[ in at the IBM T. J. Watson Research Center. She has worked on topics
Proc. ACM/IEEE Conf. SC, Piscataway, NJ, USA, 2008, pertaining to the management of computation and data in large-scale
pp. 53:1–53:12. distributed environments. She led a research team that invented
24. R. Zhou, F. Liu, C. Li, and T. Li, BOptimizing virtual machine and developed virtual machine management technology in IBM
live storage migration in heterogeneous storage environment,[ in WebSphere* Cloudburst Appliance and developed a dynamic
Proc. 9th ACM SIGPLAN/SIGOPS Int. Conf. VEE, New York, VM placement algorithm for IBM VMControl Enterprise Edition.
NY, USA, 2013, pp. 73–84. Dr. Steinder holds a Ph.D. degree in computer science from the
25. E. Feller, L. Rilling, and C. Morin, BEnergy-aware ant colony University of Delaware and an M.Sc. degree in computer science
based workload placement in clouds,[ in Proc. IEEE/ACM from AGH University of Science and Technology, Krakow, Poland.
12th Int. Conf. GRID Comput., Washington, DC, USA, 2011,
pp. 26–33.
26. W. C. Arnold, D. J. Arroyo, W. Segmuller, M. Spreitzer,
M. Steinder, and A. N. Tantawi, BWorkload orchestration S. Mark Black IBM Platform Computing, IBM Canada,
and optimization for software defined environments,[ IBM J. Markham, ON L3R 9Z7 Canada (mblack1@ca.ibm.com). Mr. Black
Res. & Dev., vol. 58, no. 2/3, Paper 11, 2014 (this issue). is an Architect with the IBM Platform Computing Group. He received
27. Ganglia Monitoring System. [Online]. Available: http://ganglia. an M.E.Sc. degree in signal processing from University of Western
sourceforge.net/ Ontario in 1995, and a B.Eng. degree in electrical engineering from
Ryerson University in 1992. Prior to joining IBM, Mr. Black’s career
started in systems administration, where he gained an understanding
of how systems should be managed. He then went on to work on
Received September 25, 2013; accepted for publication systems management automation tools. He has designed and led the
October 27, 2013 development of a large-scale UNIX** systems administration tool,
several easy to use high-performance computing management tools,
and an enterprise cloud tool and is currently designing cluster
Gokul Kandiraju IBM Research Division, Thomas J. Watson deployment tools. He has authored and coauthored three technical
Research Center, Yorktown Heights, NY 10598 USA (gokul@us.ibm. papers. Mr. Black is an IEEE (Institute of Electrical and Electronic
com). Dr. Kandiraju is a Research Staff Member and Manager for Engineers) lifetime member award holder.
Data-center Systems Software at the IBM T. J. Watson Research
Center. He received a bachelor’s of technology degree in computer
science from the Indian Institute of Technology, Madras, India in 1999,
and a Ph.D. degree in computer science from The Pennsylvania State
University in 2004. After joining the IBM T. J. Watson Research
Center in 2004, he worked in areas including operating system virtual
memory management, enterprise storage systems, file-system design,
and cloud computing. He has contributed to IBM products and is
an author of several publications and patents. He is also a member of
ACM and a senior member of IEEE.

Hubertus Franke IBM Research Division, Thomas J. Watson


Research Center, Yorktown Heights, NY 10598 USA (frankeh@us.ibm.
com). Dr. Franke is a Research Staff Member and Senior Manager for
Software Defined Infrastructures at the IBM T. J. Watson Research
Center. He received a Diplom degree in computer science from the
Technical University of Karlsruhe, Germany, in 1987, and M.S. and
Ph.D. degrees in electrical engineering from Vanderbilt University
in 1989 and 1992, respectively. He subsequently joined IBM at
the IBM T. J. Watson Research Center, where he worked on the
IBM SP1/2 MPI (Message Passing Interface) subsystem, scalable
operating systems, Linux** scalability and multi-core architectures,
scalable applications, the PowerEN architecture and application
space and, most recently, cloud platforms. He received several IBM
Outstanding Innovation Awards for his work. He is author or coauthor
of more than 30 patents and over 100 technical papers.

IBM J. RES. & DEV. VOL. 58 NO. 2/3 PAPER 2 MARCH/MAY 2014 G. KANDIRAJU ET AL. 2 : 13

Authorized licensed use limited to: UNIVERSITY OF HERTFORDSHIRE. Downloaded on July 23,2021 at 20:40:05 UTC from IEEE Xplore. Restrictions apply.

You might also like