Professional Documents
Culture Documents
Infrastructure Architecture Essentials For Data Center and Cloud
Infrastructure Architecture Essentials For Data Center and Cloud
ESSENTIALS FOR
DATA CENTER AND CLOUD
Shankar Kambhampaty
1
Infrastructure Architecture
Essentials for
Data Center and Cloud
Shankar Kambhampaty
Infrastructure Architecture Essentials for Data Center and Cloud
First Release
All rights reserved. This release of the book, or any part of it, may not be duplicated, stored in an information system for retrieval purposes,
or communicated in any form or by any means, electronic, mechanical, photocopying, recording, or scanning without the author’s written
permission.
Limits of Liability: While the author has used his best efforts in preparing this book, the author makes no warranties or representations
concerning the accuracy or completeness of the book’s contents and disclaims any express warranties of merchantability, or implied warranties
of merchantability, or fitness for a particular purpose. There are no warranties that extend beyond the descriptions contained in this paragraph.
Warranty may not be created or extended by written sales materials or sales representatives. The completeness and accuracy of the content
provided in this book and the opinions stated are not warranted or guaranteed to produce any results. The advice and strategies contained
herein are not suitable for every individual. The author shall not be held responsible or liable for any commercial damages, any type of loss, or
loss of profit, or including but not limited to special, incidental, consequential, or other damages.
Disclaimer: This disclaimer is to inform readers of this book that the views, thoughts, and opinions stated in this book belong solely to the
author and not necessarily to the author’s employer, institution, or other groups or individuals. The book is intended for educational purposes
only and does not replace independent professional judgment. All the references marked within square brackets “[]” in different chapters are
meant to refer to quoted material, additional material, or illustrative examples. The Intellectual Property (IP) and all rights for the referenced
content are with the respective organizations/institutions/owners. No portion of that content can be reproduced without written consent from
the respective organizations/institutions/owners. The contents of this release of the book have been checked for accuracy. The author cannot
guarantee full agreement since deviations cannot be precluded entirely. As the book is intended for educational purposes, the author shall not
be responsible for any omissions, errors, or damages arising out of the use of the information contained in the book.
Trademarks: All company names, brand names, and product names used in this book are trademarks, registered trademarks, or trade names
of the respective holders. The author is not associated with any product or vendor mentioned in this book.
Cover image and icons in figures in this book are licensed from Adobe.
After the launch of the third edition of my book on SOA and MSA in the summer of 2018, I received several
messages from students and professionals on the need for articles with insights on infrastructure
architecture. We do not have many books on this subject, and most of the content available is provided
by technology vendors. I began putting together a series of articles on various aspects of infrastructure
architecture over the past year to have them shared through my blog or an online magazine. I had
prepared the draft version of articles when I got a suggestion to pull together the content of the articles
into a book. That is how this book became a reality!
The core concepts for infrastructure architecture have their origins in data centers to provide a scalable,
secure, highly available, and performant environment for applications. Over the past decade, with the
cloud gaining maturity and adoption, these concepts have also been extended to the cloud. Therefore,
I have tried to balance both the data center and cloud infrastructure architecture in this book.
Many new entrants to the IT industry have directly begun working on cloud platforms without a
background in data center solutions. After all, cloud by design, abstracts and wraps infrastructure so
that there is no need to know what runs where and how everything is wired together! Just call an API and
get the infrastructure capability you need. While that is a big plus, getting a good understanding of how
infrastructure architecture is done in a data center goes a long way in building the conceptual clarity
needed in the long run for architecting sophisticated solutions in both data center and cloud. This book
attempts to provide conceptual clarity on infrastructure architecture for solutions deployed in a data
center and the cloud to all students and computing professionals.
Like any other field, infrastructure technology space is vast, with various possible architecture scenarios.
It is always a challenge to decide how much detail one should cover in a book of this kind. Moreover,
technologies change frequently and come from several vendors with varying capabilities. However, the
architecture concepts pretty much remain the same. I, therefore, took this approach of focusing on the
essentials and provided many references to industry and academic literature for every major topic that
a reader can use to dive deeper.
i
Who Should Read This Book
A book such as this one typically caters to the needs of several types of readers.
1. Students, software developers: Read this book to get an overview of key aspects of infrastructure
architecture.
2. Architects: Study each chapter of the book carefully and relate to your experience.
3. Project Managers: Focus only on those sections that apply to your projects and revisit them
whenever needed.
4. CIOs and CTOs: Review the content to address any gaps you may have in your understanding and
provide your feedback on what I should further elaborate on in the subsequent releases of this book.
I want to stay connected with you, provide additional information from time to time, and answer any
questions you may have. Please post any questions or suggestions you may have at my blog –
https://archtecht.com.
I trust readers will benefit from the infrastructure architecture concepts discussed in this book and
welcome any constructive comments or suggestions on the content or format of this book.
Shankar Kambhampaty
ShankarKambhampaty@gmail.com
ii
Acknowledgments
I want to take this opportunity to thank the Almighty and The “Line of Gurus”. Dr. M. Narasimharao and
Shri. K. M. Sastry provided remarkable insights to whom I would like to express my deepest gratitude.
This book would not have been possible without the unstinted support of my wife, Mallika, our daughter,
Sasirekha, and our son, Harish Rohan. They have always been with me, despite many sacrificed weekends
and evenings.
I must thank my senior colleagues at DXC, Erik Wahab, Vinod Bagal, James Brady, Nachiket Sukhtankar,
and Lokendra Kumar Sethi, who have always supported me in all my publications.
I have been fortunate to have worked with Ram Mynampati, AS Murthy, Ganesan Sekhar, Dr. TV Prabhakar,
Arun Jain, Kedarnath Udiyavar, Sree Arimanitahya, Dan Hushon, and other remarkable individuals at
Satyam, Polaris, and CSC/DXC. They have valued knowledge and sharing of it through publications. I am
grateful to them for their constant encouragement.
My special thanks to Rahul Shah and Vishal Thapa for all the figures in the book, the cover design, and the
formatting of the entire text to make the book look attractive. It would not have been possible for the book
to come together without their support. They spent their personal time helping me.
Mahendran Raju, Ajit Deshpande, Vijay Nanduri, MAS Naveed, Kamalakkanan Jayaraman, Swamiraj
Govindan, Altaf Anees Sarker, and Sasirekha Kambhampaty helped me with reviews of the different
chapters in the book. They gave many useful suggestions, and I truly appreciate their inputs.
Madhav Negi and Altaf Anees Sarkar suggested structuring the articles I had written in the form of a book,
and many thanks to them for that.
I deeply value the discussions I had with my team on infrastructure architecture. Krishna Dhavala, Dinesh
Batla, Darpan Verma, Yashpal Singh, Raj Sekhar Mishra, Bajrang Gupta, Nikhil Naik, Venkat Narkulla, Pintu
Maity, Suresh Yaram, Venkat Godavarthy, Radhakrishna Arugula, Rajeev Mandapati, Ramakrishna Jasti, Sri
Charan Surapaneni, SriRam Anjanadri, Ismail Shaik Mohammed, Imran Shaik, Siddharam Gour, Chandra
Sekhar, and Kamlesh Singh - Thank you all.
Shankar Kambhampaty
iii
Contents
Data Center......................................................................................................1
1.1 IT, ITSM, and ITIL......................................................................................................................... 1
1.2 Data Center ................................................................................................................................. 3
1.2.1 Infrastructure capabilities.................................................................................................................................3
1.2.2 Power and Cooling ...........................................................................................................................................5
1.2.3 Cabling ..............................................................................................................................................................5
1.2.4 Security..............................................................................................................................................................5
1.2.5 Automation........................................................................................................................................................5
1.2.6 Monitoring.........................................................................................................................................................5
1.2.7 Data Center Tiers..............................................................................................................................................6
1.2.8 Active and Passive Data Centers.....................................................................................................................6
References..............................................................................................................................................6
Cloud................................................................................................................7
2.1 Private Cloud .............................................................................................................................. 7
2.2 Public Cloud................................................................................................................................ 8
2.3 Hybrid Cloud................................................................................................................................ 8
2.4 Cloud Adoption Framework.......................................................................................................... 9
2.5 Migration Strategies to Cloud.................................................................................................... 11
2.6 Well-Architected Framework...................................................................................................... 12
2.7 Landing Zones........................................................................................................................... 12
2.8 Agile approach for cloud deployments ...................................................................................... 13
References............................................................................................................................................15
iv
Compute........................................................................................................33
5.1 Mainframe running z/OS or Linux.............................................................................................. 34
5.1.1 Mainframe in Hosted Facility and Cloud...................................................................................................... 35
5.2 Mid-range running AIX or IBM i.................................................................................................. 36
5.2.1 Mid-range on Cloud........................................................................................................................................ 37
5.3 x86 servers running Linux or Windows....................................................................................... 37
5.3.1 Virtualization ................................................................................................................................................. 37
5.3.2 Hypervisors..................................................................................................................................................... 37
5.3.3 Servers for Virtualization............................................................................................................................... 38
5.3.4 Processor, Memory, and Benchmarks.......................................................................................................... 38
5.3.5 Virtual Server Options for Cloud................................................................................................................... 40
5.4 Compute Characteristics........................................................................................................... 41
References............................................................................................................................................41
Network.........................................................................................................43
6.1 Network Basics.......................................................................................................................... 44
6.1.1 OSI Model....................................................................................................................................................... 44
6.1.2 LAN and WAN................................................................................................................................................. 45
6.1.3 Virtual LAN (VLAN) ....................................................................................................................................... 45
6.1.4 Basic Network Diagram................................................................................................................................. 51
6.1.5 Address Translation ...................................................................................................................................... 52
6.1.6 Proxy............................................................................................................................................................... 54
6.2 Network Architecture................................................................................................................. 56
6.2.1 Three-tier Architecture................................................................................................................................... 56
6.2.2 Two-Tier Spine-Leaf architecture.................................................................................................................. 57
6.3 Network virtualization................................................................................................................ 58
6.4 Network services in the cloud.................................................................................................... 59
References............................................................................................................................................59
Storage..........................................................................................................61
7.1 Block Storage............................................................................................................................ 62
7.1.1 Block Storage Options on-premises............................................................................................................. 64
7.1.2 Block Storage Options on Cloud................................................................................................................... 64
7.2 File Storage............................................................................................................................... 65
7.2.1 File storage options for data center.............................................................................................................. 65
7.2.2 File storage options for cloud....................................................................................................................... 66
7.3 Object Storage........................................................................................................................... 66
7.3.1 Object storage options for data center......................................................................................................... 66
7.3.2 Object storage options for cloud................................................................................................................... 67
7.4 Storage Media........................................................................................................................... 67
7.5 Storage Tiers............................................................................................................................. 68
References............................................................................................................................................68
v
Backup and Restore........................................................................................ 69
8.1 Backup/Restore criteria............................................................................................................. 70
8.2 Solution patterns for backup and restore................................................................................... 71
8.3 Back-end Network..................................................................................................................... 73
8.4 Types of Backup........................................................................................................................ 73
8.5 Operational Recovery................................................................................................................. 74
8.6 Backup/Restore solutions.......................................................................................................... 75
8.6.1 Backup/Restore storage options for data center......................................................................................... 75
References............................................................................................................................................76
8.6.2 Backup/Restore storage options for cloud.................................................................................................. 76
Disaster Recovery..........................................................................................77
9.1 Disaster Recovery Characteristics............................................................................................. 78
9.2 Disaster Recovery Process........................................................................................................ 79
9.2.1 Preparation..................................................................................................................................................... 79
9.2.2 Execution........................................................................................................................................................ 80
9.3 Replication for DR ..................................................................................................................... 81
References............................................................................................................................................84
Monitoring......................................................................................................85
10.1 IT Infrastructure Monitoring..................................................................................................... 86
Hardware Monitoring.............................................................................................................................................. 86
Illustration ............................................................................................................................................................... 86
Server Monitoring.................................................................................................................................................... 87
Storage Monitoring.................................................................................................................................................. 88
Network Monitoring................................................................................................................................................. 88
10.2 Application Monitoring............................................................................................................ 89
10.3 Event Monitoring and Correlation............................................................................................. 90
10.4 IT Operations Analytics (ITOA)................................................................................................. 92
10.5 Artificial Intelligence Operations (AIOps)................................................................................. 92
References............................................................................................................................................94
Security..........................................................................................................95
11.1 Access Security....................................................................................................................... 96
11.2 Connectivity Security............................................................................................................... 97
11.3 Data Security........................................................................................................................... 99
11.4 Application Security............................................................................................................... 100
11.5 Cyber Security....................................................................................................................... 104
References..........................................................................................................................................106
Index............................................................................................................................................. 107
vi
Chapter 1
Data Center
Organizations build applications to fulfill their business processes and deploy them on on-premises
data centers or on the public cloud. Regardless of where they are deployed, there are some essential
infrastructure components that most applications need – servers, storage, and network. All these
components require infrastructure solutioning to work together in a data center to meet the demands
of the applications. There are, of course, a few features offered by the public cloud that require further
solutioning, such as serverless compute (e.g., AWS Lambda or Azure Virtual Function). In this chapter,
the basic data center concepts will be outlined.
Information Technology Service Management (ITSM) is how the IT department of the enterprise
(including the third-party organizations) manages the delivery of IT services to consumers. It is defined
by five processes – service strategy, service design, service transition, service operation, and continual
service improvement[2]. Figure 1.1 depicts the key processes of ITSM pictorially[3].
Information Technology Infrastructure Library (ITIL) is a framework with a collection of best practices
for each major phase of the ITSM to improve and optimize resources and make them efficient. While ITIL
is a popular framework for service management, there are others. One example is Control Objectives for
Information and Related Technology (COBIT) developed by the Information Systems Audit and Control
Association (ISACA) for IT management and governance.
In an ITSM/ITIL organization, service strategy and service design processes result in services meant
to meet consumer requirements and priorities. These services are made available through a service
catalog. Consumers order the services from the service catalog, which triggers workflows for approvals
Workflows
Ru
rts
les
Ale ITSM
Tool
(Service
Management)
No
tifi c a ti o n s
During service operation to deliver the service, events occur. An event is referred to as an incident when
it is unplanned and negatively affects the service and needs a response to having the service restored.
Incidents have underlying causes that represent problems. Problems need to be addressed to mitigate or
prevent incidents from occurring. The data relating to problems, incidents, and other service management
processes are structured and managed through a knowledge management process. Therefore, the key
processes for service operation are event management, incident management, problem management,
and knowledge management[3].
The data center is in a state of churn due to new infrastructure technologies and applications being
introduced on an ongoing basis to address changing business, and IT demands. Also, some of the
current infrastructure and application deployments may have to be retired or refreshed. Therefore, there
is a need to continuously assess the state of operations and identify opportunities to plug inefficiencies
and implement solutions (e.g., automation) for service improvement. Operational data trends are studied
through dashboards, metrics captured, and analytics run for this purpose. All these efforts fall under the
continual service improvement ITSM process.
A configuration management database (CMDB) is set up to have an inventory of all IT components that
gets updated when additions/deletions are made through purchase, stock, license, asset, and contract
management functions in the organization.
Needless to say, a data center is critical to an enterprise. A data center outage in the event of a disaster
due to an earthquake, terrorism, or any other reason can ruin an enterprise. Hence, a secondary data
center is set up for business continuity in case of failure of the primary data center.
• Compute (Chapter 5)
• Network (Chapter 6)
• Storage (Chapter 7)
• Backup and Restore (Chapter 8)
• Disaster Recovery (Chapter 9)
• Monitoring (Chapter 10)
• Security (Chapter 11)
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 1.2: Data Centers with representative infrastructure components Focus of this chapter
Note: The term “infrastructure” has a much broader connotation in TOGAF in the context of enterprise architecture. However, the
scope of infrastructure architecture covered in this book is limited to the “Phase D – Technology architecture” of the Architecture
Development Method of TOGAF. Therefore, the words “infrastructure” and “technology” are used synonymously in the context of
capabilities, solutions, and architecture for data center and cloud.
The focus of current data center design practices is on optimizing energy consumption and cooling and
contributing to lesser carbon emissions. In order to run a data center uninterrupted 7X24, there must
be redundancy of power feeds, including backup power generators and uninterrupted power supplies.
1.2.3 Cabling
The data center uses a great deal of copper and fiber-optic cables to interconnect the different systems
in the data center. Structured and planned cabling strategies are key to ensuring that there is no tangled
mesh of wires and efficient dissipation of heat. The use of best practices, including pre-terminated
cables (with connectors from the factory) and patch panels (to make interconnections of different LAN
or fiber-optic circuits), is key to setting up the data center connections efficiently[6]. Equally important is
to document all cable configurations for ready use as and when changes are required.
1.2.6 Monitoring
The staff running the data center require a complete view of all the equipment, their usage, and
performance to respond in a timely manner to any IT problems[8]. A new class of software called Data
Center for Infrastructure Management (DCIM), such as from Sunbird provides dashboards for all critical
aspects of a data center to improve uptime, capacity planning & utilization, and data center energy
efficiency. Also, data center monitoring solutions are available from vendors, such as SolarWinds, to
observe the health of critical components of the data center and take timely actions.
References
[1]
R. Castagna, “information technology (IT)”, https://searchdatacenter.techtarget.com/definition/IT.
[2]
S. Kempter, A. Kempter, “IT Process Wiki - the ITIL® Wiki”, https://wiki.en.it-processmaps.com/index.php/Main_Page.
[3]
EasyVista, “ITSM CAPABILITIES MAP”, https://www.easyvista.com/itsm-capabilities-map.
[4]
R. Kal, “Data Center Design 101: Everything You Need to Know”,
https://www.vxchnge.com/blog/data-center-design.
[5]
M. Ole, “This is how we reduce data centers’ carbon footprint”,
https://blog.sintef.com/sintefenergy/this-is-how-we-reduce-data-centers-carbon-footprint/.
[6]
W. Ross, “4 Data Center Cabling Strategies That Will Make Your Job Easier”,
https://www.vxchnge.com/blog/data-center-cabling-strategies.
[7]
E. Sampera, “What to Know About Logical Security vs Physical Security”,
https://www.vxchnge.com/blog/logical-security-vs-physical-security.
[8]
B. Tom, “How to Retain Control With Data Center Monitoring Software”,
https://www.vxchnge.com/blog/control-with-data-center-monitoring-software.
[9]
UptimeInstitute, “Tier Classification System”, https://uptimeinstitute.com/tiers.
[10]
Switch, “Tier 5® Platinum Data Centers”,
https://www.switch.com/tier-5/#:~:text=Switch%20invented%20the%20Tier%205,operating%20data%20centers%20since%202000.
[11]
T. Slattery, “Active-active data centers key to high-availability application resilience”,
https://www.techtarget.com/searchnetworking/tip/Active-active-data-centers-key-to-high-availability-application-resilience.
[12]
Citrix, “Active-passive site deployment”,
https://docs.citrix.com/en-us/citrix-adc/current-release/global-server-load-balancing/deployment-types/active-passive-site-deployment.html.
[13]
OpenGroup, “Infrastructure Architecture”, https://pubs.opengroup.org/architecture/togaf80-doc/arch/p4/infra/infra_arch.htm#Approach
Cloud
The characteristic feature of these systems is to bring together compute, storage, network in a “box”
with simplified management and lower cost of deployment, maintenance, and support. This feature has
made them good candidates for the private cloud[4].
As indicated earlier, the private cloud is strategic to enterprises when they wish to have full control
of the applications, infrastructure, and data and optimize them for performance[16]. Systems of record
solutions, such as core banking applications, need to have high performance, integrate with mainframe
and mid-range systems, and support strict data privacy regulations. For such use cases, a private cloud
is a good option to consider as it provides tight control on the underlying infrastructure and data storage.
Chapter 2: Cloud 7
2.2 Public Cloud
Public cloud is a cloud computing environment open to all. Public cloud platforms, such as Amazon Web
Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), have grown in maturity and adoption
over the past couple of decades. These platforms provide compute, storage, network, and other infrastructure
resources for organizations to access over the internet or dedicated networks based on a pay-per-use model.
The traditional data centers would not be able to address these requirements at a reasonable cost.
The Tech Giants, Amazon, Microsoft, and Google have built public cloud platforms that offer both
infrastructure as a service (IaaS) and platform as a service (PaaS) capability from their specially designed
data centers that may be accessed as services through an API model.
Public cloud is strategic to an enterprise when solutions that implement business processes must support
a large number of users accessing from different types of devices and undergo frequent changes due
to changing technologies and business demands. The services offered by the public cloud platforms
(e.g., Amazon AWS, Microsoft Azure, Google GCP) provide virtually unlimited elasticity with features to
auto-scale resources as demand increases. For this reason, they are referred to as hyperscalers. These
hyperscalers regularly introduce new capabilities in their cloud platforms. Enterprises can formulate
strategies to enhance their products/services using the new capabilities. Many use cases are addressed
more effectively by the public cloud. The systems of engagement that must support many web and
mobile clients may be implemented on the public cloud. Development environments may be spun up
and brought down easily and quickly on the public cloud. Workloads that require a large number of
infrastructure resources for a few hours in a day, week, or month maybe more efficiently managed on the
public cloud. Some public cloud platforms (e.g., Azure) can extend the life of some of the out-of-support
systems (e.g., Windows 2003). Enterprises may also offer software as a service (SaaS) solutions on the
public cloud with lower investments and time to market.
Many enterprises have workloads on virtualization infrastructure in the data center, private cloud platforms,
and the public cloud. Hybrid cloud involves combining the resources across the data center and public cloud
and creating an environment that may be orchestrated and managed, ideally from a “single pane of glass.”
Hybrid cloud brings flexibility to move workloads between the private cloud to public cloud and vice versa
Chapter 2: Cloud 8
based on business demands – for instance, in an enterprise in the retail sector, workload demands during
the holiday season are likely to be significantly higher than off-season. Workloads deployed on a private
cloud environment can scale further into the public cloud. Applications will have to be architected to leverage
the flexibility and deploy the necessary tooling to manage the processes. For instance, hybrid cloud DR
architecture that involves backup of application data from the private cloud to the public cloud for use in case
of disaster (or cyber attack) is one way to leverage flexibility to increase the resiliency of IT infrastructure.
The move towards microservices architecture from monolithic application architecture coupled with
container orchestration in a hybrid cloud environment results in the efficient use of resources. Application
components may be deployed on infrastructure resources belonging to a hybrid cloud using infrastructure
as code (IaC) tools and techniques (e.g., Terraform). Further, application components and microservices,
packaged as containers, may be deployed using CI/CD pipelines to the container orchestrated hybrid
cloud environment using tools like Kubernetes. Such application deployments on a hybrid cloud take
advantage of the collective resources in a private cloud (and other virtualization infrastructure in the
data center) and public cloud to provide a highly efficient and resilient IT infrastructure through optimal
utilization of resources in both private cloud and public cloud.
6 Perspective
Business Perspective
People Perspective
Governance Perspective AWS Envision Align Launch Scale
Platform Perspective Approach
Security Perspective
Operations Perspective
Azure Define
Plan Ready Adopt
Approach Strategy
Govern Manage
4 Themes Cloud
Adoption
Learn
Lead GCP Tactical Strategic Transformational
Scale Approach
Secure
Figure 2.1: Cloud adoption frameworks for AWS, Azure, and GCP
Chapter 2: Cloud 9
The cloud adoption frameworks for the public cloud platforms, AWS, Azure, and GCP, facilitate the
migration of workloads to the cloud, based on best practices, as outlined by Amazon, Microsoft,
and Google.
AWS advocates the following four-step approach for cloud adoption to realize the benefits[7]:
In addition, it lays out six perspectives that the concerned stakeholders need to own and manage to
develop related capabilities on the cloud.
Azure promotes a methodology for cloud adoption also involving four steps[8]:
It also emphasizes creating an operations baseline and governance baseline for effective management
and governance.
1. Tactical.
2. Strategic.
3. Transformational.
Each of the above three phases involves four themes, learn, lead, scale and secure to lead the organization
towards greater maturity on the cloud transformation journey.
The first two steps are related to business, and the next four are related to IT. Since the focus of this
book is on IT, the next four sections discuss them.
Chapter 2: Cloud 10
2.5 Migration Strategies to Cloud
Enterprises have several types of applications. To ensure that the applications are migrated efficiently to
the cloud with optimal utilization of resources, lower cost and deliver business value in a timely manner,
it is necessary to formulate a migration strategy that specifies WHAT, WHY, HOW, WHERE, and WHEN of
migration to cloud.
WHAT and WHY – An assessment is conducted on the application landscape to determine which
applications are suitable candidates to migrate to the cloud to deliver business value and realize cost
benefits. Migration of all applications to the cloud does not necessarily result in business value to the
enterprise. For instance, it may make more business sense to RETAIN legacy platforms with deep
integration to custom applications in the data center. It may also be possible to identify rarely used or
redundant applications (that implement the same functionality). In such cases, RETIRING of applications
that are no longer needed may be the best treatment. x86 based Windows and Linux applications are
obvious targets for migration in initial phases.
a) REHOST – This is essentially a lift and shift of the application from a virtual machine in the data
center to a virtual machine on the cloud without making any changes to the application.
Chapter 2: Cloud 11
2.6 Well-Architected Framework
A well-architected framework is a guidance for architecting solutions using cloud services to support
application deployments based on a consistent approach and best practices. All the key cloud
platform providers, namely AWS, Azure, and GCP, have, by and large, specified the same principles that
comprise the foundation of a well-architected cloud solution[11]. Table 2-1 summarizes the principles of
a well-architected framework.
A solution developed based on the principles of the well-architected framework will run efficiently on
the cloud, meeting the requirements and delivering business value to the enterprise.
When enterprises consider the cloud as a target for their applications, they need to plan for provisioning,
management, and configuration of thousands (if not hundreds of thousands) of resources related to
compute, storage, network, and data. Any attempt to perform such tasks at scale manually is prone to
error and defeats the benefits of the cloud. Thus, there is a need to automate cloud resource provisioning,
management, and configuration.
To this end, cloud platforms provide access to the resources through APIs that may be programmatically
invoked through scripts. Tools such as Chef, Puppet, Ansible, and Terraform with scripting capabilities
used for automation in data centers are supported by cloud platforms to provision resources in the
cloud environment in an automated manner[12]. Scripts specify infrastructure as code (IaC) both for
Chapter 2: Cloud 12
provisioning and configuration. A landing zone is an environment that may be provisioned and configured
on the cloud with compute, storage, network, and data-related resources using which applications or
workloads are deployed to the cloud. It brings efficiency and ease of management to cloud deployments.
Cloud management teams specify landing zones to define environments for development (DEV), testing
(TEST), user acceptance (UAT), and production (PROD) to deploy applications.
The landing zone is an agile technique for infrastructure[13]. A given environment with thousands of
resources pre-defined as infrastructure as code (IaC) scripts may be spun up and down in minutes based
on business demand. It may be triggered by developers through automation with no manual intervention
by operations teams.
A landing zone is, thus, a building block for the cloud and is an integral part of the cloud adoption
framework of all major cloud providers. An enterprise typically establishes a cloud center of excellence
(CCOE) or a cloud management team that would define the landing zones to be provisioned in the cloud
with necessary monitoring and security controls to enable cloud adoption[14].
In other words, a landing zone enables a “software-defined data center” for a given enterprise in the cloud.
Two important techniques support agile in applications, one is DevOps, and the other is containerization.
DevOps involves the setup of CI/CD pipelines that foster automation of build, test, and deploy activities.
Any change can be moved into production in a few hours, sometimes even minutes — tools like Jenkins,
Bamboo, Chef, Puppet, and Ansible support DevOps. Containerization packages the functionality of an
application into independent, deployable units, scales them, and optimizes the use of the underlying
environment for efficiency[15]. The container orchestration platforms bring in greater management and
resiliency. They enable automation for scale and efficiency. Docker, Kubernetes available in various
distributions, such as from GitHub, OpenShift, and Tanzu, support agile processes.
Chapter 2: Cloud 13
An agile approach for cloud is shown pictorially in Figure 2.2.
Business/IT Cloud
Application Build Deploy Build (CI/CD) DEV TEST
Landing Landing
CI Containers Zone A Zone B
Development Deployment
As indicated in the earlier section, a CCOE, or the cloud management team in an enterprise, establishes
cloud governance. They define landing zones for the business or IT development teams to develop
infrastructure as code (IaC) scripts for provisioning, configuring, and managing landing zones.
The agile application development process involves the use of CI tools and techniques by application teams
to create application builds and containers. They are required to deploy application builds and containers
using CI/CD tools and techniques to DEV, TEST, UAT, and PROD environments. To do so, the application
teams use infrastructure as code (IaC) scripts to perform infrastructure provisioning, configuration, and
management. Infrastructure changes are handled through executing scripts that implement required
changes. Thus, changes to functionality result in changes to code that activate CI/CD pipelines to create
and deploy builds. In a well-integrated agile process, application teams trigger infrastructure as code (IaC)
scripts to provision, configure, and manage infrastructure resources.
In this book, the term “data center” is being used to refer to a traditional on-premises data center managed
by IT or their third-party providers. The term “cloud” is used to refer to public cloud platforms.
The architecture documents required for infrastructure solutions are discussed in the next chapter.
Chapter 2: Cloud 14
References
[1]
NetApp, “What Is Converged Infrastructure (CI)?”, https://www.netapp.com/data-storage/flexpod/what-is-converged-infrastructure/.
[2]
Cisco, “FlashStack with Cisco UCS X-Series and Cisco Intersight”,
https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-x-series-modular-system/flashstack-with-ucs-x-and-
intersight.html.
[3]
DELL, “VxRack System FLEX”, https://www.dell.com/en-us/work/shop/povw/vmware-vxrack.
[4]
A. Miller, “Converged vs Hyperconverged Infrastructure: The Differences Between CI & HCI”,
https://www.bmc.com/blogs/converged-infrastructure-vs-hyper-converged-infrastructure/#.
[5]
Mainstream Technologies, “Systems of Record vs Systems of Engagement”,
https://www.mainstream-tech.com/systems-of-record-vs-systems-of-engagement/.
[6]
S. Vennam, IBM, “Hybrid Cloud”, https://www.ibm.com/in-en/cloud/learn/hybrid-cloud.
[7]
AWS, “AWS Cloud Adoption Framework (AWS CAF)”, https://aws.amazon.com/professional-services/CAF/
[8]
Microsoft, “Microsoft Cloud Adoption Framework for Azure”, https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/
[9]
Google Cloud, “The Google Cloud Adoption Framework”,
https://services.google.com/fh/files/misc/google_cloud_adoption_framework_whitepaper.pdf.
[10]
AWS, “AWS Prescriptive Guidance - Overview”,
https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-retiring-applications/overview.html.
[11]
K. Stalcup, “The Azure Well-Architected Review is Worth Your Time”, https://www.parkmycloud.com/blog/azure-well-architected/.
[12]
S. Strut, IBM, “Infrastructure as Code: Chef, Ansible, Puppet, or Terraform?”,
https://www.ibm.com/cloud/blog/chef-ansible-puppet-terraform.
[13]
E. Rifkin, Microsoft, “Creating cloud ready environments with Azure landing zones”,
https://azure.microsoft.com/en-in/blog/creating-cloud-ready-environments-with-azure-landing-zones/.
[14]
D. Ramachandani, “Building a Cloud Centre of Excellence in 2020: 13 Pitfalls and Practical Steps”,
https://www.contino.io/insights/cloud-centre-of-excellence-2020.
[15]
S. Kambhampaty, “Why Your IT Strategy Should Extend The Value Of Cloud With Containerization”,
https://www.forbes.com/sites/forbestechcouncil/2021/07/16/why-your-it-strategy-should-extend-the-value-of-cloud-with-
containerization/.
[16]
S. Kambhampaty, “Choosing The Right Cloud Strategy For Your Enterprise”,
https://www.forbes.com/sites/forbestechcouncil/2021/12/15/choosing-the-right-cloud-strategy-for-your-enterprise/?sh=6b55ad3851b2.
Chapter 2: Cloud 15
Chapter 3
Thus, it is common to have three separate documents to cover the guidance provided by TOGAF and
industry best practices.
The approach that is adopted is to start with business capabilities and map them to the infrastructure
capabilities required to deliver them. Infrastructure capabilities identified may be implemented to fulfill
the needs of the business. In general, for most enterprises, compute, network, storage, backup/restore,
disaster recovery, monitoring, and security emerge as core infrastructure capabilities.
Several templates are available from different sources for preparing CTA. These may be suitably modified
based on the organization’s requirements[5]. The following is the indicative list of sections that could be
part of the CTA formulated by enterprise or infrastructure architects.
2. Objectives
3. Overview
4. Scope
5. IT Strategy
6. Design Principles
8. Infrastructure Capabilities
a. Compute
b. Network
c. Storage
d. Backup/Restore
e. Disaster Recovery
f. Monitoring
g. Security
h. Others
9. Service Management
a. Service Catalog
b. Service Monitoring
a. Architecture Documents
b. Architecting Process
Templates are available in the public domain for preparing LTA[6]. Considering the guidance provided
by TOGAF 9.2 on what should be part of the architecture deliverable document, the following is the
indicative list of sections that should be part of the LTA formulated by the infrastructure architect[7].
• Solution Context
2. Requirements
• Business Requirements
• Technical Requirements
• Application Requirements
• SLA Requirements
3. Requirements Traceability Matrix
5. Scope
• In Scope
• Out of Scope
6. Solution Detail – Logical Architecture
• Architecture Decisions
• Target Architecture
– Compute
– Storage
– Network
9. Support Services
• Network
• Monitoring
• Backup/Restore
• Disaster Recovery
• Security
Key points related to the indicative list of sections for LTA are as follows:
1) Solution Overview: The solution overview section provides the overview of the solution presented
in the rest of the LTA document, in a page or two. It also provides a context in which this solution
operates, i.e., what other solutions are related.
2) Requirements: When business units want infrastructure solutions, they tend to request X number
of servers. Specifying the number of servers is not a requirement but a solution. It is the role of
the infrastructure architect to understand the need and specify the infrastructure components for
the solution.
3) Design principles: At the enterprise level, or sometimes at the individual business unit level, specific
design principles are identified (e.g., improve security). The LTA must indicate how the design
principles are adhered to.
4) Solution detail: The solution detail section describes the target state architecture and describes
the Architecture Building Blocks (ABB) and Solution Building Blocks (SBB) of the architecture. It is
recommended that ABBs and SBBs defined in the solution be consistent with TOGAF definitions.
For instance, Data Center, Rack, Switch, Storage are high-level ABBs while Servers, ESXi Cluster are
next-level ABBs. The VM that hosts the application is the SBB[8].
5) Transition and Target Operating Model: The architect should carefully consider the transition from
a baseline (AS-IS) to the target architecture and describe the key points to be kept in mind. The
architect should also describe the operating model for the target state.
6) Support services: The run-time environment will require certain support services that include
solutions for network, monitoring, backup/restore, disaster recovery, security, and so on. The
architect should assess the need for these solutions and factor them into the overall solution
suitably.
Templates are also available in the public domain for preparing PTA[6]. Considering the guidance provided
by TOGAF 9.2 on what should be part of the architecture deliverable document, the following is the
indicative list of sections that should be part of the PTA formulated by the infrastructure architect[7]:
• Solution Context
2. Requirements
• Business Requirements
• Technical Requirements
• Application Requirements
• SLA Requirements
3. Scope
• In Scope
• Out of Scope
• Component List
• Component Specification
6. Solution – Configuration
• Implementation Detail
• Physical Layout
• Network
• Monitoring
• Backup/Restore
• Disaster Recovery
• Security
8. System Recovery
Key points related to the indicative list of sections for PTA are as follows:
1. Solution Overview, Requirements, and Scope: For continuity and context, it is advisable to have
these sections similar to those in the LTA with suitable changes, where necessary. These sections
may also provide a reference to the LTA and other documents.
2. Solution detail: The physical architecture of the solution is described in this section. The baseline
architecture (AS-IS), if one exists, and target physical architecture are described in this section.
3. Solution – Hardware & Software: An important activity performed by the infrastructure architect
is to make a complete list of the infrastructure components (including software) and to work
with vendors and the project teams to determine the right fit at optimal cost. The finalized list of
components is specified in this section (with part numbers) that constitutes the bill of materials.
Organization processes to order those components that need to be procured must be initiated after
due approvals.
4. Solution – Setup & Configure: This important section provides the information needed for
implementation teams to stand up the infrastructure and configure it right from an architecture
perspective. It may be noted that the PTA is an architecture document and is not meant to cover
all the specifics of configuration and implementation. Such detail is maintained in implementation
manuals or run books of the different infrastructure components.
5. Support solutions: As indicated earlier, the run-time environment will require certain support services
for network, monitoring, backup/restore, disaster recovery, security, and so on. The specifics at the
physical level will need to be provided by the architect to leverage the support solutions.
The LTA and PTA represent the architecture at the logical and physical levels. To formulate these
architecture documents, the architect must know the different infrastructure capabilities, components
available from vendors, and their architectural considerations. These are described at length over the
rest of the book.
References
TOGAF 9.2, “Introduction to Part II - ADM Overview”, https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap04.html.
[1]
[2]
Essential Project Documentation,” Technology Modelling”,
https://enterprise-architecture.org/docs/technology_architecture/technology_architecture_modelling_overview/.
[3]
TOGAF 8.1.1,” Developing Architecture Views”, https://pubs.opengroup.org/architecture/togaf8-doc/arch/chap31.html.
[4]
P. Robinson, “The Tao of Technology Architecture – Part 1”, https://www.ferroquesystems.com/the-tao-of-technology-architecture-part-1/
[5]
M. A. Ogush etal. “HP Architecture Template, description with examples”,
https://www.cs.helsinki.fi/group/os3/HP_arch_template_vers13_withexamples.pdf.
[6]
C. Michaud, “Templates Repository for Software Development Process”,
https://blog.cm-dm.com/pages/Software-Development-Process-templates.
[7]
TOGAF 9.2, “11. Phase D: Technology Architecture”,
https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap11.html#tag_11_03_08.
[8]
TOGAF 9.2, “33. Building Blocks”, https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap33.html.
The role of an infrastructure architect is to formulate infrastructure solutions to meet the above needs.
To this end, the infrastructure architect should follow a structured process to architect an infrastructure
solution. As discussed in the previous chapter, three documents define the infrastructure solutions and
need to be prepared by architects. The architecting process involves the preparation of these documents,
which is described in the following two sub-sections.
Design Principles: “Design Principles are a set of considerations that form the basis of any good
product”[3]. They are rules that guide architects and designers when making decisions and trade-offs
on architecture and design. When two or more architecture/design options are equally good, design
principles help architects and designers choose the one that best fits the principles defined.
For example, most enterprises have design principles related to improved security and cost optimization[4].
When the architect identifies two or more solution options that meet the requirements equally, the
architect decides on the option using the design principles - more secure, costs less, or both.
Technology standards & guidelines: “A standard is a document that provides requirements, specifications,
guidelines or characteristics that can be used consistently to ensure that materials, products, processes
and services are fit for their purpose”[5]. Organizations prepare and maintain a list of technology standards
to control technologies for solutions keeping in mind long-term benefits, licensing costs, and support
skillsets. Use of technology not in the list of approved standards typically requires exception approvals
from all stakeholders and sponsors.
Identify Infrastructure
Capabilities
Determine Technology
Drivers Compute
Network
IT Strategy
Develop CTA
Storage
Conceptual
Design Principles Backup/Restore Technolgy
Architecture
Disaster Recovery
Technology
Standards & Monitoring
Guidelines
Security
As indicated in the previous chapter, the approach adopted is to start with business capabilities and map
them to the infrastructure capabilities required to deliver them.. A value stream analysis of the business
capabilities may be conducted to establish the processes/activities required to deliver their value. (Value
stream analysis is the set of activities required to deliver goods or services). Infrastructure capabilities may
then be mapped to realize the business capabilities of the enterprise[7]. Such an approach results in a core
set of infrastructure capabilities that a data center or business unit needs to support. In general, for most
enterprises, compute, network, storage, backup/restore, disaster recovery, monitoring, and security emerge
as a core set of capabilities. Each of these capabilities has been described in subsequent chapters.
Applications deliver the business capabilities required by organizations. Hence, an important aspect of
ensuring that infrastructure capabilities support business capabilities is defining the characteristics of
applications deployed on the infrastructure.
The CTA specifies key infrastructure capabilities required for all infrastructure solutions. It also specifies
the service characteristics related to the infrastructure capabilities.
Indicative service characteristics are provided for four key infrastructure capabilities – compute, storage,
backup/restore, and disaster recovery.
1. Compute
Compute is the infrastructure capability that supports deployment of applications. Tiers are defined for
different compute service characteristics. Typically, three tiers are defined – HIGH, MEDIUM, and LOW.
An indicative table with Compute service tiers and their characteristics is given in Table 4-2[10].
The Compute service tiering, in general, aligns with the tiers of application criticality.
3. Backup/Restore
Table 4-4 gives indicative backup/restore service characteristics for the mentioned criteria for
Compute service tier[12].
Infrastructure
Requirements
Logical Physical
Technology Technology
Architecture Architecture
Solutions Options
User/Customer Architect Architect Implement
& Assumptions,
Risks & Constraints
Solution
The process to develop logical and physical technology architecture is shown in Figure 4.2. It involves
three steps – capture infrastructure requirements, prepare logical technology architecture (LTA) and
prepare physical technology architecture (PTA).
2. Understand the infrastructure requirements of applications and other related components that need
to be deployed in discussion with key stakeholders.
3. Discuss multiple solution options and constraints with the stakeholders and arrive at an agreed
infrastructure solution that is fit for purpose.
1. Objectives of Project
2. Business Requirements
4. Scope
5. System Context
a. Interfaces
6. Performance Requirements
a. Concurrent users
b. Response time
7. Scalability Requirements
a. Peak volume/load
8. Sizing Considerations
11. DR Requirements
a. Network
b. Storage
c. Backup/Restore
d. Monitoring
e. Security
Logical Technology Architecture (LTA): For the chosen solution option, the infrastructure architect
clearly identifies the architecture building blocks (ABB) and solution building blocks (SBB) as described
by the TOGAF framework and formulates the LTA[14]. The architect also has reviews conducted, and
refinements made, and ensures agreement with all stakeholders, including the implementation teams.
A list of all software and associated licenses is also made to ensure compliance.
Physical Technology Architecture (PTA): The finalization of LTA is the starting point for the development
of PTA. The focus of PTA is to define a physical technology architecture with specific hardware and
software, their setup, and configuration. For cloud (both private and public), other than a list of software
used, there would not be a need in most cases to develop a PTA, as there would not be a need to stand
up any hardware. The PTA is reviewed by all stakeholders, especially the implementation team. The
implementation team then implements the solution and delivers it to the stakeholder.
The rest of the chapters in the book describe the architectural aspects of the seven infrastructure
capabilities – compute, network, storage, backup and restore, disaster recovery, monitoring, and security.
References
[1]
R. Lebeaux, “IT strategy (information technology strategy)”,
https://searchcio.techtarget.com/definition/IT-strategy-information-technology-strategy.
[2]
CIO Wiki, “IT Strategy (Information Technology Strategy)”, https://cio-wiki.org/wiki/IT_Strategy_(Information_Technology_Strategy).
[3]
B. Brignell, “Design Principles - An open source collection of Design Principles and methods”, https://principles.design/.
[4]
AWS, “AWS Well-Architected Framework”, https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html.
[5]
ISO, “Standards”, https://www.iso.org/standards.html.
[6]
Essential Project Documentation, “Technology Modelling - Defining Technology Capabilities”,
https://enterprise-architecture.org/docs/technology_architecture/define_technology_capability/.
[7]
CIO Wiki, “IT Capability”, https://cio-wiki.org/wiki/IT_Capability.
[8]
J. Ferraro, “Three Top Tips for Successful Business Continuity Planning”,
https://esj.com/Articles/2009/06/09/Business-Continuity.aspx?Page=1.
[9]
Gartner Glossary, “Service-Level Agreement (SLA)”,
https://www.gartner.com/en/information-technology/glossary/sla-service-level-agreement.
[10]
S. Samy, “Service Criticality Tiers Standard and Architecture”,
https://www.linkedin.com/pulse/service-criticality-tiers-standard-architecture-sherif-samy/.
[11]
R. Sheldon, “tiered storage”, https://searchstorage.techtarget.com/definition/tiered-storage.
[12]
IBM, “Backup and Restore: An Essential Guide”, https://www.ibm.com/cloud/learn/backup-and-restore.
[13]
E. Sullivan, “disaster recovery (DR)”, https://searchdisasterrecovery.techtarget.com/definition/disaster-recovery.
[14]
TOGAF 9.2, “33. Building Blocks”, https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap33.html.
Compute
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 5.1: Data Centers with representative infrastructure components Focus of this chapter
The infrastructure components on which applications are deployed represent the compute capability of
the infrastructure. In Figure 5.1, showing the data center deployments, the representative components
related to the focus of this chapter, namely compute capability, are highlighted (in a box with a dashed
outline). These are essentially server components.
Chapter 5: Compute 33
There are three key platforms in a data center on which applications
are deployed –
Compute
The earlier versions of IBM Mainframe (System/360, eServer zSeries, z9 & z10, zEnterprise System including
z196, zEC12) have evolved into z13, z14, and z15 that support z/OS[2]. IBM has launched LinuxONE, which is
the only Linux-based mainframe, as of date, that may be used as a private cloud solution or hosted in their
data centers.
z/OS Linux
Virtual Virtual
z/OS Linux
Processor Processor
z/VM
(Type 1 Hypervisor)
Chapter 5: Compute 34
The operating system perspective for IBM Mainframe Z is shown in Figure 5.2[18].
1. The Processor Resource/System Manager (PR/SM) resides on the IBM Z hardware that does
physical partitioning.
6. z/OS and Linux can also be installed directly within an LPAR created by PR/SM.
Applications may be written using Fortran, COBOL, JCL, SQL, Assembler, CLIST, REXX, PL/I, C, and C++
(and languages supported by POSIX) and deployed on z/OS. The applications written with languages
supported by Linux may be deployed on the virtual processor (executed by CP), IFL, or on IBM LinuxONE.
The mainframe has a central processor (CP) and specialty engines. Workloads that run on CP are charged
software license costs, while those running on specialty engines are not. The monthly software license
charge (MLC) charges are based on usage of central processor measured in millions of service units
(MSU) per every hour in a month for each LPAR (or capacity group) considering peak usage and rolling
average computations.
• zIIP – co-assist processor that takes instructions from executing workload and executes it. Eligible
database workloads can be run on this processor. It cannot be used for Batch, CICS programs. No
MLC charges.
• zAAP – runs eligible Java workloads. Discontinued in System z13 and beyond.
• IFL – Integration Facility for Linux only supports Linux instructions. No MLC charges.
• ICF – internal coupling facility is a special LPAR that has requisite synchronizing software and
handles functions to share, cache, update, and balance data access across multiple processors.
• SAP – System Assist Processor that coordinates I/O subsystems. Multiple SAP engines may be
configured in a mainframe system.
Chapter 5: Compute 35
5.2 Mid-range running AIX or IBM i
The mid-range IBM systems are best suited for compute and data-intensive workloads like defense or
financial services.
The IBM System p with AIX based OS, and IBM System i with IBM i based operating system have evolved
z/OS Linux
into IBM Power Systems (only based on POWER chips) that supports both AIX and IBM i OS[4].
AIX IBM i
POWER Hypervisor
System p
– eServer p Series,
eServer p5, System p5,
System p
(mostly AIX based OS) IBM POWER SYSTEMS
(only based on POWER chips)
System i – AS/400,
eServer iSeries, eServer i5,
System i5, System i
(OS/400, IBM i based
operating system)
The IBM i is bundled with the database (DB2 for i), and hence it is not installed or charged separately[5].
It is a turnkey solution system where most basic systems come bundled with the operating system. IBM
i and OS/400 is ‘object-based’ operating system rather than UNIX, Linux, windows, which are ‘file-based’
operating systems.
The operating system perspective for IBM Power Systems is shown in Figure 5.3[19].
1. On the IBM Power Systems, the POWER Hypervisor resides on the hardware.
2. Logical partitioning, LPAR, are defined using POWER Hypervisor.
3. The operating system, AIX or IBM I, is set up in the virtual processor.
Chapter 5: Compute 36
Applications written with C, C++, Perl, Java, Python, IBM COBOL for AIX, Fortran, Perl, PHP, REXX, SQL,
and so on may be deployed on Logical partitions, LPARs, for AIX on IBM Power Systems. Likewise,
LPARs may also be defined on IBM Power Systems for IBM i for deploying applications written using
IBM i Control Language, RPG, COBOL/400, COBOL, C, Java, CLP, PHP, Node.js, C++, Ruby, Orion, Python,
Fortran, and so on.
1. Physical Server: It is a physical computer dedicated to a single tenant, i.e., with dedicated compute,
memory, and disk storage for the user of the system. It is also called “bare-metal server” and is
relatively more expensive as all the server resources are dedicated to one tenant. This type of server
is used when –
b. Application server requires specific configurations that are not well-supported by virtual
environments.
2. Virtual Server (or virtual machine): It is a “compute resource that uses software instead of
a physical computer to run programs and deploy apps”[9]. It is the server on which operating system
(Linux or Windows) is installed before deploying the applications.
5.3.1 Virtualization
Virtualization is the process of dividing a physical machine into multiple unique and isolated units called
virtual machines (VM) using virtualization software[10]. A virtualization software (for instance, VMware
vSphere) is installed on the physical machine(s) that enables virtual machines’ creation.
5.3.2 Hypervisors
A component of virtualization software called hypervisor makes it possible to create virtual machines
on the same physical machine[11]. There are two types of hypervisors:
1. Type 1 – Bare Metal Hypervisor: It runs directly on the physical machine and acts as a lightweight
operating system. ESXi Hypervisor from VMware and Hyper-V Hypervisor from Microsoft are
examples of this type of hypervisor.
When a server is virtualized with a bare-metal hypervisor such as ESXi (or Hyper-V), it is called a
host (or a node). These hosts can be configured as a cluster (e.g., VMware ESXi cluster) for VMs to
Chapter 5: Compute 37
be moved around in case of failures of hosts (high availability) or to manage the changing load on
the cluster (load balancing). The hosts configured as a cluster share resources such as processor,
memory, storage, and network.
Note 1: vSphere HA is a feature that enables high availability by restarting the failed VMs on other
ESXi hosts that have spare capacities. Likewise, the vSphere DRS feature enables load balancing by
treating the resources of all ESXi hosts as a global pool and automatically migrates VMs to different
ESXi hosts. A cluster has shared storage for all its ESXi hosts that maintain virtual machine disk
(VMDK) files accessible to all the VMs in hosts within the cluster.
Note 2: The storage used by a VM is stored as a file with a. vmdk extension. The format of the file is
virtual machine disk (VMDK).
2. Type 2 – Hosted Hypervisor: It runs as a software layer on an operating system. Examples are
Oracle VM VirtualBox and Microsoft Virtual PC.
1. Rack servers: These fit into server racks and are suited for intensive computing operations. They
are self-sufficient servers with their hardware, including memory, raid controller data drives, power
supply, and cooling unit.
2. Blade servers: These fit into server chassis, which provides space and power. Cabling is also optimal.
Blade servers take up less space and consume moderate power while providing high processing
power. They are also hot-swappable, and that feature improves serviceability.
Several important concepts must be considered when specifying an x86 server for the solution.
1. Processor:
a. A processor (CPU) is a physical component that provides central processing unit capability to
a server. There may be more than one processor in a server.
b. A core is an operation unit within the processor. A single processor may have multiple physical cores.
d. Many processors use a hyperthreaded model. A thread is a unit of execution in a process that is
executed by a core (in a processor)[14]. Multithreading enables the core (in a processor) to execute
several threads, each running a task of a process concurrently. Hyperthreading runs processes
in parallel by making a single physical processor core available to the operating system as
two “logical” cores. The operating system schedules processes on the two “logical” cores in
Chapter 5: Compute 38
a multi-processor system as it does on two physical cores[15]. With hyperthreading, effectively, the
total cores are doubled. Figure 5.4 provides the physical and operating system perspective.
e. Virtual processors (vCPUs) are assigned to a VM by the virtualization software. It represents the
portion of the physical processor that is allocated to the VM.
2. Memory: In addition to the amount of memory in the server, it is important to analyze the type of RAM
(SRAM, DRAM) in the specific model of the server and the amount of Cache memory (L1/L2/L3).
Illustration: Consider a server with one socket and a processor with 4 cores.
Core Core
(Physical) (Physical) vCPU 2 vCPU 3 vCPU 18 vCPU 19
Chapter 5: Compute 39
4. Server expandability: The type and number of expansion slots, ports, dedicated server storage, and
other components determine its expandability. For instance, a server might offer only a single PCIe
2.0 slot or provide two PCIe slots, one PCIe 2.0 and another PCIe 3.0. More number of slots enables
greater scalability at peak loads.
5. Security-related cryptography features: Some servers have secure data encryption built-in as a
hardware feature using a crypto co-processor to carry out cryptographic operations.
6. Processor clock speed: The faster the clock speed, the more instructions can be executed per
second, and applications run more quickly. However, increasing the number of cores on a processor
might require slower clock speeds to run more applications simultaneously (with each running a
little slowly). Thus, a balance must be achieved based on the type of workload (compute intensive
vs. I/O intensive).
The above concepts need to be kept in mind when arriving at the server configuration.
Intel or AMD processors for x86 servers have traditionally been based on CISC-based architecture.
The Intel Xeon processor is currently widely used in server models of several vendors.
Chapter 5: Compute 40
5.4 Compute Characteristics
Tiers are defined for different service characteristics. Typically, there are three tiers – HIGH, MEDIUM,
and LOW. An indicative table with Compute service tiers is given in Table 5-2.
The Compute service tiering, in general, aligns with the tiers of application criticality described in
Table 4-1[17].
References
[1]
IBM, “Mainframe solutions”, https://www.ibm.com/it-infrastructure/mainframes.
[2]
IBM, “Mainframe - Family tree and chronology”, https://www.ibm.com/ibm/history/exhibits/mainframe/mainframe_FT1.html.
[3]
H. Rama, “MAINFRAME SPECIALTY ENGINES”, https://www.cmg.org/2017/07/mainframe-specialty-engines/.
[4]
IBM, “A Brief History of the IBM AS/400 and iSeries”, https://www.ibm.com/ibm/history/documents/index.html.
[5]
IBM, “DB2 for i Frequently Asked Questions”, https://www.ibm.com/downloads/cas/1DAL4A8G.
[6]
IBM, “AIX & IBM i POWER on IBM Cloud”, https://cloud.ibm.com/catalog/services/power-systems-virtual-server.
[7]
Skytap, “Using Power VMs in Skytap”, https://help.skytap.com/kb-using-power-vms.html.
[8]
B. Lee, “Physical server vs Virtual machine: The Choice is open”,
https://www.vembu.com/blog/physical-server-vs-virtual-machine-choice-open/.
[9]
VMware, “Virtual Machine”, https://www.vmware.com/topics/glossary/content/virtual-machine.
[10]
VMware, “Server Virtualization”, https://www.vmware.com/topics/glossary/content/server-virtualization.
[11]
VMware, “Hypervisor”, https://www.vmware.com/topics/glossary/content/hypervisor.
[12]
Serverstack, “Difference between Rack servers and Blade Servers”,
https://www.serverstack.in/2019/01/19/difference-between-rack-servers-and-blade-servers/.
[13]
U. Panda, “How to decide VMware vCPU to physical CPU ratio”,
https://www.cloudpanda.org/blogs/how-to-decide-vmware-vcpu-to-physical-cpu-ratio.
[14]
R. Bauer, BackBlaze, https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/.
[15]
Wikipedia, “Hyper-Threading”, https://en.wikipedia.org/wiki/Hyper-threading.
[16]
IntelliPaat, “AWS vs Azure vs Google Cloud: Choosing the Right Cloud Platform”, https://intellipaat.com/blog/aws-vs-azure-vs-google-cloud/.
[17]
S. Samy, “Service Criticality Tiers Standard and Architecture”,
https://www.linkedin.com/pulse/service-criticality-tiers-standard-architecture-sherif-samy/.
[18]
IBM, “ZOS Mainframe concepts”, https://www.ibm.com/docs/en/zos-basic-skills?topic=concepts-mainframe-hardware-evolving-design.
[19]
IBM, “Introduction to IBM Power Virtualization Management e-Learning (text only)”,
https://www.ibm.com/docs/en/power-sys-solutions/0008-ESS?topic=P8ESS/p8eew/elearning/powervm_script.html.
Chapter 5: Compute 41
Chapter 6
Network
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 6.1: Data Centers with representative infrastructure components Focus of this chapter
The network capability is foundational to the data center. In Figure 6.1, showing the data center deployments,
the representative components related to the focus of this chapter, namely network capability, are highlighted
(in a box with a dashed outline). All other infrastructure components sit on the network and interact over the
network. The network architecture concepts are equally important in the cloud, too, except there is no need
to stand-up any infrastructure related to them. They may all be configured using the services provided by
cloud providers. Other than organizations that provide products/services whose solutions run only on the
public cloud, all the others would need to deal with network components in the data center and cloud and
their connectivity solutions. This chapter presents the key network architecture concepts.
Chapter 6: Network 43
Note: It may be noted in Figure 6.1 that the network cables run
across the data centers and are connected to all infrastructure
Key Components of
Data Center Network
components. While they contribute to network capability, they are
not shown highlighted with a dashed outline to avoid cluttering the • LAN & WAN
figure with unnecessary detail.
• VLAN
• Subnetwork
6.1 Network Basics • DMZ
This section touches on basic concepts of networking. • Firewall
A data center network is a set of firewalls, routers, switches, • Switch
and several other network components wired together through • Router
fiber-optic or copper cables. The network components use a set of • Load balancer
protocols to communicate over these cables. • Forward Proxy/
Reverse Proxy
6.1.1 OSI Model • NAT
The communication protocols are based on the 7-Layer OSI model
summarized in Table 6-1[1].
Presentation Data formatting for view by presentation with encryption HTTPS, SSL
and decryption.
Transport Data transfer between end systems and hosts. TCP, UDP
Data Link Node-to-node data frame transfer based on MAC Address. Layer 2 Switches
Physical Physical link (wired or wireless) for communication. Layer 1 Hubs, NICs,
Cable
Layers 1, 2, and 3 are particularly important when defining infrastructure architecture for deploying new
network solutions.
Chapter 6: Network 44
6.1.2 LAN and WAN
A local area network (LAN) is a computer network that connects computers and other devices in a small
area. Local Area Networks (LANs) use Ethernet and provide high data transfer rate – Fast Ethernet 100
Mbps, or Gigabit Ethernet 1/10/40/100 Gbps.
A wide area network (WAN) is a network that provides connectivity across multiple regions. WANs are
of many types –
b) Multisite connected using MPLS (Multiprotocol Label Switching) – MPLS establishes private
connection linking data centers and branch offices by directing data through a path via labels.
c) Software-based SD-WAN – SD-WAN is a software-defined wide area network that allows multisite
traffic to traverse on MPLS or less-costly internet links based on the criticality of the traffic. Encryption
is used for traffic that is sent via internet links.
LAN uses protocols such as Token ring and FDDI (Fiber Distributed Data Interface). VLAN uses IEEE802.1q
and Inter-Switch Link (ISL) protocols [2]. The Ethernet LAN represents the collision domain on which
Layer 0 Ethernet (CSMA/CD) packets collide. A VLAN represents the broadcast domain, i.e., a group of
devices configured to receive broadcast traffic (Layer 1 Data Link Frames) from one another. Without
VLANs, a broadcast message sent from a host reaches all network devices increasing CPU overhead on
each device and reducing the overall network security. With VLAN configuration, a broadcast from the
host is limited to devices on the VLAN.
A VLAN can be created from one or multiple LANs. It enables the network administrator to automatically
limit access to a specified group of servers/desktops into different isolated network segments. Two
types of VLANs are commonly in use[3] –
1. Port-based (Untagged) VLANs: A single physical switch is simply split into multiple logical switches.
2. Tagged VLANs: Multiple VLANs use a single switch port. Tags are attached to the individual
Ethernet frames as they exit the port. Tags contain the VLAN identifiers specifying the VLAN to
which the frame belongs. When both switches understand tagged VLANs, the connection can be
accomplished using a single cable connecting from, what is called, a “trunk” port.
Chapter 6: Network 45
Illustration – Port-based (Untagged) VLAN
1. One Switch S1 – Figure 6.2
• All the servers have been connected to one physical switch. However, only the following servers
can communicate with each other due to the configuration of the VLAN.
• VLANs
Ports
Switch S1
Server P1
1
Server P2
Server P8 2
3
4
5
Server P9
6
7
8
Chapter 6: Network 46
2. Two Switches S1 and S2 – Figure 6.3
Ports Ports
Switch S2
Switch S1
Server P1 Server Q1
1
Server P2 Server Q2
2
3
3
4
4
Server P8 Server Q8
5
5
Server P9 Server Q9
6
6
7
7
8
• 4 Servers are connected to Switch S1, and the other 4 servers to Switch S2, as shown.
• Both VLANs are configured on the physical switch, and since it is a port-based VLAN configuration,
one cable per VLAN is required. Therefore, two cables will be required for connecting both VLANs.
• Only the following servers can communicate with each other due to the configuration of
the VLAN.
• VLANs
Chapter 6: Network 47
Illustration – Tagged VLAN
Ports Ports
Switch S2
Switch S1
Server P1 Server Q1
1
Server P2 Server Q2
2
3
3
4
4
Server P8 Server Q8
5
5
Server P9 Server Q9
6
6
7
Trunk
8
• Figure 6.4 shows 4 Servers connected to Switch S1 and the other 4 servers to Switch S2.
• One single physical connection is established between the two physical switches. In this
illustration, both ports S1-8 and S2-8 are configured as “trunk” ports and will carry traffic for
both VLANs.
• VLAN tags (IEEE 802.1q) are used for VLAN1 and VLAN2. Tags allow for separation of VLAN1
and VLAN2 traffic without the need for physical separation.
• VLAN tags are set as traffic exits a Switch S1 “port,” so the next Switch S2 needs to understand
802.1q tags because it changes the Ethernet frame when inserted.
Subnetwork: Subnetwork or subnet is a logical network partition at Layer 3 of the OSI model. At Layer 3,
i.e., at the Network layer, each computer or host has at least one IP address as a unique identifier. The
Internet Protocol (IP) is used for sending data from one computer to another over the internet[4].
A subnet is defined to subdivide large IP networks into smaller, more efficient subnetworks. A subnet
aims to divide a large network into smaller, interconnected networks to minimize the broadcast traffic
Chapter 6: Network 48
on a single network segment, thereby improving available network bandwidth. It also optimizes the
usage of available IP address space.
Subnet masks split an IP address into bits that identify the network and host parts. When a device sees
the network identification and host identification bits of another device’s IP address, it can determine if
it is part of the same network or some other network. Figure 6.5 shows the structure of the IP address
and the network identification bits and host identification bits.
168 8 252 2
32 bits
Before Subnetting
After Subnetting
Subnetting is the segmentation of a network address space. It allows its connected devices to
communicate with each other. Routers are used to communicate between subnets. Subnet masks
specify the range of IP addresses used within a subnet. Two types of subnet specifications have been
in existence[5]:
1. Classful: Three classes of subnets have been defined as summarized in Table 6-2. Older and
less used.
Bits to Bits to
Subnet specify specify Number of
Class Mask Format hosts Number of Hosts networks Networks
Class A 8-bit 255.0.0.0 24 224 – 2 (=16777214) 1 28-1 (=128)
Class B 16-bit 255.255.0.0 16 216 – 2 (=65534) 2 216-2 (=16384)
Class C 24-bit 255.255.255.0 8 28 – 2 (=254) 3 224-3 (=2097152)
Chapter 6: Network 49
2. Classless: Classless Inter-Domain Routing (CIDR) notation is used to specify a subnet, as
summarized in Table 6-3. A trailing “/” slash and a number are used that specify how many bits are
used to identify the network portion of the address. Currently widely used.
Illustration: A /20 would indicate that 20-bits are used to identify the network, and the remaining
12-bits are used to identify the host.
Chapter 6: Network 50
6.1.4 Basic Network Diagram
A basic network diagram is shown in Figure 6.6. It shows a router and a DMZ with several infrastructure
components bounded by two Firewalls. A switch, three web servers, an email server, and a DNS server
are shown as part of the DMZ. A switch, two application servers, and a database server are shown
beyond the internal firewall.
Demilitarized Zone (DMZ): A demilitarized zone (DMZ) is a network segment to prevent outside users
from gaining direct access to an organization’s internal network. It represents a “neutral zone” between
the internet and an enterprise’s intranet.
External Internal
Firewall Firewall
DMZ
Internet
202.29.120.110 192.168.0.1
Router
(NAT Enabled)
Switch Switch
Firewall: “A firewall is a network device that monitors and controls incoming and outgoing network
traffic”[6]. A firewall controls traffic based on security rules specified in its configuration. It constitutes
a barrier between a trusted network and an untrusted network. Firewalls secure both LAN and WAN
environments and are of two types.
a. Traditional firewall: controls incoming or outgoing traffic at a point within the network. It tracks
traffic, typically in Layers 2 – 4 of the OSI model. Both stateless (monitors data in packets)
and stateful (applies intelligence and keeps track of the entire cycle of flow) methods may be
employed by the firewall[7].
Chapter 6: Network 51
b. Next-Gen firewall: application-aware, recognizes user of application through inspection of traffic,
blocks malware, provides integrated IPS, performs deep packet inspection, and recognizes and
decrypts SSL and SSH. It tracks traffic, typically in Layers 2 – 7 of the OSI model.
Router: A router forwards data from one network to another. It is a Layer 3 device used extensively to
forward internet traffic. There are two types of routing that may be performed.
a. Static Routing: A route table is created and maintained by a network administrator manually on
a router.
b. Dynamic Routing: A route table is created and maintained by routing protocol on a router. Commonly
used routing protocols include RIP (Routing Information Protocol), EIGRP (Enhanced Interior
Gateway Routing Protocol), and OSPF (Open Shortest Path First). While routers share dynamic
routing information with each other, the use of routing protocol brings enhanced routing capabilities
by dynamically choosing an optimal path when there are changes to network infrastructure.
Load balancer: A load balancer is a device that sits between the clients and servers and efficiently
distributes client requests to servers. The primary function of the load balancer is to provide high
availability for the hosted application. Load balancers may act at Layer 4 (IP, TCP, FTP, UDP) or
Layer 7 (HTTP). They may also be external facing handling requests from external sources or
internal facing. In Figure 6.6, a load balancer is shown distributing requests to the three web servers
at Layer 7.
Switch: Network switches connect devices on a network by receiving data from one device and
forwarding it to another device. Layer 2 network switches operate at the data link layer (OSI layer 2),
inspect frames, and use MAC addresses to forward data. Multilayer switches perform all functions that
Layer 2 switches do. Layer 3 network switch is one type of multilayer switch that forwards data using
destination IP address. Multilayer switches can perform routing functions, including static routing and
dynamic routing. Multilayer switches can inspect deeper into the protocol stack.
Network Address Translation (NAT): Network address translation is the process by which IP addresses
within a data packet are replaced with different IP addresses. Either routers or firewalls perform
this process. Assume the router shown in Figure 6.6 to be NAT capable. The LAN side IP address
is 192.168.0.1, and the internet side IP address is 202.29.120.110. For any packet being sent to the
internet, the NAT would change the IP address field of the sender in the packet to 202.29.120.110 when
Chapter 6: Network 52
sending the packet to the internet to hide the IP addresses of internal servers. For all the clients/servers
on the internet, only IP 202.29.120.110 is visible.
Then the question arises – how does the router correctly send the responses from servers on the internet
to the requesting client on the internal network and vice versa? It works based on the combination of
IP address and port number for each client communicating from the internal network. The NAT device
maintains a mapping of internal IP addresses and port numbers on which data is sent by internal clients
and the IP addresses of corresponding external servers while performing address translation. NAT hides
the internal device details from external clients/servers in the process.
1. The client on the internal network initiates a request to the router to connect to 148.211.63.19, port
18 and asks it to respond to IP address 192.168.0.16, port 23767.
2. The router sends a request to the remote server, 148.211.63.19, port 18, and indicates that responses
be provided on IP address 202.29.120.110, port 32122.
3. When data comes back from the server, the router accepts it on the response-port number 32122
provided by it to the server.
4. The router then passes the data to the client on the internal network, the IP address 192.168.0.16,
and response-port 23767 that the client provided to the router when it started the conversation.
External Internal
Firewall Firewall
DMZ
Internet
202.29.120.110 192.168.0.1
Server Port no. Router
32122 (NAT Enabled)
Switch Switch
Client
192.168.0.16
Port no.
Load Email DNS 23767
Balancer Server Server
Database Application
Server Server
Chapter 6: Network 53
6.1.6 Proxy
In simple terms, a proxy is doing something on behalf of someone or something else. There are two
types of network proxy:
1. Forward Proxy: A forward proxy (or simply proxy) is an intermediary server that forwards requests
on behalf of multiple clients to an external network. It is typically placed in the DMZ to forward
requests from an isolated, internal network to the internet through a firewall. Forward proxy hides
internal client IPs from devices on external network. Firewalls can also perform such functions and
may be used for that purpose.
2. Reverse Proxy: A reverse proxy is an intermediary server that accepts requests on behalf of multiple
servers. It is also placed in the DMZ. Reverse proxy hides IPs of internal servers from external clients.
Server 1 Server 2
Server Client
External Forward Internal
Firewall Proxy Firewall
6.1.7 Microsegmentation
It is common to refer to traffic in the data center as being north-south or east-west.
1. North-South traffic: For traffic from a server in a data center to reach the internet, it needs to reach WAN.
The traffic that involves the inward and outward flow of data packets from the server (LAN) to WAN is
north-south traffic. Traditionally, most of the traffic in the past in data centers was of this kind.
2. East-West traffic: The traffic flow within a data center, VLAN, or subnet is referred to as
east-west traffic. For example, the data communication from the application server to the DB server
constitutes east-west traffic. This kind of traffic is the norm in contemporary data centers.
Chapter 6: Network 54
Figure 6.9 depicts north-south traffic and east-west traffic pictorially.
Internet
North
South
Switch
East West
Microsegmentation is a method of segmentation to create zones in data centers and cloud environments
to divide into distinct security segments at the individual workload level[8]. It enables the isolation of
workloads from one another and secures them based on a Zero trust (trust no user or device) approach.
Using microsegmentation, network administrators can create policies to restrict network traffic, reduce
the network attack surface, and provide consistent security across data centers and cloud platforms.
Traditional network segmentation has been at the Layer 2 level by defining multiple virtual segments
(VLANs) and at the Layer 3 level by defining subnets. It continues to work well for north-south traffic
crossing the perimeter security. However, security for east-west traffic between workloads needs to be
much more granular – VM and workload level[9]. Microsegmentation addresses this need by enabling
the creation of microsegments, isolating, and securing them through policies. At a VM level, multiple
virtual NICs (Network Interface Cards) such as production NIC, management NIC, and backup NIC
may be assigned, and microsegments may be configured for traffic between the VMs using network
virtualization solutions such as VMware NSX.
Chapter 6: Network 55
6.2 Network Architecture
The network architecture in a data center is the architecture for the LAN based on Layer 2 and Layer
3 switching and routing to structure the flow of traffic. A hierarchical network has been found to be
more effective for improved manageability and troubleshooting than a flat network. Hence, network
architecture in a data center using switching and routing components has been hierarchical.
In the past, north-south traffic constituted a portion of data flow in a data center, while east-west traffic
is currently a significant portion of data flow. Hence, two types of network architecture have come into
vogue to support the traffic flows.
1. Three-tier network architecture: It has switches in three layers – best suited for north-south traffic.
2. Two-tier spine-leaf architecture: It has switches in two layers – best suited for east-west traffic.
1. Access: This layer of devices connects user devices such as PCs, IP phones, wireless access points,
printers, and scanners to the network.
2. Distribution: This layer of devices does not provide service to end devices but aggregates data from
the access switches.
3. Core: This layer constitutes the network’s backbone and provides a high-speed connection between
different distribution layer devices.
Core
Layer 3 Aggregation
Layer 2
Access
Chapter 6: Network 56
Access Layer: This layer has access switches that implement Layer 2 VLANs for different logically
separated environments (shown as ENV A, ENV B, and ENV C in Figure 6.10). End-user devices are
connected to switches in this layer, and traffic is restricted to the Layer 2 VLANs. They are traditional
switches that consist of 24 to 48 ports of 1 or 10Gbps ports. This layer implements several Layer 2
switching services. One of these services is the spanning tree to prevent multiple connections between
two network switches or two ports on the same switch connected. Otherwise, the loop creates repeated
broadcast messages that flood the network.
Distribution Layer: Layer 3 switches in this layer are called distribution or aggregation switches as they
aggregate data from switches in the access layer. Every Layer 2 switch is connected to a corresponding
Layer 3 switch for the logically separated environment. If any device in an ENV needs to have connectivity
to VLANs defined in a different ENV, it is implemented through tunneling in Layer 3. Layer 3 tunneling
uses network layer tunneling protocols (e.g., IPSec) for the exchange of data packets by the addition of
a new IP header to an IP packet before sending them across a tunnel created over an IP-based network.
Essentially, the Layer 3 switch routes the data to the right ENV at Layer 3.
Core Layer: The switches in this layer, called core switches, have high throughput and advanced routing
capabilities. This layer is the backbone of the network. A packet received by the core switch is routed
to the correct distribution switch and onward to the access switch where the destination device for the
packet is connected. The only service provided by core switches is to route traffic at the fastest possible
speed.
The three-tier network architecture worked well for north-south traffic. However, with increasing
east-west traffic in data centers, the three hops corresponding to the three tiers increase to four, five, or
more, adding significant latency and latency predictability issues. The spanning tree has also exhibited
brittle failure mode for network issues resulting in network outages[13]. Cisco introduced virtual-port-
channel (vPC) technology to overcome the limitations of the Spanning Tree Protocol[10]. It was also
possible to extend the Layer 2 boundary to core switches and have Layer 2 VLANs spread across the
ENVs. This approach enabled specific capabilities such as being able to do vMotion of VMs. Multiple
connections could be made with access switches and distribution switches. However, vPC also works
well when most traffic is north-south between clients and servers.
The leaf layer switches are connected to each of the spine layer switches in a mesh topology. Spine
switches are not connected to one another. If one spine switch were to go down, then the traffic is
routed through the other spine switches.
Chapter 6: Network 57
Spine
Leaf
Every server is only two hops away. Traffic from Leaf switch to Leaf switch goes via spine switch and
constitutes east-west traffic. Layer 2 boundary may be at the spine switches or just leaf switches. No
Spanning tree is used. Fabric path with ISIS protocol is used with equal-cost load balancing that gives
high throughput, or VxLAN protocol is used alternatively. The scalability for east-west traffic throughput
is accomplished by adding more spine switches. Likewise, scalability of the port capacity of leaf
switches is possible by adding more leaf switches. North-south traffic goes through the spine layer if the
Layer 2 boundary is extended to the spine. However, Layer 2 boundary is restricted to the Leaf layer, as
otherwise, broadcast traffic will spread across all the ports (of switches).
The spine-leaf architecture described in the previous section could be the physical network
infrastructure and is referred to as an underlay network. A network virtualization software can run
on the underlay network to create an overlay network that can run different virtual network layers.
1. Cisco’s Application Centric Infrastructure (ACI): Integrated overlay approach, i.e., includes hardware
and software. It uses a virtualization technology called VXLAN (virtual extensible local area network
technology).
2. VMware’s NSX-T: Adopts an approach of software overlay over server infrastructure. It uses
a generic technology for network virtualization called GENEVE.
Chapter 6: Network 58
6.4 Network services in the cloud
All public cloud platforms offer services to define the network and configure it remotely through admin
consoles. Table 6-4 provides an overview of key network services from major cloud platform providers[12].
References
Shaw K., “The OSI model explained and how to easily remember its 7 layers”,
[1]
https://www.networkworld.com/article/3239677/the-osi-model-explained-and-how-to-easily-remember-its-7-layers.html.
[2]
Harmoush E., “Virtual Local Area Networks (VLANs)”, https://www.practicalnetworking.net/stand-alone/vlans/.
[3]
Quick B., “How Do VLANs Work?”, https://www.inteltech.com/how-do-vlans-work/.
[4]
Ferguson K., ”subnet (subnetwork)”, https://www.techtarget.com/searchnetworking/definition/subnet.
[5]
Erikberg, “Notes: Networks, Subnets, and CIDR”, https://erikberg.com/notes/networks.html.
[6]
G. Palmer, “Network Device and Technologies 1.1 SY0-401”, https://zymitry.com/network-devices-technologies/.
[7]
Njoroge J., “When a Traditional Firewall Doesn’t Go Far Enough”,
https://gtb.net/why-gtb/blog/when-traditional-firewall-doesn%E2%80%99t-go-far-enough.
[8]
VMware,” What is Micro-Segmentation?”, https://www.vmware.com/topics/glossary/content/micro-segmentation.
[9]
paloalto, “What is Microsegmentation?”, https://www.paloaltonetworks.com/cyberpedia/what-is-microsegmentation.
[10]
Cisco, “Cisco Data Center Spine-and-Leaf Architecture: Design Overview White Paper”,
https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-series-switches/white-paper-c11-737022.html.
[11]
Morin J., Shaw S. “Network Virtualization for dummies”, https://www.vmware.com/content/microsites/learn/en/47785_REG.html.
[12]
S. Wickramasinghe. “AWS vs Azure vs GCP: Comparing The Big 3 Cloud Platforms”,
https://www.bmc.com/blogs/aws-vs-azure-vs-google-cloud-platforms/.
[13]
Ferro G. “Why Spanning Tree Is Evil”, https://www.networkcomputing.com/networking/why-spanning-tree-evil.
Chapter 6: Network 59
Chapter 7
Storage
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 7.1: Data Centers with representative infrastructure components Focus of this chapter
Enterprise systems continuously add data to their structured and unstructured data stores and process
it by OLTP, OLAP, and ML systems. Additionally, multiple copies of data are maintained for operational
recovery and disaster recovery requirements. Further, regulatory requirements in several industry
segments require data copies to be archived for several years. Due to cyber-attacks in recent times,
Chapter 7: Storage 61
more copies are also being stored in cyber vaults (CV) to recover
from such attacks. Consequently, there is enormous and increasing
Storage
demand for storage in enterprises. • Block Storage
The capability of storage is assessed by following parameters[1] – – Storage Area Network
(SAN)
1. IOPS: Input/output operations per second.
• File Storage
2. Throughput: Number of bits (Gbps) or bytes (GBps) a system can – Network Attached
read or write per second. Storage (NAS)
3. Latency: Duration for a single data request to be received, the • Object Storage
correct data to be located, and response to be provided by the – RESTful API access
storage media. for storage services
4. Capacity: Amount of data that can be stored in GB or TB.
5. Availability: Percentage of time that a storage system is available
for use, i.e., uptime.
6. Durability: Measure of storage system’s long-term data protection ability, i.e., not suffer from
degradation, bit rot, or other corruption. It is expressed as a percentage.
7. Types of storage media supported: NVMe (SSD), SSD, SAS, SATA.
8. Storage efficiency: Optimize storage through deduplication and compression techniques.
There are three types of storage solutions in use, both on-premises and cloud environments.
1. Block Storage.
2. File Storage.
3. Object Storage.
Before SAN, Direct attached storage (DAS) was a solution employed for storage that involved connecting
disks directly to servers. They contained applications and data on them. While disks attached to one
Chapter 7: Storage 62
server were accessible from other servers, the data accessed had to flow over the LAN for the servers, and
moving large amounts of data caused performance bottlenecks due to bandwidth issues.
On the other hand, the SAN has a separate network for the Fibre channel to interconnect disks. It does not use
the LAN, so the transfer of SAN storage data does not impact LAN performance. There is a greater cost for
the network setup for the Fibre channel and maintenance, but it gives a greater performance. iSCSI and FCoE
protocols involve transferring data over the standard network components of standard Ethernet LAN. The
iSCSI (short for “Internet SCSI”) protocol enables clients to send SCSI commands to SCSI storage devices
on remote servers. iSCSI can work over long distances using existing network infrastructure. While iSCSI
and FCoE may have disadvantages of using standard Ethernet LAN from the point of view of performance
bottlenecks due to bandwidth issues, they provide the advantage of lower cost of setup and maintenance.
1. Storage layer: This layer has an array of physical disks, most often configured with RAID options to
improve storage capacity, reliability, or both. RAID (redundant array of independent disks) protects
data in the case of a disk/drive failure. It is configured to appear to the operating system (OS) as
a single logical drive. A Logical Unit Number (LUN) is typically assigned to one or more disks (e.g.,
LUN0, LUN1) and may be accessed by servers.
2. Fabric layer: The layer with SAN switches, routers, gateways, and protocol bridges constitutes the
fabric layer over which the host layer accesses data in the storage layer.
3. Host layer: The hosts that connect with the SAN storage via the fabric layer constitutes the host
layer. Each of these hosts has a separate network adapter for Fibre Channel that differs from
ethernet called host bus adapter (HBA). The hosts run the business applications, databases and
communicate using the HBA with the SAN storage over SAN fabric.
Illustration – Storage Area Network
VM VM VM VM
Storage Layer
LUNs
(e.g. DELL/EMC
PowerMax)
Physical Server
Switch (e.g. Brocade)
ARRAY3
CPU CPU CPU CPU (RAID)
Fabric Layer
PCI Bus FC HBA Host Layer
ARRAY2
(RAID)
Switch (e.g. Brocade)
ARRAY1
(RAID)
Chapter 7: Storage 63
Figure 7.2 depicts the SAN solution with the three layers. The host layer consists of physical and virtual
servers (e.g., ESXi host with VMs), fabric Layer shows SAN Switches (e.g., Brocade), and storage layer
(e.g., PowerMax) depicts LUNs configured on the disk arrays (with RAID) for providing block storage.
a. It is typically a mix of EFD (Enterprise Flash Storage that is Solid State Drive with higher
performance), FC (Fiber Channel), SAS (Serial-attached SCSI), SATA (Serial AT Attachment).
b. DELL/EMC’s PowerMax is an example of a SAN system.
2. Virtual SAN
a. With the evolution of software-defined storage, there is an option to use Virtual SAN (vSAN). Instead
of using central storage with a separate network for accessing the storage layer, it is possible to pool
storage across multiple servers and use it as a Virtual SAN. The storage is accessible over ethernet.
It has the advantage of lower-cost by using available storage capacity through software means.
VMware vSAN works with vSphere hypervisor and can be managed with vSphere client[6].
Table 7-1: Key block storage and database services for AWS, Azure, and GCP
Chapter 7: Storage 64
7.2 File Storage
Many files are generated both by individuals and applications in any organization. The file storage
solution is an effective mechanism to store and retrieve file-based data from a system that provides
access to operating systems as a mount point or drive mapping.
Network Attached Storage (NAS) has disk arrays that are managed by an operating system. It provides
network interfaces and is accessed using file services protocols (NFS/CIFS) over ethernet, which is
different from the block-based protocols such as Fiber Channel (FC) and iSCSI used in SANs. NAS is
better for unstructured data. NAS provides high-capacity storage at a lower cost, and admins can add
more disks to scale capacity.
Figure 7.3 depicts NAS file storage connections over ethernet with physical servers and virtual machines
VM on a virtualization server (e.g., ESXi host). NAS products also most often support iSCSI for access
to SCSI storage.
VM VM VM VM
Ethernet
Interface
PCI Bus
NAS
TCP/IP NFS/
IP CIFS
Network
Physical Server
Ethernet
Interface
PCI Bus
Chapter 7: Storage 65
7.2.2 File storage options for cloud
The file storage services on public cloud platforms are summarized in Table 7-2[8].
File Storage Elastic File System, FSx Azure File Storage, Google Filestore
for Windows and Lustre Avere vFXT
Table 7-2: Key file storage services for AWS, Azure, and GCP
c. IOPS: Low IOPS for SATA disks and high IOPS for SSDs.
Chapter 7: Storage 66
7.3.2 Object storage options for cloud
Perhaps, one of the earliest services to have gained adoption on the cloud is the object storage service.
A comparison of object storage services offered by AWS, Azure, and GCP is given in Table 7-3[8].
Data Transfer Snowball edge, Import/ Data Box & Import/ Storage transfer service
Export disk & Snow Export
Mobile
Table 7-3: Key object storage services for AWS, Azure, and GCP
Chapter 7: Storage 67
7.5 Storage Tiers
An enterprise’s storage data service characteristics are classified into various tiers based on performance
and price. Table 7-5 illustrates different tiers defined for storage[11].
References
Pritchard S., “Storage performance metrics: Five key areas to look at”,
[1]
https://www.computerweekly.com/feature/Storage-performance-metrics-Five-key-areas-to-look-at.
[2]
Sullivan E., “block storage”, https://searchstorage.techtarget.com/definition/block-storage.
[3]
Atlantic.Net, “What is Block Storage?”, https://www.atlantic.net/dedicated-server-hosting/what-is-block-storage/.
[4]
Poojary N., “Understanding Object Storage and Block Storage Use Cases”, https://cloudacademy.com/blog/object-storage-block-storage/.
[5]
Bigelow S., “What is a SAN? Ultimate storage area network guide”,
https://searchstorage.techtarget.com/definition/storage-area-network-SAN.
[6]
Larcom A., “VMware vSAN vs. SAN: What are the differences?”,
https://searchvmware.techtarget.com/tip/How-VMware-vSAN-differs-from-a-traditional-VSAN.
[7]
Chapter247, “AWS vs Azure vs Google Cloud- A detailed comparison of the Cloud Services Giants”,
https://www.chapter247.com/blog/aws-vs-azure-vs-google-cloud-a-detailed-comparison-of-the-cloud-services-giants/.
[8]
A. Adshead, “Cloud storage 101: NAS file storage on AWS, Azure and GCP”,
https://www.computerweekly.com/feature/Cloud-storage-101-NAS-file-storage-on-AWS-Azure-and-GCP.
[9]
U. Boppana, “Red Hat Ceph Storage 3 greatly advances object storage capabilities”,
https://www.redhat.com/en/blog/rise-object-storage-modern-datacenter.
[10]
R. Sheldon, TechTarget, “NVMe speeds vs. SATA and SAS: Which is fastest?”,
https://searchstorage.techtarget.com/feature/NVMe-SSD-speeds-explained.
[11]
STONEFLY, “Everything you need to know about Tiered Storage”, https://stonefly.com/resources/what-is-tiered-storage.
Chapter 7: Storage 68
Chapter 8
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 8.1: Data Centers with representative infrastructure components Focus of this chapter
Backup/restore is a capability to create a copy of a system’s data (server or desktop) and use it for recovery,
should the original data be lost or corrupted[1]. A tape library, physical or virtual (VTL), and storage space
are required to store the backups. The focus of backup is on operational recovery as per the Recovery
Point Objective (RPO) defined. Two types of policies are applied in enterprises for retention of backup
data for compliance to organizational processes and regulations - short-term retention (STR) and long-
term retention (LTR)[2]. In Figure 8.1, showing the data center deployments, the representative components
related to the focus of this chapter, namely backup/restore capability, are highlighted (in a box with
a dashed outline). It may be noted that backups are different from snapshots and are meant to address
1. Recovery Point Objective (RPO): Maximum allowable data that may be lost in the event of a disaster.
It is measured in terms of time and dependent on the maximum age of the data or files in backup
storage[6].
2. Recovery Time Objective (RTO): Maximum time taken to recover from the adverse incident and
restoration of normal operations to users[7].
3. Backup Success rate: Percentage of attempts of backup when data is copied correctly and
completely[8].
4. Available Backup Window: Duration when suitable for taking backups of data[9].
5. Onsite short-term retention (STR) backup period: Duration for which backup copies are retained in
an environment that enables quick restore of data in the event of failure of a system.
6. Onsite long-term retention (LTR) backup period: Duration for backup copies to be maintained
onsite for point-in-time restore.
8. Monitoring & Support: The period during which the operations team provides monitoring and
support services.
A “3-2-1 backup” strategy is typically employed for backups – 3 copies of backup data are maintained,
2 in separate locations and 1 in off-site location[5].
Illustration
Figure 8.2 depicts agent-based backup with IBM’s Spectrum Protect and DELL/EMC’s Networker
technologies.
IBM’s Spectrum Protect has agents deployed on the servers (AIX, Windows, Linux) for backup of
SQL Server, DB2, and so on from which backup is taken. The agent takes the backup and sends it
to the server, which then writes to a storage such as DELL/EMC’s Data Domain. During the restore
process, IBM’s Spectrum Protect server uses the client to restore or spin up a new Virtual machine
and restore the backup. Figure 8.2 also shows inline-deduplication of Non-prod data and LTR
prod data to on-premises cloud storage (e.g., DELL/EMC ECS). In this case, data deduplication is
performed while the backup data is being copied to the backup device.
DD Boost
Front-end Network
Back-end
Network
X86 Physical/ DELL/EMC
Virtual Server Networker Backup / Restore
Backup / Restore
IBM Spectrum Server
Protect Client
Data Domain
Backup / Restore
Short-term Retention
IBM Spectrum
Protect Server In-line
Prod (SCUN) Deduplication
Environment
LTR Prod Bucket Non-Prod Bucket
On-Premise Cloud Storage
– IBM Spectrum Protect for Virtual Environments is a feature by which the backup server
takes backup of the complete image of the VM as a VMDK file. No agent is used. The restore
process involves spinning up a new VM and deploying the backed-up image.
– Likewise, Networker Image-level backup is a feature by which the backup server takes
backup of the complete image of VM as a VMDK file without an agent. The restore would be
to a new VM, and hence no agent is required.
1. Front-end network: Network segments (VLANs) used by applications to communicate with their
components (e.g., databases) and interfaces (e.g., middleware) constitute the front-end network.
2. Back-end network: Network segments (VLANs) used by backup devices to perform backups and
monitoring & management tools to communicate with their components constitute the back-end
network.
This practice separates the backup/restore operations traffic from the application traffic, which avoids
performance bottlenecks.
1. Full backup: Backup of all files, objects, bytes. It represents a complete backup of all data and can
be used for recovery without additional efforts. It takes time to do a full backup, depending on the
size of the data and the number of systems on which full backup needs to be done.
2. Differential backup: Differential backup makes a copy of data that has changed since the full
backup. During restore, the last full backup is used, and then the differential backup is applied on
top of it, thus saving time. However, as the number of days from full backup increases, the data to
be backed up also increases, resulting in an increase in time taken to take differential backup.
3. Incremental backup: Backup is taken of changes made since the last backup (full or incremental).
The last full backup is used during restoration, and subsequent incremental backups are applied
in the correct order. This process takes the least amount of backup window available in most
enterprises.
1. Mean Time Between Failures (MTBF): Average time between system failures. MTBF is related to
system uptime and not being under the control of the operations team.
2. Mean Time To Repair (MTTR): Average time to troubleshoot, repair, and restore the system from
failure. MTTR is related to downtime a system can tolerate to comply with availability criteria.
Operational recovery involves several activities, including recovery of data from backups. When a system
goes down with associated servers and applications, the data is recovered from backups.
3. Improper patching.
When the system needs to be recovered, the actions taken by the operations team for recovery are as
follows:
3. Deploy applications.
Backup/Restore
Storage Tiers Product Options (Examples)
Tier 0 –
Mission Critical Data for uninterrupted disruption-free access IBM Spectrum Protect[16]
and usage.
Tier 1 –
DELL/EMC Networker[17]
Frequently accessed (hot data) and high-performance workloads.
Tier 2 –
Infrequently accessed (warm data) with short-term retention
requirements.
Veritas NetBackup[18]
Tier 3 –
Archival and rarely accessed data with long-term retention
requirements.
Table 8-2: Product Options (Examples) for backup/restore in the data center
Data Transfer Snowball edge, Import/ Data Box & Import/ Storage transfer service
Export disk & Snow Export
Mobile
Table 8-3: Key Backup/Restore services for AWS, Azure, and GCP
References
[1]
Acronis, “Data Backup – What is it?”, https://www.acronis.com/en-sg/articles/data-backup/.
[2]
B. Posey, “Backup retention policy best practices: A guide for IT admins”, https://searchdatabackup.techtarget.com/answer/What-are-some-
ata-retention-policy-best-practices.
[3]
C. Puricica, Veeam, “Why snapshots alone are not backups”, https://www.veeam.com/blog/why-snapshots-alone-are-not-backups.html.
[4]
W. Preston, “Why is Operational Recovery Needed?”, https://storageswiss.com/2017/06/16/why-is-operational-recovery-needed/.
[5]
Cloudian, “Data Backup in Depth: Concepts, Techniques, and Storage Technologies”,
https://cloudian.com/guides/data-backup/data-backup-in-depth/.
[6]
dhruva, “Recovery point objective definition”,
https://www.druva.com/glossary/what-is-a-recovery-point-objective-definition-and-related-faqs/.
[7]
C. Puricica, “Demystifying Recovery Objectives”, https://www.veeam.com/blog/rto-rpo-definitions-values-common-practice.html.
[8]
D. Russel, Gartner Research, “Best Practices for Repairing the Broken State of Backup”,
https://www.gartner.com/en/documents/2574917/best-practices-for-repairing-the-broken-state-of-backup.
[9]
Techopedia, “What Does Backup Window Mean?”, https://www.techopedia.com/definition/991/backup-window.
[10]
Acronis, “Agent vs Agentless Backup: Why it Matters”, https://www.acronis.com/en-sg/articles/agent-vs-agentless-backup/.
[11]
Databarracks, “What are the advantages of Agent vs Agentless backup?”,
https://www.databarracks.com/blog/what-are-the-advantages-of-agent-vs-agentless-backup.
[12]
Techopedia, “Front and Back Ends”, https://www.techopedia.com/definition/24794/front-and-back-ends.
[13]
PARABLU, “Demystifying Data Backups”, https://parablu.com/demystifying-data-backups-types-of-backups/.
[14]
WEIBULL.COM, “Availability and the Different Ways to Calculate It”, https://www.weibull.com/hotwire/issue79/relbasics79.htm.
[15]
W. Preston, “Why is Operational Recovery Needed?”, https://storageswiss.com/2017/06/16/why-is-operational-recovery-needed/.
[16]
IBM, “IBM Spectrum Protect”, https://www.ibm.com/products/data-protection-and-recovery.
[17]
DELL, “Dell EMC NetWorker Data Protection Software”,
https://www.delltechnologies.com/en-in/data-protection/data-protection-suite/networker-data-protection-software.htm.
[18]
Veritas, “NETBACKUP - Best-in-class enterprise data backup and recovery”, https://www.veritas.com/protection/netbackup.
[19]
Chapter247, “AWS vs Azure vs Google Cloud- A detailed comparison of the Cloud Services Giants”,
https://www.chapter247.com/blog/aws-vs-azure-vs-google-cloud-a-detailed-comparison-of-the-cloud-services-giants/.
Disaster Recovery
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 9.1 – Data Centers with representative infrastructure components Focus of this chapter
A disaster is an adverse incident that prevents the operation of the applications/infrastructure from
a data center. Based on the outage duration, the enterprise makes the call to declare a disaster.
Disaster recovery is the capability to be able to recover successfully from a disaster in accordance with
pre-defined parameters. Two important parameters define the characteristics of recovery. In Figure
9.1, showing the data center deployments, the representative components related to the focus of this
chapter, namely DR capability, are highlighted (in a box with a dashed outline).
RPO is the Recovery Point Objective, which is the maximum allowable data that may be lost in the event
of a disaster. It is measured in terms of time and dependent on the maximum age of the data or files in
backup storage[1].
a. Business applications.
b. Infrastructure assets.
3. Recovery of any data that may have been lost or damaged during the disaster.
a. Data quality.
The DR-HIGH, DR-MEDIUM, and DR-LOW tiers should not be confused with Compute criticality tiers,
although there is a clear mapping between them –
The DR-HIGH tier is aligned to the needs of the production environment of the organization’s most
critical applications, hence the CRITICAL application criticality tier. It is characterized as having an RPO
and RTO of 12 hours in Table 9-1 with an indicative set of DR service characteristics.
The DR-MEDIUM tier is aligned to the needs of the production environment of the organization’s
STANDARD application criticality tier. It is characterized as having an RPO and RTO of 24 hours in Table
9-1 with an indicative set of DR service characteristics.
The DR-LOW tier is aligned to the non-production environments of the organization’s NON-CRITICAL
application criticality tier. It is characterized as having an RPO and RTO of 48 hours in Table 9-1 with an
indicative set of DR service characteristics.
9.2.1 Preparation
The key activities that are performed in preparing for DR[7] are shown pictorially in Figure 9.2:
1. Establish DR operational characteristics: RPO and RTO are two key characteristics.
2. Setup alternate DR site, which is referred to as secondary site, with the right model:
a. Active-active: Both primary and secondary data center run applications (workloads) in production
environment and serve user requests. A load balancer distributes user requests to both the
data centers.
b. Active-passive: Primary data center functions as an active application site, serves user requests,
and replicates critical business data to secondary. The secondary data center is ready to be
activated to service applications should the primary data center fail due to some reason.
4. Establish procedures to replicate data to the DR site for identified applications: This is discussed in
the next section.
Declare DR Establish
procedures to
replicate data
9.2.2 Execution
All the preparations for DR will come to fruition should a DR event occur[8]. The activities to be carried as
part of DR execution are shown in Figure 9.3 –
a. Shared application platforms: Virtualization servers (ESXi hosts) or private cloud platforms,
Mainframe, Mid-range (AIX and IBMi), SQL Server, Oracle RDBMS, IBM DB2, and so on.
2. Appliance-based replication
Appliances are deployed in primary and secondary sites. Data that needs to be replicated is
deduplicated in the appliance and replicated (e.g., IBM Spectrum Protect server[9], DELL/EMC
RecoverPoint[10]).
Illustration
Figure 9.4 depicts IBM Spectrum Protect servers deployed in primary and secondary data centers.
The clients are deployed on each of the virtual servers with applications. The server uses the clients
on the primary data center to take backups of the applications’ data into SAN-attached Virtual Tape
Library (VTL). IBM Spectrum Protect is configured to work with an appliance to deduplicate and
replicate backups to the secondary data center.
Applicance-based
Replication
Hardware Hardware
Appliance Appliance
3. VM-Snapshot Replication
VM-level snapshots are taken at the primary site and sent to the secondary site. They are
point-in-time snapshots created in the hypervisor (e.g., VMware ESXi) and replicated
asynchronously[11].
4. Hypervisor-based replication
The replication software plugs into the hypervisor and copies VM-level data from the primary
site to the secondary site. The data in the secondary site is continuously kept up to date either
synchronously or asynchronously with changes made on the primary site (e.g., Zerto Enterprise
Cloud Edition for VMs, VMware SRM).
Illustration
Figure 9.5 shows replication at the virtual machine level based on the Zerto platform that enables
replication at the hypervisor level[12].
Illustration VM VM
Figure 9.6 provides a diagrammatic view of
storage-based replication with DELL/EMC’s
Symmetrix Remote Data Facility (SRDF).
It works with DELL/EMC’s PowerMax SAN
storage and replicates data from the primary
site to the secondary site synchronously or
asynchronously. Figure 9.5 – Hypervisor-based Replication
The copy C1 shown in the primary site (Figure 9.6) is replicated to secondary site C2. A Business
Continuity Volume (BCV) is created that is refreshed with updates from C2 to support the RPO. The
BCV copy constitutes a “gold” copy of data. Snapshots are also taken on BCV for critical applications
at regular intervals to roll back easily should it be necessary.
DELL/EMC
SRDF
Storage-Level Snap on
C1 C2 BCV
Replication BCV
Gold copy
References
dhruva, “Recovery point objective definition”,
[1]
https://www.druva.com/glossary/what-is-a-recovery-point-objective-definition-and-related-faqs/.
[2]
C. Puricica, “Demystifying Recovery Objectives”, https://www.veeam.com/blog/rto-rpo-definitions-values-common-practice.html.
[3]
Process of Disaster recovery, https://mksh.com/5-elements-of-a-disaster-recovery-plan-is-your-business-prepared/.
[4]
E. Sullivan, “disaster recovery (DR)”, https://searchdisasterrecovery.techtarget.com/definition/disaster-recovery.
[5]
T. G. Cagle, “The benefits of the three-tiered system of prioritizing recovery efforts”,
http://www.instavisiontech.com/2021/07/12/the-benefits-of-the-three-tiered-system-of-prioritizing-recovery-efforts/.
[6]
J. Moore, “What is BCDR? Business continuity and disaster recovery guide”,
https://searchdisasterrecovery.techtarget.com/definition/Business-Continuity-and-Disaster-Recovery-BCDR.
[7]
J. Sipple, MKS&H, “5 ELEMENTS OF A DISASTER RECOVERY PLAN – IS YOUR BUSINESS PREPARED?”,
https://mksh.com/5-elements-of-a-disaster-recovery-plan-is-your-business-prepared/.
[8]
R. Long, “Disaster Recovery Strategy Execution, or Will It Really Work?”,
https://www.mha-it.com/2017/01/16/disaster-recovery-strategy-execution/.
[9]
IBM, “Tivoli Storage Manager - Replication of client node data”,
https://www.ibm.com/docs/en/tsm/7.1.1?topic=server-replication-client-node-data.
[10]
DELL, “DELL EMC Recover Point”, https://www.delltechnologies.com/en-in/data-protection/recoverpoint.htm.
[11]
VMware, “vSphere Replication”, https://www.vmware.com/in/products/vsphere/replication.html.
[12]
Zerto, “Hypervisor-Based Replication”, https://www.zerto.com/wp-content/uploads/2019/09/hypervisor-based-replication.pdf.
[13]
DELL EMC, “Dell EMC SRDF”, https://www.delltechnologies.com/asset/en-us/products/storage/technical-support/docu95482.pdf.
[14]
NetApp, “SnapMirror software: Unified replication, faster recovery”,
https://www.netapp.com/data-protection/backup-recovery/snapmirror-data-replication/.
Monitoring
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Firewall Web application
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 10.1: Data Centers with representative infrastructure components Focus of this chapter
Monitoring is a capability to capture and analyze vital parameters of infrastructure to respond promptly
and take corrective action when necessary. Several application components and infrastructure
components need to be monitored for smooth functioning and quick resolution of issues in a data
center or a cloud platform. In Figure 10.1, showing the data center deployments, the representative
components related to the focus of this chapter, namely monitoring capability, are highlighted (in a box
with a dashed outline).
2. Application monitoring.
Hardware Monitoring
Hardware monitoring is “nuts & bolts” monitoring of the hardware. It involves monitoring power supply,
fans, temperature, disks, arrays, memory, CPU and reports the health at the hardware level of servers
and other data center infrastructure components. An example of hardware monitoring software is
SolarWinds hardware monitoring software that monitors server hardware from different vendors[3].
Illustration
An example of hardware monitoring of DELL systems, devices, and components is DELL OpenManage[4].
An indicative list of parameters that can be monitored with DELL OpenManage is as follows:
1. Server
a. CPU.
b. Memory.
c. Processes.
e. File System/Disk.
2. Virtualization
a. Server virtualization – e.g., monitoring VM, disk, vCPU, memory, and resources that may be
reclaimed from large VMs in the environment to reduce inefficiency and improve performance. It
also includes monitoring clusters that have the highest resource demands, hosts that are being
heavily utilized, datastores running out of disk space, storage capacity, and utilization of the
vSAN environment.
b. Desktop virtualization – e.g., monitoring of critical parameters about Citrix virtual apps and
virtual desktops including License server health, broker server connectivity, connection failures,
logon duration, Latency, NetScaler connectivity, SSL certificate expiry, firmware upgradations,
NetScaler backups.
1. Nagios[5]: Monitors disk space on the server, memory, CPU usage, services on Windows/Linux,
license usage, server air temperature, WAN, and internet connection latencies.
2. IBM Tivoli Monitoring (ITM)[6]: Thresholds may be set for parameters such as disk, memory, CPU,
monitors for base operating system/availability, and monitored against the thresholds.
3. vRealize Operations (vROPS)[7]: Monitors VMware-based VM resources for server and desktop
virtualization.
4. SolarWinds Server & Application Monitor[8]: Monitors parameters of servers, including Citrix XenApp
and XenDesktop.
1. IOPS.
2. Throughput.
3. Latency.
4. CPU utilization.
5. Queue depth.
6. Capacity.
1. Unisphere for DELL/EMC SAN storage[9]: Monitors Cache write pending, SRDF consistency, FE and
BE Utilizations, Thinpool usage, Disks.
2. OnCommand Unified Manager for NetApp NAS storage[10]: Parameters such as volume and aggregate
space, chassis temperatures, power supplies, disks, shelves, switches are monitored.
3. Elastic Cloud Storage (ECS) Probe[11]: Parameters for ECS object storage.
2. Application performance monitoring (APM): Tracks key software application performance metrics
starting at the entry point of the web server/ application server. APM is being extended to include
the front-end, namely, the web browser, mobile, or IoT application which has traditionally been
part of end-user experience monitoring (EUEM). Extending APM to include EUEM helps get an end-
to-end perspective, optimize service performance and response time, and improve user experience.
Such a performance analysis discipline formed by APM and EUEM is referred to as digital experience
monitoring (DEM). DEM includes –
a. Synthetic monitoring – active (controlled) simulated user action by recording and playing user
actions and measuring performance and availability. Synthetic monitoring is performed for both
types of front-ends –
i. Web-based.
b. Real User Monitoring – passive actual user interactions monitoring by injecting JavaScript on
each page and capturing and analyzing the response, e.g., DCRUM.
A group of software vendors developed and specified an application development index (APDEX)
to report application performance. The anticipated satisfaction of a user is also assessed and
reported as a numerical score[15]. Tools such as Dynatrace generate APDEX reports.
3. Application error monitoring: Finds application bugs to enable developers to prioritize and fix them.
Two types of error monitoring[16]–
a. Front-end monitoring – detects issues with front-end components deployed in web servers and
application servers.
b. Back-end monitoring – detects errors in integrating with back-end components like databases,
middleware, and ERP servers.
5. Application database monitoring: Monitors interaction between the application and its database
and performance of the database to identify issues with the database that could affect the
efficient working of the overlying application. A tool such as Dynatrace can do database
monitoring and report on all key parameters related to database accesses and responses
(e.g., what tables are being used, the latency, and so on).
6. Application security monitoring: Monitors the application for security issues, including malware
and other threats. Tools such as those from Contrast Security are used for this purpose[18].
Enterprises deploy application full-stack monitoring tools that provide a “360-degree” view of an
application. Such tools perform active monitoring by analyzing how the application behaves in normal
and abnormal scenarios and raise alerts. They create baselines and continuously refine them. For
instance, Dynatrace Manage OneAgent[19] is a full-stack monitoring tool that monitors processes,
services, application traffic, resources (CPU, Memory, Disk, Network), application response time,
transaction failures, errors, slowness, log monitoring, application availability.
Several types of events may be generated from a variety of sources due to changes to infrastructure
devices, computing resources, increasing data volumes, and many other reasons[21] –
1. Operating System events: Produced by operating systems (Windows, Linux, Unix, iOS, Android).
2. System events: Generated for abnormal states or system health and resource changes.
3. Network events: Produced by network ports, switches, or routers related to the health of the devices.
4. Web server events: Originated from tools like Microsoft IIS or Apache HTTP server-related hardware
and software.
5. Application events: Generated from business activity monitoring software for business transactions.
7. Other Data center devices: Generated from synthetic checks, probes, real user monitoring, and
client telemetry for user interactions.
Examples of event monitoring and correlation tools are ServiceLogic SL1[20], EMC SMARTS[22],
OpsView, Splunk[26].
Figure 10.2 provides a pictorial representation of the integration of the monitoring tools for event
monitoring and correlation.
ITSM
(e.g., Microfocus SMAX,
ServiceNow)
ITOM
Event
Monitoring &
Correlation
(e.g., SL1,
Opsview)
Alert Data
Hardware Network
Server, Storage Monitoring &
Monitoring Performance
(e.g., ITM, Nagios, Monitoring
vROps) (e.g., SevOne)
Device Network
Avalailibility Monitoring &
Monitoring & Device Performance
Data Collection Data Collection
Server Storage Gateway Load Router Switches VPIN Servers Firewalls Other
Balancer Devices Data Center
Devices
IT Operations Analytics is “the practice of monitoring systems and gathering, processing, analyzing and
interpreting data from various IT operations sources to guide decisions and predict potential issues”[24].
While Big Data technologies, such as Hadoop and Cassandra, are well-suited to run analytics on
massive amounts of data and extract intelligence, there are many specialized tools for ITOA. Examples
are Elastic, Sumo Logic, Evolven, Micro Focus OpsBridge[23], Splunk[26].
ITOA tools provide analytics functionality that is generally static to analyze past monitoring data and
determine the issues. To do so, data from multiple sources are correlated and analyzed using specialized
techniques. In other words, ITOA is about using data mining techniques to discover patterns and
correlations to determine complex issues, and their root causes for operations teams to resolve them.
However, the frequently changing infrastructure in distributed environments has presented several
limitations to the analytics provided by the ITOA tools. ITOA tools and solutions have evolved to include
AI/ML and predictive capabilities in analytics to overcome the limitations. That led to concepts of AIOps.
An AIOps tool has event correlation, anomaly detection, and root cause determination capabilities. When
applied to IT infrastructure monitoring data using AIOps tools, the machine learning algorithms enable
operations teams and applications teams to work efficiently to detect issues early and resolve them
quickly to minimize the impact on business and customers. With AIOps, analytics may be performed on
massive amounts of complex data in changing IT environments to predict and prevent outages, improve
uptime, and resolve issues using automation as and when they arise.
Examples of AIOps tools are Micro Focus OpsBridge[23] and Splunk ITSI[26]. Many AIOps tools are powerful,
with all the capabilities of event monitoring/correlation tools and ITOA tools[27]. They can effectively
replace other legacy tools in the environment that may have implemented those capabilities and bring in
prediction, root cause analysis, and self-healing capabilities. Figure 10.3 depicts a monitoring solution
with AIOps.
ITOM
AIOps
(e.g., OpsBridge,
Splunk)
Alert Data
Hardware Network
Server, Storage Monitoring &
Monitoring Performance
(e.g., ITM, Nagios, Monitoring
VROPs) (e.g., SevOne)
Device Network
Avalailibility Monitoring &
Monitoring & Device Performance
Data Collection Data Collection
Server Storage Gateway Load Router Switches VPIN Servers Firewalls Other
Balancer Devices Data Center
Devices
Security
Internet
Client Devices
(Customers)
MPLS
Client Devices
Primary Data Center (Employees) Secondary Data Center
Security (Chapter 11)
Web Web
Server Server
Web application Firewall Web application
Firewall
Firewall Firewall
Monitoring Network Network Monitoring
Security
(Chapter 10) Firewall (Chapter 6) (Chapter 6) Firewall (Chapter 10)
(Chapter 11)
Storage Storage
DR (Chapter 9)
Mainframe Mainframe
Storage (Chapter 7) Replication (Chapter 7) Storage
(Chapter 7) (Chapter 7)
DR (Chapter 9)
Replication
Object File Block Block File Object
Figure 11.1: Data Centers with representative infrastructure components Focus of this chapter
The one topic that has been most concerning to the CXOs of an enterprise is security. The widespread
adoption of penetration techniques by state and non-state actors has reached a point where the question
of security being raised by the CXOs is not if but when security of their IT systems will be compromised.
• Confidentiality is the protection of sensitive and private information from unauthorized access.
• Integrity is the protection of data from unauthorized changes for the overall accuracy, consistency, and
completeness.
• Availability is to ensure access to the systems and the resources for authorized users.
The key security solutions for an enterprise to support the CIA triad fall under the following categories –
1. Access Security.
2. Connectivity Security.
3. Data Security.
4. Application Security.
5. Cyber Security.
The starting point to implement access security is to establish the identity of users (human and system).
Therefore, an identity and access management (IAM) solution needs to be implemented that provides
access to enterprise resources. “Identity and access management, or IAM, is the security discipline that
makes it possible for the right entities (people or things) to use the right resources (applications or data)
when they need to, without interference, using the devices they want to use”[3]. Users and their privileges
are added, modified, and deleted across various systems using an identity management system.
An identity management system uses a directory to store all user definitions and privileges for each
user. An access manager authenticates and authorizes users against the directory before providing
access to the enterprise system[4].
Each enterprise system user typically uses five or more applications to get their work done. Establishing
a separate identity in each application and managing it is a nightmare both for the enterprise and its
users. Hence, a single sign-on system is needed to provide easy access to valid users and an elegant
mechanism for the enterprise to control its resources.
Illustration
Figure 11.2 depicts the access security solution with Oracle products added to the basic network diagram
discussed in Chapter 6. When a request comes to the web server for access to particular application
External Internal
Firewall Firewall
DMZ
Internet
202.29.120.110 192.168.0.1
Router
(NAT Enabled)
Switch Switch
Oracle
Authenticate Verify Credentials
Internet
& Authorize & Fetch Priviledges Directory
Web Web Web
Oracle Access
Server Server Server
Manager
(Single Sign-on)
Oracle
Identity
Management
1. Network Architecture for secure communication: The security concepts addressed through the
network architecture for enterprise involving VLANs, Firewall, Routers, Three-tier architecture, and
spine-leaf architecture, microsegmentation have been discussed in chapter 6.
2. Security solutions for enhanced connectivity security: There are specific security solutions to
provide enhanced connectivity security for data center and cloud solutions. The key connectivity
security solutions are –
VPN Client
used by
External users
Internal
Firewall
a. Web Application Firewall: “A web application firewall (WAF) is an application firewall for HTTP
applications”[6]. It is like a reverse proxy as it protects servers that host one or more web
applications by inspecting and filtering traffic between each web application and the internet.
In Figure 11.3, WAF is shown deployed in the DMZ as an appliance that examines HTTP traffic
before it reaches the web server. It may also be deployed as a server-side software plugin or
packaged as a cloud service to detect and filter threats that could expose online applications to
denial-of-service (DoS) or degrade performance. WAFs may be stateless or stateful.
Examples of attacks that WAF may be used to filter are SQL injection, Cross-site Scripting (XSS),
Layer 7 DoS, Cookie poisoning, Web scraping, and Unvalidated input.
d. Virtual Private Network (VPN): A Virtual Private Network (VPN) enables an encrypted and
protected connection over public networks. The encryption takes place in real-time, making
it more difficult for third parties to track websites that users are accessing and thus provides
anonymity. A VPN server is a specially configured server. The users use the VPN client to connect
to the VPN server. Any traffic from an application (e.g., browser) configured to work with the VPN
client goes through the VPN client and is encrypted when sent to the VPN server on a public
network. The internet service provider (ISP) or other third parties cannot view which websites
and information the users look at[9].
In Figure 11.3, a VPN client is shown that a user on the internet uses to access the organization’s
applications via a VPN server deployed in the DMZ.
1. Data at rest: Data, when at rest, may exist in block storage, file storage, or object storage discussed
in chapter 7. Encryption and data masking techniques are widely used to ensure that data is
unusable, even if it reaches unauthorized users. Safe disposal of data is ensured using data wipe
techniques. Organizations may also deploy Data Leakage Prevention (DLP) solutions as an extra
measure to protect data at rest. Regulations such as GDPR and CCPA have been enacted to ensure
organizations build and deploy systems that implement data protection for personally identifiable
information (PII). Several security considerations need to be addressed for block, file, and object
storage[11].
a. Block storage security: SAN security strategy involves multiple integrated layers or zones of
security – e.g., Zone A between switches, Zone B between servers and switches, and so on.
By doing so, failure of layer or zone will not compromise the data under protection. Traditional
FC SANs enjoy a natural security advantage over IP-based networks. While FC SAN is an isolated
private environment with fewer nodes than an IP network and therefore vulnerable to fewer
security threats, there is no single comprehensive security solution available for SANs. LUN
masking and zoning, switch-wide and fabric-wide access control, Role-Based Access Control,
and logical partitioning of a fabric (Virtual SAN) are the most commonly used SAN security
methods.
b. File storage security: NAS storage may be compromised by viruses, worms, unauthorized access,
snooping, and data tampering. Permissions and ACLs constitute the first level of protection to
NAS resources by restricting accessibility and sharing. These permissions are in addition to
c. Object storage security: Data at rest is essentially implemented using encryption techniques.
Typically, each object is encrypted with its encryption key, and the encryption keys themselves
are encrypted with a master encryption key. Client-side encryption is also often employed to
encrypt objects with encryption keys before storing them in object storage. The encryption is
supported by security keys, either natively provided by the vendor or an external key manager.
2. Data in use: Data in use is active and frequently accessed/updated by multiple users through
applications. The techniques employed to secure data in use are to restrict access by user
role and limit system access to only those who need it by having controls in place “before”
providing access to content. In specific cases, information rights management (IRM) and digital
rights protection may be applied to ensure that only the authorized user can use sensitive
information. Another approach is to mask personally identifiable and sensitive data before
providing it to less secure environments. In some cases, it may be sufficient to provide metadata
to consumers instead of raw data. This approach can help prevent the leakage of sensitive
information. Products are also becoming available that may be used to encrypt data in use
(e.g., Sotero KeepEncrypt™).
3. Data in motion: Data in motion is data that is in the process of being transferred from an environment
in which it is at rest (e.g., storage) to an environment subject to third-party services whose security
cannot be guaranteed. Encryption is the key technique employed in protecting data in motion. Data
is encrypted before it traverses any external or internal networks using protected tunnels, such as
HTTPS or SSL/Transport Layer Security, VPNs, and Generic Routing Encapsulation. Several types
of encryption need to be selectively applied, keeping in mind data in motion security requirements.
Two widely applied encryption techniques are[12] –
a. Symmetric: Involves converting plaintext to ciphertext using the same key for encryption and
decryption. Examples are Advanced Encryption Standard and Triple DES.
b. Asymmetric encryption: Uses two interdependent keys, one to encrypt the data and another to
decrypt it. Examples are Diffie-Hellman key exchange and RSA.
Application security is the discipline of security that involves processes, practices, and tools to protect
applications from threats and security weaknesses throughout the application life cycle[13]. Securing
Security weaknesses are identified through static, dynamic, and interactive testing done by security
scanning tools during development. Their fixes are planned and made part of the releases keeping
in mind the dependencies involved. Runtime protection is designed to protect in real-time to defend
against malicious attacks after an application is deployed in a production environment.
Development time: The main activities conducted by developers for developing secure applications are:
1. Design secure applications to avoid vulnerabilities and risks (e.g., Top 10 of Open Web Application
Security Project – OWASP).
2. Use code repositories to develop secure and version-controlled code (e.g., GitHub).
4. Develop code, unit test, and run static code analysis (e.g., Java code with Eclipse and unit testing
with JUnit). Static code analysis is performed using Static Application Security Testing (SAST).
SAST[14]:
a. Static application security testing is a white box testing method to test code for vulnerabilities.
i. SQL injection.
i. Klocwork.
ii. SpectralOps.
iii. Veracode.
6. Create software build, run integration tests & conduct sanity tests to have a working application.
7. Deploy & run tests. Functional, non-functional, dynamic analysis and interactive security testing
are performed. Dynamic analysis is performed using Dynamic Application Security Testing (DAST),
Interactive Application Security Testing (IAST), and Software Composition Analysis (SCA).
i. Path traversal.
i. Acunetix.
ii. AppScan.
iii. Netsparker.
IAST[16]:
a. Interactive Application Security Testing (IAST) is a white box testing method on an application
instrumented with specific interfaces to identify vulnerabilities.
b. IAST testing is performed in real-time while the application is running to identify the problematic
code lines from a security perspective and notify the developer for remediation.
iii. Veracode.
ii. WhiteSource.
Runtime:
Post-deployment of an application into production, it is protected through web application firewalls (WAF),
bot management, and RASP (runtime application self-protection). WAF has been explained earlier in this
chapter.
Bot Management[18]:
a. Bot management is part of the runtime security of applications to protect mobile apps, web
applications, and APIs from malicious bots while permitting access for the bots that help the business
of an enterprise.
i. Fake Accounts.
c. Bot management solutions protect applications from attacks by different approaches to detect and
manage bots. Examples of approaches –
i. Passive: Identify malicious bots with header information and web requests.
ii. Active: Challenge the web request with tests that bots cannot perform easily (e.g., Prompt user
with a CAPTCHA).
iii. Pattern identification: Classify activity and distinguish between human users, business bots, and
malicious bots.
RASP[19]:
a. Runtime Application Self-Protection (RASP) is an application security method to protect it from inside,
unlike the WAF solution, which protects from outside.
b. Monitoring, detection, and protection-related RASP code is deployed into the application servers.
c. All requests to the application are intercepted by RASP, and necessary security actions are taken.
Cyber security protects information technology systems from cybercrime (for financial gain or
disruption), cyber-attack (targeted information gathering) or cyber terrorism (to spread panic
and fear)[21].
The key types of cybersecurity threats and tools for protection are[22]:
1. Malware: Malicious software that can be used to cause harm to a user. Viruses, worms, Trojans, and
spyware are different forms of malware.
Tools for protection (Examples) – Avast Antivirus, Kaspersky Anti-Virus, Trend Micro Antivirus+
2. Social engineering: Uses human interaction to trick users into revealing sensitive information.
3. Phishing: Fraudulent email or messages meant to deceive users as being from reputable or known
sources to steal sensitive data.
Tools for protection (Examples) – Proofpoint Email Security and Protection, Mimecast Email
Security with Threat Protection, SpamTitan Email Security.
Tools for protection (Examples) – SolarWinds Security Event Manager, AWS Shield, Indusface
AppTrana.
5. Advanced persistent threats (APTs): Sustained targeted attacks to infiltrate a network and remain
undetected for an extended period to steal data.
Tools for protection (Examples) – Security Information and Event Management (SIEM) Tools
such as SolarWinds Security Event Manager, Splunk Enterprise Security,
6. Man-in-the-middle (MitM): Attacks involve an interception and relay of messages between two
parties who believe they are communicating.
7. Ransomware: Involves locking the user’s computer system files and demanding a payment to
unlock them.
Tools for protection (Examples): Bitdefender Antivirus Plus, AVG Antivirus, Avast Antivirus.
8. Password Attacks: As the password is the most used mechanism to authenticate users to
a system, getting to know the right password is a common attack approach. Brute-force attacks
(randomly trying different passwords) and Dictionary attacks (a list of common passwords used
to gain access) are often used by hackers to get to know the password by trial-and-error approach.
Password policies, including account lockout, password change at regular intervals, and password
complexity, mitigate password attacks.
National Institute of Standards and Technology (NIST) has developed a cybersecurity framework that
provides a uniform set of rules, guidelines, and standards for organizations[23]. The five core functions
of the NIST framework are Identify, Protect, Detect, Respond, and Recover. It is a standard approach
for cybersecurity and provides the foundation for an enterprise-wide strategy around cyber risk and
compliance.
107
financial 9, 36, 104 ITIL 1, 28 N
firewall 51, 52, 59, 98 ITM 87 Nagios 87, 94
forbes 15 ITOA 86, 88, 92, 94 NAS 62, 65, 68, 70, 73, 80, 83, 88, 99, 100
formulate 8, 11, 25, 28, 32 ITOM 91 NAT 44, 52, 53
foundation 12, 28, 105 ITOps 94 native 11
foundational 43 ITSM 1, 2, 6, 28, 91 network 1, 7, 8, 12, 13, 20, 22, 32, 38, 43, 44, 45, 48,
framework 1, 10, 12, 13, 15, 32, 105 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 62, 63, 64, 65,
J 68, 72, 73, 88, 90, 96, 97, 98, 99, 104, 105
Fraud 103 Jenkins 13
Fraudulent 104 networks 8, 48, 49, 52, 59, 99, 100
FTP 44, 52 K NFS 65, 100
functionality 11, 13, 14, 78, 83, 92, 97 Kerberos 100 NIC 55
keys 100 NICs 44, 55
G Kibana 90 NIST 105, 106
Gateway 52, 67, 76 knowledge 2, 30 node 37, 44, 84
Gbps 45, 62, 67 Kubernetes 9, 13, 40 nodes 99
GCP 8, 9, 10, 12, 40, 59, 64, 66, 67, 68, 76 normal 70, 78, 90
GDPR 99 L
normalization 91
GENEVE 58 Lambda 1, 40
NoSQL 64
GitHub 13, 101 LAN 5, 44, 45, 51, 52, 54, 56, 63
notation 50
Glacier 67 landing 10, 11, 13, 14, 15
NSX 55, 58
Google 8, 9, 10, 15, 40, 41, 59, 64, 66, 67, 68, 76 LANs 45
NVMe 29, 62, 67, 68
governance 1, 10, 14 latencies 87
gsutil 76 layers 56, 58, 59, 63, 64, 99 O
guidance 12, 15, 19, 21 Linux 28, 34, 35, 36, 37, 41, 71, 87, 90 object 36, 66, 67, 68, 80, 88, 99, 100
guidelines 26, 28, 30, 105 locations 81 Objective 30, 69, 70, 71, 77, 78
logical 5, 6, 19, 30, 38, 39, 45, 48, 63, 99 objects 66, 73, 100
H LPAR 35, 36 OLAP 61
Hadoop 92 LPARs 37 OLTP 61
hardware 7, 32, 35, 36, 37, 38, 40, 41, 58, 86, 90, 94 LTR 69, 70, 71 on-premises 1, 12, 14, 62, 64, 71
HBA 63, 86 LUN 63, 99 operating 6, 20, 35, 36, 37, 38, 39, 62, 63, 65, 74, 87,
HCI 7, 15 LUNs 64 90
hierarchical 56 operational 9, 61, 69, 70, 74, 76, 79, 81
hosts 37, 38, 44, 49, 63, 80, 87 M OPEX 9
Hyperconverged 7, 15 Mainframe 34, 35, 41, 80
optimal 8, 9, 11, 22, 38, 52
hyperscalers 8 malware 52, 90, 98, 104
optimize 1, 38, 52, 89, 94
hyperthreaded 38 mapping 53, 65
orchestrated 9, 71
Hyperthreading 38 MariaDB 64
organization 1, 2, 7, 10, 18, 27, 28, 51, 52, 65, 79, 80, 99
hypervisor 11, 35, 37, 41, 64, 82, 84 mask 100
OS 28, 34, 35, 36, 41, 63, 74, 75, 81
Mbps 45, 67
I OSI 44, 48, 51, 52, 59
Measure 62
IaaS 8, 40 OSPF 52
mechanism 65, 96, 105
IaC 9, 12, 13, 14 outage 3, 30, 77, 78
memory 38, 39, 86, 87
IAM 96, 106 outages 57, 92
MemoryStore 64
IAST 101, 102, 106 overlay 58
metadata 62, 66, 100
identification 49, 91, 103 overprovisioning 39
method 55, 101, 102, 104
identifier 48 overview 8, 15, 20, 23, 59, 94, 106
methodology 10
IFL 35 OWASP 101, 102, 106
metrics 2, 28, 68, 88, 89
IIS 89, 90 microprocessors 34 P
illustrates 29, 68 microseconds 67 PaaS 8, 40
Illustration 28, 39, 46, 48, 50, 51, 53, 63, 71, 74, 81, microsegmentation 55, 59, 97 packet 45, 52, 53, 57
82, 83, 86, 96 parameters 62, 67, 77, 85, 86, 87, 88, 90
microsegments 55
implementation 30, 32 partitioning 35, 36, 99
microservices 9
incident 2, 70, 77, 78, 79 passive 6, 79, 89
Microsoft 8, 9, 10, 15, 37, 38, 40, 59, 64, 66, 67, 76,
indicative 18, 19, 20, 21, 22, 27, 28, 29, 30, 41, 67, 71, 83, 90 password 105
74, 78, 79, 86
middleware 73, 89 patching 70, 74
inefficiency 87
migrate 11 patterns 71, 92
Information 1, 32, 52, 96, 103, 105, 106
migration 10, 11, 15 PCIe 40, 67
infrastructure 7, 8, 1, 2, 4, 6, 7, 8, 9, 12, 13, 14, 15, 17,
MLC 35 perform 5, 11, 12, 14, 52, 73, 90, 94, 96, 103
18, 19, 20, 21, 22, 25, 27, 28, 30, 31, 32, 33, 41, 43, 44,
52, 58, 61, 63, 69, 74, 77, 78, 80, 81, 85, 86, 87, 90, 91, mobile 8, 89, 96, 103 performance 5, 7, 29, 63, 68, 73, 75, 87, 88, 89, 90,
92, 94, 95, 96, 100 model 8, 9, 20, 38, 39, 44, 48, 51, 52, 59, 79, 96 94, 98, 105
initiatives 8 modular 7, 15 permission 4, 96
integrated 14, 27, 52, 90, 91, 99 module 91 perspective 34, 35, 36, 39, 89, 102
integration 7, 11, 59, 91, 101 monitor 85, 94 Phishing 104
interaction 72, 90, 104 monolithic 9 physical 1, 5, 6, 22, 30, 32, 35, 37, 38, 39, 41, 45, 46,
MPLS 45 47, 48, 58, 62, 63, 64, 69, 71, 72, 78, 81, 88
interconnect 5, 63
MSU 35 plan 10, 11, 12, 84
interconnections 5
MTBF 74 platform 8, 11, 12, 59, 82, 85, 88, 94
interdependencies 78
MTTR 74 platforms 7, 8, 10, 11, 12, 13, 14, 34, 55, 59, 64, 66,
interfaces 7, 65, 73, 102
78, 80
IOPS 62, 66, 67, 88 multilayer 52
POLP 96, 106
IP address 44, 48, 49, 52, 53, 86 Multiprotocol 45
practice 73, 76, 84, 92, 96
IPS 52, 97, 98, 106 multisite 45
practices 1, 5, 10, 12, 76, 100, 104, 106
IPSec 57 Multithreading 38
predict 92
iSCSI 62, 63, 65, 83, 86 MySQL 64
predictability 57
iSeries 34, 41
108
prediction 92 routing 52, 56, 57 T
predictive 92 RPO 27, 69, 70, 74, 75, 77, 79, 83 tagged 45
prepare 30 RTO 70 target 12, 19, 20, 22, 100, 105
presentation 44 TB 62
S
presented 20, 92 TCP 44, 52
S3 67
prevent 2, 51, 57, 92, 99, 100, 105 Terraform 9, 12, 15
SaaS 8, 11
prevention 99, 106 thread 38
SANs 65, 99
primary 3, 6, 52, 79, 81, 82, 83 threads 38, 41
SAST 101
principles 10, 12, 20, 26, 28, 30, 32 Tier 6, 27, 28, 29, 31, 41, 57, 68, 71, 75
SATA 29, 62, 66, 67, 68
priorities 1 tiers 6, 27, 28, 29, 32, 41, 57, 68, 75
SBB 20, 32
prioritize 89 TOGAF 19, 21, 23, 32
SCA 101, 103
PRIORITY 27 togaf8 23
scalability 40
privacy 7 togaf9 23, 32
scalable 7
privileges 96, 97, 100 Tomcat 80, 89
scale 7, 8, 10, 12, 13, 65
process 1, 2, 14, 25, 28, 30, 37, 38, 52, 53, 61, 71, 72, transformation 8, 10
scenario 79
73, 78, 87, 90, 91, 100, 101 trunk 45, 48
scenarios 7, 90
processors 35, 38, 39, 40 trusted 51
scripting 12, 101, 102
PROD 13, 14, 31
Protect 12, 71, 72, 75, 76, 81, 105
secondary 3, 6, 79, 81, 82, 83 U
secure 1, 10, 40, 51, 97, 100, 101, 106 UAT 13, 14, 31
protected 73, 99, 100, 103
securing 55, 97 UDP 44, 52
protecting 100, 101
security 5, 6, 7, 13, 19, 20, 22, 32, 51, 52, 55, 66, 90, underlay 58
protection 62, 76, 84, 96, 99, 100, 101, 103, 104, 105, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 106 uptime 5, 6, 62, 74, 88, 89, 92
106
segmentation 49, 55, 59
protocols 44, 52, 57, 62, 63, 65, 100
segments 13, 45, 55, 61, 73
V
provider 1, 6, 11, 28, 99 vaults 62
serverless 1
provision 7, 8, 12, 14 vCenter 39, 83
servers 1, 7, 15, 20, 34, 37, 38, 40, 41, 45, 46, 47, 48,
provisioning 8, 12, 13, 14 vCPU 41, 87
51, 52, 53, 57, 62, 63, 64, 71, 72, 74, 80, 81, 86, 87, 88,
Proxy 44, 54 89, 98, 99, 104 vCPUs 39
public 1, 8, 9, 10, 14, 19, 21, 32, 43, 52, 59, 66, 99 service 1, 2, 6, 8, 11, 28, 29, 30, 32, 35, 41, 56, 57, 64, Veracode 101, 102
Puppet 5, 12, 13, 15 67, 68, 71, 76, 78, 79, 89, 91, 94, 98, 99, 105 Virtual 1, 35, 37, 38, 39, 40, 41, 45, 59, 64, 71, 72, 81,
service design 1, 28 83, 97, 99
Q virtualization 9, 37, 38, 39, 41, 55, 58, 87
quality 78 service operation 2
service provider 1, 28, 99 virtualized 37
R services 1, 2, 6, 7, 8, 9, 11, 12, 13, 15, 19, 20, 22, 28, virtualizing 58
rack 7, 41 35, 36, 37, 40, 41, 43, 44, 57, 59, 62, 64, 65, 66, 67, 68, VLAN 44, 45, 46, 47, 48, 54
racks 38 70, 76, 78, 80, 81, 87, 90, 100 VMDK 38, 72
RAID 63, 64 service strategy 1, 28 VPN 97, 99, 106
ransomware 99 setup 13, 32, 44, 63, 94 vSAN 64, 68, 87
RASP 103, 104, 106 SIEM 105 vulnerabilities 100, 101, 102, 103
Recover 80, 81, 84, 105 snapshots 69, 76, 82
recovering 78 SNMP 88
W
WAF 97, 98, 103, 104
redundant 6, 11, 63 solution 12, 19, 20, 22, 25, 30, 31, 32, 34, 36, 38, 62,
WAN 44, 45, 51, 54, 87
REFACTOR 11 64, 65, 66, 70, 72, 91, 92, 96, 99, 104
Windows 8, 28, 34, 37, 41, 66, 71, 87, 90, 100
regulations 69 solutioning 1
workflows 1, 5
REHOST 11 solutions 2, 5, 8, 9, 12, 14, 20, 22, 23, 25, 28, 41, 43,
44, 55, 62, 73, 75, 79, 92, 94, 95, 96, 97, 98, 99, 103, workload 35, 40, 55
reliable 34
104, 106
RELOCATE 11 X
specifications 49 x86 7, 34, 37, 38, 40
replace 4, 11, 92
specify 12, 13, 20, 49, 50
REPLATFORM 11
SQL 35, 37, 64, 71, 80, 89, 98, 101, 102
replicate 80, 81
SRAM 39
Replication 78, 81, 82, 83, 84
SRDF 83, 84, 88
Replicator 81
SSD 29, 62, 67, 68
report 89, 90, 103
standard 6, 32, 41, 63, 105
repositories 101
standards 26, 28, 30, 32, 105
representation 91
storage 1, 7, 8, 12, 13, 15, 28, 29, 32, 38, 40, 61, 62,
requirements 1, 8, 9, 12, 18, 29, 30, 31, 32, 37, 52, 61,
63, 64, 65, 66, 67, 68, 69, 70, 71, 75, 76, 77, 80, 81, 83,
66, 68, 75, 100
84, 87, 88, 94, 99, 100, 104
resiliency 8, 13
STR 69, 70, 71
resources 1, 7, 8, 9, 11, 12, 13, 14, 38, 68, 87, 90, 94,
strategic 8
96, 99, 106
strategies 4, 5, 6, 8, 10
restore 19, 20, 22, 28, 29, 32, 69, 70, 71, 72, 73, 74,
75, 76 strategy 1, 7, 9, 10, 11, 15, 28, 30, 32, 79, 84, 99, 105
RETAIN 11 subnet 48, 49, 50, 54, 59
retained 70 subnets 49, 55
retired 2 Subnetting 49
retiring 2, 15 Subnetwork 44, 48
Rollback 74 switch 6, 45, 46, 47, 51, 52, 57, 81, 99
routable 52 switches 44, 45, 48, 52, 56, 57, 59, 63, 88, 90, 99
route 52, 57 synchronizing 35
routed 57 synchronously 82, 83
router 52, 53 synthetic 90
routes 57
109
Infrastructure Architecture
Essentials for
Data Center and Cloud
Many new entrants to the IT industry have
directly begun working on cloud platforms
without a background in data center solutions
and infrastructure architecture. This book gives
readers the required conceptual clarity on
IT infrastructure architecture to develop
About the Author
and maintain solutions both for data center
and cloud. Shankar Kambhampaty has been
involved for 32 years in architecture,
• Describes an infrastructure architecting design, and development for several
process based on industry standards. IT projects executed globally. He has
provided leadership in technology
• Focusses on the essentials of Compute,
and IT architecture over the past
Network, Storage, Backup/Restore, Disaster
decade in several organizations.
Recovery, Monitoring, and Security from an
Shankar has written papers for
architecture perspective.
International Conferences and is
• Caters to students, developers, author of the book Service-Oriented
architects & designers, CXOs who need Architecture & Microservices
a good understanding of the concepts of Architecture for Enterprise, Cloud,
infrastructure architecture. Big Data and Mobile, published by
Wiley. He has also been a frequent
• Provides many references to industry speaker at architecture/technology
and academic literature at the end of events and an invited member of
every chapter to guide anyone who wishes Forbes Technology Council.
to go deeper.
To know more, visit Shankar’s blog:
www.archtecht.com.