You are on page 1of 35

IT Infrastructure

Architecture

Infrastructure Building Blocks and Concepts

Mary Claire Carbonilla



PART I -
INTRODUCTION TO IT
INFRASTRUCTURE

THE DEFINITION OF
IT INFRASTRUCTURE
Introduction

 During the first decades of IT development, most
infrastructures were relatively simple. While applications
advanced in functionality and complexity, hardware
basically only got faster. In recent years, IT infrastructures
started to become more complicated as a result of the
rapid development and deployment of new types of
applications, such as e-commerce, Enterprise Resource
Planning (ERP), data warehousing, big data, the Internet
of Things, and cloud computing. These applications
required new and more sophisticated infrastructure
services, secure, highly scalable, and available 24/7.
What is IT
infrastructure?

 IT infrastructures have been around for quite a
while. But surprisingly enough no generally
accepted definition of IT infrastructure seems to
exist. I found that many people are confused by the
term IT infrastructure, and a clear definition would
help them understand what IT infrastructure is, and
what it is not.

THE INFRATHE INFRASTRUCTURE
MODELSTRUCTURE MODEL

IT building blocks

The definition of infrastructure as used
in this book is based on the building
blocks in the model as shown in Figure 2.
In this model processes use information,
and this information is stored and
managed using applications.
Applications need application platforms
and infrastructure to run. All of this is
managed by various categories of
systems management.
Processes / Information building
block

Organizations implement business
processes to fulfil their mission and
vision. These processes are
organization specific – they are the
main differentiators between
organizations. As an example, some
business processes in an insurance
company could be: claim registration,
claim payment, and create invoice.
Applications building block

The applications building block
includes three types of applications:

Client applications
Office applications
Business specific applications
Application Platform building block

Most applications need some additional
services, known as application platforms,
that enable them to work. We can identify
the following services as part of the
application platform building block:

Front-end servers
Application servers
Connectivity
Databases,
Infrastructure building blocks

This book uses the selection of building
blocks as depicted in Figure 6 to
describe the infrastructure building blocks
and concepts – the scope of this book.

End User Devices


Operating Systems
Compute
Storage
Networking
Datacenters
Non-Functional attributes

 An IT system does not only provide
functionality to users; functionality is
supported by non-functional attributes.
Non-functional attributes are the effect
of the configuration of each IT system
component, both on the infrastructure
level and above.

PART II – NON FUNCTONAL
ATTRIBUTES

INTRODUCTION TO
NON-FUNCTIONAL
ATTRIBUTES
Introduction

IT infrastructures provide services to applications. Some of
these infrastructure services can be well defined as a
function, like providing disk space, or routing network
messages. Non-functional attributes, on the other hand,
describe the qualitative behavior of a system, rather than
specific functionalities. Some examples of non-functional
attributes are:
Availability
· Scalability
· Reliability
· Stability
· Testability
· Recoverability
Non-functional Requirements

 It is the IT architect or requirements engineer’s job to
find implicit requirements on non-functional
attributes (the non-functional requirements - NFRs).
This can be very hard, since what is obvious or taken
for granted by the customers or end users of a
system is not always obvious to the designers and
builders of the system. And not to forget the non-
functional requirements that other stakeholders
have, like the existence of service windows or
monitoring capabilities, which are important
requirements for systems managers.

AVAILABILITY
CONCEPTS
Introduction

 Everyone expects their infrastructure to be available
all the time. In this age of global, always-on, always
connected systems, disturbances in availability are
noticed immediately. A 100% guaranteed availability
of an infrastructure, however, is impossible. No
matter how much effort is spent on creating high
available infrastructures, there is always a chance of
downtime. It's just a fact of life.
Availability in the
infrastructure
 model
This chapter discusses the concepts and
technologies used to create high available
systems. It includes calculating availability,
managing human factors, the reliability of
infrastructure components, how to design for
resilience, and – if everything else fails –
business continuity management and
disaster recovery.
Calculating availability

general, availability can neither be calculated, nor
guaranteed upfront. It can
only be reported on afterwards, when a system has run
for some years.
Availability percentages
and intervals

The availability of a system is usually
expressed as a percentage of uptime in
a given time period (usually one year
or one month). The following table
shows the maximum downtime for a
particular percentage of availability.
MTBF and MTTR

The factors involved in calculating
availability are Mean Time Between
Failures (MTBF), which is the average
time that passes between failures, and
Mean Time To Repair (MTTR), which is
the time it takes to recover from a
failure.
Mean Time Between
Failures(MTBF)
The MTBF is expressed in hours (how many
hours will the component or service work
without failure). Some typical MTBF figures are
shown in Table 3.
Mean Time To Repair
(MTTR)

When a component breaks, it needs to be
repaired. Usually the repair time (expressed as
Mean Time To Repair – MTTR) is kept low by
having a service contract with the supplier of
the component. Sometimes spare parts are kept
on- site to lower the MTTR (making MTTR
more like Mean Time To Replace). Typically, a
faulty component is not repaired immediately.
Sources of unavailability

Human errors
Usually only 20% of the failures leading to
unavailability are technology
failures. According to Gartner through 2015,
80% of outages impacting mission-critical
services will be caused by people and process
issues and more than 50% of those outages will
be caused by change/configuration/release
integration and hand-off issues.
Software bugs

After human errors, software bugs are the number
two reason for unavailability. Because of the
complexity of most software it is nearly impossible
(and very costly) to create bug-free software.
Software bugs in applications or system drivers can
stop an entire system (like the infamous Blue Screen
of Death on
Planned maintenance

Planned maintenance is sometimes
needed to perform systems management
tasks like upgrading hardware or
software, implementing software
changes, migrating data, or the creation
of backups.
Physical defects

Of course, everything breaks down eventually, but
mechanical parts are most
likely to break first. Some examples of mechanical
parts are:
Fans for cooling the equipment.
Disk drives.
Tapes and tape drives.
Environmental issues

Environmental issues can cause downtime as well.
Issues with power and cooling, and external factors like
fire, earthquakes and flooding can cause entire
datacenters to fail.

Complexity of the infrastructure


Adding more components to an overall system design
can undermine high
Availability patterns

A single point of failure (SPOF) is a component
in the infrastructure that, if it fails, causes
downtime to the entire system. SPOFs should
be avoided in IT infrastructures as they pose a
large risk to the availability of a system.
Redundancy

Redundancy is the duplication of critical components in
a single system, to avoid a SPOF.
Failover
is the (semi)automatic switch-over to a standby system
(component), either in the same or in another
datacenter, upon the failure or abnormal termination of
the previously active system (component).
Fallback

Fallback is the manual switchover to an identical
standby computer system in a different location,
typically used for disaster recovery. There are three
basic forms of fallback solutions:
Hot site
A hot site is a fully configured fallback datacenter, fully
equipped with power and cooling. The applications are
installed on the servers, and data is kept up-to-
date to fully mirror the production system.
Warm site

A warm site could best be described as
a mix between a hot site and cold site.
Cold site
A cold site differs from the other two in
that it is ready for equipment to be
brought in during an emergency, but no
computer hardware is available at the
site.
Business Continuity
Management

BCM is not about IT alone. It includes managing
business processes, and the availability of people and
work places in disaster situations. It includes disaster
recovery, business recovery, crisis management,
incident management,
Disaster Recovery Planning
(DRP) contains a set of measures to take in case of a
disaster, when (parts of) the IT infrastructure must be
accommodated in an
alternative location.
RTO and RPO

Two important objectives of disaster recovery planning
are the Recovery Time Objective (RTO) and the
Recovery Point Objective (RPO). Figure 14 shows the
difference.

You might also like