CIT 218 - Systems Administration and Maintenance - Unit 4

Unit
Maintaining Systems
Objectives:
in the Enterprise 4
At the end of the unit, the student must have:
• described processes for updating software and hardware in an organization;
• interpret various assessments for systems;
• discuss backup strategies within an organization.
4.1 Introduction
Systems need on-going maintenance. Understanding what to maintain is key in
providing a well running service or system.
4.2 Updating Systems in the Enterprise
There are many strategies to updating systems. This could be hardware, this
could be software, things need upgraded. So, I'll discuss a little bit about why we need
to upgrade systems, update them, as well, for newer technology, and discuss the
strategies surrounding that. Also, we're going to discuss change management as
well. And why we need change management in place for updating systems. Upgrading
or updating enterprise systems is necessary for many reasons. So, we might have
bugs that are reported in software, we may have hard drives that are failing.
Hardware, in general that's failing, we may have security risks.
But it's all something that needs to be updated, upgraded, okay?
How often do you look at your phone and notice that I have five or six apps
that need to be updated? Well, this is a constant need for patching bugs,
vulnerabilities, and adding new features.
So, updating is a normal process for information technology in general.
So, we also have a lot of other services that could be affected, because of
updating. So, mail servers, web servers, video streaming, ERP systems, for example,
are all something that you're going to have to upgrade or might be impacted from
upgrading at some point in time.
Systems get old, okay, and if we're talking about hard drives. For example,
enterprise hard drives, we look at the mean time between failures as a reliability factor.
So, when a hard drive says it has a million MBTF that means that it's a very reliable
hard drive. And a million hours is a long time for a hard drive to be operating
and under certain loads. But it's a measurement of how long a bunch of hard drives
run until one of them dies.
Let's talk about upgrade strategies.
There are many upgrade strategies out there. Software typically comes with
automatic updating like your phones, for example. See that there's an update on your
phone, let's say you have an Apple iPhone, for example or an Android phone.
So, the Apple always has updates, Android always has updates. Automatic updates
always happen. I always notice that I have the latest and greatest software on both devices
because they have an automated installation policy, okay. Careful analysis on that needs to be
applied, however if hose are critical systems, maybe we shouldn't upgrade at a certain time.
For example, if we're using an iPad to control certain portion of a room like video
streaming, for example. And it's the remote. What happens if you upgrade and it's not
compatible with your streaming device, okay? Package managers for operating systems,
especially for Linux, look at the compatibility of all the software that is being installed. Now, it
doesn't guarantee that things are going to be compatible, but they should. Operating systems
should allow us to make sure things are compatible.
Windows, it's hit and miss, some stuff is compatible, some stuff isn't. Majority of the
time software is going to be compatible. Hardware becomes more difficult to upgrade to
update. Because a lot of the enterprise hardware that we are running, is running very large
systems. So, if we must replace a hard drive in a system, we may need to, schedule some
down time for that service.
Change management is important when it comes to upgrading or updating or changing

anything in a system.
Why do we have change management in place?
It's optimizing the overall business risk, okay? Changes should be implemented to
optimize or
lessen the risk that you have that the update or upgrade or change affects the system, okay?
We may need to minimize the risk to upgrading that system as well.
Achieving success at first tempt is one of those outcomes that we hope to have, when
we implement change management. We lessen the impact if we can understand what
systems the change will impact.
So, let's talk about a definition of a change first. We have change management in
place at this university for IT. So, several years ago we developed what the definition of what
we believe a change is.
The addition, modification, or removal of anything that could influence IT services and
production.
The scope should include changes to all architectures, processes, tools, metrics, and
documentation as well as the changes to IT services and other configuration items. So
effective change management ensures that risks are considered. Timelines are reviewed,
backup plans are established, stakeholders are met with to discuss timelines, and post
implementation is also discussed. What went wrong? Or what went right with the change? So,
making sure everybody's informed. And communication is extremely important when we're
upgrading or updating systems. So, in conclusion, change is inevitable.
We upgrade things, we downgrade things, we put new hard drives in. We increase the
RAM of a server. All these things are updates to business requirements generally.
So, if we plan for those changes to happen, we will have success.

4.3 When something goes wrong
Something is always going to go wrong, it's how you react to it that makes a
difference. So, I'll discuss Incident Response Plans, understand how organizations may react to
a disaster or an outage and a little bit about what to do during those incidents - what we
should be looking at.
So, going back to the planning for a disaster lesson. Something is always going to
happen. May not be a natural disaster, but your website could get attacked, files may go
missing, theft may happen.
Scenario 1:
“I was working at a company several - oh, gosh, it's been about 12 years now - where we
had two data thefts. They broke through the doors and stole laptops, they stole one server. I
don't know why they just stole one.”
But stuff like that happens. We need to make sure that we plan for this kind of stuff
just like I talked about in the planning for disaster lesson.
Disasters are inevitable. So how we deal with those incidents makes a huge
difference. What is an incident?
An incident is whenever a user is not expecting a certain level of service from an IT

service. An expected level of service could be based on a service level agreement, for
example - we’re not meeting that service level.
A major incident or outage could be also defined as a major incident; that is, a
significant event which demands response beyond a normal routine, resulting in an
uncontrolled development in the course of the operation of any establishment or transient
work activity.
We developed an incident - a major incident response plan and that's how we defined
major incidents or outages. An incident means something completely different. Also, an outage
may mean something completely different.
Let’s go to incident response plans. According to Sands, Incident Response Plan should
include
• Preparation;
• Identification;
• Containment;
• Eradication;
• Recovery; and
• Lessons learned.
That's what a good response plan should contain. The goal is to minimize damages,
so that could be communication. If we tell users that we had an outage, that we made a
mistake, it's going to go much better than if we don't tell people. So not only do we may have
damage to systems, we may have damage to reputation because of an incident.
So, the more that we communicate what is going on, the better - the better or the less the
damage could be.
We need to understand our critical systems. We need to identify mission critical

systems. And to do that, we look at what somebody is using on a day-to-day basis. Can they
do their job well? We need to identify the support structure for those services. For example, I
may have - in my Incident Response Plan,
I may have network, which is defined as firewalls, switches, routers, connections to data
centers, connections to mission critical buildings. Any disruption of those services, I'm going to
develop a response plan for that service.
How do I get that service up and running as soon as possible?
What about power disruptions as well? Power disruptions could impact data centers, for
example.
And what is a data center running? Well, they could be running authentication.
Sure, your data may be out on the Cloud, but you're still relying on onsite systems to
get information.
An Incident Response Plan also defines the roles. Roles are critical to make sure that
people know their job during an incident. So, this may be the same as the job that you're
currently - that you currently have in system administration or it may be that you're managing
communications, for example, if you're a system administrator.
Understanding how those roles interact and testing those interactions will go a long
way in an incident. Communication - and I can't stress this enough - during outages is
critical. People want to know what's going on. That's why in the news, people like to - or the
newscasters, anchors like to say repeatedly, what is going on - to inform the public as to
what's happening. The communications officer role within an Incident Response Plan should
be defined.
Also, if your organization has a communications officer or public relations

representative, thereby it is important as well. We may not want to say something to media
that may get us in trouble.
We also need to identify logistics during an incident. Logistics could be things like
well, who's going to get sleep? What if I have a disaster on campus and I need to work on
the network for over 24 hours straight? You're going to have to get sleep somewhere in there,
so let's identify who's on call at what time.
Who takes over for whom?

What about purchasing?
How about food? Identifying all those in a major Incident Response Plan or even a procedure
that is not public, will get you far. So, in conclusion, an Incident Response Plan is critical for
your success during an incident. It may not help the damage, but it may lessen it as well.
Essay-Quiz
1. Why do we need system’s maintenance? What makes it so essential in the organization.
Laboratory 4.1
▪ Please cite any scenario where you can apply and create an Incident Response Plan (the
sequence is according to SANS incident response plan).

CIT 218 - Systems Administration and Maintenance - Unit 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CIT 218 - Systems Administration and Maintenance - Unit 4

Uploaded by

Copyright:

Available Formats

Unit

4.2 Updating Systems in the Enterprise

Hardware, in general that's failing, we may have security risks.

But it's all something that needs to be updated, upgraded, okay?

Let's talk about upgrade strategies.

Change management is important when it comes to upgrading or updating or changing

Why do we have change management in place?

We may need to minimize the risk to upgrading that system as well.

So, if we plan for those changes to happen, we will have success.

An incident is whenever a user is not expecting a certain level of service from an IT

We need to understand our critical systems. We need to identify mission critical

How do I get that service up and running as soon as possible?

Also, if your organization has a communications officer or public relations

Who takes over for whom?

1. Why do we need system’s maintenance? What makes it so essential in the organization.

You might also like