You are on page 1of 15

Directory Services & Unit

Objectives:
Data Recovery and
Backups
At the end of the unit, the student must have:
3
 understood what services a directory server provides;
 understood what LDAP and Active Directory;
 discussed tradeoffs between on-site vs off-site backups
 explored what characteristics to evaluate when designing a backup system;
 defined the value and importance of backup and recovery testing;
 determined the different options for data backup and the risks that each one protects against;
and
 understood the purpose and contents of a disaster recovery plan.

1
2
3
3.1 Introduction to Directory Services
Have you ever looked up someone's phone number in a phone directory? Or use a directory listing at
a shopping mall to find a specific store? A directory server essentially provides the same functionality. A
directory server contains a lookup service that provides mapping between network resources and then
network addresses. It's used to organize and look up organizational objects and entities ranging from things
like user accounts, user groups, telephone numbers, and network shares. Instead of managing user
accounts and computer information locally on every machine, all that information can be stored on a
directory server for easy access and management. The ideal enterprise quality directory server should support
replication.
This means that the store directory data can be copied and distributed across a number of physically
distributed servers but still appear as one unified data store for querying and administering. Why is
replication important? It provides redundancy by having multiple servers available simultaneously.
So, there'll be minimal disruption to the service in the event that one of server explodes. Replication
also decreases latency when you access the directory service. By having replicas of your directory server
located in each office, you're able to answer directory service queries more quickly. The directory service
should also be flexible, allowing you to easily create new object types as your needs change. Access to the
information stored in the directory server database should be accessible from a variety of OS types and from
the designated areas of the corporate network.
Directory services are useful for organizing data and making it searchable for an organization. This is
achieved through the use of a hierarchal model of objects and containers. The containers are referred to as
organizational units or OUs, and they can contain objects or more organizational units. This is similar in
organizational structure to a file system. OUs are like folders which can contain individual files or objects for
a directory service. OUs can also contain additional folders. The management benefits of this structure are
pretty clear. Can you imagine trying to keep your music library organized if there was no such thing as
suborders? Crazy.
This hierarchal structure can be used to convey additional information about what's stored within.
Take your directory structure as an example. You may have an OU code users which contains all user
accounts. Within this OU, there could be additional OUs which represent the actual team structure of your
organization. The user's OU could contain additional OUs like sales, engineering, marketing which include
the user account objects for the individuals that belong to these current teams.
This structure can be used to convey differences between these sub-OU sub-users. For example, we
could influence stricter password requirements for members 1 of engineering without affecting sales or
marketing. Sub-members inherit their characteristics of their parent OU. So any changes made to the higher
level user's OU would affect all sub-OUs, including sales, marketing, and engineering. Someone with the
responsibilities of a systems administrator, whether that's a system admin or I.T. support specialist,
Standard OS management tasks are involved here, like ensuring that updates are installed and
configuring standard services. Other responsibilities include the installation and configuration of the directory
service itself. So installing the service and configuring any related services. If multiple servers are used in a
replication setup, this needs to be configured, too. It's very likely that the hierarchy in overall structure of the
directory itself would also be up to business admin to design and implement. Well, that covers the high level
overview of what exactly a director service is. We'll dive deep into more specific details later in this unit.

3.1.1 What is a Directory Server?


Directory services became an open network standard for interoperability among different software
vendors. In 1988, the X.500 directory standard was approved and included protocols like Directory Access
Protocol or DAP, Directory System Protocol or DSP, Directory Information Shadowing Protocol or DISP,
and Directory Operational Bindings Management Protocol or DOP.
Alternatives to DAP were designed to allow clients to access the X.500 directory. The most popular of
these alternatives was lightweight directory access protocol or LDAP. Since these are open standards for
communication and access for directory services, a bunch of different implementations of these services
cropped up. There are offerings from Apache, Oracle, IBM, and Red Hat, but we'll cover two in more detail
later in this module.
The first is Microsoft's implementation, which is referred to as Active Directory or AD. It has some
customization and added features for the Windows platform. There are also open source implementations of
directory services using LDAP. A popular example of this is OpenLDAP, which we'll also cover in greater
detail. OpenLDAP supports a wide range of platforms like Windows, Unix, Linux and various Unix
derivatives.
In addition to the server software, there are also client tools used for accessing and administering a
directory server, Microsoft Office Active Directory Users and Computers or ADUC, which works well with
Microsoft Active Directory Server. There are also other more open tools that can be used to interface with a lot
of other directory server implementations. Along with clients for administering and managing a directory
server, there are also client applications that can interface with and query a directory server.
All major OS platforms support integrating into a directory server for log in and authentication
purposes. The advantage here is that this allows for centralized management of user accounts. We'll cover the
details of centralized management in the next lesson, so don't worry too much about that right now. When
looking at specific implementations for your directory server, you'll want to consider OS support, not just the
server they'll be running the directory service itself, but also what OS is your client fleet runs and the
compatibility or support for you directory services. You can read more about why this is important in the next
reading.

3.2 Centralized Management


The job of the systems administrator is to, well, administer systems. Sysadmins have a set of systems
they're responsible for. And they have to manage those systems so they're available to serve their function to the
organization. For example, as a sysadmin, I might be responsible for making sure that all of the servers in my
network are kept up to date with security patches and application updates. Should I go around and log into each
server, checking each one at a time? What if I need to manage user accounts on end user devices? Should I go
to each employee's desk and set their account up that way? I guess I could, but that would be super time-
consuming, and probably inconsistent.
Instead, what I want to do is use centralized management, a central service that provides instructions
to all of the different parts of my IT infrastructure. Directory services are one of these services. Remember in
earlier lessons when you created accounts and gave them access to resources on your computer? Imagine that
you worked for an organization that has dozens, hundreds, or even thousands of computers and people who use
them. You can't possibly go into each of those computers to set them up. Directory services provides centralized
authentication, authorization, and accounting, also known as AAA.

2
When computers and applications are configured to use directory services, or AAA services, decisions
about granting or denying access to computers, file systems, and other IT resources are now centralized. Now
you can create a user account once, and it's available for the entire network at once. Easy, well, sort of. For now
you should understand that your directory service will be responsible for granting or denying access to
computers, file systems, and other IT resources.
Now let's go one step further. Let's say you have a network file system that you need to give everyone
in the IT department access to. You could set up the network share, then give it a list of user accounts to grant
access to the share. But what happens when someone new joins the IT department? What about when someone
leaves? Instead of granting access based on who you are, what if you granted access based on what you do?
In most organizations, access to computer and network resources is based on your role in the
organization. When you manage access to resources on a computer and on the network, you'll often grant and
deny access based on user groups. User groups can be used to organize user accounts in all sorts of ways. You
might create groups of buildings that people work out of, or the person's role in the organization, or really
almost anything else. What's important is that you use groups to organize accounts based on the way that you'll
manage them. If you are a systems administrator, then you might have permission to do things like creating user
accounts and resetting passwords. You are allowed to do that because of your role as a systems administrator.
If you add another systems administrator to your organization, you don't want to have to find out all of
the things that a sysadmin should have access to, then grant them individual account access to each of those
resources. That would just take forever. Instead, we'll create a group of sysadmins and add all system
administrators to that group. Then we can give the systems administrators' group access to any resources they
need. If you or another person change roles in the company, then all you have to do is change the groups that
you are a part of, not the rights that you have to directly access resources.
We call this role-based access control, or RBAC. Controlling access to resources isn't all you can do.
You can also centralize configuration management. Just like you don't want to run around to every computer to
configure user accounts, you wouldn't want to do that to setup printers, configure software, or mount network
file systems. By centralizing the configuration management of your computers and software, you can create
rules about how things should work in your organization. There are many ways to centralize your configuration
management. And an easy way to get started is with as simple a tool as logon scripts that run each time
someone logs on to a computer. Later in this module, we'll look at Active Directory and its group policy
objects, which are a way to manage the configuration of Windows machines. There are also dedicated
configuration management frameworks, like Chef, Puppet or SCCM, that can be used for super simple or super
powerful configuration management. These are outside the scope of this module, so check out the internet.

3.3 What is LDAP?


Before we jump into directory services, let's talk about the underlying protocol that's used in directory
services called LDAP or lightweight directory access protocol. LDAP is used to access information in
directory services like over a network. Two of the most popular directory services that use LDAP are active
directory and openLDAP, which we'll talk about more in upcoming lessons. There are a lot of different
operations you can use in LDAP. You can add a new entry in the directory server database like creating a new
user objects called Cristi. You can delete an entry in the directory server database. You can modify entries and
much, much more.
When we say entry, we're reffering to the LDAP entry format or LDAP notation for records in the
directory service. An LDAP entry is just a collection of information that's used to describe something. Take a
look at this example. Don't worry too much about what this says. The format of LDAP entry basically has a
unique entry name denoted by dn or distinguished name, then attributes and values associated with that entry.
So, CN is the common name of the object. In this case, since it's a person, we use Devan Sri-Tharan as the
name. OU is the organizational unit such as a group. And in this case, Sysadmin is used. DC is the main
component. So example.com is split into example then com. Again, it's not necessary to remember these
attributes. You can reference them in the next reading. The takeaway here is that LDAP notation is used for
entries in directory services to describe attributes using values.

3
3.3.1 LDAP Authentication
If were around when phone books were used, you might remember that these big old books contained
the names, addresses, and phone numbers of people in your neighborhood or community, who wanted the
information to be publicly listed. This is way different from the phone book or contact list you have in your
mobile phone. The people who are in your contacts directory gave you their phone numbers for your use only.
When using LDAP, there are different authentication levels that can be used to restrict access to
certain directories, similar to those big public phone directories or those private mobile phone directories.
Maybe you have a directory that you want to make public so anyone can read the entries in the directory or
maybe you just want to keep that data private to only those who need it. We'll discuss how LDAP does this
authentication and what methods it uses.
We talked about the different operations you can do with LDAP, like add, remove or modify entries in
a directory. Another operation that you can perform is the Bind operation, which authenticates clients to the
directory server. Let's say you want to log in to a website that uses a directory service. You enter your account
login information and password, your information is
then sent back to the website. It will use LDAP to check if that user account is a user at the directory and that
the password is valid. If it's valid, then you will be granted access into that account. You want your data to be
protected and encrypted when it's completing this process.
There are three common ways to authenticate. The first is anonymous, then simple, and the last is
SASL, or simple authentication and security layer. When using anonymous binding, you aren't actually
authenticating at all, depending on how it's configured anyone could potentially access that directory, just like
our public phone book example. When you use simple authentication you just need the directory entry name
and password, this is usually sent in plain text, meaning it's not secure at all. Another authentication method
that's commonly used is SASL authentication.
This method can employ the help of security protocols like TLS. SASL authentication requires the
client and the directory server to authenticate using some method. One of the most common methods for this
authentication is using Kerberos. Kerberos is a network authentication protocol that is used to authenticate user
identity, secure the transfer of user credentials, and more. Kerberos by itself can be a complex topic that we'll
revisit in the IT security course. If you want to learn more about Kerberos right now, you can check out at the
internet for that. Once the client has successfully authenticated with LDAP server or directory service, the user
will be authorized to use
whatever access levels they have. In the next few lessons we're going to dive deeper into two of the most
popular directory services that use LDAP, Active Directory and OpenLDAP.

3.4 Planning for Data Recovery


Have you ever had something unexpected and catastrophic happen toa piece of tech you owned? Maybe
you dropped your cellphone and it shattered or spilled a glass of water all over your laptop. I know it's
happened it to me, and no, it's not fun. While hardware may be hard to replace,
data can be even harder to get back, especially lost photos of special moments, important documents and more.
In this module we're going to arm you with tools you need to be prepared for these accidents, so you can
minimize the impact and loss of data. All right, let's get right into it,
data recovery.

3.4.1 What is Data Recovery?


What exactly is data recovery? If you've ever broken a cell phone, you probably lost some good
pictures along with the phone itself. Data recovery is a process of attempting to recover the data that's lost
from the broken phone. But this is just one example of attempting to recover from unexpected data loss. Data
recovery, in general terms, is the process of trying to restore data after an unexpected event that results in
data loss or corruption.
Maybe a device that contains data was physically damaged, or an attacker performed malicious actions,
or malware deleted critical data. Whatever the cause, the effect is the same. You suddenly lost some really

4
important data and you need to figure how to get it back. How you go about trying to restore this lost data
depends on a few factors.
One, is the nature of the data lost. If a device has been damaged, you might be able to recover data from
the damaged hardware. This could involve using data recovery software, which can analyze failed hard disks or
flash drives and try to locate and extract data files. Another factor that would effect your data recovery is the
presence of backups. If you're lucky, or you had the foresight to plan for the unexpected, you have data backed
up. And you can restore the data that was lost. Data recovery is an important part of an IT system or
organization.
Since data is critical component of any business operations, as an IT support specialist ,part of your role
is to ensure that this data is available and protected from corruption or loss. So if something goes wrong, the
organization can continue with their business operations with minimal disruptions. That's why it's critical to be
able to recover from unexpected events that could impact your business data. When an unexpected event
occurs, your main objective is to resume normal operations as soon as possible, while minimizing the disruption
to your business functions.
By the end of this module, you'll have practical tools and methods that you can use to protect your
data. One of the most important techniques you'll learn is how to effectively backup your data. The best way to
be prepared for a data-loss event is to have a well-thought out disaster plan and procedure in place.
Disaster plans should involve making regular backups of any and all critical data that's necessary for
your ongoing business processes. This includes things like customer data, system databases, system configs
and financial data. You'll learn more about how to design and implement a data disaster plan throughout this
module. And lastly, you'll learn more about what IT folks call a post-mortem.
Imagine that something did go wrong with your systems and you had to use a disaster plan. You might
have discovered issues when recovering your data, that wasn't covered in the disaster plan. A post-mortem is a
way for you to document any problems you discovered along the way, and most importantly, the ways you
fixed them so you can make sure they don't happen again. Being unprepared for a major data loss event can and
has severely impacted businesses. In the upcoming lessons, you'll learn how to prepare for data loss, which is a
key part of any IT role. If you're interested in hearing more about how real companies have been impacted by
unexpected data loss, check out the supplementary reading. Otherwise, we're going to kick start our journey of
data recovery with learning how to backup data, ready? Let's get started.

3.4.2 Backing up your Data


So, you want to protect your organization from critical data loss. Good instincts. But where do you
start? Let's run down some of the key things to keep in mind when designing a data backup and recovery plan.
The first thing to figure out is what data you need to backup.
In a perfect world, you should only be backing up data that's absolutely necessary for operations and
can be found in another source. So, things like e-mails, sales databases, financial spreadsheets, server
configurations, and databases, should all be included. But what about the downloads that are in your laptop? Is
it really necessary to backup all those cat pictures too? Probably not. Backing up data isn't free. Every
additional file you back up takes up little more disc space, increasing the overall costs of your backup solution.
Once you've figured out what data you like to back up, you should find out how much total data you
currently have. But it's not enough just to think about what your backup storage requirements are right now.
Your organization may continue to grow and your backup needs should grow with it. Make sure that you
account for future growth and choose a solution as flexible enough to easily accommodate increases in data
backups.
Data can be backed up either locally to systems on site, or the backup data can be sent upside to remote
systems. Both approaches have positives and negatives, and can help reduce different risks. The advantage of
onsite backup solutions is that the data is physically very close. This makes accessing the data a lot quicker.
You won't need as much outbound bandwidth since you aren't sending the data out of your internal network. If
you need to restore data from backups, that should happen pretty quickly since the data is close at hand. But one
of the unexpected event is a building fire. Now, the systems we were backing up along with the backup server
had been lost in the fire. Yikes. We've lost everything.
This is why offsite backups are strongly recommended. This involves making backups of critical data
and sending the backup data offsite to remote systems in a different physical location. This could be another
5
backup server that you control in a different office, or a cloud hosted backup service. But there are trade-offs.
Offsite backups better prepare us for catastrophic events that can wipe out data from an entire office, but
sending data offsite means that you need to transmit the data outside of your network.
This means you need to consider things like encryption and bandwidth. Your internet connection will
be used to transmit the backup data. Depending on how much data you're sending off site and how fast the
internet connection is, this could take a lot of time. Another important thing to consider is encryption of
backups. Since backups will often contain sensitive and confidential business data, it's important the data is
handled securely and stored in a way that prevents unauthorized access. When sending data offsite is especially
important to make sure that data is being transmitted securely, preferably encrypted via TNS. But that's not all.
The resulting backup data that's stored should also be encrypted at rest. This is just good security practice. In
the next lesson, we will discuss some of the practical tools you can use to back up your data.

3.4.3 Backup Solutions


So you're looking to bring a backup solution into your organization. But how do you chose between a
DIY backup system or one of the many cloud providers? Well let's start by looking at
the tradeoffs between the two. On site, or self-managed backups, could be as simple as buying a commercial
NAS device, loading it with a bunch of hard drives, and sending data to it over the network. This would
definitely work, but it might not be the best long term solution. How do you grow the disk capacity when you
need more storage space? How do you handle the failed hard disk? Because hard disks will fail eventually, by
the way, it's important to call out these options aren't mutually exclusive.
There's nothing stopping you from implementing both on-site and offsite backups. Actually, it's often
recommended to have both if it's within your organization's budget. One thing that you should consider when
evaluating the backup strategy for an organization is, backup time period.
How long do you need to hang on to backups for? This answer will impact your long term storage
needs and overall costs to maintain a backup system. One approach, which balances cost with convenience, is
to archive older data using a slower but cheaper storage mechanism. The standard medium for archival backup
data storage is data tapes. These are a lot like audio cassette tapes since they use spools of magnetic tape run
through machines that allow data to be
written to and read back from the tape.
Tape storage is pretty cheap, but isn't as easy or quick to access as data stored on
hard drives or solid-state drives. This storage system is usually used for long term archival purposes, where data
isn't likely to be needed. If it is needed some delay in getting the data isn't a concern. There are dozens and
dozens of backup solutions available, we won't cover specific ones
since there are way to many. But we'll cover some common tools and give you some examples of
backup solutions available.
One is the command line utility rsync. Rsync isn't explicitly a backup tool, but it's very commonly
used as one. It's a file transfer utility that's designed to efficiently transfer and synchronize files between
locations or computers. Rsync supports compression and you can use SSH to securely transfer data over a
network. Using SSH, it can also synchronize files between remote machines making it super useful for simple
automated backups. Apple has a first party backup
solution available for their, Mac operating systems called it Time Machine. It operates the using
an incremental backup model. Time Machine supports restoring an entire system from backup or individual
files. It even allows restoring older versions of backup files.
Microsoft also offers and first party solution called Backup and Restore. This has two modes of
operation, as a file-based version where files are backed up to a zip archive. Or there's the system image where
the entire disk saved block by block to a file. File based backup support, either complete backups or
incremental ones. System image backups support differential mode, only backing up blocks on the disk that
have changed since the last backup.

3.4.4 Testing Backups


There's one last super important topic when it comes to backups. Testing them. The field of IT is
littered with tragic tales of IT support specialist and sysadmins attempting to restore data from a back-up after a
6
data loss incident, only to discover that their backups are invalid. That's not just embarrassing, it's completely
terrifying. The takeaway here is that it isn't sufficient to just set up regular backups. That's only half of the
equation. The other half is a recovery process and that process needs to be tested regularly.
Restoration procedures should be documented and accessible so that anyone with the right access can
restore operations when needed. You don't want your time off to be interrupted because your colleague back at
the office doesn't know how to restore the sequel database from the back up, right? Of course not. So, document
the procedure and make sure you regularly test the documentation to make sure it works now and in the future.
This process is called Disaster Recovery testing and is critical to ensuring a well functioning recovery system.
Disaster recovery testing should be a regular exercise that happens once a year or so. It should have
different teams including I.T. support specialists going through simulations of disaster events. They'll test and
evaluate how well-prepared or unprepared your organization is when lots of unexpected events. These scenarios
can be anything from a simulated natural disaster, like an earthquake to a fictional, event like a horde of
zombies shutting down an office. If that's the case, back-ups would be the least of your worries. But it's still
important, whatever the scenario, it will your I.T. teams test their emergency procedures and figure out what
works and most importantly what doesn't. These simulated events are the perfect way to discover any gaps in
your planning. If you discover that you aren't protected from data loss in any given scenario, it's an opportunity
to learn and fix this gap without risking real data loss. Sounds like a win-win, doesn't it?

3.4.5 Types of Backups


So we've talked about how important backups are, and why you should be backing up any important
data. And some tools that can be used to help you back up data. But how exactly do you decide when, and how
to back up data? Well let's explore those options.
There's a couple of ways to perform regular backups on data that's constantly changing. You can do a
full backup on a regular basis, which involves making a copy of the data to be fully backed up. The full
unmodified contents of all files to be backed up is included in this backup mechanism whether the data was
modified or not. In the case of data that doesn't change very often, like operating system configuration files, this
approach can be inefficient. You're backing up multiple copies of data that isn't changing, which waste space
and uses bandwidth unnecessarily. That doesn't seem like the best idea does it?
A more efficient approach is to only backup files that are changed, or been created since the last full
backup, this is called a differential backup. The advantage is that you aren't storing backups of duplicated,
unchanging data, only the files that changed are backed up. Saving us some storage space and time to form the
backup. But you wouldn't want to completely stop taking full backups. Over time you wind up tracking and
storing lots of copies of files that change a lot, which will also take up more and more disk space over time. To
avoid this, it's a good practice
to perform infrequent full backups, while also doing more frequent differential backups. How often you
perform a full backup will depend on how far back you want changes to be tracked. Let's say we perform full
backups once every week, and differential backups daily. In the worst-case scenario, we'll lose close to 24 hours
of data changes, that's not bad.
Another efficient way to backup changing data is to perform regular incremental backups. While a
differential backup backs files that have been changed or created, and incremental backup is when only the data
that's changed in files is backed up. This is even more efficient in terms of both disk space, and time required
compared to differential backups. Again, you'll want to use frequent incremental backups along with less
frequent full backups. But because this approach only sorts differences in the files that have changed since the
last incremental backup, it's possible that all incremental backups are needed to fully reconstruct the files. If one
of these incremental backups is missing or corrupt it might not be possible to recover data any more recently
than the last full back up.
Another drawback is that recovery might be more time consuming. This is because the most recent
version of backed up data has to be recreated by integrating the last full backup with each incremental backup
that follows. For super large files that are changing frequently this could require a lot of time to process.
One more thing backup system can do to help save space is bar compression. When creating a backup
all the files and folder structures will be copied and put into an archive. Archives are useful for keeping files
organized and preserving full structure. Besides archiving the files, backups can lso be compressed, this is a
mechanism of storing the same data while requiring less
7
disk space by using complex algorithms. Those are way to complicated to go into detail here, but it's important
to call out that not all data types lend themselves to being compressed. This means that space savings from
compression will depend on what you're backing up. Another thing you should know about compressing
backups is the expense of restoration.
To recover data from a backup it needs to be decompressed first. Depending on the size of your
backups this could take a lot of time, and disk space to expand. We touched on backup storage location a bit in
the last lesson, but let's dive into a little more detail. Good news, there's a pretty cheap and easy to maintain
option out there for storing backup data on site. You can use a commercial NAS device, or configure a file
server with a large amount of disk space. Wherever you chose to store your backup data you'll need a lot of
space. You could go out and buy a giant 10
terabyte hard disk, which could work for a little while. But what do you do once your back up data grows to
fill that one disk? Are they even making disks larger than 10 terabytes yet? Another thing to worry about is
what do you do if that one disk holding all your backed up data fails. Yikes, that wouldn't be good.
These are issues a RAID array can address. RAID stands for Redundant Array of
Independent Disks, it's a method of taking multiple physical disks and combining them into one large virtual
disk. There are lots of types of RAID configuration called levels. Depending on the characteristics desired from
the RAID, various RAID levels prioritize features like performance,
capacity, or reliability. RAID arrays are a great, inexpensive way of creating a lot of data capacity, while
minimizing risk of data loss in the event of disk failure. They can even be flexible enough to
allow future growth in disk capacity. We won't go into the nitty gritty details of the different RAID levels
available. I want to stress the fact that RAID isn't a back-up solution, it's a data storage solution that has some
hardware failure redundancy available in some of the RAID levels. But storing data on a RAID array doesn't
protect against accidentally deleting files, or malware corrupting your data. This is so important that I'm going
to say it one more time. RAID is not a replacement for backups.

3.4.6 User Backups

As an IT support specialist working closely with users in your organization, the topic of users backups
is sure to come up. We've already covered backing up mission critical operational data, but what about the
spreadsheets and PDX and Jane Doe’s laptop. She's going to want to make sure that she doesn't lose those if her
laptop gets stolen. While it's important to have a backup solution for infrastructure and critical systems, you
also to think about your users and their valuable files. Ensuring reliable backups for client devices is a bit more
challenging than infrastructure devices. There likely to be lots of more client devices to backup compared to
infrastructure ones. Plus there are laptops, phones and tablets that won't be in the office all the time.
One solution to user backups is to use a cloud service designed for syncing and backing up files across
platforms and devices. Some examples of these are things like Dropbox, Apple iCloud and Google Drive,
which are simple and straightforward to use. There's no complicated scheduling or configuration compared to
infrastructure backups. They make it easy for users to configure what files or folders they want to have backed
up and then ensure the files are synchronized with what's stored in the cloud. As an IT support specialist, this is
especially relevant. When users accidentally spill a cup of coffee on their laptop, they're going to come to you
hoping their precious family photos can be saved. Getting users set up within easy to use and effective backup
system for their files is great way to avoid this situation.

3.5 Disaster Recovery Plan


We have all experienced an accident or made a mistake before. We're only human, right? As an IT
support specialist, It's important that you're prepared for a disastrous accident or mistake. Maybe an electrical
surge fried the hard drive in a database server or someone deleted the wrong folder. These things happen and
you need to be prepared for them. This is where the disaster recovery plan comes in.
A disaster recovery plan is a collection of documented procedures and plans on how to react and
handle an emergency or disaster scenario, from the operational perspective. This includes things that should be
done before, during and after a disaster. The goal of the disaster recovery plan is to minimize disruption to
business and IT operations, by keeping downtime of systems to a minimum and preventing significant data loss.
Despite the name, a disaster recovery plan will actually cover preventive measures and detection measures on
8
top of the post disaster recovery approach. Preventative measures cover any procedures or systems in place that
will proactively minimize the impact of a disaster. This includes things like regular backups and redundant
systems. Anything that's done before an actual disaster that's able to reduce the overall downtime of the event is
considered preventative. For example, a standard for critical network infrastructure or service to have redundant
power supplies. They're often fed from different power sources like battery backup. This is designed to
minimize the downtime that would be caused by one power supply failing or a power outage. That would be a
preventative measure.
When we cover designing and effective disaster recovery plan. Detection measures are meant to alert
you and your team that a disaster has occurred that can impact operations. Obviously, you need to be aware of a
disaster in order to take appropriate steps to reduce the impact on your organization. Timely notification of a
disaster is critical, since some steps of the disaster recovery plan might be time sensitive to ensure there is no
data loss or equipment damage. If there a power outage for example, critical systems should fall back to battery
power. But, battery backup power will only keep the systems on line for so long.
To avoid potential data loss or damage, these systems should be gracefully shut down before they
completely lose power. This is why lots of systems that support redundant power supplies also have a function
to send alerts on power loss events. Other things that should be monitored to help head off any unexpected
disasters include environmental conditions inside several networking rooms. Flood sensors can alert you to
water coming into the server. Temperature and humidity sensors can watch for dangerously high temperatures
or some optimal moisture levels in the air. This can alert you to failed cooling in the server room. Smoke
detectors and fire alarms are also critically important.
Remember, that it isn't just the technology that's necessary for a business to continue to operate. You
also need the people working there. The specifics of building evacuation usually forward to the building
management team, but as an IT support specialist, you'll likely work closely with some members of this team
on things like, power delivery, heating and cooling systems and building evacuation. If there's a fire and the
building need to be evacuated, you should be prepared to set up temporary accommodation so people can still
work effectively. This might be as simple as sending everyone home for the day and having them work from
home. But you should be prepared for the situation ahead of time and have a plan in place to ensure that
everyone is able to work from home effectively.
Finally, corrective or recovery measures are those enacted after disaster has occurred. These measures
involve steps like restoring lost data from backups or rebuilding and reconfiguring systems that were damaged.
Once the disaster has been detected and steps have been taken to either prevent an outage or at least minimize
downtime, work should begin on restoring full operations of everything affected. Usually, a disaster will take
out one system that's part of a redundant pair or a replication scheme which would prevent a complete service
outage. But that would mean you aren't prepared for another disaster. When one system in a redundant pair
suffers a failure, it's called a single point of failure. This is because it only takes one failure now to completely
take the system down. That's not a scenario you want to be in. We've covered the overall elements of a disaster
recovery plan. Why don't you take a little break to recover yourself, then join me in the next lesson. We're
going to run down the details of designing a disaster recovery plan of your own. Does that include planning for
the zombie apocalypse? You have to watch to find out .

3.5.1 Designing a Disaster Recovery Plan

So, what exactly goes into an effective disaster recovery plan? Well, it depends. Keep in mind, there's
no one-size-fits-all for disaster recovery plans. The mechanisms chosen and procedures put in place will depend
a lot on the specifics of your organization and environment. But we can't start by covering the three types of
measures in more detail. We'll also go over some examples to help give you an idea of what to think about.
A good way to understand what to plan for is to perform a risk assessment. This involves taking a long
hard look at the operations and characteristics of your teams. A risk assessment allows you to prioritize certain
aspects of the organizations that are more at risk if there's an unforeseen event. Risk assessment can involve
brainstorming hypothetical scenarios and analyzing these events to understand how they'd impact your
organization and operations. When you look into preventive measures, pay attention to systems that lack
redundancy. If it's something critical to permit operations, they should probably have a redundant spare, just in

9
case. Make sure you have a sound backup and a recovery system, along with a good strategy in place. Ideally,
you should have regular but automated backups to backup systems located both on site and off site.
It's also critical that you have data recovery procedures clearly documented and kept up-to-date. Data
recovery following a disaster is hopefully something you will rarely do. It's important to make sure that these
procedures are updated since systems change and evolve throughout their lifetime. Redundancies shouldn't be
limited only to systems. Anything critical to operations should be made redundant whenever possible. This
includes power delivery or supply, communications systems, data links, and hardware. Think through the
impacts that a disaster affecting each of these aspects would have on your day-to-day operations. What would
happen to your network if the building lost power? Can you use continue to work at the fiber-optic data line if
the building gets damaged by a nearby construction work? You call router for the office just burst into flames.
The last one is a little far-fetched, but I think you get where I'm going. Another super important
preventive measure that should be evaluated and verified is operational documentation. Make sure that every
important operational procedure is documented and accessible. This includes things like setting up and
configuring critical systems and infrastructure. Any steps or specific configuration details that are needed to
restore 100 percent functionality to core systems and services should be documented in detail. It's also
important that this documentation is kept up-to-date.
An effective way to do this is periodically verify that the steps documented actually work. You don't
want to find yourself in a situation where you need to reconfigure assistant following incident, only to discover
that the documentation is wrong. When looking at detection measures, you'll want to make sure you have a
comprehensive system in place that can quickly detect and alert you to service outages or abnormal
environmental conditions. If up-time and availability is important for your organization, you'll likely have two
internet connections; a primary and a secondary.
You want to monitor the connection status of both of these links. Ideally, they should be configured to
automatically fail over if one goes down. But you'll still want to be alerted when this happens, so you can
investigate why the link went down and work on getting it back as soon as possible. The way to think about
designing detection measures is to evaluate what's most critical to the day-to-day functioning of the
organization. What infrastructure services and access are absolutely vital? Those are the ones you should be
closely monitoring. And it's not just outages or complete failures you should be watching for. Of course, you
want to monitor for those, but you also want to monitor for conditions that indicate that a problem is likely to
occur.
If we can avoid a catastrophic failure by being alerted to an overheating server before it fails, that
would be much better, wouldn't it? So, you want to monitor conditions of service and infrastructure equipment.
Things like temperatures, CPU load and network load for a service monitoring for error rates and requests per
second will give you insight into the performance of the system. You should investigate any unusual spikes or
unexpected increases. These early warning systems allow you to head off disaster before it brings operations to
a halt. And of course, you absolutely must test these systems.
Simulate the conditions your monitoring systems are designed to catch. Make sure the detection
thresholds actually fire the alerts like they're supposed to. This testing goes beyond just the monitoring systems
too. You'll also want to test your reactions and responses to these alerts. If you're monitoring systems reliably
trigger alerts but everyone just ignores them, they aren't super useful, are they? You want to conduct regular
disaster test to make sure systems are functioning and that your procedures to handle them are also up to the
task.
Corrective or recovery measures include actions that are taken to restore normal operations and to
recover from an incident or outage. This includes taps like restoring a corrupted database from a backup or
rebuilding and re-configuring a server. Your disaster plan should include reference or links to documentation
for these types of tasks. Anything and everything that would be required to restore normal operations following
some disaster. This is where the steps for restoring various systems and data from backups should be. The
disaster recovery doc doesn't need to contain the details of the operations. Links and references are sufficient.
But it's important to be prepared for a situation where typical documentation acts methods are unavailable. Let's
imagine you keep all your operational documentation in a wiki on a dedicated server. What happens when that
server suffers an outage? It's important that critical documentation is accessible if an emergency scenario or
disaster strikes. In the next lesson, we'll cover what to do after a failure. See you there.

10
3.6 Post Mortems
Making mistakes is part of being human. We all make them from time to time, and hopefully each
mistake is a learning opportunity. We look at what happened and try to understand why it did to avoid it in the
future. Think of a toddler learning the stove is hot. They touch it once and it burns them. They quickly learn
that touching the stove hurts, so they stop doing it. It's important that we're able to learn from our mistakes or
else we'd constantly be burning ourselves. That's the purpose of a post-mortem report.
We create a post-mortem after an incident, an outage, or some event when something goes wrong, or
at the end of a project to analyze how it went. This report documents, in detail, what exactly happened leading
up to, during, and after the event or project. It tries to highlight things that went well and things that didn't. The
purpose of a post-mortem is to learn something from an event or project, not to punish anyone or highlight
mistakes. Sure, they'll likely be some mistakes that'll be address but, the intention isn't to punish or shame, it's
meant to understand the root cause of why the mistakes happen and how to prevent them from happening again.
The post-mortem process isn't quite over yet, once the report is written up it needs to be communicated. These
types of reports have value beyond the immediate people involved in the incident, investigation and writing of
the report.
Sharing post-mortems with other teams at an organization helps encourage a culture of learning from
mistakes, it shows that your team is willing to acknowledge when you mess up. But you don't let mistakes hold
you back, it sets a good example for others to follow and the content in the report might trigger the thought that
some other team and make them realize they have a similar problem in their infrastructure.
You may also identify areas that could be improved that are the responsibility of teams that weren't
involved in the incident. When it comes to post-mortems, sharing is caring. If you embrace post-mortems, they
tend to foster a culture where it's okay to make mistakes. This is a healthy attitude to have in any organization.
If people are afraid to make mistakes, then decision making will be very conservative. It's hard to push the
boundaries and try new things if everyone is afraid of screwing up. But if the prevailing culture says that
mistakes are okay as long as we can learn from them, then you'll have an organization that's willing to take
risks, and try out new and exciting ideas.

3.6.1 Writing a Post-Mortem

Now that we know the benefits of a postmortem, let's take a closer look at what goes into one. A typical
postmortem for an incident begins with a brief summary, just a short paragraph, that summarizes the incident. It
should include: what the incident was? How long it lasted? And what the impact was? And how it was fixed?
Be mindful of time zone when listing times and dates in the postmortem. Always clearly state the time zone to
be absolutely clear.
Next, you need a detailed timeline of key events. This should include everything that happened
throughout the event like when it started, and when people involved were notified or realized what was going
on. Every action taken in an attempt to resolve this situation should also be identified. These could contain
times and dates along with time zones, and who did what. The timeline should wrap up with the actions taken
that resolved the outage and restored services, signaling the end of the event.
Next, a very detailed and honest account of the root causes covered. This should go into detail
explaining what led to the issue. It could be something like a configuration change that was pushed live without
proper testing, or a command that was typed. Remember that the intent isn't to blame or shame, is to be honest
with what went wrong so that a lesson can be learned from it. If the cause stemmed from a lack of testing, this
might indicate areas for improvement in test verification. If it was from a typed command, this may reveal the
need to automate a manual process. A more detailed explanation of the resolution and recovery efforts should
be documented next. This is similar to the timeline covered earlier and should include the dates, times, and time
zones. But it should go into more detail about what steps were taken to recover, the rationale and the reasoning
behind those actions, and what the outcome of each that was including the rationale gives those reading the
report more context on how the event played out.
Lastly, close out the report with a list of specific actions that should be taken to avoid the same scenario
from happening again. This should include any actions or efforts aimed at improving the response handling too.
Steps to reduce response time when monitoring would help. When you spin up a list of things to improve, look
out for things like improvements to monitoring systems. Maybe the incident investigation revealed a gap in
11
visibility into critical systems, or the cause investigating turns out that automation system isn't functioning as
intended. While it's outside the scope of a postmortem report to come up with solutions to these gaps, they
should be listed in areas for improvement. Based on these discoveries, new parties can start to address the
deficiencies found.
One thing that often gets overlooked in postmortem is what went well. But this is just as important as
analyzing what went wrong. During the post-incident analysis, it's also good to highlight things that went well.
These include: fail safe or fail of a system that worked as designed, and prevented a large outage, or minimized
the severity of the outage. This helps to demonstrate the effectiveness of our systems in place. For some folks,
like those in finance, this is good news. It justifies any cost associated with these systems by clearly
demonstrating a tangible benefit. This is important for any preventative system since they are frequently viewed
as unnecessary costs by those that may not fully understand their benefit.
These examples of systems working to prevent or reduce the impact of outages make that benefit very
clear. Hopefully, you're better prepared now to learn from any mistakes you might make in your career. And
mistakes will be made because you are human after all. Own them, learn from them, and do better next time.
Such is true in IT, work, and in life. Well, that brings us to the close of our sysadmin course. We hope you have
a better understanding of the responsibilities of a sysadmin, and how closely they relate to the work of an IT
support specialist.
Feel free to review any sections that might have been a bit heavy on the technical side. First up, we've
brought a quiz for you that would touch on everything we've learned in the unit. I wish you the best of luck.

Multiple-Choice Quiz

1. What does a directory server provide?


a. A real-time communication service
b. A network time service
c. A lookup service for an organization
d. A replication service

2. What benefits does replication provide? Choose all that apply.


a. Redundancy b. Enhanced security c. Increased capacity d. decreased latency

3. What’s the most popular directory services protocol used today?


a. Directory Access Protocol
b. Directory System Protocol
c. Lightweight Directory Access Protocol
d. Directory operational Binding Management Protocol

4. Which of these are examples of centralized management? Choose all that apply.
a. Role-based access control
b. Centralized configuration management
c. Copying configurations to various systems
d. Local authentication

5. Which of these are components of an LDAP entry? Choose all that apply.
a. Uncommon name
b. Common name
c. Organizational user
d. Distinguished name

6. What does the LDAP Bind operation do exactly?


a. Modifies entries in a directory server
b. Looks up information in a directory server
c. Authenticates a client to the directory server
d. Changes the password for a user account on the directory server
12
7. Which of the following are authentication types supported by the LDAP Bind operation? Choose all
that apply.
a. Anonymous b. Simple c. Complex d. SASL

8. What is Active Directory? Choose all that apply.


a. An open-source directory server
b. A windows-only implementation of a directory server
c. Microsoft’s implementation of a directory server
d. An LDAP-compatible directory server

9. How is an Organizational Unit different from a normal container?


a. It’s not: it’s just a different name for a container
b. It can hold other objects
c. It can only hold other containers
d. It can hold additional containers

10. When you create an Active Directory domain, what’s the name of the default user account?
a. Superuser b. Root c. Username d. Administrator

11. True or false: Machines in the Domain Controllers group are also members of the Domain Computers
group.
a. True b. False

12. In what way are security groups different from distribution groups?
a. They’re the exact same thing.
b. Security groups are used for computers, while distribution groups are used for users.
c. Security groups are used for users, while distribution groups are used for computers.
d. Security groups can be used to provide access to resources, while distribution groups are only used
for email communication.

13. What’s the difference between changing a password and resetting a password?
a. Changing a password requires the previous password
b. Changing a password does not require the previous password
c. Resetting a password locks the account
d. They’re the same

14. True or False: Joining a computer to Active Directory involves joining the computer to a workgroup.
a. True b. False

15. Joining a computer to an AD domain provides which of the following advantages? Choose all that
apply.
a. Centralized authentication
b. More detailed logging
c. Centralized management with GPOs
d. Better Performance

16. What are Group Policy objects?


a. Special types of containers
b. Special Types of computers groups’
c. Special types of user groups
d. Settings for computers and user accounts

17. What’s the difference between a policy and a reference?


13
a. They’re the exact same thing.
b. A policy is used to set a preference
c. A policy is enforced by AD, while a preference can be modified by a local user.
d. A policy can be modified by a local user, while a preference is enforced by AD.

18. With a brand-new AD domain, what do you need to change before you can target groups of users and
machines with GPOs?
a. Nothing; the default configuration is good to go.
b. You need to place users and computers into new OU’s
c. You need to create an administrator account.
d. You need to rename the default groups.

19. Select the right order of enforcement of GPOs:


a. Site  Domain  OU
b. OU  Domain  Site
c. Domain  Site  OU
d. Site  OU  Domain

20. How does a client discover the address of a domain controller?


a. It’s pushed via an AD GPO.
b. It sends a broadcast to the local network.
c. It makes a DNS query, asking for the SRV record for the domain.
d. It’s provided via DHCP.

21. Which of the following could prevent you from logging into a domain-joined computer? Choose all that
apply.
a. You’re unable to reach the domain controller.
b. Your computer is connected to Wi-Fi.
c. The user account is locked.
d. The time and date are incorrect.

Laboratory 3.1 Seven People, One Command Line

What do you think about this case study?

Sysadmin George was assigned to create a new web server on a machine outside the corporate firewall
and connect it to an authentication server inside the firewall on one of the customer accounts he supported.
His manager sent detailed instructions for the process, which included sample commands for over
twenty steps to be performed under a very tight deadline. The first few steps for creating the new web server
appeared to go well, but configuring the authentication server to work with the new web server produced a
vague error message: “Error: Could not connect to server.”
For the next few hours, George was involved in increasingly intense troubleshooting. Through
telephone, e-mail, instant messaging, and in-person conversations, he worked with seven different people,
including his manager, the network team, his office mate, the architect of the system, a technical support
person, a colleague, and a software developer. Each asked him questions about system behavior, entries in log
and configuration files, error codes, and so on, and each suggested command to run. Each sought his attention
and trust, competing for the right to tell him what to do.
We refer to this collaboration pattern as “Seven People, One Command Line,” as various people
participated in troubleshooting, but only George had access to the troubled system. His manager wanted to
know when the problem would be fixed and whether others should be redirected to help him complete the task
on time. The support person wanted to resolve the problem ticket and end the call as quickly as possible. His
colleague wanted to help within the limitations imposed by his own responsibilities. The system architect

14
wanted to know if there was any problem in the overall design without being mired in the details. Other
specialists waited for instructions to manipulate the subsystems they were responsible for.
The problem was eventually found to be a network misconfiguration. George misunderstood the
meaning of a certain configuration parameter for the new web server (ambiguously labeled as “port”) to be for
communication from the webserver to the authentication server, when in fact it was the opposite. The former
would have been permitted by the firewall, but the latter was not.
George’s misunderstanding affected the remote collaborators significantly throughout the
troubleshooting session. We witnessed several instances in which he ignored or misinterpreted evidence of the
real problem, filtering what he communicated by his incorrect understanding of the system configuration, which
in turn greatly limited his collaborators’ ability to understand the problem. George’s error propagated to his
collaborators. The solution was finally found by the one collaborator who had independent access to the
systems, which meant his view of the systems was not contaminated by George’s incorrect understanding.

15

You might also like