You are on page 1of 9

Audit logs for security and compliance

Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA

WRITTEN: 2004

DISCLAIMER:
Security is a rapidly changing field of human endeavor. Threats we face literally
change every day; moreover, many security professionals consider the rate of
change to be accelerating. On top of that, to be able to stay in touch with such
ever-changing reality, one has to evolve with the space as well. Thus, even
though I hope that this document will be useful for to my readers, please keep in
mind that is was possibly written years ago. Also, keep in mind that some of the
URL might have gone 404, please Google around.

A beaten maxim proclaims that “knowledge is power”, but where do we get our
knowledge about IT resources? The richest source of such information is logs
and audit trails. Through logs and alerts (which we treat similarly to logs and
audit trails), information systems often give signs that something is amiss or even
will be amiss soon.

What are some examples of log files and audit trails? We can classify the log
files by the source that produced them, since it usually determines the type of
information contained in the files. For example, host log files, produced by UNIX,
Linux and Windows, are different from network device logs, produced by Cisco,
Nortel, and Lucent routers, switches, and other network gear. Similarly, security
appliance logs, produced by firewalls, intrusion detection system, intrusion
“prevention” systems, are very different from both host and network logs. In fact,
the security devices display a wide diversity in what they log and the format in
which they do it. Ranging in function from simply recording suspicious IP
addresses all the way to full network traffic capture, security devices produce an
amazing wealth of information, both relevant and totally irrelevant to the situation
at hand.

Thus, logs present unique challenges. Some of the questions that we ask are:

• How do we find what is relevant for the situation at hand?


• How can we learn about intrusions—past, present and maybe even future
—from the logs?
• Is it easy to expect to surf through gigabytes of log files in search of
evidence that might not even be there, since the hacker was careful to not
leave any traces?
• How do we use logs to come up with high-level metrics, indicating the
health of our enterprise?
• Can compliance auditors use the logs to prove or disprove regulator
compliance in the organization?
Let us briefly demonstrate some common log example. UNIX and Linux
installations produce a flood of messages via a syslog or “system logger”
daemon, in plain text. Such message can indicate:
• There is a problem with a secondary DNS server.
• A user has logged in to the machine.
• A forbidden DNS access has occurred.
• A user has provided a password to the Secure Shell daemon for remote login

Similarly, newer Window versions also provide extensive system logging. It uses
a proprietary binary format to record three types of log files: system, application,
and security. For example, the system log contains various records related to the
normal - and not so normal - operation of the computer

In many cases, the log files don’t just give the clear answers that need to be
extracted – sometimes forcefully - from them. This is accomplished by performing
“log analysis”. Log analysis is the science and art of extracting answers from
computer-generated audit records. Often, even seemingly straightforward logs
need analysis and correlation with other information sources. Correlation means
the manual or automated process of establishing relationships between
seemingly unrelated events happening on the network. Events that happen on
different machines at different times could have some sort of relationship,
relevant to the situation. Such relationships need to be discovered and disclosed.

Why analyze the logs? The answer is different for different environment. For
example, for a home or small office (SoHo) computer system logs are only useful
in the case of major system trouble (such as hardware or operating system
failures) or security breaches which are easier to prevent since you only have to
watch a single system or a small number of systems. Often, your time is better
being spent reinstalling your Windows operating system and keeping up with
patches and updates. Poring over logs for signs of potential intrusions is not
advisable for most users, with the possible exception of hard core log analysis
addicts. Only the minimum amount of logging should thus be enabled and the
analysis boils down to firing up Windows event logger after something wrong
occurs.

Next, let us consider a small to medium business with no full-time security staff.
In this sense, it is similar to a home system, with a few important differences.
This environment often has people who astonish security professionals with
comments such as "Why would somebody want to hack us, we have nothing that
they need?" Now more and more people understand that disk storage, processor
cycles, and high-speed network connections have a lot of value for attackers.
Log analysis for such an organization focuses on discovering, detecting and
dealing with high-severity threats. While it is well known that many low-severity
threats reflected in logs might be precursors for a more serious attack, a small
company rarely has the resource to investigate them.
A large corporation is regulated by more administrative requirements than the life
of an individual. Among these are the responsibility to shareholders, fear of
litigation and other liability. Due to the above, the level of security and
accountability is higher. Most organizations connected to the Internet now have
at least a firewall and some sort of a dedicated network for public servers
exposed to the Internet. Many also have deployed spam filters, intrusion
detection systems (IDS), intrusion prevention systems (IPS) and Virtual Private
Networks (VPNs) and are looking at more novel solutions such as anti-spyware.
All these technologies raise concerns about what to do with logs coming from
them, as companies rarely hire new security staff just to handle the logs. In such
an environment, log analysis is of crucial importance. The logs present one of the
best ways of detecting the threats flowing from the hostile Internet as well as
from the inside of their networks.

Overall, do you have to do log analysis? The answer to this question ranges from
a “not likely” for a small business to an unquestionable “Yes!!!” for a larger
organization.

By now, we convinced you that the information in logs can be tremendously


important; we also stated that such information will often be extremely
voluminous. However, such log analysis and review program needs to be
consistent.

Imagine you work for one of those companies where information security is taken
seriously, senior management support is for granted, the appropriate IT defenses
are deployed and users are educated on the security policy. Firewalls are
humming along, intrusion detection systems are installed and incident response
team is ready for action. This will probably go a long way towards creating a
more secure enterprise computing environment. Lets look at it from the
prevention- detection- response model. The above solutions provide the
technical side of the prevention, detection and response. The complex interplay
between prevention detection and response is further complicated by the
continuous decision making process: 'what to respond to?', 'how to prevent an
event?', etc. Such decisions are based on the information provided by the
security infrastructure components. Paradoxically, the more security devices one
deploys, the more firewalls are blocking messages and generating logs, the more
detection systems are sending alerts, the more messages the servers spew, the
harder it is to make the right decisions about how to react. Logs from all of the
above devices need to be consistently and diligently analyzed to arrive at the
right security decisions.

What are the common options for optimizing the security decisions made by the
company executives? The security information flow need to be converted from
logs and alerts into a decision. The attempts to create a fully automated solution
for making such a decision, some even based on artificial intelligence, have not
yet reached a commercially-viable stage. The problem is thus to create a system
to reduce the information flow sufficiently and then to provide some guidance to
the system's human operators in order to make the right security decision.

In addition to facilitating decision making in case of a security-related log or other


event indication (defined as a single communication instance from a security
device) or an incident (defined as a confirmed attempted intrusion or other
attack), reducing the information flow is required for implementing security
benchmarks. Assessing the effectiveness of deployed security controls is an
extremely valuable part of an organization security program. Such an
assessment can be used to calculate a security Return On Investment (ROI) and
to enable other methods for marrying security and business needs.

The commonly utilized scenarios can be loosely categorized into install-and-


forget (unfortunately, all too common) with no log analysis in sight, manual log
data reduction (or, reliance on a particular person to extract and analyze the
meaningful audit records) and in-house automation tools (such as scripts and
utilities aimed at processing the information flow). Let us briefly look at
advantages and disadvantages of the above methods.

Is there a chance that that the first approach - deploying and leaving
the security infrastructure unsupervised with no log review- have a business
justification anywhere outside of a very small environment such as described
above? Indeed, some people do drive their cars without a mandatory car
insurance, but companies are unlikely to be moved by the same reasons that
motivate the reckless drivers. Most of the readers have probably heard 'Having a
firewall does not provide 100% security' many times. In fact, it is often stated that
0-day (i.e. previously unknown) exploits and new vulnerabilities are less of a
threat to security, than the company employees. Technology solutions are rarely
effective against social and human problems. Advanced firewalls can probably
be made to mitigate the threat from new exploits, but not from the firewall
administrators' mistakes and deliberate tampering from the inside of the
protected perimeter. In addition, total lack of feedback and awareness on security
technology performance, coming from log collection and review program, will
prevent a company from taking a proactive stance against new threats and
adjusting its defenses against the flood of attacks hitting its bastions.

The next possibility is where no consistent log review program is present but
some employees are dedicated to the task. Does relying on human experts to
understand your log information and to provide effective response guidelines
based on the gathered evidence constitutes a viable alternative to doing nothing?

Two approaches to the problem are common. First, a security professional can
study the logs only after the security incident. Careful examination of log
evidence collected by various security devices will certainly shed the light on the
incident and will likely help to prevent the recurrence and further loss. However,
in case where extensive damage is done, it is already too late and prevention of
future incidents of the same kind will not return the stolen intellectual property or
allay the disappointed business partners. Expert response after-the-fact has a
good chance to be delayed in the age of fast automated attack tools. The second
option is to review the accumulated audit trail data periodically, such as on a
daily or weekly basis. A simple calculation is in order. A single border router will
produce several hundred messages per second on a busy network, and so will
the firewall. Adding host messages from hundreds of servers will increase the
flow to possibly thousands per second. Now if one is to scale this to a global
company network infrastructure, the information flow will increase hundredfold.
No human expert or a team will be able to review, let along analyze, the incoming
flood of signals.

But what if a security professional chooses to automate the task by writing a


script or a program to alert him or her on the significant alerts and log records?
Such technical approach to a log review program may help with data collection
(centralized syslog server or a database) and alerting (email, pager, voice mail).
However, a series of important questions arises. Collected log and audit data will
greatly help with an incident investigation, but what about the timeliness of the
response? Separating meaningful events from mere chaff is not a trivial task,
especially in a global distributed and multi-vendor environment. Moreover, even
devices sold by a single vendor might have various event logging and
prioritization schemes. Thus designing the right data reduction and analysis
scheme that optimizes security decision process might require significant time
and capital investment and still not reach the set goals due to a lack of the
specific analysis expertise.

In addition, escalating alerts on raw event data (such as 'if you see a specific bad
IDS signature, send me an email') will quickly turn into the "boy that cried wolf"
story with pagers screaming for attention and not getting it. In light of the above
problems with prioritization, simply alerting on "high-priority" events is not a
solution. Indeed, IDS systems can be tuned to provide less alerts, but to
effectively tune the system one needs access to the whole feedback provided by
the security infrastructure and not just to raw IDS logs. For example, outside and
inside firewall logs are very useful for tuning the IDS deployed in the DMZ.

Overall, it appears that simply investing in more and more security devices
without a consistent program to analyze and review their logs will not create
more security. One needs to keep in close touch with the deployed devices, and
the only way to do it is by using special-purpose automated tools to analyze all
the information they produce and to draw meaningful conclusions aimed to
optimize the effectiveness of the IT defenses. While having internal staff writes
code to help accumulate data and map it might be acceptable in immediate term
situations in small environments, the maintenance, scalability and continued
justification for such systems likely has a very low ROI. In fact, it has caused the
birth of Security Information Management (SIM) products that have, as their only
focus, the collection and correlation of this data as well as the creation of
executive-level metrics from logs.

Logs are also immensely valuable for compliance programs. Many recent US
regulations such as HIPAA, GLBA, Sarbanes-Oxley and many others have items
related to audit logging and handling of those logs. For example, a detailed
analysis of the security requirements and specifications outlined in the HIPAA
Security Rule sections §164.306, §164-308, and §164-312, shows some items
relevant to auditing and logging. Specifically, section §164.312 (b) “Audit
Retention” covers audit, logging, monitoring controls for systems that contain
patient information. Similarly, Gramm-Leach Bliley Act (GLBA) section 501 and
others have items that indirectly address the collection and review of audit logs.
Centralized logging of security events across a variety of devices, analysis,
reporting, risk analysis all provide information to demonstrate the presence and
effectiveness of the security controls implemented by the organizations and help
identify, reduce the impact, and remedy a variety of security breaches in the
organization. The important of logs for regulatory compliance will only grow as
standards (such as ISO17799) become the foundations of new regulations.

Common mistakes of log analysis

We covered the need to collect logs and review them via a carefully planned
program. However, when planning and implementing log collection and analysis
infrastructure, the organizations often discover that they aren't realizing the full
promise of such a system. This happens due to some common log-analysis
mistakes. We cover such typical mistakes organizations make when analyzing
audit logs and other security-related records produced by security infrastructure
components.

No. 1: Not looking at the logs

Let's start with an obvious but critical one. While collecting and storing logs is
important, it's only a means to an end -- knowing what 's going on in your
environment and responding to it. Thus, once technology is in place and logs are
collected, there needs to be a process of ongoing monitoring and review that
hooks into actions and possible escalation.

It's worthwhile to note that some organizations take a half-step in the right
direction: They review logs only after a major incident. This gives them the
reactive benefit of log analysis but fails to realize the proactive one -- knowing
when bad stuff is about to happen.

Looking at logs proactively helps organizations better realize the value of their
security infrastructures. For example, many complain that their network intrusion-
detection systems (NIDS) don't give them their money's worth. A big reason for
that is that such systems often produce false alarms, which leads to decreased
reliability of their output and an inability to act on it. Comprehensive correlation of
NIDS logs with other records such as firewalls logs and server audit trails as well
as vulnerability and network service information about the target allow companies
to "make NIDS perform" and gain new detection capabilities.

Some organizations also have to look at log files and audit tracks due to
regulatory pressure.

No. 2: Storing logs for too short a time

This makes the security team think they have all the logs needed for monitoring
and investigation (while saving money on storage hardware) and then leading to
the horrible realization after the incident that all logs are gone due to its retention
policy. The incident is often discovered a long time after the crime or abuse has
been committed.

If cost is critical, the solution is to split the retention into two parts: short-term
online storage and long-term off-line storage. For example, archiving old logs on
tape allows for cost-effective off-line storage, while still enabling future analysis.

No. 3: Not normalizing logs

What do we mean by "normalization"? It means we can convert the logs into a


universal format, containing all the details of the original message but also
allowing us to compare and correlate different log data sources such as Unix and
Windows logs. Across different application and security solutions, log format
confusion reigns: some prefer Simple Network Management Protocol, others
favor classic Unix syslog. Proprietary methods are also common.

Lack of a standard logging format leads to companies needing different expertise


to analyze the logs. Not all skilled Unix administrators who understand syslog
format will be able to make sense out of an obscure Windows event log record,
and vice versa.

The situation is even worse with security systems, because people commonly
have experience with a limited number of systems and thus will be lost in the log
pile spewed out by a different device. As a result, a common format that can
encompass all the possible messages from security-related devices is essential
for analysis, correlation and, ultimately, for decision-making.

No. 4: Failing to prioritize log records

Assuming that logs are collected, stored for a sufficiently long time and
normalized, what else lurks in the muddy sea of log analysis? The logs are there,
but where do we start? Should we go for a high-level summary, look at most
recent events or something else? The fourth error is not prioritizing log records.
Some system analysts may get overwhelmed and give up after trying to chew a
king-size chunk of log data without getting any real sense of priority.

Thus, effective prioritization starts from defining a strategy. Answering questions


such as "What do we care about most?" "Has this attack succeeded?" and "Has
this ever happened before?" helps to formulate it. Consider these questions to
help you get started on a prioritization strategy that will ease the burden of
gigabytes of log data, collected every day.

No. 5: Looking for only the bad stuff

Even the most advanced and security-conscious organizations can sometimes


get tripped up by this pitfall. It's sneaky and insidious and can severely reduce
the value of a log-analysis project. It occurs when an organization is only looking
at what it knows is bad.

Indeed, a vast majority of open-source tools and some commercial ones are set
up to filter and look for bad log lines, attack signatures and critical events, among
other things. For example, Swatch is a classic free log-analysis tool that's
powerful, but only at one thing -- looking for defined bad things in log files.

However, to fully realize the value of log data, it needs to be taken to the next
level -- to log mining. In this step, you can discover things of interest in log files
without having any preconceived notion of what you need to find. Some
examples include compromised or infected systems, novel attacks, insider abuse
and intellectual property theft.

It sounds obvious: How can we be sure we know of all the possible malicious
behavior in advance? One option is to list all the known good things and then
look for the rest. It sounds like a solution, but such a task is not only onerous, but
also thankless. It's usually even harder to list all the good things than it is to list
all the bad things that might happen on a system or network. So many different
events occur that weeding out attack traces just by listing all the possibilities is
ineffective.

A more intelligent approach is needed. Some of the data mining (also called
"knowledge discovery in databases") and visualization methods actually work on
log data with great success. They allow organizations to look for real anomalies
in log data, beyond "known bad" and "not known good."

Avoiding these mistakes will take your log-analysis program to the next level and
enhance the value of your company's security and logging infrastructures.

To conclude, logs might be the untapped treasures of security, allowing the


organizations to gain security benefits using an existing security infrastructure.
TO realize them however, the log collection and review program needs to be
carefully planned and common mistakes needs to be avoided.

ABOUT THE AUTHOR:

This is an updated author bio, added to the paper at the time of reposting in
2009.

Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in


the field of log management and PCI DSS compliance. He is an author of books
"Security Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy
II", "Information Security Management Handbook" and others. Anton has
published dozens of papers on log management, correlation, data analysis, PCI
DSS, security management (see list www.info-secure.org) . His blog
http://www.securitywarrior.org is one of the most popular in the industry.

In addition, Anton teaches classes and presents at many security conferences


across the world; he recently addressed audiences in United States, UK,
Singapore, Spain, Russia and other countries. He works on emerging security
standards and serves on the advisory boards of several security start-ups.

Currently, Anton is developing his security consulting practice, focusing on


logging and PCI DSS compliance for security vendors and Fortune 500
organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance
Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging
Evangelist, tasked with educating the world about the importance of logging for
security, compliance and operations. Before LogLogic, Anton was employed by a
security vendor in a strategic product management role. Anton earned his Ph.D.
degree from Stony Brook University.

You might also like