Professional Documents
Culture Documents
WRITTEN: 2004
DISCLAIMER:
Security is a rapidly changing field of human endeavor. Threats we face literally
change every day; moreover, many security professionals consider the rate of
change to be accelerating. On top of that, to be able to stay in touch with such
ever-changing reality, one has to evolve with the space as well. Thus, even
though I hope that this document will be useful for to my readers, please keep in
mind that is was possibly written years ago. Also, keep in mind that some of the
URL might have gone 404, please Google around.
A beaten maxim proclaims that “knowledge is power”, but where do we get our
knowledge about IT resources? The richest source of such information is logs
and audit trails. Through logs and alerts (which we treat similarly to logs and
audit trails), information systems often give signs that something is amiss or even
will be amiss soon.
What are some examples of log files and audit trails? We can classify the log
files by the source that produced them, since it usually determines the type of
information contained in the files. For example, host log files, produced by UNIX,
Linux and Windows, are different from network device logs, produced by Cisco,
Nortel, and Lucent routers, switches, and other network gear. Similarly, security
appliance logs, produced by firewalls, intrusion detection system, intrusion
“prevention” systems, are very different from both host and network logs. In fact,
the security devices display a wide diversity in what they log and the format in
which they do it. Ranging in function from simply recording suspicious IP
addresses all the way to full network traffic capture, security devices produce an
amazing wealth of information, both relevant and totally irrelevant to the situation
at hand.
Thus, logs present unique challenges. Some of the questions that we ask are:
Similarly, newer Window versions also provide extensive system logging. It uses
a proprietary binary format to record three types of log files: system, application,
and security. For example, the system log contains various records related to the
normal - and not so normal - operation of the computer
In many cases, the log files don’t just give the clear answers that need to be
extracted – sometimes forcefully - from them. This is accomplished by performing
“log analysis”. Log analysis is the science and art of extracting answers from
computer-generated audit records. Often, even seemingly straightforward logs
need analysis and correlation with other information sources. Correlation means
the manual or automated process of establishing relationships between
seemingly unrelated events happening on the network. Events that happen on
different machines at different times could have some sort of relationship,
relevant to the situation. Such relationships need to be discovered and disclosed.
Why analyze the logs? The answer is different for different environment. For
example, for a home or small office (SoHo) computer system logs are only useful
in the case of major system trouble (such as hardware or operating system
failures) or security breaches which are easier to prevent since you only have to
watch a single system or a small number of systems. Often, your time is better
being spent reinstalling your Windows operating system and keeping up with
patches and updates. Poring over logs for signs of potential intrusions is not
advisable for most users, with the possible exception of hard core log analysis
addicts. Only the minimum amount of logging should thus be enabled and the
analysis boils down to firing up Windows event logger after something wrong
occurs.
Next, let us consider a small to medium business with no full-time security staff.
In this sense, it is similar to a home system, with a few important differences.
This environment often has people who astonish security professionals with
comments such as "Why would somebody want to hack us, we have nothing that
they need?" Now more and more people understand that disk storage, processor
cycles, and high-speed network connections have a lot of value for attackers.
Log analysis for such an organization focuses on discovering, detecting and
dealing with high-severity threats. While it is well known that many low-severity
threats reflected in logs might be precursors for a more serious attack, a small
company rarely has the resource to investigate them.
A large corporation is regulated by more administrative requirements than the life
of an individual. Among these are the responsibility to shareholders, fear of
litigation and other liability. Due to the above, the level of security and
accountability is higher. Most organizations connected to the Internet now have
at least a firewall and some sort of a dedicated network for public servers
exposed to the Internet. Many also have deployed spam filters, intrusion
detection systems (IDS), intrusion prevention systems (IPS) and Virtual Private
Networks (VPNs) and are looking at more novel solutions such as anti-spyware.
All these technologies raise concerns about what to do with logs coming from
them, as companies rarely hire new security staff just to handle the logs. In such
an environment, log analysis is of crucial importance. The logs present one of the
best ways of detecting the threats flowing from the hostile Internet as well as
from the inside of their networks.
Overall, do you have to do log analysis? The answer to this question ranges from
a “not likely” for a small business to an unquestionable “Yes!!!” for a larger
organization.
Imagine you work for one of those companies where information security is taken
seriously, senior management support is for granted, the appropriate IT defenses
are deployed and users are educated on the security policy. Firewalls are
humming along, intrusion detection systems are installed and incident response
team is ready for action. This will probably go a long way towards creating a
more secure enterprise computing environment. Lets look at it from the
prevention- detection- response model. The above solutions provide the
technical side of the prevention, detection and response. The complex interplay
between prevention detection and response is further complicated by the
continuous decision making process: 'what to respond to?', 'how to prevent an
event?', etc. Such decisions are based on the information provided by the
security infrastructure components. Paradoxically, the more security devices one
deploys, the more firewalls are blocking messages and generating logs, the more
detection systems are sending alerts, the more messages the servers spew, the
harder it is to make the right decisions about how to react. Logs from all of the
above devices need to be consistently and diligently analyzed to arrive at the
right security decisions.
What are the common options for optimizing the security decisions made by the
company executives? The security information flow need to be converted from
logs and alerts into a decision. The attempts to create a fully automated solution
for making such a decision, some even based on artificial intelligence, have not
yet reached a commercially-viable stage. The problem is thus to create a system
to reduce the information flow sufficiently and then to provide some guidance to
the system's human operators in order to make the right security decision.
Is there a chance that that the first approach - deploying and leaving
the security infrastructure unsupervised with no log review- have a business
justification anywhere outside of a very small environment such as described
above? Indeed, some people do drive their cars without a mandatory car
insurance, but companies are unlikely to be moved by the same reasons that
motivate the reckless drivers. Most of the readers have probably heard 'Having a
firewall does not provide 100% security' many times. In fact, it is often stated that
0-day (i.e. previously unknown) exploits and new vulnerabilities are less of a
threat to security, than the company employees. Technology solutions are rarely
effective against social and human problems. Advanced firewalls can probably
be made to mitigate the threat from new exploits, but not from the firewall
administrators' mistakes and deliberate tampering from the inside of the
protected perimeter. In addition, total lack of feedback and awareness on security
technology performance, coming from log collection and review program, will
prevent a company from taking a proactive stance against new threats and
adjusting its defenses against the flood of attacks hitting its bastions.
The next possibility is where no consistent log review program is present but
some employees are dedicated to the task. Does relying on human experts to
understand your log information and to provide effective response guidelines
based on the gathered evidence constitutes a viable alternative to doing nothing?
Two approaches to the problem are common. First, a security professional can
study the logs only after the security incident. Careful examination of log
evidence collected by various security devices will certainly shed the light on the
incident and will likely help to prevent the recurrence and further loss. However,
in case where extensive damage is done, it is already too late and prevention of
future incidents of the same kind will not return the stolen intellectual property or
allay the disappointed business partners. Expert response after-the-fact has a
good chance to be delayed in the age of fast automated attack tools. The second
option is to review the accumulated audit trail data periodically, such as on a
daily or weekly basis. A simple calculation is in order. A single border router will
produce several hundred messages per second on a busy network, and so will
the firewall. Adding host messages from hundreds of servers will increase the
flow to possibly thousands per second. Now if one is to scale this to a global
company network infrastructure, the information flow will increase hundredfold.
No human expert or a team will be able to review, let along analyze, the incoming
flood of signals.
In addition, escalating alerts on raw event data (such as 'if you see a specific bad
IDS signature, send me an email') will quickly turn into the "boy that cried wolf"
story with pagers screaming for attention and not getting it. In light of the above
problems with prioritization, simply alerting on "high-priority" events is not a
solution. Indeed, IDS systems can be tuned to provide less alerts, but to
effectively tune the system one needs access to the whole feedback provided by
the security infrastructure and not just to raw IDS logs. For example, outside and
inside firewall logs are very useful for tuning the IDS deployed in the DMZ.
Overall, it appears that simply investing in more and more security devices
without a consistent program to analyze and review their logs will not create
more security. One needs to keep in close touch with the deployed devices, and
the only way to do it is by using special-purpose automated tools to analyze all
the information they produce and to draw meaningful conclusions aimed to
optimize the effectiveness of the IT defenses. While having internal staff writes
code to help accumulate data and map it might be acceptable in immediate term
situations in small environments, the maintenance, scalability and continued
justification for such systems likely has a very low ROI. In fact, it has caused the
birth of Security Information Management (SIM) products that have, as their only
focus, the collection and correlation of this data as well as the creation of
executive-level metrics from logs.
Logs are also immensely valuable for compliance programs. Many recent US
regulations such as HIPAA, GLBA, Sarbanes-Oxley and many others have items
related to audit logging and handling of those logs. For example, a detailed
analysis of the security requirements and specifications outlined in the HIPAA
Security Rule sections §164.306, §164-308, and §164-312, shows some items
relevant to auditing and logging. Specifically, section §164.312 (b) “Audit
Retention” covers audit, logging, monitoring controls for systems that contain
patient information. Similarly, Gramm-Leach Bliley Act (GLBA) section 501 and
others have items that indirectly address the collection and review of audit logs.
Centralized logging of security events across a variety of devices, analysis,
reporting, risk analysis all provide information to demonstrate the presence and
effectiveness of the security controls implemented by the organizations and help
identify, reduce the impact, and remedy a variety of security breaches in the
organization. The important of logs for regulatory compliance will only grow as
standards (such as ISO17799) become the foundations of new regulations.
We covered the need to collect logs and review them via a carefully planned
program. However, when planning and implementing log collection and analysis
infrastructure, the organizations often discover that they aren't realizing the full
promise of such a system. This happens due to some common log-analysis
mistakes. We cover such typical mistakes organizations make when analyzing
audit logs and other security-related records produced by security infrastructure
components.
Let's start with an obvious but critical one. While collecting and storing logs is
important, it's only a means to an end -- knowing what 's going on in your
environment and responding to it. Thus, once technology is in place and logs are
collected, there needs to be a process of ongoing monitoring and review that
hooks into actions and possible escalation.
It's worthwhile to note that some organizations take a half-step in the right
direction: They review logs only after a major incident. This gives them the
reactive benefit of log analysis but fails to realize the proactive one -- knowing
when bad stuff is about to happen.
Looking at logs proactively helps organizations better realize the value of their
security infrastructures. For example, many complain that their network intrusion-
detection systems (NIDS) don't give them their money's worth. A big reason for
that is that such systems often produce false alarms, which leads to decreased
reliability of their output and an inability to act on it. Comprehensive correlation of
NIDS logs with other records such as firewalls logs and server audit trails as well
as vulnerability and network service information about the target allow companies
to "make NIDS perform" and gain new detection capabilities.
Some organizations also have to look at log files and audit tracks due to
regulatory pressure.
This makes the security team think they have all the logs needed for monitoring
and investigation (while saving money on storage hardware) and then leading to
the horrible realization after the incident that all logs are gone due to its retention
policy. The incident is often discovered a long time after the crime or abuse has
been committed.
If cost is critical, the solution is to split the retention into two parts: short-term
online storage and long-term off-line storage. For example, archiving old logs on
tape allows for cost-effective off-line storage, while still enabling future analysis.
The situation is even worse with security systems, because people commonly
have experience with a limited number of systems and thus will be lost in the log
pile spewed out by a different device. As a result, a common format that can
encompass all the possible messages from security-related devices is essential
for analysis, correlation and, ultimately, for decision-making.
Assuming that logs are collected, stored for a sufficiently long time and
normalized, what else lurks in the muddy sea of log analysis? The logs are there,
but where do we start? Should we go for a high-level summary, look at most
recent events or something else? The fourth error is not prioritizing log records.
Some system analysts may get overwhelmed and give up after trying to chew a
king-size chunk of log data without getting any real sense of priority.
Indeed, a vast majority of open-source tools and some commercial ones are set
up to filter and look for bad log lines, attack signatures and critical events, among
other things. For example, Swatch is a classic free log-analysis tool that's
powerful, but only at one thing -- looking for defined bad things in log files.
However, to fully realize the value of log data, it needs to be taken to the next
level -- to log mining. In this step, you can discover things of interest in log files
without having any preconceived notion of what you need to find. Some
examples include compromised or infected systems, novel attacks, insider abuse
and intellectual property theft.
It sounds obvious: How can we be sure we know of all the possible malicious
behavior in advance? One option is to list all the known good things and then
look for the rest. It sounds like a solution, but such a task is not only onerous, but
also thankless. It's usually even harder to list all the good things than it is to list
all the bad things that might happen on a system or network. So many different
events occur that weeding out attack traces just by listing all the possibilities is
ineffective.
A more intelligent approach is needed. Some of the data mining (also called
"knowledge discovery in databases") and visualization methods actually work on
log data with great success. They allow organizations to look for real anomalies
in log data, beyond "known bad" and "not known good."
Avoiding these mistakes will take your log-analysis program to the next level and
enhance the value of your company's security and logging infrastructures.
This is an updated author bio, added to the paper at the time of reposting in
2009.