Cross Platform Security Analysis

Anton Chuvakin, Ph.D., GCIA ( WRITTEN : 2003 DISCLAIMER: Security is a rapidly changing field of human endeavor. Threats we face literally change every day; moreover, many security professionals consider the rate of change to be accelerating. On top of that, to be able to stay in touch with such ever-changing reality, one has to evolve with the space as well. Thus, even though I hope that this document will be useful for to my readers, please keep in mind that is was possibly written years ago. Also, keep in mind that some of the URL might have gone 404, please Google around.

In spite of recent economic turmoil, the last several years brought a much wider adoption of new security technologies, such as firewalls, intrusion detection system, anti-virus solutions and others. Almost every Internet-connected organization now has a firewall included as part of its network infrastructure; most Windows shops have an anti-virus solution (often both on the email gateway and on the end-user desktops); and Intrusion Detection Systems (IDSs) are slowly but surely gaining wider acceptance. The trend is further enhanced by the growing popularity of so-called "appliance" based security systems.

Today's "normal" security routine is thus one where the firewalls hum and protect our perimeters, A/V systems block malicious code, and IDSs sniff out intrusions. Some classes of attacks, however, demand (re)actions from a security team, and we generally refer to these as cases of incident response.

To launch the incident response process into action, a decision to do

so has to be made. Security management can be viewed as continuous decision making process: 'What to respond to?, 'How to prevent a loss?', 'What to do with the event?', etc. Such decisions are based on the information provided by the security infrastructure components. Paradoxically, the more security devices one deploys, the harder it is to make the right decisions about how and when to react.

In order to make this decision easier and more effective, one needs to convert the vast body of diverse audit data reported by security devices into knowledge. Most audit data contains information on "atomic events", i.e. information related to a single communication instance from a security device. Correlating all the events reported by a single device to understand the nature of an attack, or to simply distinguish an attack from normal and "first-time" behavior is hard; correlating events reported by dozens of security systems almost seems exponentially harder, as these may reveal unrelated yet simultaneous attacks, or an attack of significant reach or scope.

What are the commonly utilized scenarios in use to do this? They can be loosely categorized into (1) doing nothing with the data; (2) manual data reduction, and (3) in-house automation tools, such as scripts and utilities aimed at processing the information flow. Let us briefly look at advantages and disadvantages of the above methods.

First, is there any merit to the approach of ignoring all audit data generated by security systems? Imagine a retail store installing video surveillance monitors but never reviewing tape recordings to identify

shoplifting incidents. This familiar example illustrates that few security devices operate well in "install-and-forget" mode. Admittedly, some security devices, such as a properly deployed firewall, are deterrents, valuable to a company even if no one is reviewing the logs. IDS, however, is clearly a different story. Having nobody watching the IDS is as effective as having no IDS at all. If your "intrusion detection" method is to wait for customers to alert you of your network security problems, the business will clearly suffer.

Second, does relying on human experts to interpret your security information and to provide effective responses based on the data gathered constitute a viable alternative to doing nothing? Two approaches to the problem are possible. First, a security professional can study the evidence after the security incident. Careful examination of audit data collected by various security systems will certainly shed the light on the incident and will likely help to prevent the recurrence. However, in cases where extensive damage has been done, a "post mortem" investigation is too late: information critical to identifying the exact nature of the attack, and useful for preventing future incidents of the same kind, is lost. Moreover, no amount of investigation can recover confidential or intellectual property disclosed, or appease angry business partners.

The second approach is to review the accumulated audit trail data daily or weekly. Can this scale? A simple calculation is in order. A single border router will produce dozens of messages per second on a

busy network, and so will the firewall. Adding host event auditing messages from many servers will increase the flow to possibly many dozens per second. Now if one tries to scale this to an average company network infrastructure, the information flow will increase one hundredfold. No human expert or team will be able to perform an "eyeball" review, let along analyze, the incoming flood of signals.

What about a simple alert analysis automation? Writing a script or a program to call a security expert's attention to the significant events looks like a reasonable thing to do. Such programs may help with data collection (centralized syslog server or a database) and alarm processing (email, pager, voice mail). However, a series of important questions arises. Collected data will greatly help with an incident investigation, but what about the timeliness of the response? Separating meaningful events from mere chaff (the constant stream of low level scanning and probing of an organization's publicly accessible systems) is not a trivial task, especially in a large multi-vendor environment, where audit data records vary in context, content, and even in the classification of events which merit an audit record! Moreover, even devices sold by a single vendor might have various event prioritization schemes, thus making a simple "alert on high priority events" a complicated pursuit. Designing a data reduction and analysis scheme that optimizes a security decision-making process might consume significant time and capital investment and still not reach the set goals due to a lack of the specific analysis expertise. In addition, alerting on raw event data (such as 'if you see a specific IDS signature, send an email') will

quickly turn into the "boy who cried wolf" story with pagers screaming for attention and not getting it.

One possible solution to the data deluge conundrum is Security Information (or Event) Management (SIM, SEM) software. SIM is a process of collecting, analyzing and presenting security event data in a format that facilitates making security decisions. The process goes beyond the simple concept of "collect -> analyze -> present", and attempts to encompass as many events that occur from the time a "suspicious" or "bad" packet undergoes scrutiny to the time when some security manager initiates an appropriate response. For example, collection of security event data is likely to occur at many security systems. The analysis component is likely to include

* correlation (i.e. discovering and highlighting relationships between seemingly unrelated events);

* trending (long term time-based statistics);

* risk assessment (computation of a variable risk score for each asset); and

* basic anomaly detection (such as host profiling).

Finally, data presentation is likely to include a variety of graphical displays - diagrams, text-based summary reports and raw event display. In addition to "static" report data, the SIM solution can present data in near "real-time" such as in the form of a running graph, animated network topology or some other way.

Some SIM functionality enhances the above core features. It includes alert notification with filtering, advanced visualization (such as dynamic 3D pictures), management-specific reporting, background and timed report generation, incident workflow management and even live intrusion countermeasures, such as reconfiguring the firewalls and routers.

It is worth noting that one of the most over-hyped components of SIM is correlation. Everyone has a unique definition for correlation: some are contradictory, but all make for lively discussions at conferences. The simplest way to understand what correlation is to use a dictionary definition, e.g., Merriam-Webster's: to correlate is "to establish a mutual or reciprocal relation between, to present or set forth so as to show relationship". In SIM, correlation helps find relationships between events in order to determine "what is really going on".

Let's consider an example. A match against a network IDS signature is triggered upon processing a suspicious packet (at that stage, we do not know whether this has any significance). The IDS software records an alert and stores the packet for further analysis. Abstractly, an IDS alert then passed through an interface from the IDS product to a SIM solution. In product, such an interface might be a standard UNIX syslog, or a proprietary API that uses proprietary protocols and proprietary encryption. Upon receiving the event data, the SIM solution converts the data into its preferred format. Known as

"normalization", this process converts events from different vendor systems into a format suitable for analysis (typically XML-based).

Event analysis often begins at this stage. SIM solutions might categorize the event and relate it to other similar events, coming from other sources. The normalized event is stored on a SIM server, usually in a relational database. The event might also be sent to some notification engine (where is might trigger a page or an email) and presented to user in near "real-time" (provided that it is not dropped by a filter). The event can then be presented in a report, together with other events, used for trending and analysis (as part of such, it might trigger a correlation rule), plotted on a graph, etc.

While the above process sounds simple, there are many challenges. A SIM solution can only act on event it receives and understands. Events received from security devices which use proprietary or poorly documented audit mechanisms are hard to normalize. [[[[What if a secure OS on the firewall appliance does not like other applications looking at the events?-***I don't understand this*** Dave, well, I've seen appliances which does not allow you to install anything on a device (which is normal) AND did not have any way to get the raw events out (only via some vendor web GUI) - that is what is meant, but it can be easily skipped, its not that crucial...]]] What if the device has a high, sustained data rate of events, such as a busy firewall? What if an application sends messages that are not really security event relevant, but, under some unlikely circumstances, useful for a forensic or troubleshooting investigation?

Forensics quality and court survivability of stored data preservation of an electronic chain of custody, for example - present additional challenges. Data collection across massively distributed enterprise is another major obstacle due to all sorts of control and management issues and network bandwidth issues. Security of data collection mechanisms is also a challenge. One has to make sure that the audit data is not compromised in transit to the SIM server. It is worth noting, that most of the above challenges have solutions, implemented by SIM vendors.

Analysis challenges dwarf the above collection-related ones. Here is a brief list.

* Just what to do with all those gigabytes of data, now that we have it stored in one place? The problem is exacerbated by the extreme diversity of collected data. Attack alerts, system and hardware problems, utilization data, various access controlled lists traversals all happily reside in the database, begging to be converted into knowledge. This information should be matched together and possibly tied to the subjects of network communication, such as assets within a company. Analysis processes should focus on answering the question "what is going on" and "what needs to be done, if anything". For example, is there a significant change is some network behavior - then check the trending report. Is that a mail virus spreading again from mail server to mail server - confirm and eradicate.

* What is more important to act on first? Designing a sensible priority scheme is a challenge. Many security device vendors provide their own message severity ratings, but they are not really compatible between each other. For example, high severity message from a host IDS is likely more important than an application high severity message or even a firewall high severity message.

* Is it the same or not? Figuring out whether multiple devices reporting the same event is a challenge. While it might sounds that CVE might help here, the sad truth is that it really does not. CVE is too focused on vulnerabilities, and not all stuff in logs is realted to a known vulnerability. A daemon crash or an unauthorized access is not present in the CVE. Hopefully, MITRE's new and ambitious CIEL project (Common Intrusion Event List) will help rectify the situation. And if we add all sorts of timing issues (such as no time syncronization, different time zoen, etc) here, knowing when two reported events are really the same, becomes next to impossible

Without going deep into the realm of Artificial Intelligence (AI), a SIM solution can only provide partial answers to these and many other questions, arising in the security process.

To conclude, cross device security data analysis is a fun problem to solve and there is a definite need for a solution. The need will only grow as more security devices/applications go online. There are still challenges in how to collect and, more importantly, how to analyze the collected audit data, but existing solutions do provide a lot of value

for companies willing to spend a little time deploying the SIM solution. ABOUT THE AUTHOR: This is an updated author bio, added to the paper at the time of reposting in 2009. Dr. Anton Chuvakin ( is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books "Security Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy II", "Information Security Management Handbook" and others. Anton has published dozens of papers on log management, correlation, data analysis, PCI DSS, security management (see list . His blog is one of the most popular in the industry. In addition, Anton teaches classes and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups. Currently, Anton is developing his security consulting practice, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.

Sign up to vote on this title
UsefulNot useful