You are on page 1of 8

Honeypot Essentials

Anton Chuvakin, Ph.D., GCIA, GCIH

WRITTEN: 2003

DISCLAIMER:
Security is a rapidly changing field of human endeavor. Threats we face literally change every day;
moreover, many security professionals consider the rate of change to be accelerating. On top of
that, to be able to stay in touch with such ever-changing reality, one has to evolve with the space as
well. Thus, even though I hope that this document will be useful for to my readers, please keep in
mind that is was possibly written years ago. Also, keep in mind that some of the URL might have
gone 404, please Google around.

Summary:

The paper covers honeypot (and honeynet) basics and definitions and then outlines
important implementation and setup guidelines. It also describes some of the security
lessons a company can derive from running a honeypot, based on the author experience
running a research honeypot. The article also provides insights on techniques of the
attackers and concludes with considerations useful for answering the question “Should
your organization deploy a honeynet?”

I.

While known to security processionals for a long time, honeypots recently became a hot
topic in information security. However, the amount of technical information available on
their setup, configuration, and maintenance is still sparse as are qualified people able to
run them. In addition, higher-level guidelines (such as need and business case
determination) are similarly absent.

This paper will cover some of the honeypot (and honeynet) basics and definitions and
then will outline important implementation issues. It will also cover security lessons a
company can derive from running a research honeypot.

What is a honeypot? Lance Spitzner, a founder of Honeynet Project


(http://www.honeynet.org) defines a honeypot as "a security resource who's value lies in
being probed, attacked or compromised". The Project differentiates between research and
production honeypots. The former are focused on gaining intelligence information about
attackers and their technologies and methods, while the latter are aimed at decreasing the
risk to a company IT resources and providing advance warning about the incoming attacks
on the network infrastructure. Honeypots of any kind are hard to classify using the
"prevention - detection - response" metaphor, but it is hoped that after reading this paper
their value will become clearer to readers.

This paper will focus on operating a research honeypot or a "honeynet". The term
"honeynet", used in this article, originated in the Honeynet Project and means a network of
systems with fairly standard configurations connected to the Internet. The only difference
of such network from a regular production network is that all communication is recorded
and analyzed, and no attacks targeted at third parties can escape the network.
Sometimes, the system software is slightly modified to help deal with encrypted
communication, often used by attackers. The systems are never "weakened" for easier
hacking, but are often deployed in default configurations with minimum of security
patches. They might or might not have known security holes. Honeynet Project defines
such honeypots as "high-interaction" honeypots, meaning that attackers interact with a
deception system exactly as they would with a real victim machine. On the other hand,
various honeypot and deception daemons are "low-interaction", since they only provide an
illusion to an attacker, and the one that can hold their attention for a short time only. Such
honeypots has value as an early attack indicator, but do not yield in-depth information
about the attackers.

Research honeypots are set up with no extra effort to lure attackers - blackhats locate and
exploit systems on their own. It happens due to the widespread use of automatic hacking
tools, such as fast multiple vulnerability scanners and automatic penetration scripts. For
example, an attacker from our honeynet has attempted to scan 200,000 systems for a
single FTP vulnerability in one night using such tools. Research honeypots are also
unlikely to be used for prosecuting intruders, however researchers are known to track
hacker activities using various covert techniques for a long time after the intruder broke
into their honeypot. In addition, prosecution based on honeypot evidence has never been
tested in the court of law. It is still wise to involve a company legal team before setting up
such hacker study project.

Overall, the honeypot is the best tool for looking into the malicious hacker activity. The
reason for that is simple: all communication to and from the honeynet is malicious by
definition. No data filtering, no false positives and no false negatives (the latter only if the
data analysis is adequate) are obscuring the picture. Watching the honeypot provides
insight on intruders' personalities and can be used to profile attackers. For example, in
recent past the majority of penetrated Linux honeypots are hacked by Romanian
attackers.

What are some of the common sense prerequisites for running a honeynet? First,
honeypot is a sophisticated security project, and it makes sense to take care of security
basics first. If your firewall crashes or your intrusion detection system misses attacks, you
are clearly not yet ready for a honeypot deployment. Running a honeypot also requires
advanced knowledge in computer security. After running honeynet for my employer, a
member of Honeynet Research Alliance, I can state that operating a honeynet presents an
ultimate challenge a security professional can face. The reason is simple: no "lock it down
and maintain secure state" model is possible for such a deception network. It requires an
in-depth expertise in many security technologies and beyond.

Some of the technical requirements follow. Apparently, the honeypot systems should not
be allowed to attack other systems or, at least, such ability should be minimized. This
requirement often conflicts with a desire to create a more realistic environment for
malicious hackers to "feel at home" so that they manifest a full spectrum of their behavior.
Related to the above is a need for a proper separation of research honey network from a
company production machines. In addition to protecting innocent third parties, similar
measures should be utilized to prevent attacks against your own systems from your
honeypot. Honeypot systems should also have reliable out-of-band management. The
main reason for having this capability is to be able to quickly cut off the network access to
and from the honeypot in case of emergency (and they do happen!) even if the main
network connection is saturated by an attack. That sounds contradictory with the above
statement about preventing outgoing attacks, but Murphy Law might play a trick or two
and “human errors” can never be totally excluded.

Honeynet Research Alliance (http://www.honeynet.org/alliance/) has guidelines on Data


Control and Data Capture for the deployed honeynet. They distill the above ideas and
guidelines into a well-written document "Honeynet Definitions, Requirements, and
Standards" (http://www.honeynet.org/alliance/requirements.html) . The document
establishes some "rules of the game", which has a direct influence on honeynet firewall
rule sets and IDS policies.

Data Control is a capability required to control the network traffic flow in and out of the
honeynet in order to contain the blackhat actions within the defined policy. For example,
rules such as 'no outgoing connections', 'limited number of outgoing connection per time
unit', ‘only specific protocols and/or locations for outgoing connections’, 'limited bandwidth
of outgoing connections', 'attack string filtering in outgoing connections' or their
combination can be used on a honeynet. Data Control functionality should be
multilayered, allow for manual and automatic intervention (such as remote disabling of the
honeypot) and should make every effort to protect innocent third parties from becoming
victims of attacks launched from the honeynet (and launched they will be!).

Data Capture defines the information that should be captured on the honeypot system for
future analysis, data retention policies and standardized data formats which facilitate
information sharing between the honeynets and cross-honeynet data processing. Cross-
honeypot correlation is an extremely promising area of future research, since it allows for
a creation of an early warning system about new exploits and attacks. Data Capture also
covers the proper separation of honeypots from production networks to protect the attack
data from being contaminated by the regular network traffic. Another important aspect of
Data Capture is timely documentation of attacks and other incidents occurring in the
honeypot. It is crucial for research to have a well-written log of malicious activities and
configuration changes performed on the honeypot system.
II.

Lets turn to practical aspects of running a honeynet. Our example setup consists of three
hosts (see diagram): a victim host, a firewall and an IDS. This is the simplest
configuration to maintain, however, a workable honeynet can even be set up on a single
machine if virtual environment (such as VMware or UML-Linux) is used. Combining IDS
and firewall functionality by using a gateway IDS (such as “snort-inline”) allows one to
reduce the requirement to just two machines. A gateway IDS is a host with two network
cards that analyzes the traffic passing through it and can make packet-forwarding
decisions (like a firewall) and send alerts based on network packet contents (like an IDS).
Currently, the honeynet uses Linux on all systems, but various other UNIX flavors will be
deployed as "victim" servers by the time this paper is published. Linux machines in default
configurations are hacked often enough to provide a steady stream of data on blackhat
activity. "Root"-level system penetration within hours of being deployed is not unheard of.
UNIX also provides a safe choice for a victim system OS due to its higher transparency
and ease of reproducing a given configuration.

The honeypot is run on a separate network connection – always a good idea since the
deception systems should not be seen as owned by your organization. Firewall (hardened
Linux "iptables" stateful firewall) allows and logs all the inbound connections to the
honeypot machines and limits the outgoing traffic depending upon the protocol (with full
logging as well). It also blocks all IP spoofing attempts and fragmented packets, often
used to conceal the source of a connection or launch a denial-of-service attack. Firewall
also protects the analysis network from attacks originating from the honeypot. In fact, in
the above setup, an attacker has to pierce two firewalls to get to the analysis network. IDS
machine is also firewalled, hardened and runs no services accessible from the untrusted
network. The part of the rule set relevant to protecting the analysis network is very simple:
no connections are allowed from the untrusted LAN to an analysis network. IDS (Snort
from www.snort.org) records all network traffic to a database and a binary traffic file via a
stealth IP-less interface and also sends alerts on all known attacks detected by its wide
signature base (approximately 1650 signatures as of July 2002). In addition, specially
designed software is used to monitor the intruder's keystrokes and covertly send them to a
monitoring station.

All data capture and data control functionality is duplicated as per Honeynet Project
requirements. 'Tcpdump' tool is used as the secondary data capture facility, bandwidth-
limiting device serves as the second layer of data control and the stealth kernel-level key
logger backs up the keystroke recording. Numerous automated monitoring tools, some
custom-designed for the environment, are watching the honeypot network for alerts and
suspicious traffic patterns.

Data analysis is crucial for the honeypot environment. The evidence, in the form of
system, firewall and IDS log files, IDS alerts, keystroke captures and full traffic captures, is
generated in overwhelming amount. Events are correlated and suspicious ones are
analyzed using the full packet dumps. It is highly recommended to synchronize the time
via Network Time Protocol on all the honeypot servers for more reliable data correlation.
SIM software can be used to enable advanced data correlation and analysis, as well
logging the compromises using the Incident Resolution Management system. Unlike in the
production environment, having traffic data available in the honeypot is extremely helpful.
It also allows for reliable recognition of new attacks. For example, a Solaris attack on
"dtspcd" daemon (TCP port 6112) was first captured in one of the Project's honeypots and
then reported to CERT. Several new attacks against Linux samba1 servers were also
detected recently

The above setup has gone through many system compromises, several massive
outbound denial-of-service attacks (all blocked by the firewall!), major system vulnerability
scanning, serving as an Internet Relay Chat server for Romanian hackers and other
exciting stuff. It passed with flying colors through all the above "adventures" and can be
recommended for deployment.

III.

What insights have we gained about the attacking side from running the honeynet? It is
true that most of the attackers "caught" in such honeynets are "script kiddies" i.e. the less
enlightened part of hacker community. While famous early honeypot stories (such as
those described in Bill Cheswick's "An Evening with Berferd" and Cliff Stolls' "Cuckoo's
Nest") dealt with advanced attackers, most of your honeypot experience will probably be
related to "scipt kiddies". Opposite to common wisdom, companies do have something to
fear from the script kiddies. The number of scans and attacks aimed by the attackers at
Internet-facing networks ensures that any minor mistake in network security configuration
will be discovered fairly soon. Every unsecured server, running a popular operating
system (such as Solaris, Linux or Windows) will be taken over fairly soon. Default
configurations and bugs in services (UNIX/Linux ssh, bind, ftpd and now even Apache
web server and Windows IIS are primary examples) are the reason. We have captured
and analyzed multiple attack tools using the above flaws. For example, fully automated
scanner that looks for 25 common UNIX vulnerabilities, runs hundreds of attack threads
simultaneously and deploys a rootkit upon the system compromise is one such tool. The
software can be set to choose a random A class (16 million hosts) and first scan it for a
particular network service. Then, on second pass the program collects FTP banners (such
as "ftp.example.com FTP server (Version wu-2.6.1-16) ready.") for target selection. On
third pass, the servers, that had misfortune of running a particular vulnerable version of
the FTP daemon, are attacked, exploited and backdoored for convenience. The owner of
such tool can return in the morning to pick up a list of IP addresses that he now "owns"
(meaning, has privileged access to).

In addition, malicious attackers are known to compile Internet-wide databases of available


network services complete with their versions so that the hosts can be compromised
quickly after the new software flaw is discovered. In fact, there is always a race between
various groups to take over more systems. This advantage can come handy in case of a
local denial-of-service war. While "our" attackers have not tried to draft the honeypot in
their army of "zombie" bots, they did use it to launch old-fashioned point-to-point denial of
1
Samba is a Linux/UNIX implementation of a Microsoft Server Message Block (SMB) protocol
service attacks (such as UDP and ping floods and even the ancient modem hang-up ATH
DoS).

The attacker's behavior seemed to indicate that they are used to operate with no
resistance. One attacker's first action was changing the 'root' password on the system -
clearly, an action that will be noticed the next time system admin tries to log in. Not a
single attacker bothered to check for the presence of Tripwire integrity checking system,
which is included by default in many Linux distributions. On the next Tripwire run, all the
"hidden" files are easily discovered. One more attacker has created a directory for himself
as "/his-hacker-handle", something that every system admin worth his/her salt will see at
once. The rootkits (i.e. hacker toolkits to maintain access to a system that include
backdoors, Trojans and common attack tools) now reach megabyte sizes, and feature
graphical installation interfaces, suitable for novice blackhats. Research indicates that
some of the script kiddies "own" networks consisting of hundreds of machines that can be
used for DoS or other malicious purposes.

The exposed UNIX system is most often scanned for ports 111 (RPC services), 139
(SMB), 443 (OpenSSL) and 21 (FTP). Recent (2001-2003) remote "root" bugs in those
services account for this phenomenon. The system with vulnerable Apache with SSL is
compromised within several days.

Another benefit of running a honeypot is a better handle on the Internet noise. Clearly,
security professionals who run Internet-exposed networks are well aware of the common
Internet noise (such as CodeRed, SQL, MSRPC worms, warez site FTP scans, etc).
Honeypot allows one to observe the minor oscillations of such noise. Sometimes, such
changes are meaningful. In the recent case of MS SQL worm, we detected a sharp
increase of TCP port 1433 access attempts, just before the news of the worm became
public. The same spike was seen when the RPC worms were released. The number of
hits was similar to a well-researched CodeRed growth pattern. Thus, we concluded that a
new worm was out.

Additional value of the honeypot is in its use as a security training platform. Using the
honeypot the company can bring up the level of incident response skills of the security
team. Honeypots incidents can be investigated and then the answers verified by
honeypot's enhanced data collection capabilities. 'What tool was used to attack?' - here it
is, on the captured hard drive or extracted from network traffic. 'What did they want?' -
look at their shell command history and know. One can quickly and effectively develop
network and disk forensics skills, attacker tracking, log analysis, IDS tuning and many
other critical security skills in the controlled but realistic environment of the honeypot.

More advanced research uses of the honeypot include hacker profiling and tracking,
statistical and anomaly analysis of incoming probes, capture of worms and analysis of
malicious code development. By adding some valuable resources (such as e-commerce
systems and billing databases) and using the covert intelligence techniques to lure
attackers in, more sophisticated attackers can be attracted and studied. That will increase
the operating risks.
IV. Abuse of the compromised systems

The more recent OpenSSL incidents are more interesting since the attacker does not
have root upon breaking into the system (such as, user "apache"). One might think that
owning a system with no "root" access is useless, but we usually see active system use in
these cases. Here are some of the things that such non-root attackers do on such
compromised systems:

1."IRC till you drop"

Installing an IRC bot or bouncer is a popular choice of such attackers. Several IRC
channels dedicated entirely for communication of the servers compromised by a particular
group were observed on several occasions. Running an IRC bot does not require
additional privileges.

2."Local exploit bonanza"

Throwing everything they have at the Holy Grail of root access seems common as well.
Often, the attacker will try half a dozen different exploits trying to elevate his privileges
from mere "apache" to "root".

3. "Evil daemon"

A secure shell daemon can be launched by a non-root user on a high numbered port. This
was observed in several cases. In some of these cases, the intruder accepted the fact that
he will not have root. He then started to make his new home on the net more comfortable
by adding a backdoor and some other tools in "hidden" (".. " and other non printable
names are common) directories in /tmp or /var/tmp.

4. "Flood, flood, flood"

While spoofed DoS is more stealthy and harder to trace, many of the classic DoS attacks
do not require root access. For example, ping floods and UDP floods can be initiated by
non-root users. This capability is sometimes abused by the intruders, using the fact that
even when the attack is traced the only found source would be a compromised machine
with no logs present.

5. "More boxes!"

Similar to a root-owning intruder, those with non-root shells may use the compromised
system for vulnerability scanning and widespread exploitation. Many of the scanners, such
as openssl autorooter, recently discovered by us, do not need root to operate, but is still
capable of discovering and exploiting a massive (thousands and more) system within a
short time period. Such large networks can be used for devastating denial of service
attacks (for example, such as recently warned by CERT).

V. Conclusion

As a conclusion, we will try to answer the question: "Should you do it?" The precise
answer depends upon your organization mission and available security expertise. Again,
the emphasis here is on research honeypots and not on "shield" or protection honeypots.
If your organization took care of most routine security concerns, has a developed in-house
security program (since calling an outside consultant to investigate your honeypot incident
does not qualify as a wise investment) and requires a first hand knowledge of attacker
techniques and last minute Internet threats - the answer tends towards a tentative "yes".
Major security vendors and consultancies or Universities with advanced computer security
programs might fall into the category. If you are not happy with your existing security
infrastructure and want to replace or supplement it with the new cutting edge "honeypot
technology" - the answer is a resounding "no". Research honeypots will not *directly*
impact the safety of your organization. Moreover, honeypots have their inherent dangers.
They are analyzed in papers posted on the Honeynet Project site. The dangers include
uncertain liability status, possible hacker retaliation and others.

ABOUT THE AUTHOR:

This is an updated author bio, added to the paper at the time of reposting in 2009.

Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log
management and PCI DSS compliance. He is an author of books "Security Warrior" and "PCI
Compliance" and a contributor to "Know Your Enemy II", "Information Security Management
Handbook" and others. Anton has published dozens of papers on log management, correlation,
data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog
http://www.securitywarrior.org is one of the most popular in the industry.
In addition, Anton teaches classes and presents at many security conferences across the world; he
recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries.
He works on emerging security standards and serves on the advisory boards of several security
start-ups.

Currently, Anton is developing his security consulting practice, focusing on logging and PCI DSS
compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly
a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a
Chief Logging Evangelist, tasked with educating the world about the importance of logging for
security, compliance and operations. Before LogLogic, Anton was employed by a security vendor
in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook
University.

You might also like