You are on page 1of 5

1

1.1

Bibliography
Introduction

Client-side vulnerabilities, down-by attacks, malicious servers, today, we are litterally overwhelmed and the web becomes more unsecured every day. Many tools have been developed to work against that, and the one we decided to deal with in this paper is obvioulsy one of the most important, the client honeypot. A honeypot is a security device that is designed to lure malicious activity to itself. Capturing such malicious activity allows for studying it to understand the operations and motivations of attackers, and subsequently helps to better secure computers and networks. A honeypot does not have any production value. Its a security resource whose value lies in being probed, attacked, or compromised [6]. Since it does not have any production value, any new activities or network trac that comes from the honeypot indicate that a honeypot has been successfully compromised. Theoritically speaking, false positives, as commonly found on traditional intrusion detection systems, do not exist on honeypots, even there are particular cases which will be discuss later. Honeypots can be splitted in two categories, honeypot servers and clients. The purpose of this bibliography is to clarify and understand the ins and outs of the latter which came out recently with projects like HoneyC or Capture-HPC for example. Basically, client honeypots crawl the web to nd and identify web servers that exploit client-side vulnerabilities. Instead of passively awaiting to be attacked, client honeypots actively crawl the web to search for servers that exploit the client as part of the server response. They can belong either to low-interaction Honeypot clients, or to high-interaction Honeypot clients. The idea of client honeypots was rst articulated by Lance Spitzner in June 2004. A few client honeypots have been created since then, namely Honeymonkey, HoneyClient, HoneyC or Capture-HPC. These implementations crawl the web with a real browser and perform analysis for exploit based on the state of the OS (such as monitoring changes to the le system, conguration, and process list) Since these implementations make use of real systems, they are classied as high interaction client honeypots. On the other hand, low-interaction honeypots make use of emulated clients and analysis engine that might make use of an algorithm other than OS state inspection like signature matching.

1.2

Low interaction honeypots

Low-interaction client honeypots have many assets. They can be contained in a stand-alone application so installation is highly simplied. They are also faster than high-interaction client honeypots because they are based on emulated services. We will evaluate this new approach going through three dierent speaking of mainly MonkeySpider but also HoneyC[3] and SpyBy[2]. 1.2.1 Monkey-Spider, a good example.

MonkeySpider, this is a recent project led by Ali Ikinci [1]. As we will see, it appears that MonkeySpider is the more successfully completed. Therefore will take this project as an example to explain how the dierent blocks work together and we will esh it out with HoneyC and SpyBye. This application can be divided in three parts : The Seeder The seeder block generates starting URLs for the crawler, that is what we call the queue. There are three dierent ways to seed: First you can use the search engines (Yahoo, Google or MSN), you just have to precise a keyword like Warez or pirate, then you congure how many URLs you want back. The communication is performed via web services. Second, you can use the URLs out of Spam mails. If you setup a SpamTrap which is an email account established specially to collect spam, you can extract URLs contained in the messages to use them as seeds for your crawl. However MonkeySpider does not examine attached les. Third, you can re-queue the URLs youve already processed thanks to the monitorDB seeder. This feature is used to 1

constantly reseed previously found malicious content over time from the malware database. Some others low-interaction honeyclients oer the possibilities to write in statically the URLs like HoneyC. The Crawling MonkeySpider picked up Heritrix to be their crawler. Heritrix is the Internet Archives open source Web crawler project. It is multi-threaded, very extensible and easily customizable. Heritrix crawls contents optionally through an interconnected Web proxy. The web proxy can be used to increase performance and avoid getting duplicate crawling. Heritrix queues the generated URLs from the seeder and stores the crawled contents on the le server while generating detailed log les. On the other hand SpyBye for example does not oer any crawler. SpyBye is acting when you browse the web before displaying the website you are trying to access. The Analysis engine The analysis process can be execute on a dierent computer. Information are extract form the le server and analyse with dierent anti-virus solutions and malware analysis tools. MonkeySpider can be congure to use Avast, ClamAV, F-prot. Then identied malware and malicious websites are saved in a special directory and related malicious information are also stored in a MySQL database. Finally, binary and Javascript les are copied in an additional archive directory for additional research purposes using CWsandbox. Honeyc analysis engine is based on Snort signature matching and SpyBye allows a web master to determine whether a web site is malicious by a set of heuristics and scanning of content against the ClamAV engine. 1.2.2 Limitation and further work

There are a part of the web completely hidden, what is commonly named the Deep Web. According to Bergman (ref a mettre) The public information on the Hidden Web is 400 to 550 times larger than the publicly indexable Web in 2001. Those pages have not a static URLs, there composed of forms and authentication mechanisms and honeyclients do not process it. We can also mention the obfuscation code which is created dynamically and because lowinteraction honeyclient does not execute the scripts, we can not catch dynamically generated links and content. Finally we have to mention that topsites are crawled more than less-know ones. So this could do a dierence in the result. A further work would be to improve the crawler to avoid browsing always the same websites like (Amazon, Youtube....)

1.3

High interaction honeypots

Contrary to low intereaction honeypots, high interaction honeypot aims to recreate a real environment : real system and applications browse the web like a normal user would do. Change of state of the system are logged and analysed. These solutions are based upon virtualization solutions to easily revert the system into a clean state in case of corruption. The processing time of high interaction honeypot is a matter of concern due to the dierent delay implied by using real systems. [5] High interaction client honeypots scroll the web in order to collect data that generates variation of the host state. Theses variations can refer to les creation, registry modication, changes in the process list of the system. The data collected from these system is more accurate than low interaction honeypot: these systems detects zero day exploits [4], but in the mean time, they are more complex systems that require a lot of administration. The three components architecture previously introduced for the low interaction honeyclient can also be applied to high interaction honeyclient. High interaction honey clients use a queuer, a visitor, and an analysis engine to perform its task. The queuer elects the URL to browse and gives them to the visitor witch in our case is mainly a web browser. The last component monitors state changes and decide either the attack has been successful or not.

1.3.1

Web honeypot

Honeypot like Capture-HPC is a client-server system that permit a global administration of a set of client honeypot. The client implements the navigator and the analysis engine. It uses the VMware server solution to handle reverting the virtual system in case of corruption. The server implements the queuer. It also receives results from the honeyclient system and log them. Results are sent to Capture-HPC server in real time. Capture-HPC is able to monitor multiple instances of Internet Explorer Browser and detects which instances are compromising the system. This process is called the bulk navigation algorithm.[4]. However, this system does not work with Firefox. CaptureHPC also permit to interact with software other than web browser like instant messaging clients, pdf readers and oce suite like Microsoft Oce. Capture-HPC checks for changes in system les, registry and processes. MITRE honeyclient checks for changes in system and application les and also in the registry. It is one of the rst high honey client developed. The honeypot Spycrawler from the university of washington is a proprietary crawler developed in 2005. It focuses on spyware and uses google as an entry point with its search results. The major inconvenient of this solution is that it refers to the AdAw are spyrare database to evaluate a web site. Zero day exploit detection is also the purpose of high interaction client honeypots. HoneyMonkey [7] uses several honeypot client at dierent state of patch to detect this kind of attacks. It implements a bulk and sequential algorithm to perform its analysis : a group of URL is browsed in parallel. If the system is compromised, each URL is checked individually by dierent systems with dierent state of security patch application. That permits to detect zero day exploit. The high interaction honey client Pezzonavante use this system of dierent patch levels operating system. It also uses integrity checks, security tool scans (anti virus and anti spyware), IDS, trac analysis and snapshots comparisons. It is a fast honeyclient because it only do monthly integrity checks but it is known to miss some alerts. There are also other solution like SiteAdvisor from McAfee that uses high interaction honeyclient to get results. 1.3.2 Mail honeypot

The SHELIA aims to provide a mail honeypot. It opens mails via an MUA like Outlook Express and typically scroll the spam folder searching for malicious content and urls that are opened in web browsers. 1.3.3 Performance issue

The main issue while using High interaction honeypots is speed. Usually, scans performed aim to dene if a web server is malicious. Running multiple scans on dierent servers improve speed but does not permit to identify precisely malicious servers. Algorithms like divide and conquer for high interaction honeypots can improve speed up to 72% [5]

1.4

Honeyweb

In this part, we will deal with Honeyweb to introduce the project and see how it could be useful for the Honeyclient users community. Honeyweb is a web plateform based on Java technologies allowing Honeyclient users to manage and run honeypots remotly thanks to the Web. It is also useful to centralize and agregate log les from heterogeneous Honeyclients. Indeed Honeyweb is still in developement and all the functionalities are not implemented yet. But eventualy, thanks to web services, Honeyweb will be able to communicate with dierent Honeyclients, on dierent systems with parallel threads. If we succeed in gathering a large number of Honeyclient users around Honeyweb, we could get better results by crossing them together. Here are some milestones that Honeyweb should reach:

Name Capture-HPC

creation date unknown

monitored entities les, process, registry

specicity client/server architecture that allow parallelling browsing, interaction with multiple browsers (ie, opera, refox) and others programs (adobe reader, openoce) this solution forwards a malicious URL from unpached honeyclient to fully patched honeyclient : it permits to detect zero day exploit. this method include a scoring system that permits to make a hierarchy between scrolled urls It only uses AdAware spyware to detects spyware

Honey Monkey

unknown

les, process, registry

HoneyClient

2004

les, registry

UW Spycrawler

2005

spyware tions

infec-

Table 1: Specicities of high interaction honeyclient First, the number of Honeyclient registered is very important, the more we have the larger amount of URLs we can analize. And as we saw in the previous part, one of the limitation is the Web keeps changing every time. So from a wide crawling of the web we could agregate the logs to determine the history of websites. For example: Figure out when they became malicious?. Second, the number of dierent Honeyclients we have. If we get many dierent types, like highinteraction and low-interaction but also Honeyclients from dierent continents, dierent countries, this would be very useful.Indeed, this heterogeneity allows us to examine deeply each URL, for example we can rstly browse a large range of IP adresses with low-interaction Honeyclients and then we redirect the malicious ones to high-interaction Honeyclients for a deeper analysis.

1.5

conclusion

The World Wide Web is a sytem of interlinked, hypertext documents that runs over the Internet. We assume the whole Wide Web as a set of available content on computer systems with a unique IP address at a xed time. We call our set the web. The rst drawback in every research on this set is that it does change signicantly over time and is not predictable to any database or similar. Therefore we can not dump the web at a particular point in time, so we crawl content and extract as much links as possible from this content and try to follow these as far as useful. Finally we should mention that malicious server can hide themself from Honeyclient client range using blacklist. That is why it is very important to not reveal IP addresses, locations or anything related to honeyclients which could compromise their eciency. Additionally, future malware could use vulnerabilities in utilized malware analysis tools to report back the honeyclients IP ranges or even detect the type of the honeyclient. Contrary to blacklisting dangerous Web sites by the Internet community, such a blacklisting of dangerous Web clients for malicious Web site operators could become common to malware site operators.

References
[1] A. Ikinci, T. Holz, F. Freiling, and G. Mannheim. Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients. Sicherheit, Saarbruecken, 2008. 4

[2] Niels Provos. Spybye: Finding malware. [3] C. Seifert, I. Welch, and P. Komisarczuk. Honeyc-the low-interaction client honeypot. Proceedings of the 2007 NZCSRCS, Waikato University, Hamilton, New Zealand, 2007. [4] Christian Seifert. Cost-eective Detection of Drive-by-Download Attacks with Hybrid Client Honeypots. PhD thesis, Victoria University of Wellington, 2009. [5] Christian Seifert, Ian Welch, and Peter Komisarczuk. Application of divide-and-conquer algorithm paradigm to improve the detection speed of high interaction client honeypots. In SAC 08: Proceedings of the 2008 ACM symposium on Applied computing, pages 14261432, New York, NY, USA, 2008. ACM. [6] L. Spitzner. Honeypots: tracking hackers. Addison-Wesley Professional, 2003. [7] Y.M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King. Automated web patrol with strider honeymonkeys. In Proceedings of the 2006 Network and Distributed System Security Symposium, pages 3549. Citeseer, 2006.

You might also like