This action might not be possible to undo. Are you sure you want to continue?
Khaled Almakadmeh, & Mhammed Almakadmeh Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, QC, Canada H3G 2W1
ABSTRACT Internet has created new forms of human interaction through its services, like E-mail, Internet Forums and Online Banking Services. On the other hand, it has provided countless opportunities for crimes to be committed, many digital techniques have been developed, and used to help cybercrime investigators in the process of evidence collection. In this paper, we developed an efficient digital forensics mining tool to help cybercrime investigators in evidence collection and analysis by providing various forensically important features. Keywords: Evidence, Digital Forensics, Semantic Search,Cybercime Investigation.
nternet has provided many solutions that help people over the entire world to facilitate their lives including; E-mail, Instant Messages (IM), Online Banking Services, and many other services that most of the people can’t stop using. However, according to published statistics, there are thousands of businesses and government departments like Western Union, Creditcards.com and CD Universe have been hacked, which resulted in over a billion dollars of damages per year, and this amount of losses is climbing. This makes the job of law enforcement officers including cybercrimes investigators more difficult and complicated, because of the large amount of data that has to be collected and analyzed. Most of cyber criminals use high-technological devices; this requires that law enforcement agencies to have efficient tools and utilities to gather and analyze data from these devices. These reasons were primary motivation behind conducting our research in computer forensics to develop our Digital Forensic Evidence Mining Tool. It’s dedicated to help cybercrimes investigators in the process of collecting and analyzing evidence from suspects’ devices. We have provided features that are highly needed, helpful and supportive toward evidence collection. Search engines like Google, Yahoo, and many others perform keyword search. However, cybercrime investigators need is to be able to do a semantically oriented search. Semantic search  provides a great flexibility during the investigation process. For example, the word "cocaine" is not going to be mentioned frequently in a drug dealer's communications, instead, when an investigator wants to search for a word like "cocaine", (s)he is expecting
to get results that contain the term cocaine or any other related terms. Table 1 shows some examples of terms and their synonyms/Hyponyms Table 1: Examples of terms & their synonyms/ Hyponyms Term synonyms/Hyponyms Cocaine Blow, Nose Candy, Snow, Crack, Tornado Bank Depository, Reserve, Backlog, Stockpile, Deposit, Container, Money Resource, Money Box Investigation Probe, Inquiry, Enquiry, Research, Investigating Internet Net, Cyberspace, System, Electronic Net, Computer Network Our tool is able to enrich the search with various semantic suggestions that the investigator can use. While developing our tool we faced many challenges; we should take into consideration the tool efficiency, robust functionality, and visualization during the whole development cycle. Besides these challenges, our solution should be scalable for large number of files and ready to adapt new features. In addition, the tool needs to be very responsive; within a matter of few seconds the search results need to be displayed and ready to be processed. 2 RELATED WORK
In this section, we focus on previous tools and solutions that have been proposed to help cybercrime investigators. First, we discuss stand alone utilities used in this field and in subsequent sections we mention how our tool takes advantage by integrating
them, and providing more customized features that will help cybercrime investigators in performing their jobs. The first utility we use is Google Desktop Search (GDS)  provided by Google Corporation. GDS is a desktop search engine that provides full text search for a wide range of file types, such as emails, documents of all types, audio files, images, chat logs, and history web pages that the user has visited. What makes it efficient is that after the initial setup and building the index for the first time, indexing occurs only when the machine is idle. Thus, the machine's performance is not affected. GDS also makes sure that it stays up to date by monitoring any changes on existing or in newly added files. The last but not the least feature is finding deleted files; Google Desktop creates cached copies (snapshots) of all files. These copies can be viewed even if the files have been deleted and are returned in the search results. The other utility we use is WordNet , a large English lexical database. It provides nouns, verbs, adjectives and adverbs that are grouped into sets of cognitive synonyms called “Synsets”. Synsets are interlinked by means of conceptual-semantic and lexical relations . In  indexing with WordNet Synsets is used to improve text retrieval. We take advantage of this utility to show the investigator a broad collection of suggestions that she/he could pass to GDS. Further discussion about our developed solution is provided in subsequent sections.
In this section, we show an overview of our tool’s architecture. Then, we discuss how each component in the tool contributes to the overall functionality. After that, we show the use-case and activity diagram of our tool. 4.1 System Architecture
The system architecture provides a comprehensive overview of the tool and its supporting infrastructure, Figure 1 shows the architecture of our tool:
PROBLEM STATEMENT Figure 1: Tool Architecture
A Good problem statement should answer the following questions: What is the problem? The investigator needs to be able to query the criminals’ devices to build knowledge about what information it contains. This knowledge can be used to provide evidence, and/or to prevent future incidents. Who has the problem? The intended clients for this solution are cybercrime investigators; they face a problem when performing an effective and efficient search on the information in criminals’ devices. What is the solution? A full featured desktop tool that uses GDS and WordNet to provide semantic search in a suspect’s computer. 4.2
The system components are: Graphical User Interface WordNet API Google desktop SDK Business Layer
We describe each component from a technical perspective, and explain how they communicate with each other to handle the submitted task. Then, we present the implemented features that are of great use to cybercrime investigations. 4.3 Graphical User Interface
The Graphical User interface (GUI) was designed to be simple, intuitive, and yet very practical. It
contains all our tool functionalities in a clear and standard presentation to minimize the learning curve of the user. The GUI also provides menus that accomplish the same functionalities as the main window components; this menu is intended to help users that are more menu-oriented. Figure 2 shows a screen shot of our tool.
In addition, the investigator has the capability for more options, like specifying whether he wants to look for nouns, verbs, or adjectives that are related to the term he previously searched. Below that panel there is a definition window that shows the definition of the selected word from the suggestion panel, and an example of use. Double-clicking on a term from the suggestion panel initiates a new request to search for that term and the results are displayed in a new tab. This approach guarantees that our tool is working at the highest performance level. 4.5 Google Desktop SDK
The Google Desktop Search SDK consists of the following: Event Schemas : The GDS engine processes event objects sent to it by other components (Business layer, or even the GUI). An event object consists of the content data the investigator wants the engine to index and store, as well as additional metainformation and properties about that content or the event object. The event schemas specify the allowed event types and the relevant properties for each event type. Developer Indexing API: The Developer Indexing API consists of interfaces used to construct event objects and send them to GDS. Developer Search API : We only use the Developer Search API. It sends an HTTP request to Google Desktop Search engine that contains the investigator search query term. The HTTP response contains the desktop search results in XML format.
Figure 2: Digital Forensics Evidence Mining Tool
For the semantic search functionality, we decided not to automatically search for all synonyms of the desired term. Since this approach will overload the tool, and overwhelms the investigator with a large amount of results. Instead, we designed our tool to search only for the desired term. Figure 3 shows more practical feature-rich suggestion panel. When the forensic investigator enters a term and hits Enter; the suggestion panel shows a list in the form of tree view that contains synonyms, acronyms, sister terms…etc.
When the investigator submits a search query, actually (s)he generates an HTTP request that includes a &format = xml parameter. For example, to search for "Google" you would send something like: http://127.0.0.1:4664/search&s=1ftR7c_h ZK YvuYSRWnFHk91Z0?q=Google&format=xm. To break this down: http://127.0.0.1:4664/: is the localhost address and GDS port. search&s=1ftR7c_hVZKYvuYSRWnFHk91Z0: is the search command and a security token. ?q=Google: is the query term(s) parameter.
Figure 3: Panel shows suggestions for "Cocaine"
If the investigator wants to search for more than one term, separate the terms with +s. For example, to search for both "Google" and "GDS", use:?q=Google+GDS. If the investigator wants want to search for a specific phrase, separate the terms with +s and surround the phrase with %22s. For example, to search for the phrase "Google Desktop Search", use:?q=%22Goo-gle+Desktop+Search%22 To search for the two phrases "Google Desktop Search" and "Copyright 2005", use:?q=%22Google+Desktop+Search%22+%22Cop y-right+ 2005%22. &format=xml specifies that the HTTP response returns the search results in XML format. By default, an HTTP search response will only return the first ten results. It’s kept for developer to specify the number as needed by appending the &num= parameter, followed by the maximum number of results to be returned to the query. There is no problem if the maximum number argument value is greater than the total number of search results; only the total number of results is returned, with no null "results". 4.6 Business Layer
The activity diagram  shows the flow of the program when a search task is submitted to the tool. As shown in the diagram, the user can specify advanced search options before executing the search; also choose a keyword from WordNet to run the search again. After the results are shown, the user can generate a report and save it to be used later when presenting the evidence to the court of law.
This component is at the core of our tool; it receives the search terms from the GUI and it interacts with the WordNet component in case the investigator wants to search a keyword from the suggestion panel, it also sends the search term with the search preferences to the GDS engine. The business layer processes the results and sends them back to the GUI to be shown to the investigator. This layer resembles the brain of our tool where all the processing complexity is hidden kept separated from the GUI. It is composed of classes and functions that communicate with the rest of the components. 4.7 Use Case Diagram
Figure 4: Activity Diagram 4.9 Applicability
Our tool runs on Windows XP, Vista, and even Windows 7, and by using Google Desktop Search engine our tool can access all file types, MS Office files, Outlook files, archive files (such as .zip, .rar), email and web history files.
The use case diagram  gives an abstract of what functionalities the investigator can use when working with our tool.
Our tool provides a feature-rich environment for the investigator. We provide many features that help the investigator in evidence analysis and report generation. Below is a description of all the functionalities our tool provides: 5.1 Result Display: By default search results are displayed in a group of twenty per page; the previous and next buttons allows the investigator to navigate through the next and previous result page. The total number of results found is shown at the top of the results page.
Figure 3: Use case Diagram
5.2 Access All Files Types: Using Google Desktop Search engine our tool can access all file types, MS Office files, Outlook files, archive files (such as .zip, .rar), and web history files. 5.3 Semantic Search: Full of features panel that suggests many variations of the keyword, including a small panel that shows the meaning of each word, and a sample sentence of how it is used. 5.5 Multiple Tabs: For each keyword searched a new tab will open, allowing the investigator to conduct more search processes, and close any unneeded tab. 5.6 Advanced Search: Provides more options that allow the tool to filter the number of results. A. Choose various file types for more refined search, including most common file types, like; text, images, audio/video, archive (zip), and HTML files. B. Choose specific file category like email or web to search only the specified type of files. C. Choose the number of results per page. D. Sort the results by relevance: when checked; relevant files (within the same directory) will be displayed (sequentially) after each other. 5.7 Display File Snippet: Allows the investigator to see the searched term within the file it’s been found. 5.8 Display Detailed File Information: like creation date, last access date, last write date, file attributes, and MD5 Checksum value. 5.9 Opening The File In The Appropriate Application: when the file name is double clicked in the graphical user interface. 5.10 Comprehensive Menu: provides the same functionalities to the user is (s)he is more accustomed to using menus. 5.11 Report Generation: allows the selection of multiple files from multiple search results tabs, to be added and used to generate a report in HTML format. This report shows for each file: the file title, path, MD5 Checksum value, and files size. 5.12 Set the Search Path: to search within a specific directory only.
5.13 Calculate & Display the Hash (MD5) of the file to prove the integrity of the seized evidence. 5.14 Help Menu: provides the user with a user manual of how to use the tool functionalities.
In this paper, we developed a Digital Forensics Evidence Mining Tool that is dedicated to help cybercrimes investigators, in the process of collecting and analyzing data from a suspect’s computer. We have provided in this solution features that are highly needed, helpful and supportive towards evidence collection. We took advantage of some already developed APIs, such as; Google Search Desktop API, and WordNet API to enrich our application. Due to recurring requirements in this hot topic, our solution is scalable and can be adjusted to adapt future requirements and features to provide a unique and essential tool for cybercrime investigators.
 R. Guha, Rob McCool, Eric Miller, Semantic search, International World Wide Web Conference, Proceedings of the 12th international conference on World Wide Web.  Benjamin Turnbull, Barry Blundell, Jill Slay, Google Desktop as a Source of Digital Evidence.  George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller, Introduction to WordNet: An On-line Lexical Database.  Julio Gonzalo, Felisa Verdejo, Irina Chugur, Juan Cigarrain, Indexing with WordNet synsets can improve text retrieval, UNED, Ciudad Universitaria.  G. Booch, J. Rumbaugh, I. Jacobson, Unified Modeling Language User Guide.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.