You are on page 1of 11

Cyber Forensic Analysis – A Machine Learning Prediction

Group Members:

Mridula C M

Jimna M

Parvathy Chandran T

Rinsha N

ABSTRACT:

1
Cyber forensic specialist used computing facilities and analytics capabilities to gain insights and

faster conclusions. Data Analytics will revolutionize every area and offers a faster solution to cyber forensic

investigations. In this study, we examined the importance of Machine Learning techniques for cyber

forensics. The objective of this study is to understand different applications of Machine Learning for

forensic analysis.

INTRODUCTION:

Earlier cyber forensics mainly concentrated on workstations and connected network but now

anything is connected to the internet and ubiquitous connectivity extended the forensics to any device

connected to communicate. Data Science and analytics grabs the attention from the entire field including

business, industry, academics, research, health, transportation and more. Analytics proven to be effective at

solving computing research problems and started to inspire other areas as well. Different devices and

applications are in use produces a vast amount of data during operation and continue the production of data.

Analysis of these data during investigation plays a major role in cyber forensics to gather evidences.

Technology has direct influence in our daily life and therefore extraction and analysis of data by the

devices is a serious issue in the field of cyber forensics. With improvement in the usage of technology and

IOT devices, there is an increase in the number of cybercrimes. Cyber forensics deals with tackling

cybercrimes and has two models: Mc. Kemmish model and Kent/NIST model. It is the process of

identification, extraction, examination and analysis of the data while maintaining the integrity so as to be

admissible in the court of law. Goal of digital forensic is to perform investigation while maintaining

documented chain of evidence to find exactly what happened on computing device, when and who was

responsible it. Forensic investigators follow a set of procedures and they require specialised expertise and

tools for collecting and storing data available to end users. Computer forensic covers electronic evidence

discovery, mobile forensics, cell site analysis, cloud forensics, drone forensics, windows forensic, mac

forensic, network forensic, cybercrime investigation etc.

2
FORENSIC PROCESS

The digital forensic process is a recognized scientific and forensic process used in digital forensics

investigations. The process is mostly used in computer and mobile forensic investigations and consists of

three steps: acquisition, analysis and reporting.

DATA
ANALYSIS REPORTING
ACQUISITION

Diagram 1

Four Phases of Digital Forensics:

Data Acquisition: Data acquisition begins with data seizure, collecting digital evidence to identify the

suspected media using procedures that preserve the integrity of the data. Avoid the loss of dynamic data like

list of current network connection or other battery powered devices we should collect the data in a timely

manner. After acquiring digital data, create an exact duplicate image of original data and validate it with

hash values. Hash function like MD5 and SHA-1 and SHA-258, uses a mathematical algorithm to the digital

data and returns fixed bit string hash values. Data having similar hash values are identical. If the values

validated, then we can prove that evidences are still in the original state.

Data Examination and Analysis: After creating duplicate image, examination and analysis stage

begins on duplicate image while preserving the integrity. Depending on forensic request, an analyst reports

findings about different types if information like email, log files, documents, images etc. Results of

examination phase should be analyze using well-documented methods and techniques.

Reporting: Analysis result should be reported. It include description of action employed, explanation of

how tools and procedures were used, any other actions were performed and improve the existing system etc.

3
The rapid advancement of technology and hence increased data generation was created problems in

cybercrime investigations. The new devices, technology and protocols make crime analysis harder and

tougher. The veracity of data generation by various devices and its heterogeneity demands new methodology

and tools for data analysis. Now the technology is the greatest source for big data and analytics need to work

effectively for making intelligence. In forensic methodology if identified data for analysis, then investigators

faced the problems due to real time and big data. It is not easy for analysts to examine the given collected

data within the time limit to be of use.

DIGITAL FORENSIC CHALLENGES

There exist a variety of tools to find the evidences but majority of them failed to solve correlation

problems to maintain consistency which is very important to accept the report by the court. The lack of

standard techniques for examining and analysing large volume of data from multiple heterogeneous sources

created diversity problems in analysis. Now a day the devices seized from the crime scene are increased and

many of them are potentially evidence rich with plenty of associations which seriously impacted the

timeliness of investigations and associated delays in prosecutions. Also an analyst needs to answer the

following question as preliminary report [b] to file.

1. Who or What application generate the data for analysis and raised the identity challenge.

2. Where and when this data was found.

3. What are the connections and associated information.

4. Is it related to any other offences and what the user did with the data?

5. Identify any other information.

It will be beneficial only if we utilise intelligent automated evidence collection and processing

approaches to examine the data. Data mining techniques like pre-processing and dimensionality reduction

will be useful in making data suitable for investigations. Machine learning algorithms have great potential

for biometric estimation, location tracking, detecting anomalies and deviations in digital data including

audio, video and text files.

4
If we consider analysis as a procedure in which first need to gather the required data, then analyse it

using tools and statistical models and interpret the results. This is also known as knowledge discovery from

the data set or records. Data mining helps discover knowledge from any data set like clocking the identity of

a particular piece of information. This can be successfully utilised during crime investigations. We can

classify any digital information collected to reduce the effort put forth by the investigators and to gain

necessary information within the time line.

LITERATURE REVIEW

For handling large volume of real time data, we need to combine different types of forensics

technologies such as network forensics, computer forensics (device) and cloud forensics. Text

summarization, a technique used for shortening long content is a solution for handling large amounts of

information and can settle information overloading. If automatic summarization technology is supported,

then an analyst can analyse IoT data within a limited time span.

The computing devices which are possibly used for criminal purposes can provide forensic

evidences. The data from these devices can be used to prove a motive for crime or not. One example for this

is the CCTV footages used by the investigative officers to proving the crime. It is challenging if data is

collected from a variety of devices. As more and more devices are connected, identification of potentially

relevant data is critical. With this situation in mind, Darren Quick suggested data reduction and semi-

automatic sub set analysis. This helped in timely analysis of large volumes of data.

P. Rizwan et al used machine learning algorithm to predict the traffic density using data collected

from CCTV cameras. The system can be used as an alternative for traffic congestion control with low cost. .

Named entity extraction is very useful in cyber forensic analysis as it is useful to recover meaningful

entities like names, addresses, narcotic dealing details, vehicle specifications etc. Commonly used entity

extraction techniques such as lexical-lookup, rule-based extractor, statistical based techniques, and machine

learning based models are quite relevant in IoT data as it improves speed of analysis.

5
Outlier analysis in data mining supports the identification of differences in files located in same

directory of a system, helpful in obtaining any deviation of a particular item from other. This also helps the

investigator in understanding potential intrusions into the system. Discriminant analysis is one of the

techniques used to assign an incident to any matching incident, thus providing a mechanism for event

reconstruction to prove a case. Plenty of visualisation techniques are available which along with data mining

approaches, helps the investors to easily capture any outliers and deviations. This visualization helps a lot to

move to further analysis.

MOTIVATION OF STUDY

Internet is an essential component in today’s life and connecting any one without restrictions.

Numbers of internet users are growing at a rate of one billion in every year. Business, Banking, Industries,

Academics, Corporate organizations, Health care etc. are also moved their business online. Some

organizations allow their employees to bring their devices and connect for communication. Internet surely

considered as an opportunity to the society but not safe from cyber terrorists and attacks.

Cyber forensic is a field of computer science and systematically analyse computer related

information when crime related activities were reported. Now with the advancement in technology number

of filed cases crossed millions and lack of resources and skilled experts delayed the sentences. Also because

of connected physical world and cyber physical systems the volume of data collected for analysis inevitably

huge. The investigation procedure completely tied and relied on the experience of investigators and

expensive tools. Sometimes the reliability of tools also was questioned in the court and lead to the dismissal

of the case. It is very difficult for the analyst to conclude the report if two different tools will results separate

conclusions. The cost of investigation also becomes too high, if depending on commercialized software for

analysis. In this situation an analyst can trust on Machine Learning and Artificial intelligence to gain

knowledge supporting the crime.

Increasing use of technology such as navigation facility, automatic vehicles, cell phones, home

automation systems, smart surveillance systems etc. has increased type of evidences. Because of this world

of technology now, conventional crimes also required to tie with the cyber forensics.
6
MACHINE LEARNING FOR CYBER FORENSICS – USE CASE

1. Text Forensics

Text documents and email messages are primary source of information and may be main source for

digital evidence. The manual analysis on huge amount of data is impractical and our first task is to use

machine learning for extracting information from email using python. The main task needs to find the

owner and attributes from the text document or email and used named entity extraction for extracting the

authorship of the text document or email message. The entity extraction helps to understand whether

suspected person is responsible for the crime related information.

Our project is about applying Data Mining techniques for extracting information from damaged

media using python language. In analysis phase we will do the entity extraction, identifying correlation,

sorting forensic data into groups, collection of keywords through interview. By finding keyword we can

proceed the case in better way. A digital forensic investigator will be interested in gathering information

and conducting interviews regarding computer crime, child pornography, fraud, hacking, and other digital

crimes.

For the study download a text document such as a pdf (Portable Document Format) from

Google .To retrieve the authorship information apply entity extraction approach on the text document. For

gaining information about a particular keyword, used a program designed in python to retrieve all the

matching words. The program is useful to gain particular words that are relevant for investigation from the

large file. Extracting text from pdf may help us to parse through hundreds of PDF files to extract keywords in

order to make them searchable. It is a Part of solving the problem was figuring out how to extract textual

data from all these PDF files. This study can provide a useful view of unknown data sets by immediately

revealing at a minimum, who, and what, the information contains. As a result, an analyst would be able to

see a structured representation of all of the names of people, companies, brands, cities or countries,

even phone numbers in a corpus that could serve as a point of departure for further analysis and

investigation.

Steps followed are:


7
Step 1: Importing the following python libraries:

1. PyPDF2 (To convert simple, text-based PDF files into text readable by Python)

2. textract (To convert non-trivial, scanned PDF files into text readable by Python)

3. re ("re" module included with Python primarily used for string searching and manipulation)

Step 2: Read PDF file and converting pdf image into text.

Step 3: Finding keywords from the text and get the count of each keywords.

Step 4: checking whether a word is present or not in the pdf file.

Extracting text from pdf may help us to parse through hundreds of PDF files to extract keywords in

order to make them searchable. It is a Part of solving the problem was figuring out how to extract textual

data from all these PDF files. We implemented it by using python 3.

Output:

Output shows number of times each word occur in the pdf.

8
Figure: 1

From figure 1 investigators can get an idea of how many times the victim involved in the crime.

Figure 2

From the figure 2, it may help the analyst to check directly whether the person is involved in the crime or

not.

Extracting these type of information from text is very useful for the investigator to check whether the

information matching with the information collected through the interview. This would be helpful to obtain

strongly connected information.

CONCLUSION
With new technologies existing forensic procedures are not adequate and need to extend the field of

digital forensics to data analytics and machine learning. This paper identifies the use of data mining and

9
machine learning for crime investigations in big data processing. Also analysing some works available in

literature has proven that data mining is useful in analysing digital evidences. This paper explores the

incorporation of analytic techniques for forensic investigation. The presented use cases give the overview of

possibilities of Machine Learning in Forensic analysis. The future of this wok enquire the possibilities of

Deep Learning in digital forensic investigation.

REFERENCES

1. [b]vie L. Carroll, Stephen K. Brannon, Thomas Song, “Computer Forensics: Digital Forensic

Analysis Methodology”,Cybercrime Lab,Computer Crime and Intellectual Property Section,

Criminal Division United States Department of Justice.

2. [z] Rami Mustafa A Mohammad, Mohammed Alq, “A comparison of machine learning techniques

for file system forensics analysis” , Journal of Information Security and Applications· March 2019.

3. Deepti Sehrawat1, Nasib Singh Gill, “Data Mining in IoT and its Challenges”, International Journal

of Computer Sciences and Engineering , 2018

4. lFrancescoServida, EoghanCasey, “IoT forensic challenges and opportunities for digital traces”,

Science Direct Digital Investigation Volume 28, Supplement, April 2019, Pages S22-S29.

5. Darren Quick et al, “IoT Device Forensics and Data Reduction”, IEEE Access 2018.

6. [a]P. Rizwan, K.Suresh, M.R.Babu, Real time smart traffic management system for smartcities by

using internet of things and big data, EmergingTechnological Trends(ICETT), International

Conference on,IEEE, 2016,pp.1–7.

7. Raburu George,, Omollo Richard,, Okumu Daniel , “Applying Data Mining Principles in the

Extraction of Digital Evidence”, International Journal of Computer Science and Mobile Computing.

8. George Forman, Kave Eshghi, and Stephane Chiocchetti,(2005),” Finding similar files in large

document repositories.”, In KDD ’05: Proceeding of the eleventh ACM SIGKDD international

Conference on Knowledge discovery in data mining, pages 394–400, ACM, New York, NY, USA,

ISBN 1-59593-135-X.

10
9. Kumar Shanu Singh ,Annie Irfan and Neelam Dayal “Cyber Forensics and Comparative Analysis of

Digital Forensic Investigation Frameworks” 2019 4th International Conference on Information

Systems and Computer Networks (ISCON)

10. Asaf Varol andYeşim Ülgen Sönmez “Review of evidence analysis and reporting phases in digital

forensics process”, 2017 International Conference on Computer Science and Engineering (UBMK)

11

You might also like