Professional Documents
Culture Documents
Data Collection and Audit Logs of Digital Forensics in Cloud
Data Collection and Audit Logs of Digital Forensics in Cloud
I. INTRODUCTION
Digital forensics is the application of science, which is
mainly for lawful investigation. FDA abbreviated as Forensic
Data Analysis is branch of Digital forensics environment. In
the past, structured data of financial crimes has been extracted
and collected. The objective of digital forensics is to search,
identify, determine and study patterns of crimes. Structured
data is defined as the data and their corresponding information
from underlying databases of various application systems.
Unstructured data defined as the data or the corresponding
information that is extracted from documents and files based
on communication or from the data fragments of mobile
devices. Computer forensics is stated as the analysis of
unstructured data. The prominent technologies like NoSQL
(Not Only SQL) systems are extensively used to face the
challenges addressed in the existing forensic analysis system
by using logs based forensic analysis methods to achieve
scalability and reliability in distributed storage system.
Pranab Batsa2
Department of IT
MIT Campus
Anna University
Chennai, Tamilnadu, India
prbatsa@gmail.com
One way to resolve this issue is by investigating such large
business firms and by collecting the logs of with respect to
timestamps of the concern crime. The main contribution of
this work is to understand different properties of forensic
investigations[11],[15] and by considering these properties to
find better forensic algorithm and then qualitatively evaluate
different data collection and investigation methods that cover
the design space. In this work, a Log based forensic analysis
method is proposed for distributed storage machines by
utilizing the logs which is collected from the crime scene and
device to predict the crime person. A modified intrude
detection algorithm is proposed to detect the intrude in
comparison with original and acquired database The obtained
data is collected, audited and investigated in distributed
storage systems like Cassandra subsequently performing a
forensic investigation in backend data storage server. The
average time-taken for a query is decreased significantly due
to this scheme when the server workloads are balanced. The
volume of input and output is reduced drastically by filtering
the evidences which is necessary and not necessary. To test
the proposed concept of log based forensic model, a selective
analysis and investigating capability to Cassandra NoSQL
system is used [6] and validated using business firm datasets.
Cassandra is an open source database management system,
which is to handle large amount of data. Cassandra has
Hadoop integrated with MapReduce, which makes the mapper
and reducer algorithm for clustering the large amount of
datasets. Even though the suggested approach is validated on
a business safe system, the same theory can be used to other
safe applications also that utilize several forensic operations
frequently. The high volume of data that is stored will at
certain point become so complex that it is impossible to
investigate practically the complete datasets for the forensic
investigators to do a thorough investigation. Due to
confidential and secret datasets, the forensic process is
undergoing certain time delay. Investigating such confidential
data in Cassandra makes the processing time faster and
response time quicker.
The remaining part of the paper is organized in the following
way: Section 2 gives insights of the related work regarding the
techniques used in forensic analysis for digital investigation in
cloud. Section 3 gives Motivating insights of proposed model
and with the log files in the cloud. This method has limitations
as it is finding it tough to separate proof from large set of data.
The chances of losing information becomes higher
because of dynamic nature of scaling up and down
In cloud forensic the reliable source for evidential data is very
limited.
Maria Chalkiadaki [4] and Satoshi Fukuda et al [6],
discussed the concept to refine average response time of
search queries by parallel execution of range queries in
Cassandra and scheduling search queries. Cassandra is wellknown NOSQL database, providing high availability with no
single point of failure. The author mixed the both queries
single and range, under a situation to give suitable priority to
each query in the proposed scheduling method. Their
technique desires further computations but outgrowths
improved locality and the scheme performs well when
compared to SQL based queries. However their technique
focuses only on small networks and the limitation exists when
it is applied to large size networks. The computation
complexity is also very high.
Zehui-Li, Zhiwei-Chen, Yatao-Yang Zichen-Li,and RuoqingZhang [3] proposed the concept of Efficient Hashing Scheme
with shifted traversal design group testing algorithm. Their
proposed technique is to calculate hash values for all events in
a log file as the integrity proof and precisely locate the events,
which has been corrupted. Efficient Hashing Scheme focus on
log integrity check in policy-based security scanning systems
and gives a structure that uses a pooling strategy named the
Shifted Transversal Design (STD) to verify integrity and
trustworthiness of the monitoring logs. Fast and highthroughput screening strategy has been introduced to ensure
the integrity of monitoring logs in monitoring system.
Monitoring-as-a-service (MaaS) framework is to improve
efficiency and guarantee the data security. Latest secure hash
standard --SHA-3--Secure Hash Algorithm has been adopted
to certify the impeccability of information digests. Secure
Hash Algorithm provides the investigators of digital forensics
with a consistent and safe method which can be used to
supervise user activity over a cloud framework. Storage
overhead has been increased by adopting the scheme. The
Author calculated a hash value for all event records and stocks
it securely for later verification. This strategy could only
provide a yes-or-no answer about the integrity of the whole
log file and it is not accurate enough for the single event.
Anthony Keane and Neha Thethi [2] discussed the digital
forensics investigations in the cloud. The main objective of
their work is to verify whether it is possible or not for the
forensic investigators to imaging all the evidence in a limited
time to finish an investigation properly because of the time
consumed to image different storage volumes in the cloud
.The Author followed the investigation framework proposed
by Martini and Choo [9]. Their framework includes iteration
phases which enable the process of recognizing and saving of
evidence even after it is developed in the assessment and
analysis phase of the forensic analysis investigation. Their
approach demonstrated how creation, identification and
destruction of events would have severe consequences for
Network
Steps:
1.
2.
3.
4.
5.
6.
7.
end of for
fori= 1 to number of data fields
if di =1
Rset = Rset + di
Return Rset
End
Fig3 represents the master and slave method for clustering the
datasets (logs). Sample and ignore concept is applied here to
eliminate the repeated values and data. Each and every slave
node is a cluster formation of the attributes including time
stamps of the evidences. Master node sends the tasks and data
to each and every slave node. Slave node process the data by
forming cluster like structure and sends the report to the
master node.
Algorithm2: Efficient Segregation and filtering Algorithm for
data partition
Input: dataset dS;
Segregation ratio Sr (segregating the inculpatory and
Exculpatory evidence)
Output:
Clusters (unstructured data filtered based on the
division of logs)
Steps:
1.
Phase 1 Segregation
2.
m master reads the data and send the data elements
to one slave with probability Sr;
1041
1473
2048
2998
3845
4754
0.697
2.333
3.773
4.456
4.543
5.222
1.222
2.666
4.556
5.278
5.765
6.768
8
7
6
Execution Time
5
4
Execution
time(sec) in
Cassandra
Execution
time(sec) in sql
2
1
0
1041
1473
2048
2998
3845
4754
No.of.Queries
Figure 4. Comparison of Execution time of Cassandra with SQL
2.
3.
4.
5.
Figure 5. Linear scalability of reads and writes of Cassandra during
forensic analysis investigation
6.
7.
8.
9.
10.
11.
12.
13.
14. Keyun Ruan, Ibrahim Baggili, Joe Carthy, and Tahar Kechadi, Survey
on cloud forensics and critical criteria for cloud forensic capability, A
preliminary analysis Proceedings of the 2011 ADFSL Conference on
Digital Forensics, Security and Law ,2011.
15. Bill Nelson, Amelia Phillips and Christopher Stewart, Computer
Forensics and Investigations as a Profession in Guide to Computer
Forensics and Investigations, Fourth Edition, Boston, Course
Technology, 2010, ch.1.
16. P.Pirzadeh, J.Tatemura, and H.Hacigumus. Performance evaluation of
range queries in key value stores,In Parallel and Distributed Processing
Workshops and Phd Forum (IPDPSW), 2011 IEEE International
Symposium on, pp. 10921101, 2011.
17. Reilly D, Wren C, Berry T, Cloud Computing: Forensic Challenges for
Law Enforcement, Internet Technology and Secured Transactions
(ICITST), IEEE Press, London, 2010.
18. Biggs S, Vidalis S, Cloud Computing: The Impact on Digital Forensic
Investigations, The Institute of Electrical and Electronics Engineers,
IEEE press, 2009.