You are on page 1of 7


Data Collection and Audit Logs of Digital Forensics in Cloud

M.R.Sumalatha 1 Associate Professor (Sr.Grade) Department of IT MIT Campus,Anna University Chennai,Tamilnadu, India

Abstract - Cloud computing has become more important to the users and the service providers, since it is convenient for the users to get the required service from their service providers.As the amount of services given rises, the amount spent on privacy and security of these services too raise. The internet as it is known has been growing insecure in the recent years. With the vast expanse of technology with regard to cloud computing and popularization of the same, the crimes due to these or on these devices will also increase in the near future. The digital environment should be capable of not only preventing the attacks, but it should also be capable enough to deal with the high security threat to defend it. Logs based forensic model approach which is to build a forensic-friendly system has been proposed. A proposed framework for the pursuit of a cloud computing digital investigation is environment that depends on a particular event’s “cause and effects”. Search for evidence and to develop hypotheses about such events that at the crime. In the proposed work, Log audit plays a vital role in the forensic framework. Logs model with timestamps can collect the information on the cloud rapidly for forensic resolutions. Hence the complexity involved in these kinds of forensics is reduced.

Keywords-Distributed Database Systems, Logs Model, Crimes, Evidence, Digital Investigation, Digital Forensics.



Digital forensics is the application of science, which is mainly for lawful investigation. FDA abbreviated as Forensic Data Analysis is branch of Digital forensics environment. In the past, structured data of financial crimes has been extracted and collected. The objective of digital forensics is to search, identify, determine and study patterns of crimes. Structured data is defined as the data and their corresponding information from underlying databases of various application systems. Unstructured data defined as the data or the corresponding information that is extracted from documents and files based on communication or from the data fragments of mobile devices. Computer forensics is stated as the analysis of unstructured data. The prominent technologies like NoSQL (Not Only SQL) systems are extensively used to face the challenges addressed in the existing forensic analysis system by using logs based forensic analysis methods to achieve scalability and reliability in distributed storage system.

Pranab Batsa 2 Department of IT MIT Campus Anna University Chennai, Tamilnadu, India

One way to resolve this issue is by investigating such large business firms and by collecting the logs of with respect to timestamps of the concern crime. The main contribution of this work is to understand different properties of forensic investigations[11],[15] and by considering these properties to find better forensic algorithm and then qualitatively evaluate different data collection and investigation methods that cover the design space. In this work, a Log based forensic analysis method is proposed for distributed storage machines by utilizing the logs which is collected from the crime scene and device to predict the crime person. A modified intrude detection algorithm is proposed to detect the intrude in comparison with original and acquired database The obtained data is collected, audited and investigated in distributed storage systems like Cassandra subsequently performing a forensic investigation in backend data storage server. The average time-taken for a query is decreased significantly due to this scheme when the server workloads are balanced. The volume of input and output is reduced drastically by filtering the evidences which is necessary and not necessary. To test the proposed concept of log based forensic model, a selective analysis and investigating capability to Cassandra NoSQL system is used [6] and validated using business firm datasets. Cassandra is an open source database management system, which is to handle large amount of data. Cassandra has Hadoop integrated with MapReduce, which makes the mapper and reducer algorithm for clustering the large amount of datasets. Even though the suggested approach is validated on a business safe system, the same theory can be used to other safe applications also that utilize several forensic operations frequently. The high volume of data that is stored will at certain point become so complex that it is impossible to investigate practically the complete datasets for the forensic investigators to do a thorough investigation. Due to confidential and secret datasets, the forensic process is undergoing certain time delay. Investigating such confidential data in Cassandra makes the processing time faster and response time quicker.

The remaining part of the paper is organized in the following way: Section 2 gives insights of the related work regarding the techniques used in forensic analysis for digital investigation in cloud. Section 3 gives Motivating insights of proposed model

978-1-4673-9802-2/16/$31.00© 2016 IEEE


and de nition of problem. Section 4 presents and discusses the results after evaluation based on logs based forensic analysis in Cassandra by using the sub query concept and analyzing its performance. Section 6 concludes the research work and future enhancements is discussed.


The research issues in digital forensics focus on the log based forensic investigation of business firms and the different schemes involved are elaborated below in various papers. Log based forensic algorithm is widely used for predicting the best forensic analysis, which is applicable for firms datasets. There are several forensics algorithms available which includes tiled bitmap algorithm, image collection technique and shifted traversal design group testing algorithm. Timothy Vidas,,Nicolas Christin and Daniel Votipka[5] gave a tactic which was done on android devices for investigating digital crimes. By applying a special mode for booting , inclusive withdrawal of proof with least probable for data fraud or exclusion is possible. Collecting, analyzing and examining data from devices is a essential need to overcome approaching challenges in investigation. In their study, common middleware Android does gathering of image on a broad series of devices. Techniques related to revival of image collection are more focused in this work. They were capable of gathering data from evidence-loaded storage without worrying about evidence manipulation. However, the limitation in their approach are, forensic data acquisition is a old one, most tools came out of necessity and targeted on foremost platform that is, Microsoft Windows Computers. Miscellany of hardware and software of phones, has slowed down the progress for these tools .. Sana Shaikh, Radhika Randhir,Ashish Chavan [1] presented a Tiled Bitmap Algorithm for separately investigating fraud event taking place in different tiles and several corruption activity inside a tile. All the tools in the system is interacted with a newly created central database making it doable for notarizer and validator to do their work. Stating that ,decision making approach is in their model where one way hash key function as MD5 hash key yielded with shielded notarizer, is always used for conforming a validator and approve the data for original identity, which is a legal one. By assigning all the data owners with each different digital signatures .It aims to provide a notarization service which is strong enough and a validation scheme which is perfect to manage maximum integrity requirements and data security. Ting Sang[4] discussed the method for using logs model to make a system which is forensic-friendly .This method can be used for extraction of information from the cloud that can be done rapidly for the purpose of forensic which results in simplification of forensic analysis investigation. The proposed log based method by author has reduced the difficulty of forensic for non-repudiation of behaviors on cloud. They recognize the fake user easily by comparing the logs modules which are local and will locally gather that information on the log record with the maintained local logs

and with the log files in the cloud. This method has limitations as it is finding it tough to separate proof from large set of data. The chances of losing information becomes higher because of dynamic nature of scaling up and down In cloud forensic the reliable source for evidential data is very limited. Maria Chalkiadaki [4] and Satoshi Fukuda et al [6], discussed the concept to refine average response time of search queries by parallel execution of range queries in Cassandra and scheduling search queries. Cassandra is well- known NOSQL database, providing high availability with no single point of failure. The author mixed the both queries single and range, under a situation to give suitable priority to each query in the proposed scheduling method. Their technique desires further computations but outgrowths improved locality and the scheme performs well when compared to SQL based queries. However their technique focuses only on small networks and the limitation exists when it is applied to large size networks. The computation complexity is also very high. Zehui-Li, Zhiwei-Chen, Yatao-Yang Zichen-Li,and Ruoqing- Zhang [3] proposed the concept of Efficient Hashing Scheme with shifted traversal design group testing algorithm. Their proposed technique is to calculate hash values for all events in a log file as the integrity proof and precisely locate the events, which has been corrupted. Efficient Hashing Scheme focus on log integrity check in policy-based security scanning systems and gives a structure that uses a pooling strategy named the Shifted Transversal Design (STD) to verify integrity and trustworthiness of the monitoring logs. Fast and high- throughput screening strategy has been introduced to ensure the integrity of monitoring logs in monitoring system. Monitoring-as-a-service (MaaS) framework is to improve efficiency and guarantee the data security. Latest secure hash standard --SHA-3--Secure Hash Algorithm has been adopted to certify the impeccability of information digests. Secure Hash Algorithm provides the investigators of digital forensics with a consistent and safe method which can be used to supervise user activity over a cloud framework. Storage overhead has been increased by adopting the scheme. The Author calculated a hash value for all event records and stocks it securely for later verification. This strategy could only provide a yes-or-no answer about the integrity of the whole log file and it is not accurate enough for the single event. Anthony Keane and Neha Thethi [2] discussed the digital forensics investigations in the cloud. The main objective of their work is to verify whether it is possible or not for the forensic investigators to imaging all the evidence in a limited time to finish an investigation properly because of the time consumed to image different storage volumes in the cloud .The Author followed the investigation framework proposed by Martini and Choo [9]. Their framework includes iteration phases which enable the process of recognizing and saving of evidence even after it is developed in the assessment and analysis phase of the forensic analysis investigation. Their approach demonstrated how creation, identification and destruction of events would have severe consequences for


collecting and analyzing evidence and thus moving the result of an investigation. The author mainly focused on FTK imager toolkit. They have proved that FTK Remote Agent is most competent screening a nearly 12 percent drop in time and it is an effective approach as compared to the conservative and a usual approach.

Muhammad SharjeelZareen et al [5] discussed the latest challenges and responses in the digital forensics. Author mainly highlights the response of DF, which deals with latest development in the technologies and trends including technologies in network and systems, security while computing, surfacing of smart devices and use of anti- forensics measures. They identified only the deficits in response of DF with respect to technology and trends. They propose the improvement and measurements in DF to describe the identified shortfalls.

Mohammad Wazid et al [7] categorized the digital forensics tools into five Memory categories. They are network Forensics, mobile phone forensics, database forensics and computer forensics. In our work, we mainly focused on computer forensics and database forensics. The main issue

overlooked by the digital forensic investigation is the absence of laws in cyber field. Author clearly mentioned, even though the cyber laws are present, but it easy for criminals to commit crime without being caught as the cyber policing system is not

much aware to deal the problems

So we focused on each and

.. every steps of forensic analysis investigation in an efficient way with trustful evidences. In summary the survey, analyses the overall problems and solutions of existing methodologies like Recovery image collection technique, policy-based security monitoring systems, Tiled Bitmap Algorithm, Hashing Scheme with shifted traversal design group testing algorithm, which has many limitations. The problems in the existing approaches is addressed and overcome by our proposed model by using log based forensic analysis which has unique features, that improves the accuracy of forensic analysis on investigation process easier with less time consumption.


The objective of the proposed Log based forensic model is to perform the forensic analysis steps with efficient algorithm in each stage of examination and analysis that would enhance scalability and performance of business firm data. The proposed logs model scheme is capable to predict the crime person, who is involved in that concerned crime scene with the trusted evidences as a proof. Data Collection: Collections of data comprise recognition of likely accessible source and obtain data from it. Sources may be from personal computer, mobile, laptop. A system is OFF and the data is needed to be obtain, this gain is known as static acquisition, and the system which is in ON state is called live acquisition

From Fig. 2.1 Data gaining has three steps.

  • 1. Plan development

  • 2. Data gaining, and

  • 3. Gained data authentication.

DF tools are commended for gaining of data. Seeing the case the decision has to made whether to apply the process locally or over the network But, favored one is local gaining .For legitimate motive, Conformation of gained data is must to prove its solidarity and precision Gained data is compared

with hash value of original source data using tools of Digital Forensic to do so. Nevertheless, source will be modified by live acquisition if this conformation is tried to be done on live(online) acquisition .. Examination: Once the acquisition of data is done, After the possession of data, this stage incorporate examining and withdrawing the chunk of information from data gathered in early stage. Generally the gathered data is huge and it's challenging to discover the chunk of information related to incident being investigated by manual searching DF tools have solutions to this problem.

  • 1. Option provided to search keyword, to examine gathered

data in search of text and pattern.

  • 2. Separation of data basing on different attribute types

e.g. video ,text ,graphic ,audio etc. Analysis: The information gathered in previous stage is used for imaging conclusion in this stage. Evidence found on different storage devices are integrated in Cross-drive analysis which is a new domain.


Possible source of evidence


Host Intrusion Detection System(HIDS) Web content and browser logs Firewalls and access logs Chat logs Application cache


Access logs

Transaction logs

Packet content

Header content

Table I Possible Sources of Evidences

Forensic Analysis Engine: Business firm datasets are taken for forensic analysis. Datasets which we considered here is company based issue. Every employee has a restriction to not to use certain sites or links, mainly file sharing links. If an employee shared some confidential document to outside persons through file sharing link. Admin will get an alert, that somebody has accessed the restricted URL and shared the document, which is confidential. Confidential document is the document, which have the sensitive word count. Sensitive word count has been calculated based on the restricted file shared link. Based on the sensitive word count, admin will get an alert. Admin send a request to forensic investigator the crime

Forensic investigator captures the suspect machine based on the IP address used for the crime. Investigator will collect the logs, which is relevant to the crime



Figure1.Log based forensic system architecture

Cassandra has been used as a forensic analysis engine in the proposed system for the log based forensic analysis in which logs are collected, preserved and segregated with the help of sub query concept, which reduces the time of query processing. Clustering Algorithm filters the exculpatory evidences, which reduces the analysis process easier. The performance of the forensic analysis investigation is also getting improved. Reporting: Finally, Reporting is last and climax stage of DF where outcome of data analysis stage is arranged and provided to the covered official and court. Report is documented and prepared with provable evidences where everyone can understand it.


Figure 2.Forensic analysis Engine

Fig 2 represents the forensic analysis steps followed in the proposed system in which it produces the trusty evidence for the concerned crime.


The above features are implemented by detecting intrude in








database is


log which is collected from the suspected

machine or device. Before entering in to the analysis phase of

forensics, auditing the logs in

which logs are stored in


database for clustering to be detected whether an intruder








Algorithm1: Detecting intrude in comparison with original and acquired database


Oset is the set of Original database


Aset is the set of acquired database Dset is the set of Data Field Index Ur is Username dt is date and time di is the index data field



Rset is the set of Result


  • 1. di =0, which is the index field

  • 2. Rset =xxxx

  • 3. for i= 1 to number of data fields

  • 4. Ai data field of Aset

  • 5. Oi data field of Oset

  • 6. if Ai and Oi are not equal then

  • 7. if di =1


  • 8. end of for

3.One slave uses the specific feature to find clusters in . Sr

  • 9. fori= 1 to number of data fields

passes description of cluster to m master ;after receiving the

  • 10. if di =1


  • 11. Rset = Rset + di

  • 4. Phase 2 – Filtering

  • 12. Return Rset

  • 5. m master reads the data, ignore the elements from the

  • 13. End

cluster found in the master ( it removes the duplicates)and


send the rest to s slaves, according to the method used for data


Figure 3.Master and Slave Method for Segregation and Filtering the datasets

Fig3 represents the master and slave method for clustering the datasets (logs). Sample and ignore concept is applied here to eliminate the repeated values and data. Each and every slave node is a cluster formation of the attributes including time stamps of the evidences. Master node sends the tasks and data to each and every slave node. Slave node process the data by forming cluster like structure and sends the report to the master node.

Algorithm2: Efficient Segregation and filtering Algorithm for data partition Input: dataset dS;

Segregation ratio Sr (segregating the inculpatory and Exculpatory evidence)


Clusters division of logs)








  • 1. Phase 1 – Segregation

  • 2. m master reads the data and send the data elements

to one slave with probability Sr;

partition, which is mainly based on the evidence;

  • 6. s slaves use the specific feature for finding clusters in the

received elements, and send the descriptions of cluster to one machine, in which partitions are done and extracted data from it; 7.One machine merges the clusters received and the ones from the master, let the merged result be clusters; 8.Clusters are returned, clusters be the filtered data, which is taken into account for forensic analysis

In algorithm 2, Segregation and Filtering are the two phases involved in the examination process of forensic analysis. Based on the specific features, the unwanted evidences are removed and clustered. The main advantage is to minimize the volume of datasets. This reduces the computation complexity of processing the operations.


Evaluating the proposed scheme for achieving effective forensic analysis of business firm, the Logs based forensic model is embedded into open source code of Apache Cassandra 2.1.4. The performance of the proposed mechanism is evaluated and the forensic analysis algorithm is applied to business firm datasets. Experimental analysis was conducted by implementing the algorithm on the business firm datasets like company, which consists of Employee ID, Gender, Name Set, Given Name, Middle name, Surname, DOB, personal Email Address, Company Email address, System username, System password, Telephone number, System Number, Department, First Sign in, Last Sign out, IP address, File sharing accessed link, Domain name access time, Protocol, Document type, Sensitive words, sensitive words Occurrences count. Forensic analysis is done by investigating each and every employee attributes with the time stamps of their actions. Inculpatory evidences are the evidences which are related to the crime. Taking that in to account, querying and analysis is processed in Cassandra using sub queries. Sub queries make the analysis process faster, which improves the accuracy of Cassandra.



of Queries












  • 0.697 4.543

    • 2.333 5.222

      • 3.773 4.456



in SQL

  • 1.222 5.765

    • 2.666 6.768

      • 4.556 5.278


Table II Execution Time of Cassandra for Forensic Analysis Investigation


8 7 6 5 4 Execution time(sec) in 3 Cassandra 2 Execution time(sec) in sql 1
time(sec) in
time(sec) in sql
Execution Time

Figure 4. Comparison of Execution time of Cassandra with SQL

The proposed scheme is tested against existing forensic tools. From table I, the computational efficiency of Logs based forensic model makes it possible to perform forensic analysis in all the business firms with millions of data which investigate the crime datasets. It can perform well in both small and large volume of data. From the analysis of Jenkins and Cyclop tool , the execution time of queries in Cassand ra is compared to SQL to analyze its performance. Based on number of queries, the execution time is increased. Figure 4 clearly shows that the execution time of Cassandra is less than that the execution time of SQL. So the Performance of Cassandra is improved effectively


Figure 5. Linear scalability of reads and writes of Cassandra during forensic analysis investigation

In the Examination phase, based on the time stamp clustering the data is happened to remove the unwanted evidences. Forensic analysis process in Cassandra improves the linear scalability of it. Figure 5 shows that the test Generation of Forensic Analysis using Cassandra. Test Analysis is done in Jenkins to make the forensic analysis efficient. Sensitive and

Confidential data can be analyzed in Cassandra with short

period of time, which makes its performance and Scalability

higher and higher and secured.


Cyber Crimes are more and more in today’s world where

cloud computing Plays an important role for forensic analysis.

Traditionally Log-based model, which has simplified the

complexity of forensic for acceptance of behaviours was

implied on cloud. However, it's not enough for the other kinds

of digital forensics ..

Building a Framework for the pursuit of a

cloud computing digital investigation is an environment which

is developed based on the causes and effects of an event.

Search for evidence that shows the causes and effects of an

events and to develop hypotheses about such events based on the crime becomes necessity in forensic investigation. Cassandra, a distributed storage system which handles parallel and distributed data, would be helpful in analyzing forensic data based on events and causes for predictions effectively. Subquery concept in cassandra makes this possible, which improves the performance in Cassandra. Analyzing the crime Datasets in Cassandra makes the Forensics investigation process easier. Still, there are no proper and efficient guidelines or standards for the security of cloud. To fit the cloud computing environment wher large volume data are stored, individualistic , Investigators are updating the guidelines of traditional digital forensics. The proposed system also improves the digital forensics in cloud environment.

As a future work, the proposed method can be used for mobile based cloud forensics. Proposed log based forensic analysis can be implemented for social Network analysis and other applications too. Forensic models which can predominately provide good results for this application could be developed.


  • 1. Ashish Chavan, Sana Shaikh, Radhika Randhir "Tiled Bitmap Algorithm and Forensic Analysis of Data Tampering (An Evolutionary Approach)", IJRET: International Journal of Research in Engineering and Technology, vol. 03 issue. 02, 2014

  • 2. Neha Thethi, Anthony Keane, "Digital Forensics Investigations in the Cloud", IEEE International Advance Computing Conference, 2014.

  • 3. Daniel Votipka, Timothy Vidas and Nicolas Christin, “Passe-Partout: A general Collection Methodology for Android Devices”, IEEE Transactions on Information Forensics and Security, Vol.8, NO.12, PP 1937-1946., 2013.

  • 4. Maria Chalkiadaki, Kostas Magoutis, "Managing Service Performance in the Cassandra Distributed Storage System", IEEE International Conference on Cloud Computing Technology and Science, 2013.

  • 5. MuhammadSharjeelZareen, AdeelaWaqar and Baber Aslam, “Digital Forensics: Latest Challenges and Response”, IEEE 2nd National Conference on Information Assurance (NCIA), 2013.

  • 6. Mohammad Wazid, AvitaKatal, R H Goudar, Sreenivas Rao, “Hacktivism Trends, Digital Forensic Tools and Challenges: A Survey”, Proceedings of IEEE conference on Information and Communication Technologies (ICT), 2013.

  • 7. Ruoqing-Zhang, Zhiwei-Chen, Zehui-Li, Yatao-Yang and Zichen-Li “An Efficient Scheme for Log Integrity Check in Security Monitoring System”, IEEE International Conference on Cloud Computing, 2013.

  • 8. Satoshi Fukuda, Ryota Kawashima, Shoichi Saito and Hiroshi Matsuo, "Improving Response Time for Cassandra with Query Scheduling",


IEEE, First International Symposium on Computing and Networking,


  • 9. Ting Sang, “A Log-Based Approach to make Digital Forensics Easier on Cloud Computing”, IEEE International Conference on Intelligent System Design and Engineering Applications and, 2013.

    • 10. George Grisspos Tim storer, Willaim Bradley Glisson, “Calm Before the Storm: The Challenges of Cloud Computing in Digital Forensics”, International Journal of Digital Crime and Forensics, 4(2), 28-48, April- June 2012.

    • 11. B.Martini and K. Choo, "An integrated conceptual digital forensic framework for cloud computing," Digital Investigations, vol. 9, no. 2, p. 71–80, 2012.

    • 12. Ray Hunt, Sherali Zeadally, “Network forensics–An Analysis of Techniques, Tools and Trends”, IEEE Computer Journal Issue 99, 2012.

    • 13. D. Westermann et al., “Automated Inference of Goal-Oriented Performance Prediction Functions,” in Proc. of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE), Essen, Germany, Sept. 2012.

    • 14. Keyun Ruan, Ibrahim Baggili, Joe Carthy, and Tahar Kechadi, “Survey on cloud forensics and critical criteria for cloud forensic capability”, A preliminary analysis Proceedings of the 2011 ADFSL Conference on Digital Forensics, Security and Law ,2011.

    • 15. Bill Nelson, Amelia Phillips and Christopher Stewart, “Computer Forensics and Investigations as a Profession” in Guide to Computer Forensics and Investigations, Fourth Edition, Boston, Course Technology, 2010, ch.1.

    • 16. P.Pirzadeh, J.Tatemura, and H.Hacigumus. “Performance evaluation of range queries in key value stores”,In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pp. 1092–1101, 2011.

    • 17. Reilly D, Wren C, Berry T, “Cloud Computing: Forensic Challenges for Law Enforcement”, Internet Technology and Secured Transactions (ICITST), IEEE Press, London, 2010.

    • 18. Biggs S, Vidalis S, “Cloud Computing: The Impact on Digital Forensic Investigations”, The Institute of Electrical and Electronics Engineers, IEEE press, 2009.