You are on page 1of 4

Name: Leopoldine Vassilio Date: February 12, 2021

Paper: Laterna D, Barili A. Forensic analysis of deduplicated le systems. Digital Investigation.


2017; 20(Supplement):S99-S106. https://doi.org/10.1016/j.diin.2017.01.008

Summary: (Problem) This paper presents an emerging issue in digital forensics, which is the use
of deduplication in le systems, which at the time of the paper’s publication in 2017, digital
forensic tools could not detect. Deduplication is a recent advancement in le systems and storage
which allows for memory optimization by breaking up les into chunks and then only saving one
copy of chunks that are identical across several les. (Solution) In this paper, Lanterna and Barili
break down two implementations of deduplication, OpenDedup and Windows 2012
deduplication, and propose methods to identify and restore deduplicated les and le systems in
both implementations. (Meaning) This paper is signi cant because it brings attention to the
complete lack of digital forensic tools that can detect and restore deduplicated les.

Strengths: This paper does an excellent job at explaining deduplication clearly and concisely,
not only just as a concept but also in two different implementations: OpenDedup and Windows
2012 deduplication. With this paper, Lanterna and Barili were able to convey the need and the
urgency for developing digital forensic capabilities to detect deduplication and restore
deduplicated les on devices.

Weaknesses: Weaknesses of this paper include the limited scope in how deduplication is
implemented in practice. While the paper does a good job in explaining how deduplication works
in the two analyzed implementations, OpenDedup and Windows 2012 deduplication may not be
the most in uential and widely used implementations in today’s digital landscape.

Analysis I: One critique I have with this paper is that they did not conduct their experiment in a
consistent manner. The chunking experiment was done using the rst 12 pages of “Alice in
Wonderland” with the Windows 2012 deduplication implementation but was not conducted on a
device using OpenDedup. While they were able to prove the necessity of having the hash
sequences to restore les, it would have been interesting to conduct the same experiment using
OpenDedup and compare the results. Taking this one step further, Lanterna and Barili could have
examined how each implementation breaks les into chunks, as this is important knowledge in
gaining a complete understanding of how deduplication is implemented in practice and would be
useful when building or extending existing digital forensic tools to detect and be able restore
deduplicated les.

Analysis II: In my opinion, the most signi cant impact of the paper in the eld was that the
paper conveyed the need to have the ability to detect deduplication and restore deduplicated les
in digital forensics. Beyond this, while providing the details behind two implementations of
deduplication was interesting, the paper did not provide a set way to automate the process for it
to be built into digital forensic tools. They laid the groundwork but there remains much work to
be done, thus I believe that this paper had a limited impact on the eld. To date, this paper has
only been cited by other academics twice, for which neither paper was related to digital forensics
but rather the use of deduplication in cloud environments.1, 2 From my research, there currently
still does not exist any tools that can detect deduplication
fl
fi
fi

fi
fi
u


fi
fi
fi
fi
fi
.

fi
fi
fi
fi
fi
fi
fi
fi

fi

Moreover, I believe this paper has a limited impact due to the deduplication
implementations Lanterna and Barili chose to analyze. Based on my online searches, OpenDedup
does not appear to be widely used today and Windows 2012 deduplication is now outdated as
newer versions have since been released, making the Windows 2012 version largely obsolete.3 It
is possible that more recent versions still remain similar to the 2012 implementation but
additional research would be required to know for certain. Due to their limited use in today’s
digital landscape, the impact of the analyses of their implementations is not far reaching as even
if a digital forensic tool was created to be able to detect both of these implementations, its use
would likely be quite limited.

Analysis III: This paper opens the door to many further research directions, as the goal of the
paper was to bring awareness to the lack of detection of deduplication in digital forensic tools at
the time of writing. Due to the memory optimization advantages it provides, it is likely that
deduplication will only continue to increase in usage in devices. In fact, Microsoft published an
article in 2017, stating that users could likely see a 30-50% increase in space saving regarding
user documents as well as space savings between 80-95% in virtualization libraries.3 It is critical
then that digital forensic tools gain the capabilities to detect these types of lesystems and be
able to rehydrate les during an investigation. As mentioned earlier, from my research I was not
able to nd any digital forensic tools that claimed to be able to detect and restore deduplicated
le systems.

As Lanterna and Barili mention in their paper, future research might include analyzing
other implementations of deduplication such as Data Domain File System (DDFS), Zettabyte
File System (ZFS) and B-tree le system (BTFRS) as well as analyzing more recent versions of
Microsoft deduplication which can be found on Windows Server 2019 and Windows Server
2016.4 Microsoft has added new functionality to their deduplication implementation such as
ReFS support and support for large les and volumes.4 It would be interesting to compare the
Windows 2012 deduplication implementation to more recent versions. Furthermore, future
research could also be in the form of developing a new digital forensic tool, or extending an
existing one, to have the capabilities of detecting when deduplication is in use on a device,
identifying which implementation is in use and then having the ability to rehydrate and restore
les stored, or deleted on a device. For a tool like this to be useful, it would need to be able to do
the above for a wide range of deduplication implementations.

References
1. Majed SO, Thamer SK. Cloud based industrial le handling and duplication removal
using source based deduplication technique. AIP Conference Proceedings.
2020:22921:030012. https://aip.scitation.org/doi/abs/10.1063/5.0030989.

2. Shi A, Liu K, Pei D. A Novel Approach to File Deduplication in Cloud Storage Systems.
Arti cial Intelligence and Security. 2020:503-512.
https://link.springer.com/chapter/10.1007%2F978-981-15-8086-4_48
fi
fi



fi
fi

fi

fi
fi
fi

fi
.




3 Data Deduplication Overview. Microsoft. https://docs.microsoft.com/en-us/
windows-server/storage/data-deduplication/overview. Published May 9, 2017.
Accessed February 10, 2021

4. What’s New in Data Deduplication. Microsoft. https://docs.microsoft.com/en-us/


windows-server/storage/data-deduplication/whats-new. Published April 17,
2019. Accessed February 10, 2021
.

.



Grading Rubric: Content is graded as follows: 1 point for summary; 1 strengths; 1 weakness;
2 analysis I, 2 analysis II, and 2 analsyis III. The detailed grading rubric for the critique is shown
below:
Content /9
Grammar /1
Quality of presentation /2
Total /12

You might also like