You are on page 1of 2

2.

9 Ransomware Detection Datasets


Although there are many datasets for Ransomware Detection System that are available in the internet,
every dataset has its limitations. Some of The most common datasets are DREBIN, ISOT, RanSAP, and
CICAndMal2017. In this section, we will discuss these datasets briefly .

ISOT
The ISOT ransomware dataset combines information on the behavior of various ransomware samples and
safe programs. In addition to collecting multiple samples from anti-malware vendors, the ransomware
samples were received via Virustotal under an academic license. The collection consists of a total of 669
ransomware samples, which include the majority of the well-known families and variations of ransomware
that have been found in the wild. The dataset also contains data from 103 innocuous applications, which
reflect the most widely used software programs used by Windows users, in addition to the ransomware
samples. The dataset on disk has a total size of 428 GB. Cuckoo sandbox was used to analyze both the
ransomware and the benign samples [11].

RanSAP
For ransomware researchers, the RanSAP dataset is a collection of time-series storage access patterns that
is accessible through a public repository (Hirano, 2021[20]). It is one of the few open datasets that includes
dynamic ransomware properties. The dataset can be used to compare the access patterns of different
ransomware families and covers access patterns for ransomware variations, those running on various
operating systems, and those on storage devices with full drive encryption enabled. It is made up of the
behavioral characteristics discovered through a thin hypervisor layer.

CICAndMal2017
For the purpose of preventing runtime behavior change of complex malware samples that are able to
identify the emulator environment, the CICAndMal2017 collection consists of malicious and good-natured
programs on actual smartphones. More than 10,854 samples, including 4,354 malicious and 6,500 benign,
were gathered from various sources and included in this collection. Over 6,000 benign apps from the
Google Play store that were released in 2015, 2016, and 2017 were collected, and 5,000 of them (426
malicious and 5,065 benign) were installed on actual devices. Ransomware, adware, scareware, and SMS
malware are the four types of malware samples in the CICAndMal2017 dataset. The samples were from 42
different families of malware.
DERIBN
The DREBIN dataset is a method for detecting Android malware that allows for direct malware
identification on mobile devices by automatically inferring detection patterns. In order to collect as many
features from an application's code and manifest as feasible, DREBIN does a thorough static analysis.
These aspects are grouped into groups of strings (such network addresses, permissions, and API requests)
and integrated into a shared vector space. This dataset includes 215 attribute feature vectors that were taken
from 15,036 applicants (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset was
utilized to create and assess a novel multilevel classifier fusion strategy for Android malware, dubbed
"DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware" [13] in the IEEE
Transactions on Cybernetics paper.

The full research paper outlining the details of the dataset and its underlying principles:
- Arash Habibi Lashkari, Andi Fitriah A.Kadir, Laya Taheri, and Ali A. Ghorbani, “Toward Developing a
Systematic Approach to Generate Benchmark Android Malware Datasets and Classification”, In the
proceedings of the 52nd IEEE International Carnahan Conference on Security Technology (ICCST),
Montreal, Quebec, Canada, 2018.
References [10] Cuckoo Sandbox. Cuckoo Sandbox- Automated malware analysis. Retrieved from
https://cuckoosandbox.org

[20]https://www.sciencedirect.com/science/article/pii/S2666281721002390?via%3Dihub#bib25

REFERENCES
[30]http://ieeexplore.ieee.org/document/8245867/

You might also like