The document discusses using deep reinforcement learning to improve malware detection in antimalware software. It proposes a new method that uses DRL to determine the optimal time to stop emulating unknown files in a sandbox, rather than stopping after a predetermined number of events. This allows the model to continue emulating files until it can confidently classify them, preventing attackers from evading detection by delaying malicious activity.
The document discusses using deep reinforcement learning to improve malware detection in antimalware software. It proposes a new method that uses DRL to determine the optimal time to stop emulating unknown files in a sandbox, rather than stopping after a predetermined number of events. This allows the model to continue emulating files until it can confidently classify them, preventing attackers from evading detection by delaying malicious activity.
The document discusses using deep reinforcement learning to improve malware detection in antimalware software. It proposes a new method that uses DRL to determine the optimal time to stop emulating unknown files in a sandbox, rather than stopping after a predetermined number of events. This allows the model to continue emulating files until it can confidently classify them, preventing attackers from evading detection by delaying malicious activity.
Abstract: Antimalware software is an essential component for identifying malicious software assaults, and the engines that power these products often carry out operations on unknown programs inside a sandbox before attempting to run them on the host operating system. Because files can't be searched endlessly, the engine uses heuristics to figure out when to stop the execution process. Previous studies have investigated analysing the sequence of system calls generated during this emulation process to determine whether an unknown file contains malicious code. However, these models typically require the emulation to be stopped after executing a predetermined number of events starting from the beginning of the file. Additionally, the accuracy of these classifiers is not sufficient for them to be able to stop emulation during the file on their own. In this research, we offer a unique method that overcomes this constraint and finds the ideal moment to terminate the file's execution based on deep reinforcement learning (DRL). Because the new DRL-based system continues to emulate the unknown file until it can make a confident decision to stop, it prevents attackers from avoiding detection by beginning malicious activity after a predetermined number of system calls. This is because the new system continues to emulate the unknown file until it can make a confident decision to stop.