Professional Documents
Culture Documents
Malla: Demystifying Real-World Large Language Model Integrated Malicious Services
Malla: Demystifying Real-World Large Language Model Integrated Malicious Services
Services
1
our analysis of Malla listings across nine underground market- today’s public LLM services. Our study reveals that public
places, our study uncovered a rapid increase of Mallas, which LLMA hosting platforms have been abused for hosting Mallas.
has grown from April 2023 to October 2023 over the span of • We characterize real-world Malla samples, including
six months. Interestingly, we observe that miscreants utilized their development frameworks, exploitation techniques such
an LLMA hosting platform, Poe, a chatbot hosting platform as jailbreak prompts, and effectiveness in generating various
offered by Quora, to showcase a “vouch copy” of their Malla. malicious contents. This sheds light on the capabilities and
Despite the violation of the platform’s usage policies [74], this potential threats posed by these malicious services.
activity went unchecked throughout our observation window • We have discovered a set of artifacts integral to Mallas,
from July 2023 to October 2023. Furthermore, a pricing com- including 45 prompts exploited by miscreants to engineer
parison indicates that Malla offers a more economical option malware and deceptive phishing campaigns, 182 jailbreak
for malware generation, especially when juxtaposed with the prompts that circumvent the protective measures of public
rates of traditional malware vendors ($5-199 vs $399). To pro- LLM APIs, etc.
vide a deeper understanding of the economic factors at play,
our case study focused on a specific Malla service, WormGPT.
The findings revealed a staggering revenue exceeding $28K 2 Background
in just three months, underscoring the significant financial
allure to Malla vendors. Through investigating 207 Malla 2.1 Large language model
samples, we explore the research question: To what extent
can Malla effectively generate malicious content, including A language model is a type of machine learning model de-
malware, phishing emails, and deceptive websites? Our find- signed to understand and generate human language based on
ings bring to light that Certain Mallas, such as DarkGPT and a probability distribution over text corpora. In recent years,
EscapeGPT, excel in producing high-quality malware that significant scaling improvements have been achieved by in-
is both compilable and capable of evading VirusTotal detec- creasing model sizes (from a few million parameters [50] to
tion [35], while others (e.g., WolfGPT ) can create phishing hundreds of billions [47]) and incorporating larger text cor-
emails with a high readability score and manages to bypass pora (from a few gigabytes, e.g., English Wikipedia dataset,
OOPSpam [21]. Although Malla generally lags in crafting to hundreds of gigabytes [51]). These advancements have
phishing sites, one Malla (EscapeGPT) distinguishes itself empowered pre-trained large language models (LLMs) to
by generating operational phishing site codes that go unno- demonstrate remarkable proficiency across a wide array of
ticed. This highlights the significant concerns surrounding the downstream LLM-integrated applications (LLMA), such as
cybercriminal exploitation of LLMs. chatbot, coding assistant, and machine translation.
In our exploration of Malla artifacts, we identified two pre- Paradigms for building LLM-integrated applications. To
dominant techniques leveraged by miscreants: exploitation of employ a pre-trained LLM for downstream tasks and appli-
uncensored LLMs and jailbreaking of public LLM APIs. We cations, there are two primary paradigms, i.e., “pre-train and
call an LLM “uncensored” when it can freely generate any prompt” and “pre-train and fine-tune,” as elaborated below.
content, regardless of its potential harm, offensiveness, or in- • “Pre-train and prompt.” In this paradigm, the pre-trained
appropriateness, without undergoing any intentional filtering LLM is used as-is, while users provide a text or template,
or suppression. Our findings accentuate the ethical quandaries known as a prompt, to guide the generation to output answers
posed by publicly accessible, uncensored LLMs when they for desired tasks. This approach is straightforward and cost-
fall into the hands of adversaries. A case in point is the LLM, effective, as the same pre-trained model can be used without
Luna AI Llama2 Uncensored, provided by Tap [30]. Our data the need for task-specific data. However, the quality of results
suggests that this model has been exploited by Mallas to fa- heavily relies on the quality and formulation of the prompts.
cilitate malware generation. Moreover, our research brings Crafting effective prompts (a.k.a., prompt engineering [60])
to light 182 distinctive jailbreak prompts associated with five is essential for obtaining desired outcomes. In our study, we
public LLM APIs that have been exploited. Notably, OpenAI have observed that this paradigm emerges as the predominant
emerges as the LLM vendor most frequently targeted by Mal- approach employed by Mallas.
las. Among its offerings, gpt-3.5-turbo appears to be particu- • “Pre-train and fine-tune.” In this approach, a pre-train
larly susceptible to jailbreak prompts.We took the responsible LLM is adapted for a particular downstream task through
step of informing the affected parties about our findings. fine-tuning. This fine-tuning process requires training the
Contributions. We summarize the contributions as follows: model using a substantial volume of labeled data that
• We conduct the first in-depth empirical study of real- pertains to the target task. While this approach can lead
world cybercriminal activities surrounding the misuse of to state-of-the-art performance on specific tasks, it can be
LLMs as malicious services. computationally expensive due to the need for task-specific
• Our study provides a detailed examination of the Malla data and training resources.
ecosystem, revealing its significant growth and impact on LLM vendors and LLMA hosting platforms. LLM ven-
2
dors, such as OpenAI, Anthropic, and Meta, have established
potent and sophisticated models by training on vast corpora, 1 Develop 1 Power
Public LLM
API
spanning various topics and domains, enabling them to gen- Malicious LLM
Application Vendor
Malicious LLM
Application
erate coherent and contextually relevant text based on input 2 Deploy
Uncensored
LLM
prompts. These vendors are often made available to the wider Third-party
Backend LLM
Web server Sample
hosting platform
public and developer community through API services, which 3 Promote
can be utilized to build various applications and services. Malware generation
3
2.3 Scope of Problem tion issues, and the use of CAPTCHA-solving mechanisms
when confronted with unsuccessful access attempts.
In this study, the term “malicious service” refers to the ex-
ploitative misuse of LLMs for the purpose of facilitating cy- Datasets. In this way, we collected the following datasets
bercriminal activities. Specifically, based on the functionali- related to commercial Malla services, also shown in Table 2:
ties observed in Malla during our study, our focus is on the • Malla listing dataset (Ls ). This dataset consists of 25 Malla
following cybercriminal activities: generating code for mal- listings from five underground marketplaces and forums. The
ware, writing phishing emails, and creating corresponding length of these listings ranges from 41 words to 730 words,
phishing or scam webpages, among other illicit activities. We with an average of 216.75 words. A typical Malla listing con-
acknowledge that LLMs can potentially be misused for other sists of various artifacts, including the service name, pricing
prohibited purposes, such as generating NSFW content or dis- information, functionality (e.g., malware generation), demo
information. However, our study centers on the threats posed screenshots featuring prompts and responses associated with
by the malicious misuse of LLMs in the context of cybercrime. malicious functionality, contact information, and links to the
We believe that studying these threats will supplement the storefront website. Given the limited number of Malla list-
current understanding of LLM misuse. ings, we opted for a manual extraction process to recognize
those artifacts. This hands-on approach ensured that we docu-
mented these artifacts with accuracy and thoroughness. We
3 Data Collection
elaborated on those artifacts in § 4. From these Malla listings,
In our paper, we categorize Malla into two types: commercial we identified 14 Malla services as listed in Table 1.
Malla services, where malicious LLM-integrated applications • Samples of Malla services (Ds ). Out of the 14 Malla ser-
are created and deployed for profit, and publicly accessible vices, we collected nine Malla samples to investigate their
Malla projects, where malicious applications are developed behaviors. Note that two samples are provided as compli-
and distributed as publicly available projects. In this section, mentary copies (a.k.a., voucher copies), while the remaining
we first explain the methods to collect commercial Malla ser- seven samples were obtained through purchasing Malla ser-
vices and public Malla projects, as well as their artifacts (e.g., vices. We elaborate on the ethical discussion related to Malla
backend LLM, abused public LLM APIs and prompts used service purchase in § 3.3. Importantly, we always initiated
by Mallas). After that, we discuss the ethical implications of our purchase attempts by requesting a voucher copy from
our data collection (§ 3.3). Malla providers. If a Malla provider furnished us with such a
complimentary copy, we refrained from making any further
purchases. Note that not all of our purchase attempts yielded
3.1 Malla Services successful results; some encountered difficulties due to sus-
To identify Malla services, our study collected 13,353 listings picions on the part of the sellers that we might be acting in
from nine underground marketplaces and forums (i.e., Aba- a “white hat” capacity. One notable instance involved a pur-
cus Market, Kerberos Market, Kingdom Market, WeTheNorth chase attempt with the DarkBERT vendor. In this transaction,
Market, MGM Grand Market, Hack Forums, XSS.is, Breach- the vendor became alarmed when we inquired about Dark-
Forums, and BlackHatWorld) from November 30, 2022 to BERT’s capabilities and performance. Ultimately, our attempt
October 12, 2023. Note that we focus on underground market- was declined. In other instances, despite the websites for
places of malware and other cyber products/services, instead WormGPT [40], FraudGPT [12], and BLACKHATGPT [2]
of illegal drugs in our research. indicating that access would be automatically emailed to the
Specifically, to recognize the listings involving Malla ser- customer upon confirmation of Bitcoin payment, we never re-
vices, we crafted a collection of 145 keywords related to “large ceived such access after making the payment. Several other in-
language model” using a search keyword generation tool of dividuals reported similar experiences with WormGPT on un-
WordStream [81], and then searched those keywords on un- derground forums [17, 38]. Likewise, users have highlighted
derground marketplaces and forums. For four forums (i.e., scams associated with DarkBERT and DarkBARD [39].
Hack Forums, XSS.is, BreachForums, and BlackHatWorld), • Backend LLM abused by Malla services (Ms ). In our re-
we built our scrapers on the top of Selenium [27] to conduct search, we studied the backend LLMs driving the Malla ser-
site crawling and content parsing. Regarding five Tor-based vices based on the Malla samples (Ds ). For samples provided
marketplaces (i.e., Abacus Market, Kerberos Market, King- as source code, such as Evil-GPT and WolfGPT, we inspected
dom Market, WeTheNorth Market, and MGM Grand Market), the LLM models or APIs within the code. For Mallas hosted
we implemented the scrapers based on the Tor browser [33] on websites like BadGPT, EscapeGPT, DarkGPT, and Free-
and Selenium [27]. In our study, we ensured a complete data domGPT, we monitored network traffic and examined their
scraping by manually checking a range of measures, including headers, payloads, and responses. For those hosted on LLMA
real-time monitoring of HTTP status codes and returned page hosting platforms, such as XXXGPT, we extracted backend
sizes, vigilant session management to address session expira- LLM information by parsing their hosting pages.
4
Table 1: Malla services and details
Released Time
Name Price Functionality w/wo Infrastructure w. sample
(yyyyMM)
Voucher
Copy
Malware Phishing Scam
Email Site
CodeGPT 10 βytes* No Jailbreak prompts 202304 Yes
MakerGPT 10 βytes* No Jailbreak prompts 202304 Yes
FraudGPT $90/month No - 202307 No
WormGPT C100/month No - 202307 No
XXXGPT $90/month Yes Jailbreak prompts 202307 Yes
WolfGPT $150 No Uncensored LLM 202307 Yes
Evil-GPT $10 No Uncensored LLM 202308 Yes
DarkBERT $90/month No - 202308 No
DarkBARD $80/month No - 202308 No
BadGPT $120/month No Censored LLM 202308 Yes
BLACKHATGPT $199/month No - 202308 No
EscapeGPT $64.98/month No Uncensored LLM 202308 Yes
FreedomGPT $10/100 messages Yes Uncensored LLM - Yes
DarkGPT $0.78/50 messages Yes Uncensored LLM - Yes
* βytes is the forum token of hackforums.net; indicates implicit mention.
From our investigations, we discerned that both BadGPT • Malicious prompt dataset (Pm ). To showcase their func-
and XXXGPT utilize OpenAI GPT-3.5 as their backend LLM, tionalities in the listings, Malla services typically include
while Evil-GPT and WolfGPT employ the APIs of OpenAI screenshots featuring prompt-response pairs related to their
Davinci-003 and OpenAI Davinci-002, respectively. malicious capabilities. In our study, we gathered 45 of these
While our efforts to determine the backend LLMs for Dark- prompts, referred to as “malicious prompts,” extracted directly
GPT, EscapeGPT, and FreedomGPT were inconclusive, we from the screenshots. Among this collection, we identified 35
did uncover some telling clues. For example, DarkGPT pur- prompts associated with malware generation, five prompts tai-
ports to be powered by Davinci-003, a claim we found on its lored for phishing email creation, and five prompts designed
chat interface. During our traffic monitoring for EscapeGPT, for phishing site creation. Particularly, of prompts associated
we observed the “model=gpt-3.5-turbo” and “jailbreak=gpt- with malware generation, 26 specify the required program-
evil” in its payload text. FreedomGPT also mentioned its ming languages: 11 for Python, 10 for C/C++, 2 for C#, etc.
use of an uncensored model named Liberty [13] and, in its This collection of prompts offers valuable insights into the
repository, provided the download URL of Liberty’s offline prompts employed by miscreants for malware generation and
model [14], directing users to the repository of the “Luna phishing email/site creation.
AI Llama2 Uncensored” model [31]. However, as the ven-
dors of these Malla services employ either self-owned servers 3.2 Malla Projects
without recognizing backend LLMs or APIs, determining
the exact backend LLMs remains challenging. To infer the To understand the design and execution of publicly acces-
backend LLMs of these Malla services, we propose a reverse- sible Malla projects, we built a collection of such projects
engineering approach detailed in § 5.1. (D p ) hosted on two prominent public LLMA hosting plat-
5
forms: the Platform for Open Exploration (Poe) and FlowGPT. the hosting page documents the backend LLMs used by these
These platforms enable the development and hosting of LLM- projects. In our research, we parsed these hosting pages to
integrated applications by providing access to various LLMs extract information about the backend LLMs associated with
(e.g., GPT-3.5, LLaMA-2), alongside the capability to utilize Malla projects. Through this method, we identified five dis-
custom text prompts. tinct backend LLMs being employed by Malla projects, in-
In our study, we compiled 73 search keywords by extract- cluding OpenAI GPT-3.5, OpenAI GPT-4, Pygmalion-13B,
ing topic keywords from Malla service listings using GPT-4. Claude-instant, and Claude-2-100k.
After that, we utilized the search APIs provided by Poe and
FlowGPT to retrieve all the relevant LLMA projects associated
with these keywords. In this way, we collected 575 and 174 3.3 Discussion
LLMA projects hosted on the Poe and FlowGPT, respectively.
We further triaged the LLMA projects expressing malicious Potential bias. Given the inherent difficulties in thoroughly
intents in projects’ visible information, including their titles, identifying Malla services and analyzing their illicit activities,
descriptions, welcome messages, and visible prompts. More our study relied on the available data (including Malla ser-
specifically, we detected not-safe-for-work (NSFW) content vices observed during our research, pre-trained LLM models
in the collected LLMA projects utilizing an NSFW Text Classi- and jailbreak prompts we could fingerprint, and other acces-
fier [19]. We fed the NSFW Text Classifier with the aforemen- sible resources). This reliance may introduce some bias into
tioned text carrying the information of each LLMA project. In our study. While we consider our research as the pioneering
this way, we flagged 417 and 154 suspicious Malla projects large-scale investigation into Malla services, offering valu-
from Poe and FlowGPT LLMA projects. able insights into this emerging underground phenomenon,
To validate that those projects are indeed Malla projects, we we exercise caution when drawing conclusions.
employed malicious prompts (Pm ) related to three malicious Ethical concerns. Our study involves malicious service
functionalities (e.g., malware generation, phishing email/site purchase, which could raise legal and ethical considerations,
generation), respectively, to collect their responses. We fil- particularly in the context of interactions and transactions with
tered out LLMA projects with invalid responses. Here we miscreants. Specifically, our study has been approved by our
define an invalid response as a response that lacks effective institution’s IRB. In close collaboration with our IRB counsel,
malicious content. This includes malware code that cannot be we crafted comprehensive guidelines (e.g., always asking for
compiled, phishing emails that are hard for a broad audience voucher copy, not deanonymizing seller) governing our con-
to understand, and phishing websites that cannot be executed versations and purchase interactions. These guidelines were
by browsers. Note that for each malicious prompt, we queried designed to establish a robust legal and ethical framework,
an LLMA project three times. We consider an LLMA project thereby minimizing any potential risk of harm to any involved
as a Malla project if at least one of the three responses is valid. parties. Also, the approach in our study was legally deployed
Datasets. In our study, we collected the following datasets under the sting operation guidance [65]. To assess the ethical
related to Malla projects, also shown in Table 2: considerations and potential risks associated with our study,
we apply a critical analysis informed by the principles out-
• Malla project dataset (D p ). In our study, we collected 198 lined in the Menlo Report [44] and Cybersecurity Research
Malla projects (125 from Poe and 73 from FlowGPT). Among Ethical Frameworks [57]. Particularly, in line with previous
these, 184 projects (113 from Poe and 71 from FlowGPT) cybercrime research that has employed malicious service
exhibited the capability to produce malware, 80 (54 from Poe purchase as data collection methodology [43, 56, 59, 62, 80],
and 26 from FlowGPT) were adept at formulating phishing we maintain a steadfast belief that the potential societal
emails, and 31 (17 from Poe and 14 from FlowGPT) demon- benefits resulting from our research far surpass the relatively
strated proficiency in designing phishing web pages. More minimal elevated risks of harm. It is important to note that
details will be discussed in § 4.3. within this data collection, there is likely a minimal or
• Jailbreak prompts employed by Malla projects (Pj ). Another non-existent presence of Personally Identifiable Information
critical aspect of our research involves the analysis of the (PII), and our comprehensive analysis did not yield any
jailbreak prompts employed in Malla projects. For part of instances of PII. Thus, there is a minimal risk of us creating
projects hosted on Poe and FlowGPT, their jailbreak prompts any privacy harm from our analysis. We did not attempt to
are visible. In our study, we collected 127 jailbreak prompts deanonymize anyone in these leaks as part of our study.
utilized by 91 Malla projects on Poe and 52 Malla projects In addition, our research involved testing LLMA projects
on FlowGPT. Note that for those projects without visible on Poe and FlowGPT using malicious prompts to uncover
prompts, we uncovered their prompts via the prompt injection Malla projects. Such an approach could raise ethical concerns
approach as elaborated in § 5. that warrant thorough discussion. Specifically, we took ut-
• Backend LLMs employed by Malla projects (M p ). For the most care to ensure that our testing did not disrupt services,
Malla projects hosted on platforms like FlowGPT and Poe, harm users, or lead to any unintentional damages. Particularly,
6
3 U R M H F W in FlowGPT emerged on February 27, 2023. As shown in
2 Z Q H U
Figure 2, the number of Malla projects in FlowGPT increases
rapidly, particularly in the months of April and Septem-
ber. This uptick mirrors the growth trajectory observed in
Malla services. Note that our analysis focused exclusively
&