Professional Documents
Culture Documents
4 2 Share
We’ll teach you how to build a Discord bot running Stable Diffusion in 15 minutes
or less;
We’ll then go through all the fragilities that a simple bot like that needs to address;
Finally, we’ll dive into the actual cloud architecture we are using, involving AWS,
Runpod and Reyki AI as our cloud partners.
Let’s begin.
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 1/11
12/5/23, 9:04 PM How to build a generative AI service's backend
It takes less than 15 minutes to set up a Discord Stable Diffusion bot (here’s how)
1. To create a Discord slash command, first there must be a bot that creates it. To
create it, an application must be created in the Discord Developer Portal.
a. Once created, click on it, click on “Bot” and turn on “Server Members Intent”
and “Message Content Intent”.
b. Finally, click on “OAuth2”, then on “URL Generator”, then in the “bot” scope.
Select at least the “Send Messages” and “Read Messages/View Channels”
permissions. Then, copy and go to the url generated in the bottom part of the
screen to add the bot to a Discord server you are an admin of.
2. After creating the bot, you must then create code that connects it to the bot you
created. You could, for instance, adapt this code we open-sourced if you’d like to
create one using Python – this code was the prototype of our Distillery bot.
3. After creating and coding the bot, you must then install a Stable Diffusion, say, by
using the uber popular Automatic1111, making sure it runs as an API service.
This is simple enough. Assuming you already run Automatic1111, you could just run it
with the added --api parameter, then run the bot.py code in another terminal, and done –
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 2/11
12/5/23, 9:04 PM How to build a generative AI service's backend
with less than 200 lines of code and 15 minutes or less of effort, you now are a proud
owner of a generative AI art bot in Discord.
If, however, you wish to run an scalable system, you’ll soon understand the limitations of
this approach:
3. No workflow flexibility. As it is, the bot expects a prompt and a negative prompt,
and that’s it. Any other workflow will necessitate direct changes in the code: if
you’d like to change models, or do image-to-image, or do controlnet, etc.
4. No redundancies. If anything breaks, as they often do, you’ll have pure downtime.
Also, there’s no such thing as a ‘failing gracefully’ in this approach – any failure will
leave the system hanging and will require a human being re-running the code to
bring it back online.
5. Bandwidth issues. Assuming the PC running the backend runs on-premises, any
connectivity hiccups can kill the bot. Also, bandwidth and latency themselves
might be a limitation if the number of requests is high enough.
When you task yourself with dealing with these issues, you’ll soon get into a rabbit hole
that can become incredibly deep. You’ll need to work on some of that internet magic
we’ve grown to depend on and love, called DevOps.
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 3/11
12/5/23, 9:04 PM How to build a generative AI service's backend
This post aims to explain our take on the internet magic that makes Distillery run. Its
intended audience is of generative AI enthusiasts who’d like to create something similar
but are beginners to the field of DevOps. With it, we aim to fulfill the promises of being
transparent with our methods and committed to the open source cause.
In order to be able to service several requests in parallel, even while avoiding having a
fleet of dedicated servers, we opted to run our service using a provider of serverless GPU
endpoints. A ‘serverless’ setup allows for the use of (and payment for) a fraction of the
time of a GPU running in the cloud, as opposed to renting/owning a machine in full.
This resolved points (1) and (2) on the list above.
In order to tackle the workflow flexibility issue — point (3) above —, we opted to use
ComfyUI as a backend. ComfyUI is extremely fast and extensible, and it allows us to
design any number of workflows with ease. Its main difficulty lies in its weird approach
to an API: it needs to receive a JSON file containing the full workflow it must run in
order to generate an image. In order to make it work in a serverless environment, we
created and open-sourced ComfyUI-Serverless, a connector that can be used to
automatically start ComfyUI, deliver the payload to it, and then deliver the output
images to the requestor.
In order to tackle the lack of redundancies — point (4) above —, we chose to run all our
processes in containerized form, that are automatically reset if something breaks their
runs. To ensure that the system keeps running even if one of its components fails, we
decided to split the back-end processes in three types:
A process that runs the script for the bot, controlling the interaction between the
back-end and the Discord front-end (the “Bot”);
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 4/11
12/5/23, 9:04 PM How to build a generative AI service's backend
We also run multiple instances of each component: we currently run 3 instances (or
“shards”, in Discord parlance) of the Bot and 5 concurrent instances of the Master. We
had to be careful in our code to avoid situations where concurrent instances race against
each other — for instance, to ensure that each request is processed only one single time,
we used a caching server that locks each request to the first Bot that asks for it.
Finally, by being very careful on the Bot code dealing with parsing and preprocessing of
commands, we can minimize the risks of forbidden content — point (6) above — being
generated.
With all this said, we can now show the schematic workflow of our current version,
v0.2.9 Izarra. As stated before, internet magic, it seems, can get complicated really fast:
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 5/11
12/5/23, 9:04 PM How to build a generative AI service's backend
2. The Bot parses the request, and then creates a payload containing the ComfyUI
API workflow in JSON format. The Bot then pushes the payload into a data table
called “Generation Queue”.
3. The Master pops the Generation Queue and retrieves the payload, sending it to the
serverless GPU endpoint. The serverless GPU endpoint will create a Worker
container to process the request.
4. A Worker receives the payload, generates images, saves them in a cloud storage,
and delivers the pointers to the output images back to the Master.
5. The Master receives the output images’ pointers and then pushes them to a data
table called “Send Queue”.
6. The Bot pops the Send Queue, retrieves the output images’ pointers, downloads the
images, and then sends them back to the requestor user in Discord.
The most important part of our workflow is the Master/Worker relationship. It’s a 1-to-
N relationship, meaning that there can be only one Master, but there can be several
Workers at any given time. Also, the fact that the Master is separated from the Bot
front-end makes Distillery ready to receive and service direct API requests in the near
future.
To run Distillery, we contract the services of three cloud providers: AWS, which host the
backbone of our service; Runpod, which runs Stable Diffusion and does the inferencing
work; and Reyki AI, our cloud cost optimization partner (more on Reyki AI in a
moment).
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 6/11
12/5/23, 9:04 PM How to build a generative AI service's backend
It’s notoriously difficult to find your way around AWS’ constellation of services. We took
a long time to figure out what could be the simplest configuration of AWS services that
could satisfy our design requirements. We ultimately chose to use AWS Lightsail as the
platform to launch Distillery. AWS Lightsail acts as a simplified, mini-AWS, much more
straightforward to use than any other AWS service we considered. (Seriously. It took us
more time to figure out how to use AWS Cloudwatch, which is just a logging service,
than it took us to deploy everything else we do in AWS Lightsail. I wished someone told
me about AWS Lightsail when I first started learning how to run things in the cloud.)
Excluding the GPU inferencing itself, and a few minor details, Distillery runs entirely
inside containers in AWS Lightsail, using a storage solution (called S3) offered inside of
Lightsail, and using a MySQL database managed also managed under AWS Lightsail.
Other, minor, components of the AWS architecture include the beforementioned AWS
Cloudwatch for as a log server, AWS Elastic Container Repository (ECR) to store the
container images, and AWS Memcached as a caching server to make sure the redundant
containers don’t get in conflict with each other.
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 7/11
12/5/23, 9:04 PM How to build a generative AI service's backend
We’ve been using Runpod as our cloud compute partner for most of our explorations.
We chose to deploy in Runpod in light of their excellent UI, their competitive prices for
serverless GPUs, and finally, due to the ability to design our own worker container (a
practice aptly called “bring-your-own-container” in the industry). Those reasons,
together with the great working relationship we developed with their team, made us
keep them as our main GPU partner to this day.
We had a lot of bugs with the connections to Runpod in the first months running
Distillery, but these were largely resolved by a combination of updates on Runpod’s end,
and bugfixes on our own code. Our worker now runs reliably well, and we welcome
anyone to use it — you can find the open-source code here. Our workers run in
machines with Nvidia RTX A6000 GPUs, and we are ready to have up to 60 of them
working concurrently.
The main issue we’ve been having with Runpod is the availability of free GPUs for
serverless inferencing. While this is arguably a result of how successful they were in
attracting new customers, it has become a potentially limiting factor in our service’s
growth, so are looking to add a second inferencing service, AWS Sagemaker, to our
system.
AWS’ bill, especially as we onboard AWS Sagemaker, can be very expensive if not
managed properly, and AWS’ opaque pricing scheme makes us especially concerned. To
address that, we hired Reyki AI.
Reyki AI is an excellent idea that, in our view, should become the industry standard in
managing cloud computing costs. Reyki AI's solutions automate the process of
managing cloud infrastructure, ensuring that organizations get the most value out of
their cloud investments.
They’ve built quite an amazing product that uses machine learning to predict our cloud
use, and then allocates to Distillery the best available rates for the type of computing
service we require at any given moment. While this is already important for Distillery
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 8/11
12/5/23, 9:04 PM How to build a generative AI service's backend
Again, we absolutely loved Reyki AI and the concept behind it. If you’re building
anything cloud-based, you should consider having them as a partner. They work with
the main cloud providers (AWS, Google and Azure), so it’s very likely they can be useful.
In summary
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 9/11
12/5/23, 9:04 PM How to build a generative AI service's backend
Our hope is that this post can help enthusiasts in their journey to learn how to deploy an
AI system in a scalable way. If you are interested in deploying something similar, and
want to exchange ideas with us, we will be glad to talk!
And if you’d like to check Distillery, you can do so by following this link.
4 Likes
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 10/11
12/5/23, 9:04 PM How to build a generative AI service's backend
2 Comments
Write a comment...
1 reply
1 more comment...
https://followfoxai.substack.com/p/how-to-build-a-generative-ai-services 11/11