You are on page 1of 3

How To Make Custom AI-Generated Text

With GPT-2
September 4, 2019 10 min read​ ​AI​,​ ​Text Generation





In February 2019,​ ​OpenAI​ released​ ​a paper​ describing GPT-2, a AI-based text-generation


model based on the​ ​Transformer architecture​ and trained on massive amounts of text all around
the internet. From a text-generation perspective, the included demos were very impressive: the
text is coherent over a long horizon, and grammatical syntax and punctuation are near-perfect.

At the same time, the Python code which allowed anyone to download the model (albeit smaller
versions out of concern the full model can be abused to mass-generate fake news) and the
TensorFlow​ code to load the downloaded model and generate predictions was​ ​open-sourced on
GitHub​.

Neil Shepperd created​ ​a fork​ of OpenAI’s repo which contains additional code to allow
finetuning​ the existing OpenAI model on custom datasets. A​ ​notebook​ was created soon after,
which can be copied into​ ​Google Colaboratory​ and clones Shepperd’s repo to finetune GPT-2
backed by a free GPU. From there, the proliferation of GPT-2 generated text took off:
researchers such as Gwern Branwen made​ ​GPT-2 Poetry​ and Janelle Shane made​ ​GPT-2
Dungeons and Dragons character bios​.

I waited to see if anyone would make a tool to help streamline this finetuning and text
generation workflow, a la​ ​textgenrnn​ which I had made for recurrent neural network-based text
generation. Months later, no one did. So I did it myself. Enter​ ​gpt-2-simple​, a Python package
which wraps Shepperd’s finetuning code in a functional interface and adds ​many​ utilities for
model management and generation control.

Thanks to gpt-2-simple and​ ​this Colaboratory Notebook​, you can easily finetune GPT-2 on your
own dataset with a simple function, and generate text to your own specifications!

How GPT-2 Works


OpenAI has released three flavors of GPT-2 models to date: the “small” 124M parameter model
(500MB on disk), the “medium” 355M model (1.5GB on disk), and recently the 774M model
(3GB on disk). These models are ​much​ larger than what you see in typical AI tutorials and are
harder to wield: the “small” model hits GPU memory limits while finetuning with consumer
GPUs, the “medium” model requires additional training techniques before it could be finetuned
on server GPUs without going out-of-memory, and the “large” model ​cannot be finetuned at all
with current server GPUs before going OOM, even with those techniques.

The actual Transformer architecture GPT-2 uses is very complicated to explain (here’s a​ ​great
lecture​). For the purposes of finetuning, since we can’t modify the architecture, it’s easier to
think of GPT-2 as a​ ​black box​, taking in inputs and providing outputs. Like​ ​previous forms of text
generators​, the inputs are a sequence of tokens, and the outputs are the probability of the next
token in the sequence, with these probabilities serving as weights for the AI to pick the next
token in the sequence. In this case, both the input and output tokens are​ ​byte pair encodings​,
which instead of using character tokens (slower to train but includes case/formatting) or word
tokens (faster to train but does not include case/formatting) like most RNN approaches, the
inputs are “compressed” to the shortest combination of bytes including case/formatting, which
serves as a compromise between both approaches but unfortunately adds randomness to the
final generation length. The byte pair encodings are later decoded into readable text for human
generation.

The pretrained GPT-2 models were trained on websites linked from​ ​Reddit​. As a result, the
model has a very strong grasp of the English language, allowing this knowledge to transfer to
other datasets and perform well with only a minor amount of additional finetuning. Due to the
English bias in encoder construction, languages with non-Latin characters like Russian and​ ​CJK
will perform poorly in finetuning.
When finetuning GPT-2, I recommend using the 124M model (the default) as it’s the best
balance of speed, size, and creativity. If you have large amounts of training data (>10 MB), then
the 355M model may work better.

gpt-2-simple And Colaboratory


In order to better utilize gpt-2-simple and showcase its features, I created my​ ​own Colaboratory
Notebook​, which can be copied into your own Google account. A Colaboratory Notebook is
effectively a​ ​Jupyter Notebook​ running on a free (w/ a Google Account) virtual machine with an
Nvidia server GPU attached (​randomly​ a K80 or a T4; T4 is ideal) that normally can be
cost-prohibitive.

You might also like