You are on page 1of 26

1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Open in app

Search

Member-only story

Creating talking head videos with generative


AI
Creating talking head videos using various generative AI techniques and tools

Sau Sheong · Follow


14 min read · Dec 26, 2023

Listen Share More

Talking heads are exactly what it sounds like — it’s a person talking in front of a
video camera, showing mostly the head and sometimes up to the shoulder or even
torso. If you watch any TV or social media, you’re likely to have seen it. They are

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 1/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

pretty popular among social media video creators and are often used for product
reviews, training videos, explainers, newscasting, reporting and so on.

There are a number of AI services that can create pretty amazing talking head
videos that speak in any number of languages, and they are all quite mind-
bogglingly good. As usual, I was curious — how did they do it, and can I recreate
something similar?

Well, the answer is obviously yes since I’m writing this article. Of course, my
attempt is far from the efforts of those well-funded companies, but I believe I come
reasonably close.

Let me give you a couple of quick samples of the output before jumping in. Here’s
one closer to the season with Santa Claus sending his Christmas greetings.

Santa Claus Christmas greetings


This digital avatar is created using
Persona.https://github.com/sausheong/persona
youtube.com

Here’s another one, in Chinese, just to show it works in multiple languages, sending
greetings for the upcoming Chinese New Year.

Chinese New Year Greetings in Chinese


This talking head video is created using
Persona.https://github.com/sausheong/persona
youtube.com

Creating talking heads


The overall algorithm is quite straightforward:

1. Generate the speech that the talking head is going to speak

2. Create a still image of the talking head

3. Animate the still image of the talking head into a video

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 2/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

4. Generate the moving lips based on the speech and super-impose them on the
talking head video, along with the speech

5. Improve the quality of the video (optional)

The last bit is optional but if you’re trying to load your video to social media you
should have decent quality to share.

If you feel amazed that I am able to come up with all of the above, I want to disclaim
that I didn’t do any of the above on my own. Actually, I just took existing algorithms
and code and stitched them together in a way that makes sense, to generate the
talking head. Each of the algorithm/project on its own is amazing, but putting them
all together has a quite a different effect altogether. It was deeply satisfying too.

I put all the code here in a project called Persona, which I’ll show how it works in a
bit, and how to use it to generate talking head videos later.

GitHub - sausheong/persona: Persona AI avatar


Persona AI avatar. Contribute to sausheong/persona development
by creating an account on GitHub.
github.com

For now, let’s take a look at the first step, to generate the speech.

Generating the speech


For speech generation, I have many, many options. There are plenty of libraries and
services. In fact, even OpenAI has a text-to-speech API which is trivial to use.
However, in the spirit of using Python libraries, I decided against using any APIs to
generate the speech.

Instead, I use Tortoise-TTS, a text-to-speech Python library that uses AI to generate


pretty high quality speech.

import os, time


import torchaudio
from tortoise.api import TextToSpeech
from tortoise.utils.audio import load_voices

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 3/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

def generate_speech(path_id, outfile, voice, text, speed="standard"):


tts = TextToSpeech(kv_cache=True, half=True)
selected_voices = voice.split(',')
for _, selected_voice in enumerate(selected_voices):
if '&' in selected_voice:
voice_sel = selected_voice.split('&')
else:
voice_sel = [selected_voice]
voice_samples, conditioning_latents = load_voices(voice_sel)

gen, _ = tts.tts_with_preset(text, k=1, voice_samples=voice_samples,


conditioning_latents=conditioning_latents,
return_deterministic_state=True)
if isinstance(gen, list):
for j, g in enumerate(gen):
torchaudio.save(os.path.join("temp", path_id, outfile),
g.squeeze(0).cpu(), 24000)
else:
torchaudio.save(os.path.join("temp", path_id, outfile),
gen.squeeze(0).cpu(), 24000)

The code is not mine, I just tweaked it. However it’s pretty straightforward. First, I
create a TextToSpeech instance. Then I load the voices from the library of voices
provided by Tortoise-TTS. You can actually create your own voices — you just need
to record at least 3 snippets of at least 10 seconds into WAV files and place them in
the voices directory.

Then I use the voices to generate the speech and finally I use torchaudio to save it to
file.

This is how the generate_speech function is used.

message = """Apple today confirmed that it will be permanently closing its


Infinite Loop retail store in Cupertino, California on January 20. Infinite
Loop served as Apple's headquarters between the mid-1990s and 2017, when its
current Apple Park headquarters opened a few miles away."""

generate_speech(path_id, "temp.wav", "daniel", message, "ultra_fast")

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 4/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

The code will generate a WAV file called temp.wav which contains the speech. The
ultra_fast parameter is the fastest, you can also use fast or standard (which is the
slowest).

Tortoise-TTS is powerful but unfortunately can be quite slow. If you prefer to use
another text-to-speech generator, please feel free. While this code produces WAV
format files, you can use other formats such as MP3 as well as an input. I’ll show you
how later.

Generate the talking head image


As before, there are plenty of choices of libraries and APIs to use. The most logical
choice for me, without using APIs, would be to choose one of the many Stable
Diffusion models available.

I chose the SDXL-Turbo model because it’s an exciting new model that promises to
generate images really quickly. I was quite blown away the first time I tried it and
even though it didn’t run as fast as I thought it would on my own hardware, it was
fast enough.

The code was also dead simple, using HuggingFace’s diffusers library.

from diffusers import AutoPipelineForText2Image

def generate_image(path_id, imgfile, prompt):


pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0.0).images[
image.save(os.path.join("temp", path_id, imgfile))

First, I load the model into a pipeline using AutoPipelineForText2Image . Then I


generate an image using the pipeline, and save it to file.

This is how I used the generate_image function.

avatar_description = "Young Indian man with short dark hair, serious look"
generate_image(path_id, imgfile, f"hyperrealistic digital avatar, centered, \

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 5/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

{avatar_description}, rim lighting, studio lighting, looking at the \


camera")

You might be wondering why I broke up the prompt in 2 pieces. It’s basically to
reassure that whatever the input is later on, I’ll always create a talking head.

This is the output.

As before in generating speech, you can use another text-to-image generator, or


even a headshot photo of a person. Persona can use PNG or JPG files.

Animating the talking head


This is where things turn interesting. I could, in fact, create a talking head without
any head animations. This works quite well for shorter speeches but it does look
decidedly odd to have a talking head that is staring unblinkingly at you while he/she
is talking.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 6/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

I decided to look for ways to animate the head, and I found an interesting project. In
fact I found a project that took 2 different projects to produce the effect of an
animated talking head that I wanted.

The combined project called Face Animation in Real Time consists of 2 separate
projects:

1. One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing and,

2. GAN Prior Embedded Network for Blind Face Restoration in the Wild

I took THAT project and tweaked it to make it work for me. Let’s take a step back and
explain what these 2 projects did.

The first one, One-Shot Free-View Neural Talking-Head Synthesis for Video
Conferencing, takes a single still image and using another video (called a driver
video), reproduces the motion of the video on the still image. As you can tell, this is
the crux of animating the talking head is all about.

The second project, GAN Prior Embedded Network for Blind Face Restoration in the
Wild, does blind-face restoration on the video output from the output of the first
project, making the end result a lot more natural.

This is the animate_face function that collates all the necessary pieces together and
generates an animated video file from a still image and a driver video.

import os, sys, cv2, yaml, imageio, torch, subprocess, platform


import numpy as np
import torch.nn.functional as F
import subprocess, platform
from mutagen.wave import WAVE
from datetime import timedelta
from tqdm import tqdm
from face_vid2vid.sync_batchnorm.replicate import DataParallelWithCallback
from face_vid2vid.modules.generator import OcclusionAwareSPADEGenerator
from face_vid2vid.modules.keypoint_detector import KPDetector, HEEstimator
from face_vid2vid.animate import normalize_kp
from batch_face import RetinaFace

def animate_face(path_id, audiofile, driverfile, imgfile, animatedfile):


faceanimation = FaceAnimationClass(os.path.join("temp", path_id, imgfile),
use_sr=False)

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 7/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

tmpfile = f"temp/{path_id}/tmp.mp4"
duration = get_audio_duration(os.path.join("temp", path_id, audiofile))
hms = seconds_to_hms(duration)

command = f"ffmpeg -ss 00:00:00 -i {driverfile} -to {hms} -c copy {tmpfile}


subprocess.call(command, shell=platform.system() != 'Windows')

capture = cv2.VideoCapture(tmpfile)
fps = capture.get(cv2.CAP_PROP_FPS)
frames = []
_, frame = capture.read()
while frame is not None:
frames.append(frame)
_, frame = capture.read()
capture.release()

output_frames = []
for frame in tqdm(frames):
result = faceanimation.inference(frame)
output_frames.append(result)
writer = imageio.get_writer(os.path.join("temp", path_id, animatedfile),
fps=fps, quality=9, macro_block_size=1,
codec="libx264", pixelformat="yuv420p")
for frame in output_frames:
writer.append_data(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
writer.close()

A driver video is nothing more than a short snippet of any talking head video. It
doesn’t need any audio. The driver video’s head and facial motions (blinking eyes,
raised eyebrows etc) will be used on top of the still image.

First, I create an instance of the FaceAnimationClass with the still image. Then I take
the driver video and convert it into frames. For every frame, I use the instance of
FaceAnimationClass and generate a new frame based on the still image. Finally, I
take all the frames and write it to a new video.

Notice that before I start using the driver video, I used ffmpeg to trim it to a smaller
temporary driver file. This is because my driver video about 1 minute and if I use it
directly it will take more time to process. To reduce processing time, I trimmed the
driver video to the same length as the speech.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 8/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

A driver video (no sound needed) that I took from news snippet

This is the still image I used, generated by SDXL-Turbo based on the prompt “Young
Indian man with short dark hair, serious look”.

Young Indian man with short dark hair, serious look

And this is the animated face that is produced by applying the driver video on the
still image.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 9/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Animated face output (no sound)

You might notice that the lips movement is quite small, it doesn’t matter because
we’re going to super-impose another set of lips on it.

Generating and super-imposing the moving lips


This is the crux of the entire effort. Without the lips matching the speech, the whole
thing will look like some weird foreign language dubbing. We want natural-looking
talking heads with the lips moving according to the words. And to do this there is
really only one project that does it very well, and that is the Wav2Lip project.

The modify_lips function puts together the speech and the animated video to
produce the combined output video.

def modify_lips(path_id, audiofile, animatedfile, outfilePath):


animatedfilePath = os.path.join("temp", path_id, animatedfile)
audiofilePath = os.path.join("temp", path_id, audiofile)
tempAudioPath = os.path.join("temp", path_id, "temp.wav")
tempVideoPath = os.path.join("temp", path_id, "temp.avi")

video_stream = cv2.VideoCapture(animatedfilePath)
fps = video_stream.get(cv2.CAP_PROP_FPS)

full_frames = []
while 1:
still_reading, frame = video_stream.read()
if not still_reading:
video_stream.release()
break
if resize_factor > 1:
frame = cv2.resize(frame, (frame.shape[1]//resize_factor,
frame.shape[0]//resize_factor))

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 10/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

if rotate:
frame = cv2.rotate(frame, cv2.cv2.ROTATE_90_CLOCKWISE)

y1, y2, x1, x2 = crop


if x2 == -1: x2 = frame.shape[1]
if y2 == -1: y2 = frame.shape[0]

frame = frame[y1:y2, x1:x2]

full_frames.append(frame)

command = 'ffmpeg -y -i {} -strict -2 {}'.format(audiofilePath,


tempAudioPath)
subprocess.call(command, shell=True)
wav = wav2lip.audio.load_wav(tempAudioPath, 16000)
mel = wav2lip.audio.melspectrogram(wav)

if np.isnan(mel.reshape(-1)).sum() > 0:
raise ValueError('Mel contains nan! ')

mel_chunks = []
mel_idx_multiplier = 80./fps
i = 0
while 1:
start_idx = int(i * mel_idx_multiplier)
if start_idx + mel_step_size > len(mel[0]):
mel_chunks.append(mel[:, len(mel[0]) - mel_step_size:])
break
mel_chunks.append(mel[:, start_idx : start_idx + mel_step_size])
i += 1

full_frames = full_frames[:len(mel_chunks)]
batch_size = wav2lip_batch_size
gen = datagen(full_frames.copy(), mel_chunks)

for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen,


total=int(np.ceil(float(len(mel_chunks))/batch_size)))):
if i == 0:
model = load_model(checkpoint_path)
frame_h, frame_w = full_frames[0].shape[:-1]
out = cv2.VideoWriter(tempVideoPath, cv2.VideoWriter_fourcc(*'DIVX'),
fps, (frame_w, frame_h))

img_batch = torch.FloatTensor(np.transpose(img_batch,
(0, 3, 1, 2))).to(device)
mel_batch = torch.FloatTensor(np.transpose(mel_batch,
(0, 3, 1, 2))).to(device)

with torch.no_grad():
pred = model(mel_batch, img_batch)

pred = pred.cpu().numpy().transpose(0, 2, 3, 1) * 255.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 11/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

for p, f, c in zip(pred, frames, coords):


y1, y2, x1, x2 = c
p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))

f[y1:y2, x1:x2] = p
out.write(f)

out.release()

command = 'ffmpeg -y -i {} -i {} -strict -2 -q:v 1 {}'.format(tempAudioPath,


tempVideoPath, outfilePath)
subprocess.call(command, shell=platform.system() != 'Windows')

First, I take the animated video from before. Then I take the speech file, convert it
first into WAV format (this is why we can use any other format, it is first converted
into WAV) then a mel spectrogram using the librosa library. The mel spectrogram
is then broken up into chunks and converted into batches, alongside with the
frames earlier.

These batches are then fed into the model to generate a new set of frames that has
the correct lip movements according the speech. These frames are finally compiled
into a video, together with the original speech file to create the output video.

This is the output video that is produced, putting everything together.

Persona talking head - Apple closing its Infinite Loop retail store
Talking head created by Persona.
youtube.com

Improving the video


The output video is 256x256 and the video quality is passable for its size. However if
you want something bigger (for example, posting on YouTube) you’d want a better
quality video.

To do this we need to break down the video into frames first, then use a technique to
make the image look better, then reassemble the frames back into the video.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 12/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

There are a number of techniques that can be used for making the image higher
resolution, many of them are GAN based image restoration techniques. I found that
Real-ESRGAN works pretty well for me, and so that’s the one I used.

First of all, we need to break down the video into frames using vid2frames .

def vid2frames(vidPath, framesOutPath):


vidcap = cv2.VideoCapture(vidPath)
success,image = vidcap.read()
frame = 1
while success:
cv2.imwrite(os.path.join(framesOutPath, str(frame).zfill(5) + '.png'),
image)
success,image = vidcap.read()
frame += 1

Now that we have a bunch of image files in a directory, we need to take each of them
and improve them using Real-ESRGAN.

def improve(disassembledPath, improvedPath):


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = RealESRGAN(device, scale=4)
model.load_weights('weights/RealESRGAN_x4.pth', download=True)
files = glob.glob(os.path.join(disassembledPath,"*.png"))

results = t_map(real_esrgan, files, [model]*len(files),


[improvedPath] * len(files)

I used t_map to wrap tqdm around the real_esrgan function in order to show
progress.

def real_esrgan(img_path, model, improvedPath):


image = Image.open(img_path).convert('RGB')
sr_image = model.predict(image)
img_name = os.path.basename(img_path)
sr_image.save(os.path.join(improvedPath, img_name))

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 13/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

The improved image files are placed in another directory when it’s done (this can
take some time since I didn’t run it in parallel). Once it’s done, I can use
restore_frames to combine the improved images and the speech audio file into a
final output video.

def restore_frames(audiofilePath, videoOutPath, improveOutputPath):


no_of_frames = count_files(improveOutputPath)
audio_duration = get_audio_duration(audiofilePath)
framesPath = improveOutputPath + "/%5d.png"
fps = no_of_frames/audio_duration
command = f"ffmpeg -y -r {fps} -f image2 -i {framesPath} -i {audiofilePath}
subprocess.call(command, shell=platform.system() != 'Windows')

def get_audio_duration(audioPath):
audio = WAVE(audioPath)
duration = audio.info.length
return duration

def count_files(directory):
return len([name for name in os.listdir(directory) if
os.path.isfile(os.path.join(directory, name))])

Running Persona
Now that we have all the pieces in place, let’s see how to run Persona. Remember,
this is not a web application, it’s just a script that puts various pieces of code
together by calling functions.

First, you need to clone the repo from GitHub and install the required
dependencies.

$ git clone https://github.com/sausheong/persona.git


$ cd persona
$ pip install -r requirements.txt

Next, you need to download the following weights and PyTorch files.

1. Wav2Lip weights — place them in the wav2lip directory

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 14/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

2. Face detection weights — place them in the directory


wave2lip/face_detection/detection/sfd/s3fd.pth

3. Real-ESRGAN weights — create a folder named weights and place the file in
there

Once all dependencies are installed, you can run Person by calling the persona.py

script. This is the default way to call Persona.

$ python persona.py

Running it the first time takes a while because other than the files you downloaded
earlier, the script will automatically download other necessary weights.

A new folder named temp will be created to store all temporary files, and a new
folder named results will be created to store the final resultant videos. In the temp
folder, a new folder with a path_id will be created to store all the temporary files
created during the creation of the talking head video.

If you want to use your own image for your talking head video, you can do this.

$ python persona.py --image=<path/to/your/image>

If you want to use your own speech file, you can do this.

$ python persona.py --speech=<path/to/your/wav file>

Remember to use only WAV files.

Running it this way only creates the smaller videos in the results directory. If you
want the larger video, you can do this.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 15/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

$ python persona.py --improve

This runs the normal generation, followed by the improvement step. The
improvement step can be pretty slow (> 20 minutes).

How about if you generated a smaller video but you want to now improve it to a
larger one? I’ve got you covered for that too.

$ python persona.py --improve --path_id=<your path id> --skipgen

The skipgen flag tells Persona to skip all the generation, and the improve flag tells
Persona to improve the video. However you still need to tell Persona which video
file to use and where to store the temporary frames, so you need to provide a
path_id as well.

Hardware
A note about the hardware I ran this on. So far I’ve only tried this on an Intel x86_64
machine with Nvdia GPUs, using CUDA. I spun up an n1-highmem-16 (16vCPU, 8
core, 104 GB memory) instance on Google Cloud with 2xT4 GPUs and ran my
experiments on it.

Running Persona on this configuration generally takes about 4 minutes to generate


the small talking head video.

generating speech: 54 seconds


generating avatar image: 18 seconds
animating face: 2 minutes
modifying lips: 48 seconds
total time: 4 minutes

Depending on the amount of text spoken, and the speed can differ greatly. In fact, I
usually find generating speech the slowest part. If you’re impatient and want to use
something else, you can try OpenAI’s text-to-speech, which is pretty fast (I have

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 16/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

some commented out code in speech.py you can uncomment). You can also try any
other text-to-speech generator, or even record your own!

I also tried other GPUs and while the more expensive ones generally run faster, they
are also a LOT more expensive so beware.

Final thoughts
I spent a couple of days over the Christmas break to muck around with talking head
videos. It was a fascinating journey and I learnt a lot and hope you enjoy playing
about with the project as well!

Libraries
Here are the libraries used.

Generating speech

GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained


with an emphasis on quality
A multi-voice TTS system trained with an emphasis on quality -
GitHub - neonbjb/tortoise-tts: A multi-voice TTS system…
github.com

Generating the still image

GitHub - Stability-AI/generative-models: Generative Models by


Stability AI
Generative Models by Stability AI. Contribute to Stability-
AI/generative-models development by creating an account on…
github.com

Animating the face

GitHub - sky24h/Face_Animation_Real_Time: One-shot face


animation using webcam, capable of running…
One-shot face animation using webcam, capable of running in real
time. - GitHub - sky24h/Face_Animation_Real_Time…
github.com

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 17/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

GitHub - yangxy/GPEN
Contribute to yangxy/GPEN development by creating an account on
GitHub.
github.com

GitHub - zhanglonghao1992/One-Shot_Free-
View_Neural_Talking_Head_Synthesis: Pytorch implementation…
Pytorch implementation of paper "One-Shot Free-View Neural
Talking-Head Synthesis for Video Conferencing" - GitHub …
github.com

Improving the video

GitHub - ai-forever/Real-ESRGAN: PyTorch implementation of


Real-ESRGAN model
PyTorch implementation of Real-ESRGAN model. Contribute to ai-
forever/Real-ESRGAN development by creating an account on…
github.com

AI Python Videos Stable Diffusion Text To Speech

Follow

Written by Sau Sheong


2.3K Followers

I write, code.

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 18/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

More from Sau Sheong

Sau Sheong

Matching resumes with job postings using LLMs and Go


Matching text content using cosine similarity

· 16 min read · Dec 9, 2023

94 1

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 19/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Sau Sheong in Stackademic

Creating a simple ChatGPT clone with Go


A simple introduction to writing LLM applications with Go

· 4 min read · Aug 6, 2023

216 4

Sau Sheong in Stackademic

Creating a ChatGPT clone that runs on your laptop with Go


Running Llama-2 on your laptop using llama.cpp and Go

· 17 min read · Aug 20, 2023

237 6

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 20/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Sau Sheong in Geek Culture

Prompt Engineering with LlamaIndex and OpenAI GPT-3


Using GPT-3 with your own documents, datasets, images and videos

· 6 min read · Mar 29, 2023

190 11

See all from Sau Sheong

Recommended from Medium

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 21/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Gavin Li in AI Advances

How Your Ordinary 8GB MacBook’s Untapped AI Power Can Run 70B LLM
Models That Will Blow Your Mind!
Do you think your Apple MacBook is only good for making PPTs, browsing the web, and
streaming shows? If so, you really don’t understand the…

4 min read · Dec 28, 2023

729 9

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 22/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Eric Risco

The End of Retrieval Augmented Generation? Emerging Architectures


Signal a Shift
· 3 min read · Dec 25, 2023

908 40

Lists

Coding & Development


11 stories · 358 saves

Predictive Modeling w/ Python


20 stories · 757 saves

Generative AI Recommended Reading


52 stories · 574 saves

What is ChatGPT?
9 stories · 266 saves

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 23/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Aymen El Amri in FAUN — Developer Community 🐾

The Hottest Open Source Projects Of 2023


This article was originally posted on faun.dev.

· 14 min read · Dec 28, 2023

1.6K 9

Gencay I. in Level Up Coding

3 Trending GPT That Will Save You Tons of Time


https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 24/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

Exploring the Future of AI: How GPT and ChatGPT Are Changing the Tech Landscape

· 4 min read · Dec 26, 2023

353

Anmol Tomar in CodeX

Say Goodbye to Loops in Python, and Welcome Vectorization!


Use Vectorization — a super-fast alternative to loops in Python

· 5 min read · Dec 28, 2023

1.6K 19

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 25/26
1/6/24, 1:16 PM Creating talking head videos with generative AI | by Sau Sheong | Dec, 2023 | Medium

ChatDOC

Revolutionizing RAG with Enhanced PDF Structure Recognition


We examines methods for extracting structured knowledge from documents to augment LLMs
with domain expertise.

15 min read · Dec 19, 2023

337 2

See more recommendations

https://medium.com/@sausheong/creating-talking-head-videos-with-generative-ai-2df3947fd506 26/26

You might also like