You are on page 1of 2

Kdenlive's Speech-to-Text Tool.

This is
my experience

Last week, Pablinux told you about the new version of Kdenlive, the video editing tool
from the KDE project. As I once commented, I prefer OpenShot which has a lower
learning curve, butAs I was very interested in the speech-to-text tool that this new
version incorporates, I decided to take a look at it.

Although I have written my share of articles on Linux alternatives to this or that


Windows program (No one can call themselves a Linux blogger if they didn't write one
of those), this is not an approach that I like. I think that programs should be talked about
by their own characteristics. If I have to define Kdenlive in any way, I will say that it
is a video editor for hobbyists who want their creations to look professional.

I've said in the past and I keep it (come one by one) that free and open source software
has libraries for multimedia work that make Adobe and Blackmagic products look
like mere toys. The big problem is that nobody was interested in putting these tools
together with a simple and attractive interface and complete and easy to understand
documentation. Although Kdenlive is far from having achieved its goal, its developers
are on the right track.

In the case of the ability to convert speech to text, Kdenlive uses two tools from the
arsenal of the repository of the Python Package Index.

Vosk is an open source and offline speech recognition toolkitn. It offers speech
recognition models for 17 languages and dialects: English, Indian English, German,
French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch,
Catalan, Arabic, Greek, Farsi, and Filipino.

Kdenlive uses Vosk models through a module written in Python.


However, having the transcript is not enough. You also have to sync it with the video.
For this we need another module in Python for creating subtitles.

Kdenlive will check that you have these modules installed. PTo do this you need to
first install the python3-pip package on your distribution and then run the
commands:

pip3 install vosk

pip3 install srt

Next, we have to install the voice models. For this we open Kdenlive and we are going
to Settings Configure Kdenlive Speech to Text.

To load the models you have two options: or download the models from this page
and load them manually (You must first check the Custom modem folders box) or
paste the link from the list that shows you that same page.

Using the Speech to Text tool


1. Make sure in the View menu that you have the subtitles option activated. Next,
upload the video you want to transcribe.
2. Move the video to the first video track and slide the blue line along the length
you want to transcribe.
3. Click on the subtitles tab and then on the + sign
4. A hint is added at the top. Click on the icon to the left of the eye.
5. Select the transcription model and if you want to transcribe a clip, all the clips in
a timeline or a part of the timeline. Click on Process

I compared Speech to tech to the free version of a cloud tool, and have seen self-
captioned videos from Youtube and paid course platforms. I have to say that it is not
perfect, but it is not worse than the mentioned alternatives. He has problems when
those who speak do not have good diction or do so over music or some other sound.
But, imagining the question they are asking me, yes, it can be used to subtitle a series or
movie. Although, due to the limitations indicated, they may have to be completed by
hand.

And, if the guys at Kdenlive put the batteries a bit and integrate a translation module,
the thing would be perfect.

There is something that could be improved. Today, if you want to change the
appearance of the subtitles, you will have to insert code. And, there is no way to
export them. You will only be able to see them embedded in the video.

But, as I said above, without a doubt the project is on the right track.

You might also like