You are on page 1of 3

Transcription COSTS

A100 80 GB = $1.79 per hour


$0.00025 / second
3 minutes of audio in 11s
1 min of audio in 3.3s

Cost per hour on A100 80GB


60 * 60 = 3600 secs
$1.79 / 3600 = $0.00049

cost of processing 1 minute of audio


3.3 seconds * $0.0004972222 = $0.0016408333

cost of processing 10 hours per user per month on A100 80GB


$0.0016408333 x 10 x 60 = = $0.98449998

$.055

==========================================================
average speaking rate for adults in English is around 125 to 150 words per minute
(WPM)
Time to speak 15 words (in seconds) = (15 words / 155 WPM) * 60 seconds per minute
Time to speak 15 words (in seconds) ≈ 5.81 seconds

15 words voice note = approx 5-6secondsa

============ ================ ==================


Whisper STT model can transcribe 13 minutes of audio in 54 seconds on an NVIDIA
Tesla V100S GPU.
The A100 80 GB GPU is about 2.5 times faster than the Tesla V100S GPU, so we can
expect it to process one minute of audio in about 3.3 seconds.
On the CPU, the faster-whisper model is also faster than the original model. The
original model can transcribe 13 minutes of audio in 10 minutes
and 31 seconds on an Intel(R) Xeon(R) Gold 6226R, while the faster-whisper model
can do it in 2 minutes and 44 seconds.

============= ============ ==================


The processing time for BARK on an A100 80 GB GPU would be significantly faster
than on an NVIDIA T4 GPU. The A100 80 GB GPU is
about 2.5 times faster than the T4 GPU, so we can expect BARK to process a
prediction in about 20 seconds on the A100 80 GB GPU.

NVIDIA A100 80GB can process 1500 sentences/second Source:


https://www.nvidia.com/en-us/data-center/a100/

======================STT COSTS============================
Nvidia A40 GPU.
cost of processing 15 words (approximately 5.81 seconds of audio) on the Nvidia A40
GPU.

Cost per second on A40 GPU = $0.0013

So, the cost of processing 15 words (5.81 seconds) would be:

5.81 seconds * $0.0013 = $0.007553

10 hours * 60 minutes/hour * 137.5 words/minute = 82500 words

The cost of processing 82500 words would be:

(82500 words / 15 words per 5.81 seconds) * $0.007553 = $32.01

So, the cost of processing 10 hours of audio per user per month on the Nvidia A40
GPU using the Bark model would be approximately $32.01.

-----------------------------------------------------------------------------------
------
Nvidia A1000 GPU
The cost per second on A100 80GB GPU = $0.0004972222
The average speaking rate for adults in English is around 125 to 150 words per
minute. We'll use the average of these two values,
which is 137.5 words per minute.

Time to speak 15 words (in seconds) ≈ 5.81 seconds

A100 80 GB = $1.79 per hour

the cost per second would be $1.79 / 3600 = $0.0004972222.

Average Voicenote time in around 6 secs that takes up 16 KB of output voice. !!!!
Prcessing time unknown!!!!

5.81 seconds * $0.0004972222 = $0.002889

Now, the cost of processing 10 hours of audio per user per month.

In 10 hours, a user would speak:

10 hours * 60 minutes/hour * 137.5 words/minute = 82500 words

The cost of processing 82500 words would be:

(82500 words / 15 words per 5.81 seconds) * $0.002889 = $13.47

=========================custom solution vs others==================


Text to Speech

The Nvidia A1000 GPU with custom faster BARK will do faster calculations than AWS
Textract, IBM TTS, and Google Cloud Speech-to-Text.

Service Transcription time (seconds)


BARK on Nvidia A1000 GPU 54
AWS Textract 1 20
IBM TTS 120
Google Cloud Speech-to-Text 120

Speech to Text
Service Transcription time (seconds)
Whisper on Nvidia A1000 GPU 54
AWS Transcribe 120
IBM STT 150

===================================================================================
========
The cost per second of usage would be higher if the GPU is idle for some of the
time.

The size of the audio files that need to be transcribed.


The accuracy requirements for the transcriptions.
The amount of time that each user will need to transcribe their audio files.
Once you have considered these factors, you can use the following formula to
calculate the maximum capacity of users the A100 can handle:

Maximum capacity = (GPU capacity) / (Time per user)

You might also like