You are on page 1of 15

Reference of HMM-based Speech Synthesis Engine

hts engine API version 1.10


HTS Working Group
December 25, 2015

1 Engine structures
1.1 Audio
HTS Audio Audio output wrapper.
'
size t sampling frequency size t
short
size t

max bu size
*bu
bu size

void *audio interface


&

$
sampling frequency

buer size of audio output device


current buer
current buer size

audio interface specified in compile step

1.2 Model
HTS Window Window coecients to calculate dynamic features.
'
size t size
- # of windows (static + deltas)
- left width of windows
int *l width
int *r width
- right width of windows
double
size t
&

**coecient
max width

window coecients
maximum width of windows

HTS Pattern List of patterns in a question and a tree.



char *string - pattern string
HTS Pattern *next
- pointer to the next pattern

HTS Question List of questions in a tree.

char *string - name of this question
HTS Pattern *head
- pointer to the head of pattern list
HTS Question


*next

pointer to the next question

%




HTS Node List of tree nodes in a tree.


'
int index
- index of this node
size t pdf
- index of PDF for this node (leaf node only)
HTS Node *yes
- pointer to its child node (yes)
HTS Node
HTS Node
HTS Question
&

*no
*next

pointer to its child node (no)


pointer to the next node

*quest

question applied at this node

HTS Tree List of decision trees in a model.



HTS Pattern *head - pointer to the head of pattern list for this tree
HTS Tree *next - pointer to the next tree
HTS Node *root - root node of this tree


size t

state

state index of this tree

HTS Model Set of PDFs, decision trees and questions.


'
size t vector length
- vector length (static features only)
size t num windows - # of windows for delta
HTS Boolean is msd
- flag for MSD
size t
size t
float
HTS Tree
HTS Question
&

ntree
*npdf

# of trees
# of PDFs at each tree

***pdf
*tree
*question

PDFs
pointer to the list of trees
pointer to the list of questions

HTS ModelSet Set of duration models, HMMs and GV models.


'
char *hts voice version
- version of HTS voice format
size t
size t
size t

sampling frequency
frame period
num voices

sampling frequency
frame period
# of HTS voices

size t
size t

num states
num streams

# of HMM states
# of streams

*stream type
*fullcontext format
*fullcontext version

stream type
fullcontext label format
version of fullcontext label

HTS Question
char

*gv o context
**option

GV switch
options for each stream

HTS Model
HTS Window
HTS Model

*duration
*window
**stream

duration PDFs and trees


window coecients for delta
parameter PDFs and trees

**gv

GV PDFs and trees

char
char
char

HTS Model
&

%



$

%
$

1.3 Label
HTS LabelString Individual label string with time information.

HTS LabelString *next
- pointer to the next label string
char *name - label string

double
double

start
end

start frame specified in the given label


end frame specified in the given label

HTS Label List of label strings.



HTS LabelString *head 

size t

size

pointer to the head of label string

# of label strings

1.4 State stream


HTS SStream Individual state stream.
'
size t vector length
double **mean
double **vari
-

$
vector length (static features only)
mean vector sequence
variance vector sequence

double
size t

*msd
win size

MSD parameter sequence


# of windows (static + deltas)

int
int
double

*win l width
*win r width
**win coecient

left width of windows


right width of windows
window coecients

size t
double

win max width


*gv mean

maximum width of windows


mean vector of GV

*gv vari
*gv switch

variance vector of GV
GV flag sequence

double
HTS Boolean
&

HTS SStreamSet Set of state stream.


'
HTS SStream *sstream
- state streams

&

size t
size t

nstream
nstate

# of streams
# of states

size t
size t
size t

*duration
total state
total frame

duration sequence
total state
total frame

1.5 PDF stream


HTS SMatrices Matrices/Vectors used in the speech parameter generation algorithm.

%
$

'
double

$
**mean

mean vector sequence

double
double

**ivar
*g

inverse diagonal variance sequence


vector used in the forward substitution

double
double
&

**wuw
*wum

W U 1 W

W U 1 m

HTS PStream Individual PDF stream.


'
size t vector length
size t length
size t
double
HTS SMatrices
size t
int
int
double
HTS Boolean
double
double
HTS Boolean
size t
&

$
vector length (static features only)
stream length

width
**par

width of dynamic window


output parameter vector

sm
win size
*win l width

matrices for parameter generation


# of windows (static + deltas)
left width of windows

*win r width
**win coecient
*msd flag

right width of windows


window coecients
Boolean sequence for MSD

*gv mean
*gv vari

mean vector of GV
variance vector of GV

*gv switch
gv length

GV flag sequence
frame length for GV calculation

HTS PStreamSet Set of PDF streams.



HTS PStream *pstream
- PDF streams
size t nstream
- # of PDF streams


size t

total frame

total frame

%


1.6 Generated parameter stream


HTS GStream Generated parameter stream.

size t vector length - vector length (static features only)

double


**par

generated parameter

HTS GStreamSet Set of generated parameter stream.


'
size t total nsample - total sample
size t
size t
HTS GStream
&

double

total frame
nstream
*gstream

total frame
# of streams
generated parameter streams

*gspeech

generated speech

1.7 Engine
HTS Condition Synthesis condition.
'
size t sampling frequency
size t fperiod

$
-

sampling frequency
frame period

audio bu size
stop
volume

audio buer size (for audio device)


stop flag
volume

*msd threshold
*gv weight

MSD thresholds
GV weights

HTS Boolean
double
size t

phoneme alignment flag


speed
stage

flag for using phoneme alignment in label


speech speed
if stage = 0 then gamma = 0 else gamma = 1/stage

HTS Boolean
double

use log gain


alpha

log gain flag (for LSP)


all-pass constant

double
double
double

beta
additional half tone
*duration iw

postfiltering coecient
additional half tone
weights for duration interpolation

double
double

**parameter iw
**gv iw

weights for parameter interpolation


weights for GV interpolation

size t
HTS Boolean
double
double
double

&

HTS Engine Engine itself.


'
HTS Condition condition
HTS Audio audio
HTS ModelSet
HTS Label
HTS SStreamSet
HTS PStreamSet
HTS GStreamSet
&

%
$

synthesis condition
audio output

ms
label

set of duration models, HMMs and GV models


label

sss
pss
gss

set of state streams


set of PDF streams
set of generated parameter streams

2 Engine functions
2.1 Initialize engine
2.1.1 HTS Engine initialize
Type
Use

void
Initialize engine.

Arguments
Attention!!

HTS Engine *engine - pointer to HTS Engine structure


To start engine, first you must call this function.

2.2 Load models


2.2.1 HTS Engine load
Type

HTS Boolean

Use
Arguments

Load duration PDFs and trees from files using given file names.
- pointer to HTS Engine structure
HTS Engine *engine
char **voices
- HTS voice file names

Attention!!

size t num voices - # of HTS voices


You must initialize engine using HTS Engine initialize before calling this function.

2.3 Synthesize speech and set/get synthesis parameters


2.3.1 HTS Engine set sampling frequency
Type
Use

void
set sampling frequency.

Arguments

HTS Engine
size t

*engine
i

pointer to HTS Engine structure


sampling frequency (Hz), 1 i

2.3.2 HTS Engine get sampling frequency


Type

size t

Use
Arguments

get sampling frquency.


HTS Engine *engine

pointer to HTS Engine structure

pointer to HTS Engine structure


frame shift (point), 1 i

2.3.3 HTS Engine set fperiod


Type

void

Use
Arguments

set frame shift.


HTS Engine *engine
size t i

2.3.4 HTS Engine get fperiod


Type

size t

Use
Arguments

get frame shift.


HTS Engine *engine

pointer to HTS Engine structure

2.3.5 HTS Engine set audio bu size


Type

void

Use
Arguments

set buer size for direct audio output.


HTS Engine *engine - pointer to HTS Engine structure
size t i
- buer size (sample)

Attention!!

Default value is 0. If i = 0, direct audio play is turned o.

2.3.6 HTS Engine get audio bu size


Type

size t

Use
Arguments
Attention!!

get buer size for direct audio output.


HTS Engine *engine - pointer to HTS Engine structure
Default value is 0. If i = 0, direct audio play is turned o.

2.3.7 HTS Engine set stop flag


Type
Use

void
set stop flag.

Arguments

HTS Engine *engine


HTS Boolean b
Default value is FALSE.

Attention!!

pointer to HTS Engine structure


flag

2.3.8 HTS Engine get stop flag


Type
Use

HTS Boolean
get stop flag.

Arguments
Attention!!

HTS Engine *engine Default value is FALSE.

pointer to HTS Engine structure

2.3.9 HTS Engine set volume


Type

void

Use
Arguments

set volume in db.


HTS Engine *engine
double f

Attention!!

Default value is 0.0.

pointer to HTS Engine structure


volume in db

2.3.10 HTS Engine get volume


Type

double

Use
Arguments

get volume in db.


HTS Engine *engine

pointer to HTS Engine structure

2.3.11 HTS Engine set msd threshold


Type
Use

void
set MSD threshold.

Arguments

HTS Engine
size t
double

*engine
stream index

pointer to HTS Engine structure


index of streams

threshold

pointer to HTS Engine structure

index of streams

pointer to HTS Engine structure


index of streams
GV weight

pointer to HTS Engine structure


index of streams

2.3.12 HTS Engine get msd threshold


Type
Use
Arguments

double
get MSD threshold.
HTS Engine *engine
size t

stream index

2.3.13 HTS Engine set gv weight


Type
Use

void
set GV weight.

Arguments

HTS Engine
size t
double

Attention!!

Default value is 1.0.

*engine
stream index
f

2.3.14 HTS Engine get gv weight


Type
Use

double
get GV weight.

Arguments

HTS Engine
size t

*engine
stream index

2.3.15 HTS Engine set speed


Type

void

Use
Arguments

set speech speed.


HTS Engine *engine
double f

Attention!!

Default value is 1.0.

pointer to HTS Engine structure


speed

2.3.16 HTS Engine set phoneme alignment flag


Type
Use

void
set flag to use phoneme alignment in label.

Arguments

HTS Engine
HTS Boolean

Attention!!

Default value is FALSE.

*engine
b

pointer to HTS Engine structure


flag

2.3.17 HTS Engine set alpha


Type
Use

void
set frequency warping parameter alpha.

Arguments

HTS Engine
double

*engine
f

pointer to HTS Engine structure


alpha, 0.0 f 1.0

2.3.18 HTS Engine get alpha


Type
Use
Arguments

double
get frequency warping parameter alpha.
HTS Engine *engine - pointer to HTS Engine structure

2.3.19 HTS Engine set beta


Type
Use

void
set postfiltering coecient parameter beta.

Arguments

HTS Engine *engine


double f
Default value is 0.0.

Attention!!

pointer to HTS Engine structure


beta, 0.0 f 1.0

2.3.20 HTS Engine get beta


Type
Use
Arguments

double
get postfiltering coecient parameter beta.
HTS Engine *engine - pointer to HTS Engine structure

Attention!!

Default value is 0.0.

2.3.21 HTS Engine add half tone


Type

void

Use
Arguments

add half tone.


HTS Engine *engine
double f

pointer to HTS Engine structure


half tone

2.3.22 HTS Engine set duration interpolation weight


Type

void

Use
Arguments

set weight for duration interpolation.


HTS Engine *engine
- pointer to HTS Engine structure
size t
double

voice index
f

index of duration models


interpolation weight

2.3.23 HTS Engine get duration interpolation weight


Type
Use

double
get weight for duration interpolation.

Arguments

HTS Engine
size t

*engine
voice index

pointer to HTS Engine structure


index of duration models

2.3.24 HTS Engine set parameter interpolation weight


Type

void

Use
Arguments

set weight for parameter interpolation.


HTS Engine *engine
- pointer to HTS Engine structure
size t voice index
- index of parameter models
size t
double

stream index
f

index of streams
interpolation weight

2.3.25 HTS Engine get parameter interpolation weight


Type

double

Use
Arguments

get weight for parameter interpolation.


- pointer to HTS Engine structure
HTS Engine *engine
size t
size t

voice index
stream index

index of parameter models


index of streams

10

2.3.26 HTS Engine set gv interpolation weight


Type

void

Use
Arguments

set weight for GV interpolation.


HTS Engine *engine
size t voice index
size t
double

stream index
f

pointer to HTS Engine structure


index of GV models

index of streams
interpolation weight

2.3.27 HTS Engine get gv interpolation weight


Type

double

Use
Arguments

get weight for GV interpolation.


HTS Engine *engine
size t voice index
size t

stream index

pointer to HTS Engine structure


index of GV models

index of streams

2.3.28 HTS Engine get total state


Type
Use

size t
get total # of state.

Arguments

HTS Engine

*engine

pointer to HTS Engine structure

2.3.29 HTS Engine set state mean


Type
Use
Arguments

void
set mean value of state.
HTS Engine *engine

pointer to HTS Engine structure

size t
size t

stream index
state index

index of streams
index of states

size t
double

vector index
f

index of vector
mean value

pointer to HTS Engine structure


index of streams

index of states
index of vector

2.3.30 HTS Engine get state mean


Type

double

Use
Arguments

get mean value of state.


HTS Engine *engine
size t stream index
size t
size t

state index
vector index

11

2.3.31 HTS Engine get state duration


Type

size t

Use
Arguments

get state duration.


HTS Engine *engine
size t state index

pointer to HTS Engine structure


index of states

2.3.32 HTS Engine get nvoices


Type
Use
Arguments

size t
get # of HTS voices.
HTS Engine *engine

pointer to HTS Engine structure

pointer to HTS Engine structure

pointer to HTS Engine structure

2.3.33 HTS Engine get nstream


Type
Use
Arguments

size t
get # of stream.
HTS Engine *engine

2.3.34 HTS Engine get nstate


Type
Use

size t
get # of state.

Arguments

HTS Engine

*engine

2.3.35 HTS Engine get fullcontext label format


Type
Use

const char *
get fullcontext label format defined in HTS voice.

Arguments

HTS Engine

*engine

pointer to HTS Engine structure

2.3.36 HTS Engine get fullcontext label version


Type
Use

const char *
get fullcontext label version defined in HTS voice.

Arguments

HTS Engine

*engine

pointer to HTS Engine structure

pointer to HTS Engine structure

2.3.37 HTS Engine get total frame


Type
Use

size t
get total # of frame.

Arguments

HTS Engine

*engine

12

2.3.38 HTS Engine get nsamples


Type

size t

Use
Arguments

get # of samples.
HTS Engine *engine

pointer to HTS Engine structure

2.3.39 HTS Engine get generated parameter


Type

size t

Use
Arguments

get generated parameter.


HTS Engine *engine
size t stream index
size t
size t

frame index
vector index

pointer to HTS Engine structure


index of streams

index of frames
index of vector

2.3.40 HTS Engine get generated speech


Type
Use
Arguments

size t
get generated speech.
HTS Engine *engine
size t

index

pointer to HTS Engine structure

index of samples

2.3.41 HTS Engine synthesize from fn


Type

HTS Boolean

Use
Arguments

synthesize speech from file name.


HTS Engine *engine - pointer to HTS Engine structure
char *fn
- label file name

2.3.42 HTS Engine synthesize from strings


Type
Use

HTS Boolean
synthesize speech from string list.

Arguments

HTS Engine
char
size t

*engine
**lines
num lines

pointer to HTS Engine structure


label string list
# of lines

2.3.43 HTS Engine generate from fn


Type
Use
Arguments

HTS Boolean
generate state sequence from file name (1/3 synthesis step)
HTS Engine *engine - pointer to HTS Engine structure
char

*fn

label file name

13

2.3.44 HTS Engine generate from strings


Type

HTS Boolean

Use
Arguments

generate state sequence from string list (1/3 synthesis step)


HTS Engine *engine
- pointer to HTS Engine structure
char **lines
- label string list
size t

num lines

# of lines

2.3.45 HTS Engine generate parameter sequence


Type
Use

HTS Boolean
generate parameter sequence (2/3 synthesis step)

Arguments

HTS Engine

*engine

pointer to HTS Engine structure

2.3.46 HTS Engine generate sample sequence


Type

HTS Boolean

Use
Arguments

generate sample sequence (3/3 synthesis step)


HTS Engine *engine - pointer to HTS Engine structure

2.3.47 HTS Engine save information


Type

void

Use
Arguments

output trace information.


HTS Engine *engine FILE

*fp

pointer to HTS Engine structure

output file pointer

pointer to HTS Engine structure

output file pointer

Attention!!

2.3.48 HTS Engine save label


Type

void

Use
Arguments

output label with time.


HTS Engine *engine
FILE

*fp

Attention!!

2.3.49 HTS Engine save generated parameter


Type

void

Use
Arguments

output generated parameter.


HTS Engine *engine FILE *fp
-

pointer to HTS Engine structure


output file pointer

Attention!!

14

2.3.50 HTS Engine save generated speech


Type

void

Use
Arguments

output generated speech.


HTS Engine *engine FILE *fp
-

pointer to HTS Engine structure


output file pointer

Attention!!

2.3.51 HTS Engine save ri


Type
Use

void
output ri format file.

Arguments

HTS Engine
FILE

*engine
*fp

pointer to HTS Engine structure


output file pointer

Attention!!

2.3.52 HTS Engine refresh


Type

void

Use
Arguments

free label, state streams, PDF streams and generated parameter streams per one time synthesis
HTS Engine *engine - pointer to HTS Engine structure

Attention!!

2.4 Free engine


2.4.1 HTS Engine clear
Type
Use
Arguments

void
free engine.
HTS Engine

*engine

pointer to HTS Engine structure

Attention!!

15