United States Patent (10) Patent No.: US 7.672,840 B2: Sasaki Et Al. (45) Date of Patent: Mar. 2, 2010

USOO7672840B2
(12) United States Patent (10) Patent No.: US 7.672,840 B2

Sasaki et al. (45) Date of Patent: Mar. 2, 2010
(54) VOICE SPEED CONTROL APPARATUS 6,374,213 B2 4/2002 Imai et al. ................... TO4,233
6,377,931 B1 * 4/2002 Shlomot .... ... TO4,503
(75) Inventors: Hitoshi Sasaki, Kawasaki (JP); Hiroshi 6,711,536 B2 * 3/2004 Rees ......... ... 704/210
Katayama, Kawasaki (JP); Rika 6,782,363 B2 * 8/2004 Lee et al. .......... ... 704, 248
r s 7,412,376 B2 * 8/2008 Florencio et al. ............ 704/206
Nishiike, Kawasaki (JP) 7,480,613 B2 * 1/2009 Kellner ....................... TO4,231
(73) Assignee: Fujitsu Limited, Kawasaki (JP) (Continued)
(*) Notice: Subject to any disclaimer, the term of this FOREIGN PATENT DOCUMENTS
patent is extended or adjusted under 35 EP O944036 9, 1999
U.S.C. 154(b) by 318 days.
(21) Appl. No.: 11/653,952 (Continued)
OTHER PUBLICATIONS
(22) Filed: Jan. 17, 2007 Supplementary European Search Report dated Aug. 27, 2008.
(65) Prior Publication Data (Continued)
US 2007/O1 18363 A1 May 24, 2007 Primary Examiner Martin Lerner
Related U.S. Application Data (74) Attorney, Agent, or Firm—Kratz, Quintos & Hanson,
LLP
(63) Continuation of application No. PCT/JP2004/010340,
filed on Jul. 21, 2004. (57) ABSTRACT
(51) Int. Cl. A Voice speed control apparatus comprising: an utterance?
GIOL 2L/04 (2006.01) non-utterance judging unit judging whether a processing tar
GIOL II/02 (2006.01) get segment in inputted a voice signal is an utterance segment
(52) U.S. Cl. ........................ 704/215: 704/226; 704/503 or a non-utterance segment; a non-utterance continuation
(58) Field of Classification Search ................. 704/210, length acquiring unit acquiring a non-utterance continuation
704/215, 226, 233,240,253, 502,503, 504 length representing a length of the Voice signal judged con
See application file for complete search history. tinuously to be the non-utterance; a determining unit deter
mining a reproducing speed of the processing target segment
(56) References Cited in the Voice signal in accordance with the non-utterance con
U.S. PATENT DOCUMENTS tinuation length so that the reproducing speed gets higher as
the non-utterance continuation length gets larger and so that
5,305,420 A * 4, 1994 Nakamura et al. .......... 704/271 an increase in reproducing speed is restrained to a greater
5,611,018 A 3, 1997 Tanaka et al. ............... TO4/215 degree as the non-utterance continuation length becomes
5,699,481 A * 12/1997 Shlomotet al. ............. TO4,228 Smaller, and a changing unit changing the reproducing speed
5,809,454. A * 9/1998 Okada et al. ................
5,842,123 A * 1 1/1998 Hamamoto et al. ...... 455,412.2
it of the voice signals corresponding to the reproducing speed
5,995,925 A * 1 1/1999 Emori ........................ TO4/208
6,324,509 B1 * 1 1/2001 Bi et al. ...................... TO4,248 7 Claims, 30 Drawing Sheets
WOCE SPEED CONTROL APPARATUS
DELAY QUANTITY
ACQURING UNIT
DELAY
ACCUMULATED QUANITY
UTTERANCE/ CONTINUATION DELAY QUANTTY
NON-UTTERANCE
TIME
JUDGING UNIT CALCULATING WOICE SPEED
UNT DETERMINING UNIT
RESULT OF NON-UTTERANCE
JUDGMENT CONTINUATION TIME/
JUDGMEN RESULT
VOICE SPEED
INPUT CONTROLLED
SIGNAL WOICE SPEED SGNA
CONTROUNT
US 7.672,840 B2
Page 2
U.S. PATENT DOCUMENTS JP 2001-109499 4/2001

JP 2001-154684 6, 2001
2001/0032072 A1 10, 2001 Inoue JP 2001-184100 T 2001
2002, 0004722 A1 1/2002 Inoue JP 2001-255894 9, 2001
2003/0028375 A1 2/2003 Kellner ....................... TO4/235 JP 2001-318700 11, 2001
2007,0265839 A1* 11, 2007 Sasaki et al. ... ... 704/201 JP 2003-216200 T 2003
2007/0276657 A1* 1 1/2007 Gournay et al. ............. 704/203
OTHER PUBLICATIONS
FOREIGN PATENT DOCUMENTS
Notice of Reason for Rejection dated Jul. 28, 2009 issued in corre
JP 8-292.796 11, 1996 sponding Japanese application No. 2006-527702 with English trans
JP 9-73299 3, 1997 lation.
JP 9-147472 6, 1997
JP 2000-244972 9, 2000 * cited by examiner
U.S. Patent US 7.672,840 B2
TOH|IN0/1
9/-/
/
U.S. Patent Mar. 2, 2010 Sheet 2 of 30 US 7.672,840 B2
A/G 2
20MS
CHD
INPUT S GNAL INPUT SIGNAL INPUT SIGNAL
(n-1) (n) (n+1)
| |
E-
INTEd|WSERX}Hd
|TWNOLIEAN0
|BTldWXE
U.S. Patent Mar. 2, 2010 Sheet 5 Of 30 US 7.672,840 B2
A/G 5
S T A RT
UTTERANCE/NON-UTTERANCE
JUDGMENT SO1
SO2
RESULT OF UTTERANCE
JUDGMENT
NON-UTTERANCE
CALCULATE NON-UTTERANCE
CONTINUATION TIME
SO7
DETERMINE VOICE SPEED DETERMINE VOICE SPEED

IN NON-UTTERANCE SECTION N UTTERANCE SECTION
VOICE SPEED
CONTRO PROCESS SO 5
OUTPUT VOICE SPEED S06

CONTROLLED SIGNAL
- - - - - - - - - - - - - - - - -
NO1 0?S
|
'
0 '0
G
A/6 7A
AWTHO
89/-/
IAAllVWNT3I0|NT0OIÒ
B|W|?
1/BONW8310 HON0W-YJOEN.
69/-/
Zp
—)
—
–)
–
+
?
=,
=
*
==,
|†
-
i
|
'
0
START
ACQUIRE DELAY
UTTERANCE/NON-UTTERANCE SO1
JUDGMENT
SO2
RESULT OF UTTERANCE
JUDGMENT
NON-UTTERANCE
CONTINUATION TIME
SO7
DETERM NE VOICE SPEED IN DETERMNE VOICE SPEED

NON-UTTERANCE SECTION N UTTERANCE SECTION
VOICE SPEED
CONTROL PROCESS SO 5
OUTPUT VOICE SPEED SO6

CONTROLLED SIGNAL
M(TEWN9|)S
(T-u)
A/G 76
START
UTTERANCE/NON-UT TERANCE
JUDGING PROCESS S10
u S11
JUDGED RESULT OF UTTERANCE
PROCESSING TARGET
FRAME?
NON-UTTERANCE
S 19
CALCULATE NON-UTTERANCE DETERMINE VOICE SPEED IN
CONTINUATION TIME IN PAST DIRECTION UTTERANCE SECTION
CONTINUATION TIME IN FUTURE DIRECTION
NON-UTTERANCE
CONTINUATION TIME
|N FUTURE DIRECTION
> t 7?
S15 YES
DETERMINE VOICE SPEED BASED DETERMINE WOICE SPEED BASED
ON NON-UTTERANCE CONTINUATION ON NON-UTTERANCE CONTINUATION
T ME IN PAST DIRECTION TIME IN FUTURE DIRECTION
VOICE SPEED CONTROL PROCESS S 17
OUTPUT VOICE CONTROLLED S18

SIGNAL
//9/-/
BONWA1|
A/G 79
RESULT OF
DEGREE OF RELIABILITY SUBTRACTION
HIGH LESS THAN OdB

EQUAL TO OR LARGER THAN
INTERMEDIATE out ESSEN3
LOW EQUAL TO OR LARGER THAN
3dB BUT LESS THAN 60B
NAUGHT (JUDGED EQUAL TO OR LARGER THAN
TO BE UTTERED) 6dB
A/G 20
ITEREDIATE is
assig,
TO BE UTTERED) to
A/G 27
S ART
UTTERANCE/NON-UTTERANCE SO1
JUDGMENT
ACQUIRE DEGREE OF S2O

RELIABILITY
SO2
RESULT OF UTTERANCE
JUDGMENT
NON-UTTERANCE
CONTNUATION TIME
SO7
DETERMINE VOICE SPEED IN DETERMINE VOICE SPEED

NON-UTTERANCE SECTION N UTTERANCE SECTION
WOCE SPEED
CONTROL PROCESS SO 5
OUTPUT VOICE SPEED S06

CONTROLLED SIGNAL
ZZ9/-/
A/G 23
VOICE SPEED
MULTIPLYING FACTOR
10 20 SIGNAL-TO-NOSE RATO
(dB)
A/G 24
ACQUIRE SIGNAL-T0 UTTERANCE/NON-UTTERANCE

NOISE RATIO JUDGMENT SO1
SO2
RESULT OF UTTERANCE
JUDGMENT
NON-UTTERANCE
CONTINUATION TIME SO7
DETERMINE VOICE SPEED IN DETERMINE VOICE SPEED

NON-UTTERANCE SECTION IN UTTERANCE SECTION
WOICE SPEED
CONTROL PROCESS
S05
OUTPUT WOICE SPEED

CONTROLLED SIGNAL SO6
A/G 26
PRIOR ART
WOICE SPEED
MULTIPLYING FACTOR)
2.0 - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.
ti NON-UTERANCE
CONTINUATION TIME
8Z80/-/
[BONWA]B
IIn10JB9NO|N1WY8AJBIES
U.S. Patent Sheet 30 of 30 US 7.672,840 B2
0£9/-/
3[0Q1B3Ad]S
US 7,672,840 B2
1. 2
VOICE SPEED CONTROL APPARATUS ance judgment. For instance, there is a case in which misjudg
ment might occur in the utterance/non-utterance judgment
CROSS-REFERENCE TO RELATED under a noisy environment. FIG. 27 is a graph showing an
APPLICATION example of an inputted Voice under an environment with no
noise. FIG. 28 is a graph showing an example of the inputted
This is a continuation of Application PCT/JP2004/010340, voice under an environment with a noise. In FIGS. 27 and 28,
filed on Jul. 21, 2004, now pending, the contents of which are each of upper graphs shows a power value, while each of
herein wholly incorporated by reference. lower graphs shows an example of a result of the utterance?
non-utterance judgment. Under the environment with no
BACKGROUND OF THE INVENTION 10 noise, the precise utterance/non-utterance judgment also
about utterance starting points and utterance endpoints is
1. Field of the Invention conducted. Under the noisy environment, however, a case is
The present invention relates to a technology effective in that a noise level takes a value approximate to or exceeding
being applied to an apparatus, a method and a program that the power value in the utterance starting points and in the
change a reproducing speed of a Voice without changing a 15 utterance endpoints, and in this case the utterance starting
tone pitch. points and the utterance endpoints are absorbed by the noises.
2. Description of the Related Art Hence, under the noisy environment, it is difficult to actualize
There has hitherto been proposed a technology for getting the precise utterance/non-utterance judgment. For example,
a content of a conversation easy to hear by slowing down a under the noisy environment, there is a high possibility,
speed of the conversion (which will hereinafter be called a wherein Voice elements exhibiting Small Voice power as at the
“voice speed') without changing a pitch of a Voice of a utterance starting points and at the utterance endpoints might
conversing partner. At this time, if only the Voice speed is be misjudged to be unuttered in spite of being uttered (which
simply slowed down, a delay corresponding to the slowdown are, e.g., the Voice elements depicted by dotted lines in the
occurs. Technologies of obviating the delay are proposed for lower graph in FIG. 28). The voice elements with the small
Solving this problem by diminishing a non-utterance section 25 Voice power are exemplified by unuttered consonants in addi
(a section in which a sound Such as a human Voice is not tion to the utterance starting points and the utterance end
uttered) existing in the middle (intermission or pause) of the points.
conversation and by getting faster the Voice speed in the If the process of diminishing the non-utterance section and
non-utterance section (refer to Patent documents 1 and 2). the process of increasing the Voice speed on the basis of the
FIG. 25 is a diagram showing an example of function 30 misjudgment described above are executed, Such problems
blocks of a conventional voice speed control apparatus P1. In arise that Vanishment of the Voice element occurs and the
the conventional voice speed control apparatus P1, with non-utterance continuation length is excessively reduced.
respect to an section judged to be non-utterance (which is, i.e., FIG.29 is an explanatory graph showing the problems caused
the non-utterance section) by an utterance/non-utterance in the case of executing the process of diminishing the non
judging unit P2, a continuation time calculating unit P3 cal 35 utterance section and the process of increasing the Voice
culates a length of continuation time of this non-utterance speed in the non-utterance section on the basis of the mis
section. Then, a Voice speed determining unit P4 determines judgment. In FIG. 29A, the utterance starting points and the
as to whether or not the voice speed should be increased utterance endpoints are accurately judged because of having
according to the continuation time of the non-utterance sec no noise. Hence, the process of diminishing the non-utterance
tion, and the voice speed control unit P5 controls the voice 40 section existing between the utterance starting point and the
speed in the non-utterance section. utterance endpoint and the process of increasing the Voice
FIG. 26 is a graphic chart for explaining a conventional speed, are properly carried out. On the other hand, in
mechanism of how the voice speed is controlled. In FIG. 26, FIG. 29B, the utterance starting point(s) and the utterance
“t1 represents a non-utterance continuation time threshold endpoint(s) are misjudged due to the noise. Therefore, in the
value. A section ranging from a start of the non-utterance 45 case of FIG. 29B, the process of diminishing the non-utter
section up to “t1 is called a protection section. In the protec ance section is executed without taking account of the utter
tion section, as shown in FIG. 26, the voice speed is set to, ance endpoint (a waveform of the utterance endpoint depicted
e.g., a 1-fold speed without being increased in most cases. If by the dotted line) judged to be the non-utterance section and
the continuation time (the non-utterance continuation time) the utterance starting points (two waveforms of the utterance
of the non-utterance section, which is acquired by the con 50 starting points drawn by the dotted lines: illustrated in Super
tinuation time calculating unit P3, exceeds “t1, the voice position on the waveform of the utterance endpoint) judged to
speed determining unit P4 determines that the voice speed is be the non-utterance section. As a result, such a problem is
doubled. Then, the voice speed control unit P5 controls the caused that the non-utterance section between the utterance
voice speed according to this value (the 2-fold value). Herein, starting point and the utterance endpoint, which are depicted
the numerical value, which is as specific as the 2-fold value, is 55 by the dotted lines, gets excessively short, and in the exem
an example, and other values (triple, quintuple, etc) may also plified case the vanishment of the voice element occurs due to
be available. The delay is obviated by such a process. cutoff of any one (or both) of the utterance starting point and
Patent document 1: Japanese Patent Application Laid the utterance endpoint. Further, in the case of increasing the
Open Publication No. 2003-216200 Voice speed in the non-utterance section, as compared with
Patent document 2: Japanese Patent Application Laid 60 the case of diminishing the non-utterance section, the prob
Open Publication No. 08-292796 lem that the utterance starting points and the utterance end
Patent document 3: Japanese Patent Application Laid points are lost is prevented. However, the problem of getting
Open Publication No. 200-244972 hard to hear the utterance starting points and the utterance
On the occasion of executing the process of diminishing endpoints still remains unsolved.
the non-utterance section and the process of increasing the 65 This problem, especially about the utterance endpoints,
Voice speed in the non-utterance section, however, it is nec can be obviated to Some extent by providing a protection
essary to take account of accuracy of the utterance/non-utter section. FIG. 30 is a graph showing an example of how the
US 7,672,840 B2
3 4
Voice speed is controlled in the case of providing the protec vent or reduce the adverse influence as of the vanishment of
tion section. If the misjudgment about the utterance endpoints the voice element at the utterance endpoints. On the other
occurs in excess over the protection section, the problem of hand, the reproducing speed Is determined to get higher as the
getting hard to hear the utterance endpoints is not obviated. In non-utterance continuation length gets larger. Accordingly,
this case, it is considered to set the protection section com with respect to the segment having the long non-utterance
paratively long. In the protection section, however, the Voice continuation time, i.e., the segment exhibiting a low possibil
speed is not basically increased, and hence excessive elonga ity that the utterance endpoints exist, the delay can be effi
tion of the protection section hinders the obviation of the ciently obviated by emphasizing the speedup.
delay and is therefore unpreferable. According to the first aspect of the present invention, the
10 speed control apparatus may be configured to further com
SUMMARY OF THE INVENTION prise a speed decreasing unit getting, if judged to be utterance
by the utterance/non-utterance judging unit, the reproducing
Such being the case, it is an object of the present invention, speed of the Voice signal slower than a normal reproducing
which solves these problems, to provide an apparatus capable speed, and a delay quantity acquiring unit acquiring cumula
of under the noisy environment, making the utterance end 15 tively a delay quantity generated by the speed decreasing unit.
points easy to hear even in the case where particularly the If thus configured, the determining unit determines a maxi
utterance endpoints are misjudged to be the non-utterance mum value of the reproducing speed on the basis of an accu
section, and obviating the delay. mulated value of the delay quantities acquired by the delay
The present invention takes the following configurations in quantity acquiring unit so that the maximum value of the
order to solve the problems. A first mode of the present reproducing speed gets larger as the accumulated value of the
invention is a speed control apparatus comprising an utter delay quantities gets larger, and determines the reproducing
ance/non-utterance judging unit, a non-utterance continua speed in the processing target segment in the Voice signals,
tion length acquiring unit, a determining unit and a changing corresponding to this maximum value and the non-utterance
unit. The utterance/non-utterance judging unit judges continuation length.
whether a processing target segment in inputted Voice signals 25 According to the first aspect of the present invention having
is an utterance segment or a non-utterance segment. The this configuration, the speed decreasing unit carries out the
non-utterance continuation length acquiring unit acquires a low-speed reproduction of the segment judged to be uttered,
non-utterance continuation length representing a length of the and hence a user gets easy to hear the Voice of the segment
Voice signal judged continuously to be the non-utterance by (the uttered segment) judged to be uttered. Then, the deter
the utterance/non-utterance judging unit. The non-utterance 30 mining unit and the changing unit increase the reproducing
continuation length may be expressed by use of any kind of speed of the segment (unuttered segment) judged to be unut
units on condition that the unit represents the length of the tered, thereby obviating the delay caused by the speed
Voice signal. For instance, the non-utterance continuation decreasing unit. At this time, the determining unit determines
length may also be expressed by employing a length of time the maximum value when determining the reproducing
for which to reproduce the Voice signal at a normal speed, and 35 speed, corresponding to the accumulated value of the delay
may further be expressed by use of a frame count of the voice quantities generated from the speed-down by the speed
signals segmented into a plurality of frames. The determining decreasing unit. Therefore, the reproducing speed is deter
unit determines a reproducing speed of the processing target mined so as to become higher as the accumulated value of the
segment in the Voice signals in accordance with the non delay quantities becomes larger, thereby effectively obviating
utterance continuation length acquired by the non-utterance 40 the accumulated delay. On the other hand, if the accumulated
continuation length acquiring unit so that the reproducing value of the delay quantities is Small, the increase in repro
speed gets higher as the non-utterance continuation length ducing speed is restrained, so that the priority is given to
gets larger and so that an increase in reproducing speed is preventing the adverse influence as of the missing Voice ele
restrained to a greater degree as the non-utterance continua ment (the skip of the voice element) rather than the speedup
tion length becomes Smaller. The changing unit changes the 45 that is not required in particular.
reproducing speed of the Voice signal, corresponding to the According to the first aspect of the present invention, the
reproducing speed determined by the determining unit. utterance/non-utterance judging unit may be constructed So
Generally, the utterance endpoints are likely to be judged to as to further make judgment about predetermined segments in
be unuttered under the noisy environment for the reason such the future direction from the processing target segment in the
as the Voice powers being Small. Accordingly, in the Voice 50 inputted Voice signals. In this case, the non-utterance continu
signals peripheral to the utterance endpoints, if the Voice ation length acquiring unit is constructed to acquire a future
speed is abruptly increased after the non-utterance judgment directional continuation length representing a length of the
has been made, the result is the Voice speedina segment of the signal judged to be the non-utterance signal continuously
misjudged utterance endpoints is Subjected to a sharp from the processing target segment in the future direction.
increase, and an adverse influence as of the Vanishment of the 55 Furthermore, the determining unit determines, if the future
voice element has hitherto been produced. To cope with this directional continuation length is Smaller than a threshold
problem, according to the first mode of the present invention, value, the reproducing speed in the processing target segment
the non-utterance continuation length is acquired, and the in accordance with the future-directional continuation length
reproducing speed in the segment judged to be unuttered is so that the reproducing speed becomes slower as the future
determined corresponding to the acquired non-utterance con 60 directional continuation length becomes Smaller.
tinuation length. At this time, the Voice speed is determined so According to the first aspect of the present invention having
that the increase in reproducing speed is restrained to the this configuration, the Voice speed of the segment judged to be
greater degree as the non-utterance continuation time gets unuttered is determined based on the non-utterance continu
shorter. Hence, there is restrained the degree of the speedup in ation length in the future direction. To be specific, the deter
the segment having the short non-utterance continuation 65 mining unit determines the Voice speed so that the Voice speed
time, i.e., the segment exhibiting a high possibility in which becomes slower as the non-utterance continuation length in
the utterance endpoints exist. It is therefore possible to pre the future direction becomes shorter. Therefore, the voice
US 7,672,840 B2
5 6
speed of the non-utterance segment close to the segment hence the maximum value of the speedup is set low, thereby
judged to be uttered is restrained from rising. Consequently, it preventing the occurrence of the adverse influence.
is feasible to prevent or reduce the adverse influence (as of the A second aspect of the present invention is a speed control
Vanishment of the Voice element etc) if misjudged to be unut apparatus comprising an utterance/non-utterance judging
tered at the utterance starting points. unit, a speed decreasing unit, a delay quantity acquiring unit,
According to the first aspect of the present invention, the a determining unit and a changing unit. The utterance/non
utterance/non-utterance judging unit may be constructed so utterance judging unit judges whether a processing target
as to further acquire a degree of reliability on the judgment segment in inputted Voice signals is an utterance segment or a
result about the respective segments to be judged. In this case, non-utterance segment. The speed decreasing unit makes, if
the determining unit determines the maximum value of the 10 judged to be the utterance by the utterance/non-utterance
reproducing speed in accordance with the degree of reliability judging unit, a reproducing speed of the Voice signal slower
so that the maximum value of the reproducing speed gets than a normal reproducing speed. The delay quantity acquir
larger as the degree of reliability gets higher, and determines ing unit acquires cumulatively a delay quantity generated by
the reproducing speed in the processing target segment in the the speed decreasing unit. The determining unit determines
Voice signals in accordance with the maximum value and the 15 the reproducing speed of the processing target segment in the
non-utterance continuation length. Voice signals on the basis of the accumulated delay quantity
According to the first aspect of the present invention having acquired by the delay quantity acquiring unit so that the
this configuration, the maximum Voice speed is determined reproducing speed gets higher as the delay quantity gets
based on the degree of reliability on the result of the judg larger. The changing unit changes the reproducing speed of
ment. Specifically, the maximum Voice speed on the occasion the Voice signal according to the reproducing speed deter
of determining the Voice speed of the non-utterance segment mined by the determining unit.
gets higher as the degree of reliability on the result of the According to the second aspect of the present invention
judgment gets higher. Hence, if the degree of reliability on the having this configuration, the increase in reproducing speed is
result of judgment of being unuttered is low, the maximum actualized based on not the non-utterance continuation length
voice speed can be restrained low. It is therefore possible to 25
but the accumulated value of the delay quantities, and Sub
reduce the adverse influence as of the skip of the voice ele stantially the same effect as in the first aspect of the present
ment when the misjudgment occurs. While on the other hand, invention is exhibited.
if the degree of reliability on the result of judgment of being A third aspect of the present invention is a speed control
unuttered is high, the maximum Voice speed is set high. apparatus comprising a unit judging whether an inputted
Hence, the priority is given to increasing the Voice speed 30
Voice signal is an utterance or anon-utterance; and a unit
rather than reducing the adverse influence in the case of the decreasing a reproducing speed in an utterance section in the
misjudgment, and the accumulation of the delays can be Voice signal while performing one of diminishing a non
effectively reduced. utterance section in the Voice signal and increasing the repro
According to the first aspect of the present invention, the ducing speed in the non-utterance section.
utterance/non-utterance judging unit may be constructed to 35
Subtract an average of power values of the Voice signals in the The first aspect through the third aspect may be actualized
segments judged to be the non-utterance segments in the past in Such a way that an information processing apparatus
from a power value of the Voice signal in the processing target executes a program. Namely, the present invention can be
segment, and to acquire, based on a result of this Subtraction, specified as a program for making the information processing
a degree of reliability that gets higher as a value of the Sub 40 apparatus execute the processes executed by the respective
tracted result gets lower and also a degree of reliability that unit in the first aspect through the third aspect described
becomes lower as the value of the subtracted result becomes above, or specified as a recording medium recorded with the
higher. program. Further, the present invention may also be specified
According to the first aspect of the present invention, the as a method by which the information processing apparatus
speed control apparatus may be configured to further com 45 executes the processes executed by the respective unit
prise signal-to-noise ratio acquiring unit acquiring a signal described above.
to-noise ratio in the processing target segment in the Voice According to the present invention, it is possible to prevent
signals. In this case, the determining unit determines a maxi or reduce the adverse influence as of the vanishment of the
mum value of the reproducing speed in accordance with the voice element etc at the utterance endpoints. On the other
signal-to-noise ratio acquired by the signal-to-noise ratio 50 hand, in the segment having a low possibility in which the
acquiring unit so that the maximum value of the reproducing utterance endpoints exist, the delay can be efficiently obvi
speed gets larger as the signal-to-noise ratio gets higher and ated by emphasizing the speedup.
so that the maximum value of the reproducing speed becomes
Smaller as the signal-to-noise ratio becomes lower, and deter BRIEF DESCRIPTION OF THE DRAWINGS
mines the reproducing speed in the processing target segment 55
in the Voice signals in accordance with the maximum value FIG. 1 is a diagram showing an example of function blocks
and the non-utterance continuation length. in a first embodiment of a voice speed control apparatus;
According to the first aspect of the present invention having FIG. 2 is a diagram showing an example of input signals;
this configuration, the maximum Voice speed is determined FIG. 3 is a graphic chart showing an example of how a
corresponding to the signal-to-noise ratio (SN ratio). If the 60
Voice speed is controlled by a Voice speed determining unit if
SN ratio is high, i.e., if the signal is preferable, there is a low judged to be unuttered;
possibility in which the misjudgment occurs in the utterance?
non-utterance judgment, and therefore the maximum value of FIG. 4 is a graphic chart showing an example of how the
the speedup is set high, thereby scheming to obviate the delay. Voice speed is controlled by the Voice speed determining unit
Whereas if the SN ratio is low, i.e., whereas if the signal is not 65 if judged to be unuttered;
preferable, there is a high possibility in which the misjudg FIG.5 is a flowchart showing an operational example in the
ment occurs in the utterance/non-utterance judgment, and first embodiment of the Voice speed control apparatus;
US 7,672,840 B2
7 8
FIG. 6 is a graphic chart showing an example of how the DESCRIPTION OF THE REFERENCE
voice speed is controlled in the first embodiment of the voice NUMERALS AND SYMBOLS
speed control apparatus;
FIGS. 7A, 7B, 7C, and 7D are graphs showing an example 1a, 1b, 1c. 1d. 1e Voice speed control apparatus
of an effect in the first embodiment of the voice speed control 2a, 2C, 2d utterance/non-utterance judging unit
apparatus; 3a, 3c continuation time calculating unit
FIG. 8 is a diagram showing an example of function blocks 4a, 4b, 4c., 4d, 4e Voice speed determining unit
in a second embodiment of the Voice speed control apparatus; 5a, 5b voice speed control unit
FIG.9 is a graphic chart showing a relationship between an 6 delay quantity acquiring unit
accumulated delay quantity and the Voice speed; 10 7 signal-to-noise ratio acquiring unit
FIG. 10 is a graphic chart showing a relationship between P1 Voice speed control apparatus
non-utterance continuation time and the Voice speed; P2 utterance/non-utterance judging unit
FIG. 11 is a graphic chart showing a relationship between P3 continuation time calculating unit
the non-utterance continuation time and the Voice speed; P4 Voice speed determining unit
FIG. 12 is a flowchart showing an operational example in
15 P5 voice speed control unit
the second embodiment of the Voice speed control apparatus; DETAILED DESCRIPTION OF THE PREFERRED
FIG. 13 is a diagram showing an example of function EMBODIMENTS
blocks in a third embodiment of a voice speed control appa
ratus; First Embodiment
FIG. 14 is a diagram showing an example of a frame in
which the utterance/non-utterance judging unit acquires a System Architecture
result of utterance/non-utterance judgment in the third To start with, an example of a configuration of a Voice
embodiment; speed control apparatus 1a will be explained by way of a first
FIGS. 15A and 15B are graphs showing a relationship 25 embodiment of a voice speed control apparatus 1. The Voice
between the voice speed determined by the voice speed deter speed control apparatus 1a includes hardwarewise a CPU
mining unit and the non-utterance continuation time in the (Central Processing Unit), a main storage device (RAM) and
third embodiment; an auxiliary storage device, which are connected to each other
FIG. 16 is a flowchart showing an operational example in via abus. The auxiliary storage device is constructed by use of
the third embodiment of the voice speed control apparatus; 30 a nonvolatile storage device. The nonvolatile storage device
FIG. 17 is a graphic chart showing an example of how the connoted herein represents a so-called ROM (Read-Only
voice speed is controlled in the third embodiment of the voice Memory including an EPROM (Erasable Programmable
speed control apparatus; Read-Only Memory), an EEPROM (Electrically Erasable
FIG. 18 is a diagram showing an example of function Programmable Read-Only Memory), a mask ROM, etc), an
blocks in a fourth embodiment of the voice speed control 35 FRAM (Ferroelectric RAM), a hard disc and so on.
apparatus; FIG. 1 is a diagram showing an example of function blocks
FIG. 19 is a table showing a relational example between a of the Voice speed control apparatus 1a. The Voice speed
degree of reliability and a result of subtraction; control apparatus 1a functions as an apparatus including an
FIG. 20 is a table showing a relational example between a utterance/non-utterance judging unit 2a, a continuation time
degree of reliability and a maximum Voice speed; 40 calculating unit 3a, a Voice speed determining unit 4a and a
Voice speed control unit 5a by loading the main storage
FIG. 21 is a flowchart showing an operational example in device with a variety of programs (an OS, applications, etc)
the fourth embodiment of the voice speed control apparatus; stored in the auxiliary storage device and by making the CPU
FIG. 22 is a diagram showing an example of function execute the programs. The CPU executes the programs,
blocks in a fifth embodiment of the voice speed control appa 45 thereby actualizing the utterance/non-utterance judging unit
ratus; 2a, the continuation time calculating unit 3a, the Voice speed
FIG. 23 is a graph showing a relational example between a determining unit 4a and the Voice speed control unit 5a.
signal-to-noise ratio and the maximum Voice speed; Further, the utterance/non-utterance judging unit 2a, the con
FIG. 24 is a flowchart showing an operational example in tinuation time calculating unit 3a, the Voice speed determin
the fifth embodiment of the voice speed control apparatus; 50 ing unit 4a and the Voice speed control unit 5a may also be
FIG. 25 is a diagram showing an example of function constructed as dedicated chips. Next, the respective function
blocks of a conventional Voice speed control apparatus; units included in the Voice speed control apparatus la will be
FIG. 26 is an explanatory graphic chart of a conventional explained.
mechanism for controlling the Voice speed; <Utterance/Non-Utterance Judging United
FIG. 27 is a graph showing an example of an input voice
55 The utterance/non-utterance judging unit 2a judges
whether a processing target segment (frame) of input signals
under an environment with no noise; inputted to the Voice speed control apparatus 1a is a voice
FIG. 28 is a graph showing an example of then input voice utterance segment or a voice non-utterance segment (this
under an environment with a noise; process is called a utterance/non-utterance judgment). Any
FIGS. 29A and 29B are graphs for explaining a problem 60 type of existing utterance/non-utterance judging technolo
occurred in the case of executing a process of diminishing a gies may be applied to the utterance/non-utterance judging
non-utterance section and a process of increasing the Voice unit 2a. A specific example of the process executed by the
speed in a non-utterance section on the basis of misjudgment; utterance/non-utterance judging unit 2a will hereinafter be
and described.
FIG. 30 is a graph showing an example of how the voice 65 FIG. 2 is a diagram showing an example of the input
speed is controlled in the case of providing a protection sec signals. FIG. 2 shows the example of the input signals in the
tion. case of executing the process on a frame-by-frame basis,
US 7,672,840 B2
9 10
wherein one frame is on the order of 20 ms (160 samples Further, the Voice speed determining unit 4a may double
when a sample rate is 8 kHz). Namely, in this case, each of the the Voice speed at multi-stages (5-stages in FIG. 4) in the
frames as illustrated in FIG. 2 becomes the processing target increasing direction in accordance with the non-utterance
segment (frame). For example, the utterance/non-utterance continuation time (see FIG. 4). In this case, the Voice speed
judging unit 2a can judge based on a power value of the input 5 determining unit 4a may determine the Voice speed at the
signal whether the voice is uttered or unuttered. When the multi-stages on the basis of a plurality of threshold values
processing target frame is set to an input signal (n), at first, the (t1-t5) provided for the non-utterance continuation time. The
utterance/non-utterance judging unit 2a calculates a power Voice speed determining unit 4a transfers the determined
value (e.g., a Sum-of-squares mean value of the input signal voice speed to the voice speed control unit 5a.
(n)) of the input signal (n). Next, the utterance/non-utterance 10 Moreover, the voice speed determining unit 4a, if the utter
judging unit 2a calculates a non-utterance mean power value ance/non-utterance judging unit 2a judges the Voice to be
(a mean on the unit of dB) of the past frames (an input signal uttered, determines a voice speed in an utterance section. For
(n-1), an input signal (n-2). . . . ). Subsequently, the utter example, the Voice speed determining unit 4a determines that
ance/non-utterance judging unit 2a Subtracts the non-utter the Voice speed in the utterance section is set slower than a
ance mean power value of the past frames from the power 15 normal voice speed (e.g., slower than a 1-fold speed). Such a
value of the input signal (n). Then, the utterance/non-utter scheme is thus taken, a user gets easy to hear the Voice in the
ance judging unit 2a makes the utterance/non-utterance judg utterance section.
ment, depending on whether a result of this subtraction is <Voice Speed Control United
larger than a threshold value or not. The voice speed control unit 5a changes the voice speed of
Note that a content of the process of making the utterance? the input signal according to the Voice speed determined by
non-utterance judgment described above is given by way of the Voice speed determining unit 4a. To be specific, the Voice
one example, and the utterance/non-utterance judgment may speed control unit 5a changes the Voice speed in the process
be actualized by applying other processes. The utterance/non ing target frame, i.e., the frame subjected to the utterance?
utterance judging unit 2a transfers the result of the judgment non-utterance judgment made by the utterance/non-utterance
to the continuation time calculating unit 3a. 25 judging unit 2a. Then, the Voice speed control unit 5a outputs,
<Continuation Time Calculating United as a voice speed controlled signal, the signal after the Voice
The continuation time calculating unit 3a calculates a speed has been changed.
period of time (non-utterance continuation time) as a length
of the signal of which a non-utterance state continues in the OPERATIONAL EXAMPLE
input signal on the occasion of reproducing the signal at a 30
normal speed. Namely, the continuation time calculating unit FIG.5 is a flowchart showing an operational example of the
3a calculates the period of time for which the utterance/non voice speed control apparatus 1a. The operational example of
utterance judging unit 2a continues to judge that there is a the voice speed control apparatus 1a will hereinafter be
continuous non-utterance state. The continuation time calcu explained with reference to FIG. 5. Upon a start of the pro
lating unit 3a transfers the thus-calculated non-utterance con 35 cess, to begin with, the input signals are inputted to the Voice
tinuation time to the Voice speed determining unit 4a. speed control apparatus 1a. Then, the utterance/non-utter
Another scheme may also be taken, wherein other values ance judging unit 2a makes the utterance/non-utterance judg
Such as a number of frames (frame count) judged to be unut ment about the processing target frame in the input signals
tered and an amplitude count are, it is to be noted, each used (S01). As a result of this judgment, when judging that the
as the length of the signal of which the non-utterance state 40 processing target frame is the non-utterance frame (S02: non
continues in the input signal. utterance), the continuation time calculating unit 3a calcu
<Voice Speed Determining United lates a period of non-utterance continuation time (S03). Next,
The voice speed determining unit 4a, if the result of the the Voice speed determining unit 4a determines a Voice speed
judgment by the utterance/non-utterance judging unit 2a in the non-utterance section on the basis of this non-utterance
shows the non-utterance state, determines a voice speed in 45 continuation time (S04). While on the other hand, when judg
accordance with the non-utterance continuation time calcu ing in S02 that the processing target frame is the utterance
lated by the continuation time calculating unit 3a. FIGS.3 and frame (S02: utterance), the voice speed determining unit 4a
4 are graphic charts each showing an example of how the determines the Voice speed in the processing target frame as
Voice speed determining unit 4a controls the Voice speed the utterance section (S07). Then, the voice speed control unit
when judged to be the non-utterance state. In FIGS. 3 and 4, 50 5a executes a voice speed control process of the processing
a graph depicted by a bold broken line represents a relation target frame (S05) and outputs the voice speed controlled
ship between the voice speed determined by the voice speed signal (S06).
determining unit 4a and the non-utterance continuation time. Operation/Effect
The Voice speed determining unit 4a determines so that the The Voice speed control apparatus 1a determines that the
Voice speed gets faster as the non-utterance continuation time 55 Voice speed in the non-utterance section becomes faster
gets longer. For instance, the Voice speed determining unit 4a according to the length of the non-utterance continuation
may double a multiplying factor of the Voice speed linearly in time. FIG. 6 is a graphic chart showing an example of how the
an increasing direction from a point of time when the non Voice speed control apparatus 1a controls the Voice speed.
utterance continuation time exceeds a threshold value t2 up to The scheme being taken as described above, in the voice
a threshold value t3 (see FIG.3). In this case, for example, the 60 speed control apparatus 1a, for instance, even if misjudgment
Voice speed determining unit 4a may calculate the Voice of utterance endpoints occurs exceeding a protection section,
speed from the non-utterance continuation time on the basis these utterance endpoints are not speeded up, wherein a
of a mathematical expression representing the relationship degree of speedup is restrained (e.g., the Voice speed is
between the non-utterance continuation time and the Voice changed to a level approximate to the 1-fold speed). It is
speed. At this time, the Voice speed may also be set to increase 65 therefore possible to obviate a problem of causing difficulty
linearly from t2 to t3 and may also be set increase non to hear the Voice elements of the utterance endpoints and a
linearly. problem that some Voice elements vanish at the utterance
US 7,672,840 B2
11 12
endpoints. Further, the Voice speed increases corresponding utterance section. For instance, as in the case of an IP tele
to the length of the non-utterance continuation time, and phone (Internet Protocol telephone), it is effective that the
hence it does not happen that the obviation of the delay is Voice speed control apparatus 1a is applied to a system where
hindered. Moreover, as in FIG. 6, the voice utterance can be the delay occurs even when executing the process of slowing
made easier to hearthan by the prior artina way that restrains down the Voice speed particularly in the utterance section. In
next utterance starting points (or part of the utterance starting this case, a delay occurred due to a cause different from the
points) as the case may be. voice speed control in the IP telephone can be obviated.
Further, the Voice speed control apparatus 1a is capable of Further, the Voice speed control apparatus 1a may be con
obviating the problem such as the vanishment of the voice figured to, in the case of decreasing the Voice speed in the
elements without increasing the delay quantity so much by 10
utterance section, calculate the continuation time in the utter
changing a length of the protection section and changing a ance section and to determine the Voice speed in the utterance
gradient of a Voice speed change rate after the protection section corresponding to this continuation time.
section. In other words, the Voice speed can be minutely
controlled, and, if the misjudgment occurs, an adverse influ Second Embodiment
ence thereof can be minimized. For instance, in the case of 15
emphasizing a realtime process, it is effective to shorten the
protection section while increasing the gradient. Moreover, System Architecture
for example, in FIG. 3, the protection section ranging up to Next, an example of a configuration of a Voice speed con
“t1 is reduced down to “t2” shorter than “t1. In FIG. 3, the trol apparatus 1b will be explained by way of a second
Vanishment of the Voice elements can be prevented in a way embodiment of the voice speed control apparatus 1. FIG. 8 is
that equalizes the quantity of obviation of the delay in the a diagram showing an example of function blocks of the Voice
prior art and the quantity of obviation of the delay in tile voice speed control apparatus 1b. The Voice speed control appara
speed control apparatus 1a to each other by equalizing a tus 1b is different from the voice speed control apparatus 1a
length of time from “t2” to “t1' to a length of time from"t1” interms of including a delay quantity acquiring unit 6, a Voice
to t3. 25 speed determining unit 4b that replaces the Voice speed deter
Furthermore, FIG. 7 is a diagram showing an example of mining unit 4a and a voice speed control unit 5b in place of the
the effect of the voice speed control apparatus la. FIGS. 7(a) voice speed control unit 5a. Other components of the voice
through 7(d) are graphs depicting waveforms of a Voice speed control apparatus 1b are basically the same as those of
uttered such as “ara-ta-ni-ha-kku-tsu-shi (it is newly exca the voice speed control apparatus 1a. Different points of the
vated). FIG. 7(a) shows the waveform undergoing none of 30 Voice speed control apparatus 1b from the Voice speed control
processing both in the utterance section and in the non-utter apparatus 1a will hereinafter be described.
ance section, i.e., shows the input signals themselves. In <Voice Speed Control United
FIGS. 7(b) through 7(d), the voice speed is slowed down in The voice speed control unit 5b, if the result of the judg
the utterance section. FIG. 7(b) illustrates an output wave ment by the utterance/non-utterance judging unit 2a shows
form in the case of carrying out the prior art that diminishes 35
the utterance, slows down the Voice speed of the processing
the non-utterance section. FIG. 7(c) shows an output wave target frame. At this time, the voice speed control unit 5b may
form in the case of carrying out the prior art that increases the decrease the voice speed on the basis of the voice speed in the
voice speed in the non-utterance section. Then, FIG. 7(d) utterance section that is determined by the voice speed deter
shows an output waveform (a waveform of a Voice speed mining unit 4b, and may also decrease the Voice speed in the
change signal) in a case where the Voice speed control appa 40
utterance section down to a predetermined Voice speed irre
ratus 1a increases the Voice speed in the non-utterance sec spective of the determination of the Voice speed determining
tion. As understood from FIG. 7(a), a voice element “tsu' as unit 4b.
one of the utterance endpoints in "ha-kku-tsu-shi' shows a Further, the voice speed control unit 5b notifies the delay
small power value. Hence, this voice element “tsu' might be quantity acquiring unit 6 of a delay quantity generated each
misjudged to be unuttered. Due to the influence of the mis 45
time the Voice speed is changed. For example, the Voice speed
judgment, it is known from FIGS. 7(b) and 7(c) that the control unit 5b may acquire the delay quantity taking a posi
vanishment of the voice element occurs. On the other hand, in
FIG. 7(d), even if the voice element “tsu' is misjudged to be tive value in the case of carrying out the Voice speed control to
unuttered, it is recognized that none of the vanishment of the decrease the Voice speed in the utterance section and the delay
Voice element occurs because of preventing a rapid increase 50 quantity taking a negative value in the case of carrying out the
in Voice speed in the non-utterance section. Voice speed control to increase the Voice speed in the non
Moreover, Such a problem pertaining to the delay time has utterance section, and may notify the delay quantity acquiring
hitherto arisen that in the case of bidirectional communica unit 6 of the respective delay quantities.
tions, when changing a reproducing speed, the delay <Delay Quantity Acquiring United
increases before and after the change, and the communica 55 The delay quantity acquiring unit 6 acquires an accumu
tions get hard to perform or can not be established, however, lated delay quantity at each point of time of processing by
this type of problem can be also solved. accumulating the delay quantities generated when the Voice
speed control unit 5b decreases the voice speed in the utter
MODIFIED EXAMPLE ance section (which will hereinafter be referred to as the
60 “accumulated delay quantity”). For instance, the delay quan
The Voice speed control apparatus 1a, as shown in FIG. 6. tity acquiring unit 6 acquires the delay quantity generated by
executes a process of decreasing the Voice speed in the utter the Voice speed control process from the Voice speed control
ance section but may also be configured not to execute this unit 5b each time the process is executed, and accumulates
process. Namely, the Voice speed control apparatus 1 a these delay quantities, whereby the accumulated delay quan
executes the process of increasing the Voice speed in the 65 tity may be obtained. The delay quantity acquiring unit 6
non-utterance section but may also be configured not to transfers the accumulated delay quantity at each point of time
execute the process of decreasing the Voice speed in the of processing to the Voice speed determining unit 4b.
US 7,672,840 B2
13 14
<Voice Speed Determining United determines the maximum voice speed serving as a criterion
The voice speed determining unit 4b is different from the for determining the Voice speed in the non-utterance section,
Voice speed determining unit 4a in terms of a point of deter corresponding to an accumulated State of the delay quantities.
mining, on the occasion of determining the Voice speed in the This scheme being thus taken, it is possible to prevent the
non-utterance section, the Voice speed on the basis of the Voice speed from being unnecessarily increased if the accu
non-utterance continuation time obtained by the continuation mulated delay quantity is Small. In other words, the Vanish
time calculating unit 3a and the accumulated delay quantity ment of the voice elements, which is caused in the case of the
obtained by the delay quantity acquiring unit 6. FIG. 9 is a Small accumulated delay quantity, can be reduced more effec
graphic chart showing a relationship between the accumu tively than by the Voice speed control apparatus 1a.
lated delay quantity and the Voice speed. The Voice speed 10
determining unit 4b determines, based on the accumulated MODIFIED EXAMPLE
delay quantity, a maximum Voice speed. The Voice speed
determining unit 4b increases, if the accumulated delay quan The Voice speed control apparatus 1b may be configured so
tity exists between the threshold value d1 and the threshold that the continuation time calculating unit 3a does not calcu
value d2, as shown in FIG. 9, the maximum voice speed 15 late the continuation time in the non-utterance section, i.e.,
according to a rise in the accumulated delay quantity. FIGS. does not calculate the non-utterance continuation time. If
10 and 11 are graphic charts each showing a relationship configured in this way, the Voice speed determining unit 4b
between the non-utterance continuation time and the Voice determines the Voice speed on the basis of only the accumu
speed. The Voice speed determining unit 4b determines, based lated delay quantity. To be specific, in this configuration, the
on the maximum voice speed determined according to the Voice speed determining unit 4b determines not the maximum
accumulated delay quantity and on the non-utterance con Voice speed based on the accumulated delay quantity but the
tinuation time, the Voice speed along the graphs as depicted in Voice speed based on the accumulated delay quantity in the
FIGS. 10 and 11. Namely, the voice speed determining unit 4b graph shown in FIG. 9. For example, the determination of the
determines the Voice speed so that the Voice speed gets faster Voice speed can be actualized by setting a value along the axis
as the non-utterance continuation time gets longer, wherein 25 of ordinate as the voice speed in the graph shown in FIG. 9.
the maximum Voice speed determined based on the accumu Further, the voice speed control unit 5b in the voice speed
lated delay quantity is set as an upper limit. control apparatus 1b may notify, after the processes in S05
and S06, the delay quantity acquiring unit 6 of the delay
OPERATIONAL EXAMPLE quantity.
30 Moreover, not the voice speed control unit 5b but the voice
FIG. 12 is a flowchart showing an operational example of speed determining unit 4b may acquire the delay quantity and
the voice speed control apparatus 1b. Note that in FIG. 12, the may notify the delay quantity acquiring unit 6 of the delay
same processes as those shown in the flowchart in FIG. 5 are quantity.
marked with the same reference symbols. Referring to FIG.
12, an explanation of the operational example of the Voice 35 Third Embodiment
speed control apparatus 1b will hereinafter be focused on the
different processes from those of the voice speed control System Architecture
apparatus 1a. Next, an example of a configuration of a Voice speed con
Upon a start of processing, to begin with, the input signals trol apparatus 1c will be explained by way of a third embodi
are inputted to the Voice speed control apparatus 1b, and the 40 ment of the voice speed control apparatus 1. FIG. 13 is a
delay quantity acquiring unit 6 acquires the accumulated diagram showing an example of function blocks of the Voice
delay quantity at this point of time (S08). Thereafter, the speed control apparatus 1c. The Voice speed control apparatus
utterance/non-utterance judging unit 2a makes the utterance? 1c is different from the voice speed control apparatus 1a in
non-utterance judgment (S01). Then, after the continuation terms of including an utterance/non-utterance judging unit 2c
time calculating unit 3a has calculated the non-utterance con 45 that replaces the utterance/non-utterance judging unit 2a, a
tinuation time (after S03), the voice speed determining unit continuation time calculating unit 3c in place of the continu
4b determines the Voice speed in the non-utterance section on ation time calculating unit 3a, and a voice speed determining
the basis of the accumulated delay quantity obtained by the unit 4c as a Substitute for the Voice speed determining unit 4a.
delay quantity acquiring unit 6 and the non-utterance con Other components of the Voice speed control apparatus 1c are
tinuation time obtained by the continuation time calculating 50 basically the same as those of the Voice speed control appa
unit 3a (S09). After this process, as in the case of the voice ratus 1a. Different points of the Voice speed control apparatus
speed control apparatus 1a, the voice speed control unit 5b of 1c from the voice speed control apparatus 1a will hereinafter
the Voice speed control apparatus 1b executes the processes in be described.
S05 and S06. Further, the process in a case where the result of <Utterance/Non-Utterance Judging United
the judgment by the utterance/non-utterance judging unit 2a 55 The utterance/non-utterance judging unit 2c is different
shows the utterance, is the same as in the case of the Voice from the utterance/non-utterance judging unit 2a in terms of
speed control apparatus 1a (refer to S07) acquiring results of the utterance/non-utterance judgments
Operation/Effect about, in addition to the processing target frame, frames
One of the reasons why the Voice speed control apparatus 1 (frames in a past direction) anterior to the processing target
increases the Voice speed in the non-utterance section lies in 60 frame and frames (frames in a future direction) posterior
obviating the delay occurred due to the control of decreasing thereto. FIG. 14 is a diagram showing an example of the
the voice speed in the utterance section. Therefore, if almost frames about which the utterance/non-utterance judging unit
no delay occurs, there is no necessity of increasing the Voice 2c acquires the results of the utterance/non-utterance judg
speed in the non-utterance section. Hence, it is also effective ments. The utterance/non-utterance judging unit 2c acquires
to control the Voice speed in the non-utterance section, cor 65 the results of the utterance/non-utterance judgments about the
responding to the accumulated delay quantity. According to processing target frame, L-pieces of frames anterior to this
Such a point of view, the Voice speed control apparatus 1b processing target frame and M-pieces of frames posterior
US 7,672,840 B2
15 16
thereto. Namely, the utterance/non-utterance judging unit 2c utterance continuation time in the future direction and the
acquires the results of the utterance/non-utterance judgments voice speed, and FIG. 15(b) is the graph showing the rela
about the (1+L+M) pieces of frames. The utterance/non-ut tionship between the non-utterance continuation time in the
terance judging unit 2c may acquire the results of the utter past direction and the Voice speed. The Voice speed determin
ance/non-utterance judgments about these respective frames 5 ing unit 4c., at first, acquires the Voice speed along the graph
by executing the utterance/non-utterance judgment each shown in FIG. 15(a) on the basis of the non-utterance con
time. Moreover, the utterance/non-utterance judging unit 2c tinuation time in the future direction. The voice speed deter
may acquire the results of the utterance/non-utterance judg mining unit 4c., if the Voice speed at this time is less than a
ments about the respective frames in a way that stores the 2-fold speed, in other words, if the non-utterance continua
results of the judgments about the frames already undergoing 10 tion time in the future direction is smaller than a threshold
the utterance/non-utterance judgments and executes the utter value “t7, determines the voice speed at this time as a voice
ance/non-utterance judgments about only the frames that speed in the processing target frame. While on the other hand,
need carrying out newly the utterance/non-utterance judg the Voice speed determining unit 4c., if the Voice speed at this
ments. The utterance/non-utterance judging unit 2c transfers time is the 2-fold speed, in other words, if the non-utterance
the results of the utterance/non-utterance judgments about the 15 continuation time in the future direction is larger than the
respective frames to the continuation time calculating unit 3c. threshold value“t7, acquires the voice speed along the graph
<Continuation Time Calculating United shown in FIG. 15(b) on the basis of the non-utterance con
The continuation time calculating unit 3c acquires, if the tinuation time further in the past direction. Then, the voice
utterance/non-utterance judging unit 2c judges that the pro speed determining unit 4c determines the Voice speed
cessing target frame is a non-utterance frame, the number of 20 acquired based on the non-utterance continuation time in the
frames (frame count) judged to be the non-utterance frames past direction as a voice speed in the processing target frame.
consecutively from the processing target frame in the past The Voice speed determining unit 4c may execute the pro
direction and the number of frames judged to be the non cess as below. To start with, the Voice speed determining unit
utterance frames consecutively in the future direction. 4c judges whether the non-utterance continuation time in the
Specifically, the continuation time calculating unit 3C 25 future direction is equal to or greater than the threshold value
refers to, if in the past direction, the results of the utterance/ “t7. Next, the voice speed determining unit 4c., if the non
non-utterance judgments about an input signal (n-1), an input utterance continuation time in the future direction is less than
signal (n-2), an input signal (n-3) sequentially down to an the threshold value “t7, determines the voice speed by use of
input signal (n-L), and acquires the number of frames judged the non-utterance continuation time in the future direction
to be the non-utterance frames consecutively from the pro- 30 along the graph in FIG. 15(a). Then, the voice speed deter
cessing target frame. In the case of the future direction, the mining unit 4c., if the non-utterance continuation time in the
continuation time calculating unit 3c refers to the results of future direction is equal to or greater than the threshold value
the utterance/non-utterance judgments about an input signal “t7, determines the voice speed by use of the non-utterance
(n+1), an input signal (n+2), an input signal (n+3) sequen continuation time in the past direction along the graph in FIG.
tially up to an input signal (n+M), and acquires the number of 35
15(b).
frames judged to be the non-utterance frames consecutively The Voice speed determining unit 4c may carry out the
from the processing target frame. Then, the continuation time process in the following manner. At first, the Voice speed
calculating unit 3c acquires, based on the acquired frame determining unit 4c determines the Voice speed by use of the
count, obtains the lengths of non-utterance continuation time
respective in the past direction and in the future direction 40 non-utterance continuation time in the future direction along
from the processing target frame. The continuation time cal the graph in FIG. 15(a). Next, the voice speed determining
culating unit 3c transfers the non-utterance continuation time unit 4c determines the Voice speed by use of the non-utterance
in the past direction from the processing target frame and the continuation time in the past direction along the graph in FIG.
non-utterance continuation time in the future direction from 15(b). The voice speed determining unit 4c determines the
the processing target frame, to the Voice speed determining 45 slower of these two voice speeds as a voice speed in the
unit 4c. processing target frame.
<Voice Speed Determining United The voice speed determining unit 4c is characterized by the
The voice speed determining unit 4c is different from the piecemeal reduction of the voice speed when shifting to the
Voice speed determining unit 4a in terms of determining, on utterance section from the non-utterance section in a way that
Such an occasion that the Voice speed in the non-utterance 50 executes the process by use of the non-utterance continuation
section is determined, the voice speed on the basis of the time in the future direction. Accordingly, without being lim
non-utterance continuation time in the past direction from the ited to the method described above, another method of attain
processing target frame and the non-utterance continuation ing the piecemeal reduction of the Voice speed in a way that
time in the future direction. The voice speed determining unit judges timing of the shift to the utterance section from the
4c determines, if the non-utterance continuation time in the 55 non-utterance section by use of the non-utterance continua
future direction is shorter than a predetermined threshold tion time in the future direction, may also be applied to the
value, the Voice speed on the basis of the non-utterance con Voice speed determining unit 4c.
tinuation time in the future direction. At this time, the voice
speed determining unit 4c determines the Voice speed so that OPERATIONAL EXAMPLE
the Voice speed gets slower as the non-utterance continuation 60
time in the future direction gets shorter. The following is an FIG. 16 is a flowchart showing an operational example of
explanation of a specific process of the Voice speed determin the voice speed control apparatus 1c. FIG. 16 shows the
ing unit 4c. flowchart in a case where a method of judging whether or not
FIG. 15 illustrates graphs each showing a relationship the process is executed by employing the non-utterance con
between the voice speed determined by the voice speed deter 65 tinuation time in any one of the future direction and the past
mining unit 4c and the non-utterance continuation time. FIG. direction on the basis of the threshold value “t7, is applied to
15(a) is the graph showing the relationship between the non the Voice speed determining unit 4c. An operational example
US 7,672,840 B2
17 18
of the voice speed determining unit 4c will hereinafter be Fourth Embodiment
described with reference to FIG. 16.
Upon a start of processing, to begin with, the input signals System Architecture
are inputted to the Voice speed control apparatus 1c. Then, the Next, an example of a configuration of a Voice speed con
utterance/non-utterance judging unit 2c makes the utterance? 5 trol apparatus 1d will be explained by way of a fourth embodi
non-utterance judgments about the processing target frame ment of the voice speed control apparatus 1. FIG. 18 is a
and the respective frames positioned anterior and posterior to diagram showing an example of function blocks of the Voice
this processing target frame in the input signals (S10). As a speed control apparatus 1d. The Voice speed control appara
result of these judgments, when judging that the processing tus 1d is different from the voice speed control apparatus 1a
10 in terms of including an utterance/non-utterance judging unit
target frame is the non-utterance frame (S11: non-utterance), 2d that replaces the utterance/non-utterance judging unit 2a,
the continuation time calculating unit 3c calculates the non and a voice speed determining unit 4d in place of the Voice
utterance continuation time in the past direction and the non speed determining unit 4a. Different points of the voice speed
utterance continuation time in the future direction (S12, S13). control apparatus 1d from the Voice speed control apparatus
Next, the Voice speed determining unit 4c judges whether the 15 1a will hereinafter be described.
non-utterance continuation time in the future direction is <Utterance/Non-Utterance Judging United
equal to or greater than the threshold value “t7 (longer than The utterance/non-utterance judging unit 2d is different
the threshold value “t7 or not). If this value is equal to or from the utterance/non-utterance judging unit 2a in terms of
larger than the threshold value “t7” (S14: YES), the voice not only judging, in the utterance/non-utterance judgment,
speed determining unit 4c determines the Voice speed along whether the processing target frame is the utterance frame or
the graph shown in FIG. 15(b) by use of the non-utterance the non-utterance frame but also obtaining a degree of reli
continuation time in the past direction (S15). While on the ability on the judgment if judged to be unuttered. The utter
other hand, if this value is less than the threshold value “tT ance/non-utterance judging unit 2d, through the same process
(S14: NO), the voice speed determining unit 4c determines as by the utterance/non-utterance judging unit 2d. Subtracts
the voice speed along the graph shown in FIG. 15(a) by use of 25 an average power value when unuttered in the past signals
the non-utterance continuation time in the future direction from the power value of the input signal (n). Then, the utter
(S16). On the other hand, in the utterance/non-utterance judg ance/non-utterance judging unit 2d obtains, based on a result
ment, if the processing target frame is judged to be the utter of this subtraction, a value (the degree of reliability) repre
ance frame (S11: utterance), the Voice speed determining unit senting the reliability. FIG. 19 is a table showing an example
4c determines the Voice speed as a voice speed in the utterance 30 of a relationship between the degree of reliability and the
section (S19). Then, the voice speed control unit 5a executes result of the Subtraction. The utterance/non-utterance judging
the Voice speed control process about the processing target unit 2d obtains the degree of reliability on the basis of the
frame according to the Voice speed determined by the Voice result of the subtraction made above and the table in FIG. 19.
speed determining unit 4c (S17), and outputs a voice speed Then, the utterance/non-utterance judging unit 2d transfers
controlled signal (S18). 35 the obtained degree of reliability to the voice speed determin
Operation/Effect ing unit 4d.
<Voice Speed Determining United
FIG. 17 is a graphic chart showing an example of how the The Voice speed determining unit 4d is more similar in its
Voice speed control apparatus 1c controls the Voice speed. In construction to the Voice speed determining unit 4b than the
the Voice speed control apparatus 1C, the Voice speed in the 40 Voice speed determining unit 4a, and is therefore explained by
non-utterance section is determined based on the non-utter making a comparison with the Voice speed determining unit
ance continuation time in the future direction as well as on the 4b. The voice speed determining unit 4d is different from the
non-utterance continuation time in the past direction. To be Voice speed determining unit 4b in terms of determining the
specific, the Voice speed determining unit 4c determines the maximum voice speed on the basis of not the accumulated
Voice speed so that the Voice speed gets slower as the non 45 delay quantity but the degree of reliability. The voice speed
utterance continuation time in the future direction gets determining unit 4d determines the maximum Voice speed to
shorter. Hence, the Voice speed in the non-utterance section be faster as the degree of reliability on the judgment made
close to the utterance section is controlled not in a fast about the non-utterance by the utterance/non-utterance judg
reading status but as a voice speed that is, e.g., a 1-fold normal ing unit 2d becomes higher, and determines the maximum
voice speed or as slow as a 0.5-fold voice speed. It is therefore 50 voice speed to be slower as the degree of reliability becomes
possible to prevent or reduce the adverse influence (the van lower. FIG. 20 is a table showing an example of a relationship
ishment of the voice elements) if misjudged to be unuttered in between the degree of reliability and the maximum voice
the utterance starting points. speed. The Voice speed determining unit 4d determines the
maximum voice speed on the basis of, e.g., the degree of
MODIFIED EXAMPLE 55 reliability received from the utterance/non-utterance judging
unit 2d and the table shown in FIG. 20. Then, the voice speed
The Voice speed control apparatus 1c may also be config determining unit 4d, in the same way as by the Voice speed
ured to further include the delay quantity acquiring unit 6. If determining unit 4b, determines the Voice speed on the basis
configured in this way, the Voice speed determining unit 4c of of the maximum Voice speed and the non-utterance continu
the Voice speed control apparatus 1C may determine the maxi 60 ation time along, e.g., the graphs illustrated in FIGS. 10 and
mum Voice speed on the basis of the accumulated delay 11.
quantity as done by the Voice speed determining unit 4b.
Then, the Voice speed determining unit 4c., on the occasion of OPERATIONAL EXAMPLE
determining the Voice speed on the basis of the non-utterance
continuation time in the past direction or the non-utterance 65 FIG. 21 is a flowchart showing an operational example of
continuation time in the future direction, may determine the the voice speed control apparatus 1d. Note that the same
Voice speed on the basis of the maximum Voice speed. processes in FIG. 21 as those shown in the flowchart in FIG.
US 7,672,840 B2
19 20
5 are marked with the same reference symbols as those in ment of the voice speed control apparatus 1. FIG. 22 is a
FIG. 5. Referring to FIG. 21, a description of the operational diagram showing an example of function blocks of the Voice
example of the Voice speed control apparatus 1d will herein speed control apparatus 1e. The Voice speed control apparatus
after befocused on different processes from those in the voice 1e is different from the voice speed control apparatus 1a in
speed control apparatus 1a. terms of including a Voice speed determining unit 4e in place
Upon a start of processing, the utterance/non-utterance of the Voice speed determining unit 4a, and further including
judging unit 2d. after making the utterance/non-utterance a signal-to-noise ratio acquiring unit 7. Other components of
judgment (after S01), obtains the degree of reliability on this the Voice speed control apparatus 1e are basically the same as
judgment (S20). At this time, the utterance/non-utterance those of the voice speed control apparatus 1a. Different points
judging unit 2d may be constructed so as not to obtain the 10 of the Voice speed control apparatus 1e from the Voice speed
degree of reliability in the case of making the utterance judg control apparatus 1a will hereinafter be explained.
ment. <Signal-to-Noise Ratio Acquiring Units
Then, after the continuation time calculating unit 3a has The signal-to-noise ratio acquiring unit 7 acquires a signal
calculated the non-utterance continuation time (after S03), to-noise ratio (SN ratio) with respect to the processing target
the Voice speed determining unit 4d determines the Voice 15 frame of the utterance/non-utterance judging unit 2a in the
speed in the non-utterance section on the basis of the degree input signals inputted to the Voice speed control apparatus 1e.
of reliability obtained by the utterance/non-utterance judging Any type of technology for acquiring the SN ratio can be
unit 2d and the non-utterance continuation time acquired by applied to the signal-to-noise ratio acquiring unit 7. An expla
the continuation time calculating unit 3a (S21). After this nation of a specific process for acquiring the SN ratio is
process, in the same way as in the case of the Voice speed omitted. The signal-to-noise ratio acquiring unit 7 transfers
control apparatus 1a, the Voice speed control unit 5a of the the acquired SN ratio to the voice speed determining unit 4e.
Voice speed control apparatus 1d executes the processes in <Voice Speed Determining United
S05 and S06. Further, if the result of the judgment made by The Voice speed determining unit 4e is more similar in its
the utterance/non-utterance judging unit 2d shows the utter construction to the Voice speed determining unit 4b than the
ance, the process is the same as in the case of the Voice speed 25 Voice speed determining unit 4a, and hence the explanation of
control apparatus 1a (refer to S07). the Voice speed determining unit 4e will be made by compar
Operation/Effect ing with the Voice speed determining unit 4b. The Voice speed
In the Voice speed control apparatus 1d, the maximum determining unit 4e is different from the voice speed deter
voice speed is determined based on the degree of reliability on mining unit 4b in terms of determining the maximum Voice
the judgment when the utterance/non-utterance judging unit 30 speed on the basis of not the accumulated delay quantity but
2d makes the non-utterance judgment. To be specific, in the the SN ratio. The voice speed determining unit 4e determines
voice speed control apparatus 1d, the maximum voice speed that the maximum voice speed is set higher as the SN ratio
gets faster as the degree of reliability on the non-utterance acquired by the signal-to-noise ratio acquiring unit 7 gets
judgment made by the utterance/non-utterance judging unit higher, and determines that the maximum voice speed is set
2d gets higher, and the maximum Voice speed gets slower as 35 lower as the SN ratio becomes lower. FIG. 23 is a graph
the degree of reliability gets lower. Hence, if the degree of showing an example of a relationship between the SN ratio
reliability on the non-utterance judgment is low, i.e., if there and the maximum Voice speed. The Voice speed determining
might be a possibility of its being the utterance, the adverse unit 4e determines the maximum Voice speed on the basis of
influence as of a skip of voice element (a missing Voice for example, the SN ratio received from the signal-to-noise
element) in the case of the misjudgment can be reduced by 40 ratio acquiring unit 7 and the graph illustrated in FIG. 23.
restraining low the maximum Voice speed. By contrast, if the Then, the Voice speed determining unit 4e, in the same way as
degree of reliability on the non-utterance judgment is high, by the voice speed determining unit 4b, determines the voice
i.e., if there is a low possibility of its being the utterance, the speed on the basis of the maximum voice speed and the
maximum voice speed is set high, whereby the priority is non-utterance continuation time along, e.g., the graphs shown
given to setting the Voice speed high rather than reducing the 45 in FIGS. 10 and 11.
adverse influence caused by the misjudgment, and the accu
mulation of the delays can be effectively decreased. OPERATIONAL EXAMPLE
MODIFIED EXAMPLE FIG. 24 is a flowchart showing an operational example of

50 the Voice speed control apparatus 1e. Note that the same
The Voice speed control apparatus 1d may also be config processes in FIG. 24 as those shown in the flowchart in FIG.
ured to further include the delay quantity acquiring unit 6 5 are marked with the same reference symbols as in FIG. 5.
used in the second embodiment. If thus configured, the Voice Referring to FIG. 24, a description of the operational example
speed determining unit 4d may be constructed so as to deter of the voice speed control apparatus 1e will hereinafter be
mine the maximum Voice speed on the basis of not only the 55 focused on the different processes from those of the voice
degree of reliability but also the accumulated delay quantity. speed control apparatus 1a.
For instance, the Voice speed determining unit 4d may deter Upon a start of processing, the signal-to-noise ratio acquir
mine the maximum Voice speed on the basis of a table con ing unit 7 acquires the SN ratio in parallel with the utterance/
sisting of three items (fields) of information such as the non-utterance judging process (the process in S01) by the
degree of reliability, the accumulated delay quantity and the 60 utterance/non-utterance judging unit 2a (S22). Then, after the
maximum Voice speed. continuation time calculating unit 3a has calculated the non
utterance continuation time (after S03), the voice speed deter
Fifth Embodiment mining unit 4e determines the Voice speed in the non-utter
ance section on the basis of the SN ratio acquired by the
System Architecture 65 signal-to-noise ratio acquiring unit 7 and the non-utterance
Next, an example of a configuration of a voice speed con continuation time obtained by the continuation time calculat
trol apparatus 1e will be explained by way of a fifth embodi ing unit 3a (S23). After this process, in the same way as in the
US 7,672,840 B2
21 22
case of the Voice speed control apparatus 1a, the Voice speed So that an increase in reproducing speed is restrained to
control unit 5a of the voice speed control apparatus 1e a greater degree as the non-utterance continuation length
executes the processes in S05 and S06. Further, if the result of becomes Smaller,
the judgment by the utterance/non-utterance judging unit 2a a changing unit to change the reproducing speed of the
shows the utterance, the process becomes the same as in the Voice signal, corresponding to the reproducing speed
case of the voice speed control apparatus 1a (refer to S07). determined by said determining unit;
Operation/Effect a speed decreasing unit to get if judged to be utterance by
In the Voice speed control apparatus 1e, the maximum said utterance/non-utterance judging unit, the reproduc
voice speed is determined based on the SN ratio acquired by ing speed of the Voice signal slower than a normal repro
the signal-to-noise ratio acquiring unit 7. To be specific, in the 10 ducing speed; and
Voice speed control apparatus 1e, the maximum Voice speed a delay quantity acquiring unit to acquire cumulatively a
gets higher as the SN ratio gets higher, and the maximum delay quantity generated by said speed decreasing unit,
voice speed becomes lower as the SN ratio becomes lower. wherein said determining unit determines a maximum
Generally, in the case of the high SN ratio, it is shown that a value of the reproducing speed on the basis of an accu
noise quantity in the signals (which are herein the input sig 15 mulated value of the delay quantities acquired by said
nals) is small, there is a preferable state, and the reliability of delay quantity acquiring unit so that the maximum value
the signals is high. Accordingly, in the case of the low SN of the reproducing speed gets larger as the accumulated
ratio, i.e., if there is a high possibility that the misjudgment value of the delay quantities gets larger, and determines
might be made in the utterance/non-utterance judgment, the the reproducing speed in the processing target segment
adverse influence as of the skip of voice element (the missing in the Voice signals, corresponding to this maximum
Voice element) when misjudged can be reduced by restraining value and the non-utterance continuation length.
low the maximum voice speed. By contrast, in the case of the 2. A Voice speed control apparatus comprising:
high SN ratio, i.e., there is a low possibility that the misjudg an utterance/non-utterance judging unit to judge whethera
ment might be made, the maximum Voice speed is set high, processing target segment in inputted a voice signal is an
whereby the priority is given to setting the Voice speed high 25 utterance segment or a non-utterance segment;
rather than reducing the adverse influence caused by the a non-utterance continuation length acquiring unit to
misjudgment, and the accumulation of the delays can be acquire a non-utterance continuation length represent
effectively decreased. ing a length of the Voice signal judged continuously to be
the non-utterance by said utterance/non-utterance judg
MODIFIED EXAMPLE 30 ing unit;
a determining unit to determine a reproducing speed of the
The voice speed control apparatus 1e may also be config processing target segment in the Voice signal in accor
ured to further include the delay quantity acquiring unit 6 dance with the non-utterance continuation length
used in the second embodiment. If thus configured, the Voice acquired by said non-utterance continuation length
speed determining unit 4e may be constructed so as to deter 35 acquiring unit so that the reproducing speed gets higher
mine the maximum Voice speed on the basis of not only the as the non-utterance continuation length gets larger and
degree of reliability but also the accumulated delay quantity. So that an increase in reproducing speed is restrained to
For instance, the Voice speed determining unit 4e may deter a greater degree as the non-utterance continuation length
mine the maximum Voice speed on the basis of a table con becomes Smaller, and
sisting of three items (fields) of information such as the 40 a changing unit to change the reproducing speed of the
degree of reliability, the accumulated delay quantity and the Voice signal, corresponding to the reproducing speed
maximum Voice speed. determined by said determining unit,
The present invention is applied to the apparatus in which wherein said utterance/non-utterance judging unit further
the delay occurs when reproducing the voice, whereby the makes judgment about predetermined segments in the
effects can be acquired. 45 future direction from the processing target segment in
Others the inputted Voice signals,
The disclosures of international application PCT/JP2004/ said non-utterance continuation length acquiring unit
010340 filed on Jul. 21, 2004 including the specification, acquires a future-directional continuation length repre
drawings and abstract are incorporated herein by reference. senting a length of the signal judged to be the non
50 utterance signal continuously from the processing target
What is claimed is: segment in the future direction, and
1. A voice speed control apparatus according comprising: said determining unit determines, if the future-directional
continuation length is Smaller than a threshold value, the
an utterance/non-utterance judging unit to judge whether a reproducing speed in the processing target segment in
processing target segment in inputted a voice signal is an 55 accordance with the future-directional continuation
utterance segment or a non-utterance segment; length so that the reproducing speed becomes slower as
a non-utterance continuation length acquiring unit to the future-directional continuation length becomes
acquire a non-utterance continuation length represent Smaller.
ing a length of the Voice signal judging continuously to 3. A Voice speed control apparatus, comprising:
be the non-utterance by said utterance/non-utterance 60 an utterance/non-utterance judging unit to judge whethera
judging unit; processing target segment in inputted a voice signal is an
a determining unit to determine a reproducing speed of the utterance segment or a non-utterance segment;
processing target segment in the Voice signal in accor a non-utterance continuation length acquiring unit to
dance with the non-utterance continuation length acquire a non-utterance continuation length represent
acquired by said non-utterance continuation length 65 ing a length of the Voice signal judged continuously to be
acquiring unit so that the reproducing speed gets higher the non-utterance by said utterance/non-utterance judg
as the non-utterance continuation length gets larger and ing unit;
US 7,672,840 B2
23 24
a determining unit to determine a reproducing speed of the 5. A Voice speed control apparatus comprising:
processing target segment in the Voice signal in accor an utterance/non-utterance judging unit to judge whethera
dance with the non-utterance continuation length processing target segment in inputted a Voice signal Is an
acquired by said non-utterance continuation length utterance segment or a non-utterance segment;
acquiring unit so that the reproducing speed gets higher a non-utterance continuation length acquiring unit to
as the non-utterance continuation length gets larger and acquire a non-utterance continuation length represent
So that an increase in reproducing speed is restrained to ing a length of the Voice signal judged continuously to be
a greater degree as the non-utterance continuation length the non-utterance by said utterance/non-utterance judg
becomes Smaller, and ing unit;
a changing unit to change the reproducing speed of the 10 a determining unit to determine a reproducing speed of the
Voice signal, corresponding to the reproducing speed processing target segment in the Voice signal in accor
determined by said determining unit, dance with the non-utterance continuation length
wherein said utterance/non-utterance judging unit further acquired by said non-utterance continuation length
acquires a degree of reliability on the non-utterance acquiring unit so that the reproducing speed gets higher
judgment result made by said utterance/non-utterance 15 as the non-utterance continuation length gets larger and
judgment unit about the respective segments to be So that an increase in reproducing speed is restrained to
judged, and a greater degree as the non-utterance continuation length
said determining unit determines the maximum value of becomes Smaller,
the reproducing speed in accordance with the degree of a changing unit to change the reproducing speed of the
reliability on the non-utterance judgment result so that Voice signal, corresponding to the reproducing speed
the maximum value of the reproducing speed gets larger determined by said determining unit; and
as the degree of reliability on the non-utterance judg a signal-to-noise ratio acquiring unit to acquire a signal-to
ment result gets higher, and determines the reproducing noise ratio in the processing target segment in the Voice
speed in the processing target segment in the Voice sig signals,
nals in accordance with the maximum value and the 25
wherein said determining unit determines a maximum
non-utterance continuation length. value of the reproducing speed in accordance with the
4. A voice speed control apparatus, comprising: signal-to-noise ratio acquired by said signal-to-noise
an utterance/non-utterance judging unit to judge whether a ratio acquiring unit so that the maximum value of the
processing target segment in inputted a voice signal is an reproducing speed gets larger as the signal-to-noise ratio
utterance segment or a non-utterance segment; 30
gets higher and so that the maximum value of the repro
a non-utterance continuation length acquiring unit to ducing speed becomes Smaller as the signal-to-noise
acquire a non-utterance continuation length represent ratio becomes lower, and determines the reproducing
ing a length of the Voice signal judged continuously to be speed in the processing target segment in the Voice sig
the non-utterance by said utterance/non-utterance judg nals in accordance with the maximum value and the
ing unit; 35
non-utterance continuation length.
a determining unit to determine a reproducing speed of the 6. A recording medium recorded with a program, the pro
processing target segment in the Voice signal in accor gram making an information processing apparatus execute:
dance with the non-utterance continuation length judging whether a processing target segment in inputted
acquired by said non-utterance continuation length Voice signals is an utterance segment or a non-utterance
acquiring unit so that the reproducing speed gets higher 40
Segment,
as the non-utterance continuation length gets larger and acquiring a non-utterance continuation length representing
So that an increase in reproducing speed is restrained to a length of the Voice signal judged continuously to be the
a greater degree as the non-utterance continuation length non-utterance;
becomes Smaller, and determining a reproducing speed of the processing target
a changing unit to change the reproducing speed of the 45
Voice signal corresponding to the reproducing speed segment in the Voice signals in accordance with the
determined by said determining unit, acquired non-utterance continuation length so that the
wherein said utterance/non-utterance judging unit further reproducing speed gets higher as the non-utterance con
acquires a degree of reliability on the judgment result tinuation length gets larger and so that an increase in
about the respective segments to be judged, and 50 reproducing speed is restrained to a greater degree as the
said determining unit determines the maximum value of non-utterance continuation length becomes Smaller;
the reproducing speed in accordance with the degree of changing the reproducing speed of the Voice signal, corre
reliability so that the maximum value of the reproducing sponding to the determined reproducing speed;
speed gets larger as the reliability gets higher, and deter getting, if judged to be utterance by said judging, the repro
mines the reproducing speed in the processing target 55 ducing speed of the Voice signal slower than a normal
segment in the Voice signals in accordance with the reproducing speed; and
maximum value and the non-utterance continuation acquiring cumulatively a delay quantity generated by said
length, and getting,
wherein said utterance/non-utterance judging unit Sub wherein said determining determines a maximum value of
tracts an average of power values of the Voice signals in 60 the reproducing speed on the basis of an accumulated
the segments judged to be the non-utterance segments in value of acquired delay quantities so that the maximum
the past from a power value of the Voice signal in the value of the reproducing speed gets larger as the accu
processing target segment, and acquires, based on a mulated value of the delay quantities gets larger, and
result of this subtraction, a degree of reliability that gets determines the reproducing speed in the processing tar
higher as a value of the Subtracted result gets lower and 65 get segment in the Voice signals, corresponding to this
also a degree of reliability that becomes lower as the maximum value and the non-utterance continuation
value of the subtracted result becomes higher. length.
US 7,672,840 B2
25 26
7. A voice speed control method comprising: getting, if judged to be utterance by said judging, the repro
judging whether a processing target segment in inputted ducing speed of the Voice signal slower than a normal
Voice signals is an utterance segment or a non-utterance reproducing speed; and
Segment, acquiring cumulatively a delay quantity generated by said
acquiring a non-utterance continuation length representing 5 getting,
a length of the Voice signal judged continuously to be the wherein said determining determines a maximum value of
non-utterance; the reproducing speed on the basis of an accumulated
determining a reproducing speed of the processing target value of acquired delay quantities so that the maximum
segment in the Voice signals in accordance with the value of the reproducing speed gets larger as the accu
acquired non-utterance continuation length so that the 10 mulated value of the delay quantities gets larger, and
reproducing speed gets higher as the non-utterance con determines the reproducing speed in the processing tar
tinuation length gets larger and so that an increase in get segment in the Voice signals, corresponding to this
reproducing speed is restrained to a greater degree as the maximum value and the non-utterance continuation
non-utterance continuation length becomes Smaller; length.
changing the reproducing speed of the Voice signal, corre 15
sponding to the determined reproducing speed;

United States Patent (10) Patent No.: US 7.672,840 B2: Sasaki Et Al. (45) Date of Patent: Mar. 2, 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

United States Patent (10) Patent No.: US 7.672,840 B2: Sasaki Et Al. (45) Date of Patent: Mar. 2, 2010

Uploaded by

Copyright:

Available Formats

USOO7672840B2

(12) United States Patent (10) Patent No.: US 7.672,840 B2

WOCE SPEED CONTROL APPARATUS

U.S. PATENT DOCUMENTS JP 2001-109499 4/2001

DETERMINE VOICE SPEED DETERMINE VOICE SPEED

OUTPUT VOICE SPEED S06

DETERM NE VOICE SPEED IN DETERMNE VOICE SPEED

OUTPUT VOICE SPEED SO6

VOICE SPEED CONTROL PROCESS S 17

OUTPUT VOICE CONTROLLED S18

HIGH LESS THAN OdB

ACQUIRE DEGREE OF S2O

DETERMINE VOICE SPEED IN DETERMINE VOICE SPEED

OUTPUT VOICE SPEED S06

ACQUIRE SIGNAL-T0 UTTERANCE/NON-UTTERANCE

DETERMINE VOICE SPEED IN DETERMINE VOICE SPEED

OUTPUT WOICE SPEED

MODIFIED EXAMPLE FIG. 24 is a flowchart showing an operational example of

You might also like