Professional Documents
Culture Documents
processing system
m
current input
If the interface is given the capability of learning new
commands, the language restriction is effectively removed
since the learning process would be invoked whenever an
ellipsis processing
unrecognised command is encountered. We may there-
fore envisage an interface having a sensibly chosen,
restricted vocabulary and the ability to extend this a t n traversal
vocabulary should unknown phrases be encountered.
In using a natural language interface a significant
amount of processing is involved in recognising the user’s parse succeeded
commands. This processing may make systems using an
interface somewhat slower in their response than the
same system without the interface. Whilst this may be
frustrating to the experienced computer user, the naive or
irregular user may be prepared to tolerate it as the cost of
being able to communicate easily with the machine. It is Fig. 1 Block diagram of architecture of syntactic processor
not suggested that natural language interfaces should at
present provide the normal means of communicating the preprocessor, the sequencer, the ATN parser and the
with the machine, rather they may provide a vehicle for ellipsis processor. These components will now be exam-
the inexperienced or naive user to communicate ; experi- ined.
enced users may still prefer to issue commands directly. It The user’s input is preprocessed in two respects. Any
is highly likely that this restriction will be removed in abbreviations are expanded and the text checked for
future as more powerful processors are produced. spelling errors. The system requests the user to verify the
This paper will describe a pseudonatural language spelling of words not appearing in an on-line dictionary.
interface designed to allow naive computer users to The preprocessed sentence is then passed to the
define and perform signal processing tasks. The intended sequencer. In defining the interface it has been assumed
users of the interface would be knowledgeable about that the top level construct employed in formulating
signal processing, but ignorant of any programming lan- commands is the sequence. A sequence is a connected
guages. The interface is therefore biased towards under- group of natural language statements each referring to a
173 IEE PROCEEDINGS. Vol. 137. P t . E. N o . 5. S E P T E M B E R 1990
single action to be performed. The sequencer’s function is the operator and operands. The system is not restricted
to establish the boundaries between statements. to this: infix notation may also be used. Neither is the
The statements of a command may be connected by system restricted to having operators written in full, since
the words ‘then’, ‘and then’ or ‘and’, or the colon symbol. it is being deliberately biased towards mathematical
Statements sequenced by any connector other than ‘and’
are easily identified since these symbols are not used in
any other context. Ambiguities may occur when ‘and’ is
used to connect statements since this word may also be
used in data definitions (‘add 2, 3, and 4‘). Such ambi-
guities are resolved by searching for the longest self con-
sistent substring within a command.
As an example of the sequencer’s operation, consider
the following input command ‘consider the berlin eeg
data: display the samples between 1.5 and 2.7 milli-
seconds in the last three channels and then multiply this
by 0.75: divide this result by 2.43 and display it’. The
sequencer would first search for any unambiguous con- please ” p r I n t the”
nectors (‘:’, ‘then’, ‘and then’) and mark their positions.
Any of the statements thus delimited containing the word
‘and’ are further processed by attempting to use the ‘and‘
in a data specification context. Should this fail the ‘and’ is
treated as a connector. The above input would therefore
yield the following separated statements:
“add”
consider the berlin eeg data
display the samples between 1.5 and 2.7 milliseconds
in the last three channels
multiply this by 0.75
divide this result by 2.43
display it.
termination
U
‘until ’ a (3 test
ordering is not preserved.)
Pronouns in a command are always assumed to refer
to the result of the previous computation, rather than to
the same data as the previous command. Thus one could
type the command ‘multiply the data by 7’ followed by
the command ‘multiply it by 3’. The net effect is that the
Fig. 4 Repetition transition network
data has been multiplied by 21 rather than by 7 and then
3. Enforcing this identification allows us to place all
pronouns on an arc of the data networks and associate
the repeat . . . until looping structure. It has been the previous result with them.
observed [2] that naive users quickly discovered the exis- Data in the input is specified by noun phrases. These
tence of a nonspecified structure in an interface when its are highly domain dependent. For instance, in the current
use was required. Note also that repetition may be application, data consists of files made up of twelve sets
implied by the time sequence nature of the data, for of time series measurements called channels. The chan-
instance when a request is made to apply an operator to nels in a file contain the same number of data points
a subset of the data. which may not be the same as the number of points in
Whilst these programming primitives are simple, it is any other file. The current system will understand data
the thesis of structured programming [12] that the three specified as points identified by their number, the time at
constructs (repetition, selection, and iteration) are suffi- which they were collected, a range of these values, or the
cient for defining any algorithm. Therefore, although the data’s channel number. Changing the system’s domain
range of programming primitives is limited, it is suffi- will require extensive redefinition of the data specifying
ciently large for any program to be implemented. noun phrases.
The primitive networks described thus far, and any
networks taught by a user are stored separately but are 3 Semantic processing
treated identically when the system attempts to match
them with user input. This separation has been enforced Semantic processing is achieved using a frame based
to maintain the generality of the system since the meta- approach. A frame consists of a collection of labelled
networks (the repetition and selection ATNs) will be slots. Each object in the system, be it a command or data,
appropriate for all programming applications, the is represented by a frame. As many slots as possible in
command ATNs will be appropriate to mathematical each frame have been given default values, either data or
314 IEE PROCEEDINGS, Vol. 137, Pt. E, N o . 5, S E P T E M B E R 1990
references to subframes. As a command is parsed the ins- operands ‘channel 4‘ and ‘7‘ to be extracted from the
tructions augmenting the network cause a frame appro- input and placed in two of the data subframes. The data
priate to it to be instantiated and the frame’s slots are subframes are now replete, the data may be moved to the
filled or altered. When sufficient slots in a frame have parent frame and the subframes deleted. Sufficient argu-
ments are now present for the multiplication to be per-
formed, the result is stored in the workspace’s current
result slot since there are no further commands to be
considered. The multiplication frame may now be
rem0 ved .
As a more complex example consider the input sen-
L
tence ‘multiply the sum of channels 2 and 6 by the differ-
ence between channels 9 and 7’. The main verb of the
sentence is recognised as multiply, the multiplication
frame is therefore instantiated. The arguments for the
multiplication frame are recognised to be the results of a
summation and a substraction. These two frames are
channel 6
therefore instantiated and linked as subframes of the
multiplication frame. Further parsing identifies channel
references as the arguments to the two subframes. The
frame structure of Fig. 5 is generated. Having filled suff-
cient data slots for processing to proceed, the system will
perform the instructions specified by the lowest level
frames, transfer the results to the parent ones and delete
the lowest level frames until the top level frame’s result
has been computed. The result is transferred to the work-
space.
Ih Isubtract
E
i
channel 9
4 Formatting of output
t
command.
The interface produces several types of output. It
channel 7 repeats the current and previous commands, one state-
ment per line in the case of multistatement inputs. It
traps any error messages and prints them, rather than
allow them to crash the system. It outputs the results of
computations as a time series display, or as a histogram.
Fig. 5 Frame structure associated with the sentence ‘multiply the sum It also enters into a meaningful dialogue when requesting
ofchannels 2 and 6 by the difference between channels 9 and f clarification of a command. This last aspect of the system
will be examined in the next Section.
been filled and parsing of a command is completed, the
appropriate action may be performed. Rules have been 5 Teaching the system
formulated for frame handling. If the newly filled frame is
a subframe its result is transferred to the parent frame Should all attempts at parsing the input fail the interface
and the subframe may be deleted. If the current frame is a will require clarification of the unknown portions of the
command frame, filling its slots will result in the input. The system will search the current input for vari-
command being executed. The result of executing a able names (anything satisfying a data specification ATN)
command may be passed on as data to a further and replace it or them with formal parameters. The input
command or may be stored in the system’s workspace command is then formatted into a request for clari-
frame for possible consideration by further input. fication. The user responds by typing a definition which
The system’s workspace frame contains four slots for the system will build into a new transition network.
the current and previous inputs and results. The previous Parsing of the input command continues in a depth first
input is retained for ellipsis resolution. The previous manner (the most recently input command not yet under-
result is maintained and displayed for the user’s benefit. stood is parsed, and if necessary clarified). User defined
The workspace frame also contains slots for storing networks are stored separately to the system-provided
labelled portions of data. networks since this aids the maintenance of user speci-
As an example of the semantic processing, assume that ficity.
the user had typed the command ‘multiply channel 4 by One aspect of system efficiency has not been
7’. The syntactic processor will parse this and cause the addressed, that is the response of the system to being
multiplication frame to be instantiated, since ‘multiply’ is taught a command it already knows. Ideally the system
recognised by the multiplication ATN. The multiplica- would attempt to match the newly expounded definition
tion frame will have an indeterminate number of slots, against all the previously taught definitions. If a match is
referencing data subframes. Further parsing causes the found then the newly defined command would be made a
I E E P R O C E E D I N G S , Vol. 137, P t . E, N o . 5, S E P T E M B E R 1990 375
synonym of the matching command and the new defini- parsing the current command the default time series
tion discarded. This contraction of the system’s data base frame will be instantiated. Further parsing will result in
could take place either during the learning phase or as a the appropriate sections of channels one to five being
separate activity. In the current version of the system it is selected and inserted into the data slot of the time series
intended to be a separate activity and has not yet been frame and the result slot of the workspace. Routines
implemented. associated with the display frame autoscale the data as
As an example of how the system is taught new com- required, generate labels for the display’s axes (the default
mands consider the following dialogue. The user has labels are ‘time’ on the horizontal and fiducial marks on
typed the command ‘remove the average from channel the vertical axes). At this stage the system’s graphical
one’. The system is unfamiliar with the command remove, output is as shown in Fig. 6. In the upper portion of the
it therefore requests clarification: screen are displayed the previous (left hand side) and
current (right hand side) outputs of signal processing
clarify REMOVE (x-1) FROM (x-2) commands. The left hand side is blank at this stage as the
The terms in angle brackets have been inserted by the previous command did not result in any graphical
system in place of the noun phrases used in the output. In the lower portion of the screen are displayed
command. The user is required to respond with a defini- the current and previous commands typed by the user.
tion, assuming that ‘remove’ had some meaning peculiar The user may now give this data a name by instruct-
to the application being considered: ing the system to ‘call this group 1’. The pronoun pro-
cessor associates the data in the current result slot with
scale x-1 by 0.5 : ‘this’ and binds it to the name. This command has no
decrement this by 1.2: graphical results, the current result display is therefore
subtract this result from x-2. empty, the previous result display now contains what
used to be the current result, i.e. the five time series of
The system will recognise this as a sequence of com- data. The system also copies the result slot into another
mands and will bind them to the ATN it has extracted portion of the workspace and labels this portion ‘group
from the original command. It will now attempt to parse 1’. The system will produce an overwrite error message
each command. Assuming it is unfamiliar with the first informing the user that he has lost any previous data
two; subtraction is a system primitive and will therefore with the same name.
be recognised. The user may redefine scale in terms of If the user now types ‘display samples between 0.5 and
multiplication which will be recognised as a primitive. 1.5 milliseconds in the last five channels and call it group
Similarly decrement may be redefined in terms of sub- 2’ a similar series of actions will be performed. The
traction. The three new definitions are saved in the user’s system will now display the data in the top right portion
command file. of the screen, the top left portion will be empty (Fig. 7).
As an example of learning a more complex command, The data will again be duplicated into the workspace and
consider convolution. This may be defined as performing labelled. The user may now use the names given to the
an overlap integral between a signal and a template for data in mathematical formulae: ‘display group 1 -
each point in the signal. The overlap integral is defined as group 2’ would result in the difference between equivalent
the summation of the products of overlapping points points in the two data sets being displayed in the current
between the template and signal when the template is result window. The previous result window will now
centred on the point of the signal currently being con- display the original group 2 data. Had the user typed
volved. The learning dialogue would closely follow this ‘group 1 - group 2’ the computation would have been
definition. The system would build up two new ATNs: performed, but the result not displayed, Fig. 8.
one for the overlap integral and one for the top level of Suppose now that the user types ‘remove the DC com-
the convolution. ponent’. Parsing of this command will fail as remove and
DC component are unknown to the system. The system
will therefore enter its learning mode and prompt the
user to ‘clarify remove (x-1)’. The user may respond
6 Implementation and examples
with a suitable definition of this entity: ‘subtract x-1
The system has been implemented in Domain-Lisp (a from each point of the data’. The system will then seek
variant of Common Lisp) and is running on an Apollo clarification of any further unknown phrases in the input,
D N 3000 workstation. It has been tested using twelve subtract is a system primitive and requires no further
channel electroencephalogram data collected by collabor- clarification, DC component is an unknown concept and
ators of the Ophthalmic Optics Department at UMIST: therefore requires clarification. The user will respond as
this data is called ‘the Berlin EEG data’. Any time series he thinks is appropriate, possibly by typing ‘divide the
data could have been utilised equally well. The test data sum of the data points by the number of data points’. It
consists of files containing twelve independent series of is intended that as the user defines algorithms, the system
points of equal length. An example session will be will build a tree like diagrammatic representation (c.f. the
described. Jackson diagram), this has not yet been implemented.
The system is invoked by a user with the result that Having had the algorithm defined the system may
the user’s file of ATNs is loaded. The user initialises the execute the command, resulting in the graphical output
system’s workspace by instructing it to ‘consider the of Fig. 9.
Berlin EEG data’. This results in the data contained in
the appropriate file being loaded into the current result 7 Conclusions and discussion
slot of the workspace. Suppose the system is requested to
‘display the samples between 0.5 and 1.5 milliseconds in A restricted natural language interface for experts in the
the first five channels’. This command is placed in the domain of signal processing has been implemented. It
current command slot and the earlier command trans- provides the experts with the facility to investigate the
ferred to the previous command slot. As a result of effects of applying algorithms to their data without them
376 I E E P R O C E E D I N G S . Vol. 137. P t . E . N o . 5. S E P T E M B E R 1990
TIME (MSECS)
E.E.C DATA
F I V E CHRNNELS
E.E.G DATA
l
DISPLAY THE SRMPLES BETUEEN 0.5 RND 1.5 MILLISECONDS I N THE IST
I/ YOUR PREVIOUS INPUT :> DISPLRY THE SRNPLES BETYEEN 0.5 AND 1 . 5
E.E.G nnin
i E.E.t DRTA
I
I
DISPLfW GROUP 1 - GROUP 2
Fig. 9 DC component, having been taught to the system, hap heen evtracted from each datum