You are on page 1of 1

Related Work

This chapter surveys previous work in structured text processing. A common theme
in much of

this work is a choice between two approaches: the syntactic approach and the
lexical approach.

In general terms, the syntactic approach uses a formal, hierarchical definition of text
structure,

invariably some form of grammar. Syntactic systems generally parse the text into a
tree of

elements, called a syntax tree, and then search, edit, or otherwise manipulate the
tree.

The lexical approach, on the other hand, is less formal and rarely hierarchical.
Lexical systems

generally use regular expressions or a similar pattern language to describe


structure. Instead of

parsing text into a hierarchical tree, lexical systems treat the text as a sequence of
flat segments,

such as characters, tokens, or lines.

The syntactic approach is generally more expressive, since grammars can capture
aspects of

hierarchical text structure, particularly arbitrary nesting, that the lexical approach
cannot. However,

the lexical approach is generally better at handling structure that is only partially
described. Other

differences between the two approaches will be seen in the sections below.

You might also like