Professional Documents
Culture Documents
Russian Experience in Hypertext: Automatic Compiling of Coherent Texts
Russian Experience in Hypertext: Automatic Compiling of Coherent Texts
Ft. S. Gilyarevskii
All-Russian Institute for Scientific and Technical Information, Usievicha 2Oa, Moscow 125219, Russia
M. M. Subbotin
State Scientific Technical Center of Hypertext Information Technologies, Zemlyanoi Vat 52/16, Moscow, Russia
Russian hypertext research emphasizes algorithmic used in the systems of artificial intelligence, the hypertext
navigation. Navigation rules are based on features network is intended not for acquiring derivative knowl-
of hypertext nodes formulated in terms of graph edge-conclusion, appraisal, diagnosis, etc.-but for read-
theory. The trail built in this navigation can be per-
ceived as a nonformal reasoning or a coherent text. ing sequentially the textual material; the goal here is the
In creating hypertext systems there appear specific same as in reading the linear text-to master the knowledge
problems of logic and structural analysis which were given in the text.
first advanced by Russian researchers. The Russian When knowledge is represented in the form of hyper-
hypertext systems, HYPERLOG, HYPERNET, BAHYS, text, the reader receives the possibility to acquire it more
and SEMPRO, are described.
actively: he/she alone can select the initial point and order
of reading in accordance with his/her intellectual interests
and knowledge.
Introduction: The Concept of Hypertext These advantages of the hypertext, in comparison with
the linear text, will become apparent only if the reader
The readers of JASLSare familiar with the basic concept
establishes the proper navigational routes in the hypertext
of hypertext (Lunin & Rada, 1989) which is why we shall
network, i.e., if the consequently read fragments form a
only dwell upon some features of this concept that are
joint, coherent content.
significant from our point of view.
The problem of finding suitable intelligent navigation
Hypertext is interpreted as a nonlinear, networking form routes is an acute one, when large and complex hypertext
of arranging textual material.
networks are under discussion.
Thus, textual material consists of separate fragments
(“nodes”) with indicated possible transitions (“links”) be-
tween them. There are various ways of establishing these
Large and Complex Hypertext Networks
links, but for us the important issue is semantic proximity
of the linked fragments. Very large hypertext networks usually appear, when we
As a rule, every fragment is connected with several oth- speak about a dynamic hypertext (Carmel, McHenry, &
ers by links, which gives the material a network form. The Cohen, 1989) that expands permanently with newly inserted
process of reading the fragments, which form a hypertext, information. Complex networks emerge in a hypertext,
by following the links, is called navigation. which originally is not built hierarchically (new information
A hypertext network gives a reader the possibility of is not counted to be inserted in rubrics set before), and
navigating along different routes, i.e., reading the ma- especially when links of new nodes are set on the estimation
terial in a different order, and not in only one, as in of its semantic proximity to each available node. As a rule,
the case of reading ordinary linear texts. In contrast to many cycles appear in texts built in this way. It should also
the network forms of representing knowledge, which are be mentioned that a link which reflects semantic proximity
may go in two directions, so a graph built on these links is
not directed in general cases. The more links are related to
Received February 15, 1991; revised June 16, 1991, March 16, 1992;
the same node in a network, the more difficult it is for the
accepted October 21, 1992. user to select the next node while in navigation. The well-
known problem of “getlost” in hypertext emerges (Conklin,
0 1993 John Wiley & Sons, Inc. 1987).
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 44(4):185-193, 1993 CCC 0002-8231/93/040185-09
Direction of the links, hierarchically organized structure, Correspondence between the hypertext topology and
the building of composite nodes, and finally, an availability system of links of represented knowledge in hypertext
of routes paved beforehand for the reader-all reduce the makes it possible to impart semantic interpretation to the
difficulties in choosing while in navigation. But it is reached graph characteristics of the nodes. In particular, a correct
simply through the reduction of a possible choice. All the semantic connection of the nodes in the navigation route can
mentioned means of facilitation of navigation weaken the be described in graph theory terms (e.g., as a requirement to
most interesting property of the hypertext-the variety of every next node to have links not only with the previous, but
implicit routes (linear texts), the possibility of revealing in with some other preceding nodes of the navigational route
the hypertext unexpected, but quite intelligent routes. also). Such criteria of selection of nodes in a navigational
From our point of view, it is very important that the route have a heuristic character. For the last 20 years we
hypertext include all the variety and fullness of possible have revealed and checked a series of requirements to
semantic links between cognitive elements, and thus a great graph characteristics of the node being chosen. Keeping to
variety of potential navigational routes. these requirements provides quite high logical and semantic
This means that facilitation of navigation and perception qualities of the textual material being built in the process
of the material read in the process of navigation must of navigation.
be reached not through directly reducing the number of
possible routes, but in an other way. Possibility of Navigation under Rules
We believe it can be reached by using certain rules, (Algorithmic Navigation)
criteria, and in selecting each next node from a possible
variety. With availability of criteria in selecting nodes according
to structure, the graph-theory features give the possibility
of carrying out navigation under the rules. Usage of these
Navigation Which Provides Coherence
rules does not turn the hypertext into an expert system.
of the Read Material
These rules do not require an already checked model of
If links in hypertext reflect a semantic connection of a subject domain, they are, in general, independent from
the content of nodes, then the consequence of the text’s the concrete material. That is why they also can be applied
fragments determined by the navigational route always has in dynamic hypertext, which is permanently expanding and
some level of semantic organization. But an availability changing with new information.
of semantic links between two neighboring fragments only In the Russian systems, HYPERLOG, SEMPRO, and
does not give any guarantee that the joint content of the BAHYS, these rules are implemented in corresponding al-
nodes will be coherent for the user and thus available for gorithms and programs and provide algorithmic navigation
mastering it effectively. To meet these requirements, while through hypertext.
in navigation, one must take into consideration not only
the semantic proximity to the direct earlier node, but also Illustrating the Algorithmic Navigation
semantic and logical relations with all preceding nodes of in Dynamic Hypertext
the navigational route.
Under ordinary “manual” navigation in a large and Let us assume that we accumulate knowledge on the
complex hypertext network it is very difficult to select the problem “navigation in hypertext,” finding statements from
next node that would meet the mentioned requirements. literature on this problem and writing them down in the
We believe a true, effective choice of the next node in order we came across them. Let us assume that now we
navigation can be made on the basis of an estimation of its have the 14 statements listed in Table 1.
place in the topological structure of the hypertext network. Let us establish direct semantic links in our collection of
statements and express them (Table 2). The left column of
Table 2 contains the statements numbers in the order they
Correspondence between the Hypertext
were set down initially, and the lines contain the numbers
Topological Structure and the System
of adjacent semantic statements for each one.
of Semantic Links of Knowledge Units
The hypertext network, reflected in this table, can be
There are direct and mediate links between the elements easily presented graphically (Fig. 1).
of knowledge in any subject domain. The direct links take The presence of the link in this hypertext is a property of
place if one unit of knowledge confirms the other; makes adjustment of the corresponding statements, therefore, navi-
it more concrete; generalizes it; or makes it appear as a gation here provides a certain coherence level of sequence
cause, goal, etc. of statements.
The topological structure of the hypertext network can However, ordinary manual navigation does not, as a rule,
quite adequately reflect a system of such semantic links un- provide a very high level of coherence. The reader can try to
der two conditions: if nodes are elementary monosemantic move across the links of this hypertext and, in most cases,
units of knowledge (statements, ideas, facts, etc.); and if will realize the typical defects essential to this type of text
links are established in the hypertext in all cases when a building. On one hand, they are characterized by a violation
direct semantic link is present. of logic (e.g., later statements evidently introduce ideas,
1. Hypertext, or nonlinear text consists of separate text fragments which are called “nodes.”
2. While in process of navigation, user has access to information on the content of neighboring nodes and, on this basis, selects
his/her route through the network.
3. A necessity arises to have something like a compass for hypertext navigation.
4. Russian research indicate that rules of algorithmic navigation may be based on structural characteristics of the hypertext network.
5. Possibility to move from one node to another is called “link.”
6. Following the hypertext network’s links is called “navigation.”
7. In complex networks, where each node has many links, selection for the next navigational step becomes very difficult.
8. Structural characteristics of the hypertext network, used for setting rules of algorithmic navigation, can be described in terms of graph
theory.
9. Following links, it is possible to traverse a hypertext network in various directions and by different routes.
10. With the rules, navigation may be carried out algorithmically.
11, Since every node may have many links, the so-called “hypertext network” emerges. A hypertext network may be very large and of high
complexity.
12. Each node has a prescribed set of links.
13. Rules determining direction of navigation in accordance with the user’s subject matter should act as a compass.
14. Many researchers point out that, in large hypertext networks, it is easy for the user to become disoriented or to get lost.
which were used in previous statements); on the other, For navigation in large hypertexts there are also other
statements included in these texts by manual navigation criteria used. Of importance is the following criterion: the
are far from all being a rule. Often, when the navigational general number of links of the next node must not be more
route reaches any node, it turns out that all the neighboring than that of the previous (as numerous observations show, it
nodes have been used already in the building of the linear provides a deductive character of exposition, from general
text; having returned, one can move to new statements, but, to particular) (Subbotin, 1986).
as we know, it is very easy to get lost in large hypertext (the
reader permanently returns to routes he has already passed).
Our system, BAHYS, has automatically created a linear Semantical Gaps
text of these 14 statements, built on the basis of one of Of course, the possibility of building a logically coherent
the algorithms which implement our heuristic rules (see exposition of a theme depends not only on criteria of
Table 3). selection of the next node, but on the accumulated material
One can become convinced that this linear text is itself, on the actual topology of the hypertext network.
logically consequent and set out the material quite That is why the navigational algorithm must be adapted
systematically. to the network structure. If it is impossible to select the
The algorithm used in this case selects the nodes, be- next node by strong criteria (just like the one mentioned
ginning from the third-the first two nodes are set by above), weaker criteria are used, or the entire set of criteria
the user. The nodes are selected on the basis of pure is not taken into account. Systems carrying out algorithmic
structural indication; the following two criteria were used
for selection: the current node should have no less than two
direct links with previous nodes of the route (it provides
quite a high level of general coherence of the linear text
under building); there must be a direct link with the
previously adjacent node (it provides continuity of content).
1 5 12
2 6 7 9
3 7 10 13 14
4 8 10 13
5 1 9 11 12
6 2 7 9 11
7 2 3 6 11 13 14
8 4 10
9 2 5 6 11
IO 3 4 8 13
11 5 6 7 12 14
12 I 5 11
13 3 4 7 10
14 3 7 11
FIG. 1. Arrows indicate the “good” path, forming good linear text.
Hypertext, or nonlinear text consists of separate text fragments which are called “nodes” (Lunin & Rada, 1989).
Possibility to move from one node to another is called “link” (Gilyarevskii & Kaloshin, 1988).
Each node has a prescribed set of links (Luhn, 1958).
Since every node may have many links, the so-called “hypertext network” emerges. A hypertext network may be very large and of high
complexity (Nelson, 1966).
Following links, it is possible to traverse a hypertext network in various directions and by different routes (Bush, 1945).
Following the hypertext network’s links is called “navigation” (Chelnokov, 1985).
While in process of navigation. The user has access to information on the content of neighboring nodes and, on this basis, selects his/her
route through the network (Carmel, McHenry, & Cohen, 1989).
In complex networks, where each node has many links, selection for the next navigational step becomes very difficult (Otlet, 1975).
Many researchers point out that, in large hypertext networks, it is easy for the user to become disoriented or to get lost (Tscitin, 1961).
A necessity arises to have something like a compass for hypertext navigation (Conklin, 1987).
Rules determining direction of navigation in accordance with user’s subject matter should act as a compass (Garfield, 1955).
With the rules, navigation may be carried out algorithmically (Engelbart & English, 1968).
Russian research indicates that rules of algorithmic navigation may be based on structural characteristics of hypertext network (Subbotin,
1986).
Structural characteristics of hypertext network, used for setting rules of algorithmic navigation, can be described in terms of graph theory
(Otlet, 1975).
navigation indicate places in the built linear text, where the the procedure of semantic contiguity (direct link) turns out
next node was selected by weaker criteria and, therefore, be laborious when inserting every following statement in
where logical correctness and coherence weakened (i.e., the relation to every available one in hypertext statements.
semantical gap). Practically, an ascertainment of the availability or ab-
As a rule, a gap is overcome by inserting additional sence of semantic contiguity is set not for every pair of
information, by adding new cognitive elements, which are statements, but only for those, which by some indications,
able to establish a link between fragments of a linear text. most probably, have this contiguity. A selection of corre-
Indication of the gaps stimulates a purposeful search for sponding pairs of statements is carried out on the basis of
new information and ideas. key words, which can be singled out automatically-on the
basis of morphological and syntactic analysis of the text.
Construction of Hypertext and Algorithmic In many cases, in the practical sense, the final result can be
Navigation as an Instrument of Logical satisfactorily received under fully automatic establishment
Arranging of the Accumulated Material of links between statements.
In our case, it should be noted also that it is not necessary
The example given shows that conslruction of the hy- to point out the type of relations for establishing semantic
pertext and the algorithmic navigation through it can be contiguity (correspondingly, there is no need to classify
considered as instrument of systematization of the accumu- these relations); the fact of presence of some relation is
lated knowledge. itself important. The explanation is that the final result-a
If, at input, we have a free and a nonorganized collection good linear text-presumes high coherence of every next
of statements and at the output we want to achieve a well- statement with previous fragments of the exposed material,
organized sequence (from the logical and semantic criteria independently of what concrete relations this statement has
point of view), we must convert statements into nodes and with the previous ones.
set direct semantic links between them, and then carry out
navigation under the structural criteria.
If the user has regulated and structuralized the accu-
Some Other Directions of Algorithmic Structural
mulated statements corresponding to the formerly fixed
Analysis of the Hypertext Network
structure of the given subject domain, he actually would
have brought new information to some strict, a priori Until now, we spoke about a structural analysis of the
scheme. If the collection of statements is left not structural- hypertext network, meaning the linear adjustment of its
ized, then the possibility of comprehending and acquiring nodes. However, other types of analyses are also of cer-
the stored knowledge is lost. A structuralization (hierar- tain interest, in particular, the singling out by structural
chization, systematization) of the accumulated material, indication subgraphs which reflect the content of the given
proceeding on semantic and logical links, contained in the hypertext as a whole. As in the case of algorithmic navi-
collection of statements, would have been most desirable. gation, we speak about structural indications which have
Of course, the problem of the laboriousness of this kind a semantic interpretation, because of the correspondence
of systematization of knowledge appears. First, there is between the topological structure of the network and the
the supposed building of the hypertext on the principle system of semantic links of cognitive elements.
of setting links in all cases of close semantic contiguity Thus, the more links the node has in the hypertext
between statements. Though not complicated intellectually, network, obviously the higher and more considerable the
comparison of types of IS
statistical description theory of IR
of text choice of inf. lang.
FIG. 2. Graph reflecting dissertations prepared for 1965-1969. Autom., automatic (ed); Bibl., bibliography(ic); DB, data base(s); ES, expert
system(s); Inf., information; IR, information retrieval; IS, information system(s); Lang., language; Lib., library(ies); Ref., reference; S&T,
science and technology; SDI, selective dissemination of information; Transl., translation.
S & T specialized IS
FIG. 3. Graph reflecting dissertations prepared for 1970-1974. See legend to Figure 2 for abbreviations.
any subject area (ultimately human knowledge) is a single framed from all that has been published . . (Otlet,
system of links among cognitive elements forming it. It 1975a)
is this particular understanding that has been forming in
It should certainly be taken into account that this idea
information science for a long time. Pioneers of this science
was expressed at the beginning of the 20th century and was
had advanced similar ideas as far back as the pre-hypertext oriented at the technical possibilities of that time. Although
period. Now it is becoming clear that successes in the
they were very limited by present-day standards, Otlet
development of computer facilities and programming have
foresaw modern achievements, even systems of remote
made it possible to realize ideas which had been developed access to data banks. In 1934, he wrote:
within library and information sciences long before. Bibli-
ographers, for hundreds of years, appreciated the difficulty Any one from afar would be able to read the passage,
of organizing and presenting information. Possibilities and expanded or limited to the desired subject, projected on
trends in the development of information technology in this his individual screen. Thus, in his armchair, any one
would be able to contemplate the whole of creation or
field were guessed and correctly predicted by the pioneers
certain of its parts. (Otlet, 1975b)
of informatics.
P. Otlet is known to most specialists only in connection From the time of pioneers of hypertext (Bush, 1945;
with the Universal Decimal Classification (UDC) he created Engelbart & English, 1968; Nelson, 1966) to the mid-
in 1905. As far back as 1905 he realized the need to 1980s the hypertext idea experienced an incubation period
order the world system of scientific communication. In his when numerous projects which developed certain aspects
report at the International Congress on Bibliography and of this idea were carried out in an isolated way within the
Documentation (Brussels, 1908) he expressed an idea which framework of various scientific directions. As a rule, the
contained the kernel of hypertext technology: designers, far from using the term “hypertext,” did not often
realize their link with the works by Engelbart and Nelson.
The medium of the organization of scientific work is the
book, above all in its latest form, the periodical . . . . Only now is it possible to identify the range of works
The only conception which corresponds to reality is which in fact developed certain essential elements of the
to consider all books, all periodical articles, all the hypertext concept. Although the task is far from simple its
official reports as volumes, chapters, paragraphs in one solution would facilitate the use of the potential of former
great book, the Universal Book, a colossal encyclopedia projects to ensure further development of hypertext and
semantic compression
informatics for management
inf. needs
inf. services
FIG. 4. Graph reflecting dissertations prepared for 197551979. See legend to Figure 2 for abbreviations.
international bibl.
bibl. in agriculture
bibl. in science
bibl. in engineering
inf. services in
software for
language interface
software for DB
FIG. 5. Graph reflecting dissertations prepared. See legend to Figure 2 for abbreviations.
improvement of its new forms and tools. If the problem of the many works of this cycle mention should be made
hypertext is viewed from broad logical-linguistic positions, of the reports by Tseitin (1961) and Ivanov (1961)
one can see that it arose from attempts to overcome the at the Conference on Information Processing, Machine
narrowness of traditional tools of information retrieval: Translation and Automatic Reading of Text held in Moscow
hierarchical classification schemes and descriptor languages in 1961.
of coordinate indexing. Similar attempts in earlier years In the Tseitin report, the main task dealing with the
led to Luhn’s (1958) ideas of automatic abstracting and construction of a model of text was to identify a host
Garfield’s (1955) citation networks. The former line proved of grammatically correct phrases and to indicate which
unpromising, whereas the latter was brilliantly developed in were equivalent to each other in meaning. The task was
science citation indexes and in maps and atlases of scientific to develop an algorithm generating a multitude of such
fields. equivalent pairs. The Ivanov report ended with a prophetic
idea to the effect that deliberate operations over linguistic
systems required for machine translation and information
Soviet Investigations in the 1960s and 1970s
retrieval may be linked by feedback with the development
One of important lines leading to modern linear text gen- of these systems themselves.
erating based on logical-linguistic research of the 1960s. Of special interest are works dealing with the distributive
In the 1960s the USSR conducted major logi- analysis of text and developing methods of American
cal-linguistic and logical-mathematical projects aimed descriptive linguistics (Harris, 1951). In this connection we
at establishing formalized models of natural language would like to mention the book by Andreev (1967) and
with a view to automating information processes. Of a Doctoral thesis by Shaikevich (1982). The dissertation
. -...- .
rA
for
necessity
hypertext
arises
navigation.
to have something 1 il::e a compass
I---- -----Ob:PECT~OC:Tb-
) In complex network, wher+ each, npde. has. many. 1 I nkcj-, se,]:ec>
Many researchers point out, that in large hyperte,:c network:
Rules determining direction af navigation in accordqce, wi-t
With the rules, navigation may be carried out algorlthmical
FIG. 6. Semantical area of a node in BAHYS (adjacent nodes represented by first lines).
lows one to effect transitions (navigation) between different Conklin. J. (1987). Hypertext: An introduction and survey. Computer,
hypertexts (designers call this feature “hyperlink”). Visual- 17-24.
Engelbart, D., & English, W. (1968). A research center for augmenting
ization of a hypertext net combined with various images human intellect. AFIPS Conference Proceedings, 33, 1.
and pictographs is a good tool for intensifying the user’s Garfield, E. (1955). Citation indexes for science. Science, 122,
thinking in solving complex problems. The HYPERNET is 108-111.
also a product of the State Scientific Technical Center of Gilyarevskii, R. S., & Kaloshin, V. V. (1988). Development trends
Hypertext Information Technologies (Moscow). of informatics (based on Soviet dissertations from 1965 to 1980).
Automatic Documentation & Mathematical Linguistics, 22, 56-68.
SEMPRO (Semantic Processor) is a hypertext system
Harris, Z. (1951). Methods in structural linguistics. Chicago.
of a similar type designed and disseminated by the Ivanov, V.V. (1961). On constructing of information language for
Soviet-Finnish-Bulgarian joint venture NOVINTEKH. It texts on descriptive linguistics. Conference on Informntion Process-
takes into account peculiarities of users with a humanistic ing, Machine Translation and Automatic Reading of Text, Moscow,
thinking process and is specially adapted to the main p, 15 (in Russian).
Ivanov, V. G., & Subbotin, M. M. (1978). Analysis and updating of
stages of the authorship process: information (source texts)
comprehensive solutions with the use of computers on the basis of
accumulation, creating a variety of notes and fragments for the method of logical-semantic modeling. Moscow (in Russian).
the future text, and compiling variants of the exposition. Luhn, H.P. (1958). The automatic creation of literature abstracts
In all described systems with algorithmic navigation are (autoabstracts). IBM Journal of Research and Development, 2,
implemented the same rules of compiling coherent texts 159-165.
Lunin, L.N., & Rada, R. (Eds.) (1989). Perspectives on hypertext.
from small fragments. All of them inform the user of the
Articles about hypertext. Journal of the American Society for Infor-
semantic gaps that cannot be suppressed in the constructed mation Science, 40, 158-220.
texts. Each of these systems supports the creation and Nelson, T. (1966, May) The information systems in the future.
updating of hyperbases using keywords for searching Information retrieval: A critical view. In G. Schecter (Ed.), Third
“candidates for linking” [this method is similar to that used Annual Colloquium on Information Retrieval, Philadelphia.
Otlet, P. (1975a). La documentation on matiere administrative. In
by designers of the Arizona Analyst Information System;
Actes de la Conference internationale de bibliographie et de docu-
see Carmel, McHenry, & Cohen (1989)]. mentation (pp. 147-154). Bruxelles (W. Boyd Rayword, trans.)
The Universe of Information (p. 16). Moscow, FID 520 (Original
work published in 1908).
Otlet, P. (1975b). Traite de documentation. Bruxelles (W. Boyd
References Rayword, trans.) The Universe of Information, (p. 354). Moscow,
FID 520 (Original work published in 1934).
Andreev, N. D. (1967).Statistical-combinatoric methods in theoretical Shaikevich, A. Y. (1982). Distributive-statistic& analysis of texts.
and applied linguistics. Leningrad(in Russian). Doctoral thesis, Moscow (in Russian).
Bush, V. (1945). As we may think. Atlantic Monthly, 276, 101-108. Subbotin, M. M. (1986). Computer applications and the construction
Cannel, E., McHenry, W. K., & Cohen, J. (1989). Building large, of chains of reasoning. Automatic Documentation & Mathematical
dynamic hypertext: How do we link intelligently? Journal of Linguisfics, 20, 1 - 10.
Management Information Systems, 6, 33-50. Tseitin, G. S. (1961). On constructing of mathematical models of Ian-
Chelnokov, V. M. (1985). Making the concept of integrity opera- guage. Conference on Information Processing, Machine Translation
tional in the representation of knowledge. In System research: and Automatic Reading of Text, Moscow, p. 11 (in Russian).
Methodological Problems. A yearbook (pp. 103-112). Moscow (in Zefirova, V.L. (1990). The research hypertext system HYPERLOG.
Russian). NOVINTEKH: International Computer Journal, 1, 29-30.