You are on page 1of 15

An electronic thesaurus of Vedic texts

Jost GIPPERT, Frankfurt

When the first attempts to digitize Old Indic texts were made in the late
1970ies, nobody could foresee that it would take only a few decades to reach
the aim of putting scholarly investigations into the language and the contents of
Vedic texts on an electronic basis. Beginning with the famous Texas version of
the RV-Samhitā, several projects that were dedicated to the entry of Vedic texts
˙
were undertaken independently all over the world, in the US, Japan, and
Europe, until in 1987, the common project of an electronic thesaurus of texts
that are relevant for Indo-European studies was outlined by some Indo-Europ-
eanists during a conference in Leiden / Netherlands. It goes without saying that
within this project, which was later given the name of "TITUS"1, the corpus of
Vedic texts plays a prominent role, and on the basis of a free exchange of data,
the aim of being exhaustive in this respect has nearly been achieved; cp. table 1
below where those texts that have already been digitized or are at present being
electronically prepared are listed2. It is to be hoped that the existing gaps will
soon be filled, provided contributors for the texts in question can be found.

In the present paper, I should like to discuss some peculiar problems that
have to be tackled with respect to the preparation of Vedic texts for electronic
analysis and retrieval as a corpus. As a matter of fact, the Sanskrit language –
and Vedic even more so – raises a set of unusual problems when the adaptation
of textual material to digital processing is concerned. If we consider that the
overall principle of electronic retrieval consists in searching which presupposes
unambiguous, i.e. clearly identifiable word forms, it becomes clear at once that
as soon as we are dealing with a corpus of texts, not a single text, unification
is required to a considerable extent.

1
"Thesaurus indogermanischer Text- und Sprachmaterialien" ("Thesaurus of Indo-
European textual and linguistic material"). For the first announcement of the project, cf.
GIPPERT (1987); for reports and descriptions, cf. GIPPERT (1995a), (1995b) and (1997a).
2
In the table, those texts that have already been worked on are marked (by a shadowed
background). The state of the project is permanently documented in the WWW pages
http://titus.uni-frankfurt.de/texte/texte2.htm and http://titus.
fkidg1.uni-frankfurt.de/texte/texte2.htm from which most of the Vedic
texts are directly retrievable. For a special report about Old Indic and Iranian texts represent-
ed in the TITUS collection cf. GIPPERT (2000).

-1-
Within Sanskrit, the main problem that has to be tackled in this respect, is the
treatment of Sandhi phenomena. We all know that word forms such as devah, ˙
devo, devas, devaś or agni , agnis, agni , agniś, agnir are contextual variants
ḣ ṡ
of but one underlying form each; for the computer, however, each of them
represents an individual unit, distinct from the others by its outer shape. The
problem has indeed led to several solutions which are not compatible with each
other in a corpus environment. A few examples may suffice to illustrate the
effects of this.
One of the most widespread solutions is the one which I should style a
"minimalistic approach". It is confined to a plain transliterative rendering of
what we would read in a Devanāgarı̄ edition, with word boundaries being
unmarked when falling within ak a ras. This approach was, e.g., used in the first

electronic adaptation of the Śatapatha-Brāhma a (Mādhyandina recension) ṅ
prepared by H.S. ANANTANĀRĀYA A as early as 1975; cf. Fig. 1 which shows

ŚBM 1,1,1,1 both in the underlying printed edition (by A. WEBER, 1849/1964)
and in the electronic version thus produced, with word boundaries of the type
in question emphasized. It is clear that a retrieval of unique word forms can
hardly yield reliable results on this basis.
vratamúpaisyan | ántarenāhavan´
˙ ı̄yam ca ˙ ˙
g´ārhapatyam ca prāṅ tísthannapa úpasprśati
˙ ˙˙ ˙
tadyádapá upasprśátyamedhyo vai púruso
˙ ˙
yadánrtam vádati téna p´
˙ ˙ ūtirantarato médhyā vā
´
āpo médhyo bhūtv´
ā vratamúpāyān´ı̄ti pavítram vā ˙
´
āpah pavítrapūto vratamúpāyān´
˙ ı̄ti tásmādv´
ā apa
úpasprśati ˙

Fig. 1: ŚBM 1,1,1,1: Weber’s edition vs. e-text (word-boundaries within aksaras emphasized)
˙

A second solution might be called the "Western traditional approach". Here,


the electronic text is based on a transcription with word boundaries indicated
unless contraction applies according to the rules of vocalic sandhi. This
approach is, e.g., met with in the electronic version of the RV Khilāni provided
by C. JORDÁN CÓLERA and F.J. MARTÍNEZ GARCÍA (1996), in accordance with
the underlying printed edition by I. SCHEFTELOWITZ (1906/1966) which was
outlined in just the same way; cf. Fig. 2 showing RVKh I,1 with cases of
contraction marked.

1 a sámaiksisyordhvámahasa ādityéna sah´


˙ ˙ ı̄yasā /
b ahám yyáśasvínām yáśo viśvā rūp´
˙ ˙ āny ´
ā dade /
˙
2 a udyánn adyá ví no bhaja pit´ ā putrébhyo yáthā/
b dı̄rghāyutvásya heśise tásya no dhehi sūrya /
˙
3 a udyántan tvā mitramaha āróhantam vvicaksana/
˙ ˙ ˙
b páśyema śarádaś śatám j´
ı̄vema śarádaś śatám /
˙
4 abhí tyám meśám puruhūtám rgmíyam /1.
˙ ˙ ˙

Fig. 2: RvKh I,1: Scheftelowitz’s edition vs. e-text (vowels contracted by sandhi marked)

-2-
A third solution, which reflects another way of transcription established
within Western tradition, represents a "half-way" compromise between a trans-
literative rendering of Devanāgarı̄ text (in the sense of a Sa h itā-Pā h a) and anṁ ṫ
interpretative, transcriptional dissolution of word forms (resembling a Pada-
Pā h a version). Here, vocalic sandhi is resolved in all cases, even when contrac-

tion would occur, with apostrophes used as markings whenever needed. This
method was, e.g., applied in the electronic version of the Gobhila-Grhyasūtra ˚
prepared by C. JORDÁN CÓLERA and F.J. MARTÍNEZ GARCÍA again (1995),
mirroring the edition provided by F. KNAUER (1884); cf. Fig. 3 which shows
GobhGS 1,1, with the "false avagrahas" resulting from this transcription
method marked.
yajñopavı̄tinā ’’cāntodakena krtyam. udagayane
˙
pūrvapakse punye ’hani prāg āvartanād ahnah
˙ ˙ ˙
kālam vidyāt yathādeśam ca. hsarvānyh evā
˙ ˙ ˙
’nvāhāryavanti. apavarge ’b irūpab ojanam ˙
yathāśakti. - brahmacārı̄ vedam adhı̄tyā ’ntyām ˙
samidham abhyādhāsyan jāyāyā vā pānim ˙ ˙
jighrksan, anuguptā apa āhrtya,
˙ ˙ ˙
prāgudakpravahnam deśam samam vā
˙ ˙ ˙ parisamuhyo
˙
’palipya, mad yatah prācı̄m lekhām ullikhyo
˙ ˙
Fig. 3: GobhGS 1,1: Knauer’s edition vs. e-text ("false avagrahas" marked)

A "maximalistic" approach with regard to the dissolution of sandhi was


developed by P. SCHREINER and his students in the preparation of an e-text of
Chāndogya-Upanisad (1985-86, on the basis of the editions by LIMAYE/VADE-
˙
KAR 1958 and SENART). Here, all word boundaries that are affected by sandhi
are marked (by an asterisk, *), with contracted vowels restituted. Although this
method comprises markings of compound boundaries (by a plus sign, +) as
well, it is still distinct from a real Pada-Pā h a in that it is not pausa variants

that the occurring word forms are represented by; cf. Fig. 4 showing ChUp
1,1,10 as appearing in the Devanāgarı̄ edition by O. BÖHTLINGK (1889), with
cross-sandhi markings emphasized.
<101.10/01> tena*ubhau kuruta.h [!] ya/s*
ca*etad* eva.m* veda ya/s* ca na veda !
<101.10/02> n-an-a tu vidy-a ca*a+vidy-a ca !
<101.10/03> yad* eva vidyay-a karoti
/sraddhay-a*upani.sad-a tad* eva v-iryavatta-
ra.m* bhavati*iti khalv* etasya*eva*a+k.sara-
sya*upavy-akhy-ana.m* bhavati !!101.10!
Fig. 4: ChUp 1,1,10: Böhtlingk’s edition vs. e-text (cross-sandhi markings emphasized)

A comparison of the four methods illustrated above clearly shows that with
a view to a unified corpus structure, the mere "mirroring" of printed editions
can only be a first step and may have misleading effects. It also illustrates the
dilemma we encounter when trying to represent both a coherent running text
and its analysis with regard to the word forms contained in it in but one tran-

-3-
scription — trying to reunite the information provided by a plain Sa h itā-Pā h a ṁ ṫ
text with its Pada-Pātha-like interpretation means an attempt to kill two birds
˙
with one stone. This is why we have decided to keep Sa h itā-Pā h a and Pada-
ṁ ṫ
Pā h a versions of Vedic texts apart within the TITUS project in order to pro-

vide a maximum of information both as to the original shape of the transmitted
text and to its analysis. On the basis of the WordCruncher retrieval system3
which has proven extremely well suited for the given task and which has
therefore been adapted as the main basis of the TITUS text collection, two
solutions are feasible for this purpose.
The more simple solution consists in a line-by-line arrangement of the two
text variants within one text file, each verse being represented by two clearly
distinct versions. It goes without saying that such an arrangement meets the
requirements of electronic text retrieval only if the two versions are indexed
separately, thus providing access to both the Sa h itā-Pā h a word forms and
ṁ ṫ
their Pada-Pā h a interpretations; separate indexation of this kind is one of the

most remarkable features indeed of the WordCruncher retrieval system. Cp.
Fig. 5 which shows the synoptical arrangement of the three verses of ChUp
1,1,10 in its WordCruncher representation, together with the structure of the
underlying input text.

Fig. 5a: ChUp 1,1,10 (synoptical arrangement of plain transcriptional and analytic text variants)

3
The WordCruncher retrieval system has been developed by Brigham Young University
since the 1980ies; within the TITUS project, a special server which provides online access
to the preindexed texts on its basis, has been established since 1997 (cf. http://titus.uni-
frankfurt.de/texte/tituswc.htm). Unfortunately, its usage is restricted to an MS Windows
environment even in its most recent version, and it remains doubtful whether a cross-platform
version will ever appear.

-4-
|v1 〈 Tiovpl16 〉 tenobhau kurutah {\} yaś caitad evam veda yaś ca na veda \ 〈 Tn16 〉
˙ ˙
〈 Tiovps16 〉 tena+ ubhau kurutah {\} yaś+ ca+ etad+ evam+ veda yaś+ ca na veda \ 〈 Tn16 〉
˙ ˙
|v2 〈 Tiovpl16 〉 nānā tu vidyā cāvidyā ca \ 〈 Tn16 〉
〈 Tiovps16 〉 nānā tu vidyā ca+ a-vidyā ca \ 〈 Tn16 〉
|v3 〈 Tiovpl16 〉 yad eva vidyayā karoti śraddhayopanisadā tad eva vı̄ryavattaram bhavatı̄ti khalv
˙ ˙
etasyaivāksarasyopavyākhyānam bhavati \\101.10\ 〈 Tn16 〉
˙ ˙
〈 Tiovps16 〉 yad+ eva vidyayā karoti śraddhayā+ upanisadā tad+ eva vı̄ryavattaram+ bhavati+
˙ ˙
iti khalv+ etasya+ eva+ a-ksarasya+ upavyākhyānam+ bhavati \\101.10\ 〈 Tn16 〉
˙ ˙
Fig. 5b: (same, input structure for WordCruncher retrieval system)

Another solution provided by the WordCruncher system is the synchronized


arrangement of two distinct text versions, each of them being accessible by a
separate index again. Within the TITUS project, this solution has, e.g., been
chosen for the edition of Kātha Samhitā which is at present being worked on by
˙ ˙
Chl.H. WERBA (since 1998); cp. Fig. 6 which illustrates KS 19,2: 14,7 ff.
arranged in this way, with sentences enumerated by letters for easy referencing.

Fig. 6: KS 19,2e-j: 14,7 ff. (Samhitā-Pātha and Pada-Pātha versions arranged synoptically)
˙ ˙ ˙

Within the TITUS project, only a few texts have been prepared in such a
twofold way so far, and the question of which method of arrangement to prefer
is still open. The same holds true for some minor points regarding sandhi that
will have to be decided before the complete Vedic corpus can be made ready
for a cross-textual online retrieval. They comprise, among others, the question
what sandhi variant should be taken as the underlying form in Pada-Pātha ˙
versions (- for final -s in deva , - in agni , -r in puna ? ) and whether or not
ḣ ḣ ṡ ḣ ḣ
the "normalized" forms of a Pada-Pā h a version should be marked wherever

they are altered by sandhi in the corresponding Sa h itā-Pā h a text passage. Cp. ṁ ṫ
Fig. 7 showing BAU 1,1,1 as prepared by M. ALBINO (1996-7), with final -s
represented as such in the Pada-Pā h a version and a tilde (~) added where the

plain text has sandhi variation.

-5-
Fig. 7: BAU 1,1,1 (Samhitā-Pātha and Pada-Pātha arranged line-by-line, with additional markings)
˙ ˙ ˙

A second principle that must be taken into account when preparing Vedic
texts for an electronic cross-corpus retrieval, is word accentuation. For the
computer, word forms such as devah, agnih, or brhaspatih are not at all ident-
˙ ˙ ˙ ˙
ical with their accented variants, devá , agní , b´ h aspáti , and special ways of
ḣ ḣ ṙ ḣ
treatment must be envisaged to facilitate a common search of equivalents of this
type within the Vedic corpus. The problem seems even harder to overcome if
we consider that within Vedic tradition, the notation of accentuation differs to
a certain extent between schools so that we have to distinguish a Rgveda style, ˚
a Śatapatha-Brāhma a style, a Kā h aka style, and a Sāmaveda style. These
ṅ ṫ
divergences have led to several approaches in the preparation of e-texts again.
A "minimalistic" approach can be seen in A. PANDEY’s transcription of the
Sāmaveda-Samhitā (Kauthuma) which contains no marking of accents at all,
˙
thus contrasting with the traditional Devanāgarı̄ representation which uses
superscript digits, 1, 2, and 3, to denote accents as in SĀNTAVALEKAR’s edition
(1956). Cp. Fig. 8 showing SVK 1,1,1,1,1-5:
{01a} agna ā yāhi vı̄taye grnāno havyadātaye .
˙˙
{01c} ni hotā satsi barhisi .. 1 ˙
{01a} tvamagne yajñānā˙ m̆ hotā viśvesā˙
m̆ hitah . ˙ ˙
{02c} devebhirmānuse jane .. 2 ˙
{03a} agnim dūtam vrnı̄mahe hotāram viśvavedasam.
˙ ˙˙ ˙ ˙
{03c} asya yajñasya sukratum .. 3
{04a} agnirvrtrāni jaṅghanaddravinasyurvipanyayā.
˙ ˙ ˙
{04c} samiddhah śukra āhutah .. 4
˙ ˙
{05a} prestham vo atithi˙
˙˙ m̆ stuse mitramiva priyam.
˙ ˙
Fig. 8: SVK 1,1,1,1,1-5: accented Devanāgarı̄ vs. unaccented transcriptional e-text

If we consider that there is a clear correspondence between the three numeric


notations and their Rgvedic equivalents, 1 agreeing with a RV udātta, 2, with
˚
a RV svarita (or an udātta lowered where no svarita follows, i.e. in verse-final
position or immediately preceding an anudātta), and 3, with a RV anudātta, we

-6-
might propose to use the common transcription method of the RV for the SV
Sa h itā as well; cp. Fig. 9 showing the corresponding text passage, RV 6,16,

10ff., both in its Devanāgarı̄ representation (taken from M. MÜLLER’s edition,
1877/1965) and as a transcriptional e-text.

|p10
|va 〈Tiovmxla16〉ágna ā´ yāhi vı̄táye〈Tn16〉
|vb 〈Tiovmxla16〉grnānó havyádātaye /〈Tn16〉
˙˙
|va 〈Tiovmxla16〉ní hótā satsi barhísi //〈Tn16〉
˙
|p11
|va 〈Tiovmxla16〉tám tvā samídbhir aṅgiro〈Tn16〉
˙
|vb 〈Tiovmxla16〉gh t éna vardhayāmasi /〈Tn16〉

Fig. 9: RV 6,16,10 ff.:Accented Devanāgarı̄ text vs. e-text (accented transcription with structure elements)

It is clear that the traditional rendering of the RV accentuation is highly


interpretative as it stands in that it reduces the threefold notational system of
udāttas, svaritas, and anudāttas to two basic units, an acute and a grave mark.
In this way, it opposes itself to two other approaches that have also been
adopted in digitizing Vedic texts. One of them, which might be styled "superfi-
cial", consists in the one-by-one representation of the anudātta-like accent mark
occurring in Śatapatha-Brāhmana tradition by an acute accent, thus neglecting
˙
the possibility that a given accent mark may represent various types of accentu-
ation. If A. WEBER was right, we have to assume that in ŚB manuscripts, a
horizontal stroke below an ak a ra regularly denotes a svarita of the following

syllable. In this way, it often seems to mark the udātta of the syllable it pertains
to, but in a sequence of udātta syllables, only the last one receives the stroke;
furthermore, the stroke marks the non-udātted syllable before an independent
svarita4. On this basis, an interpretative rendering gives a similar picture as the
one used for the RV Sa h itā; cp. Fig. 10 which shows ŚBM 1,1,1,1 as quoted

before (Fig. 1), but with additional marking (by underlining) of independent
svaritas and of vowels that may be regarded as udātted albeit they bear no
accent mark.
vratám úpaisyan | ántarenāhavan´
˙ ˙ ı̄yam ca
˙
g´ārhapatyam ca pr´
˙ āṅ tísthann apá úpasprśati
˙˙ ˙
h
tádyádapá upasprśáty amed yó vái púruso yád
˙ ˙
ánrtam vádati téna p´
˙ ˙ ūtir antarató médhyā v´

´
āpo médhyo bhūtv´
ā vratám úpāyān´ı̄ti pavítram v´ ā ˙
´
āpah pavítrapūto vratám úpāyānı̄ti tásmād v´
˙ ´ ā
` h
´
apá úpasprśati || 1 || só ’gním evāb ı̄ksamāno
˙ ˙ ˙
Fig. 10: ŚBM 1,1,1,1 Devanāgarı̄ edition vs. e-text (with non-marked accents restored)

4
The "double stroke" used to denote independent svaritas as in, e.g., ;&Û@◊$ô@@ê@#k@◊@@…@$ ≈
` ´ı̄ksamāno, is an invention of WEBER’s and has no basis in manuscript tradition;
’gním evābh ˙ ˙
cf. his edition (1849/1964), p. XII.

-7-
As against the "superficial" approach outlined above, a one-by-one rendering
of what the written tradition has may as well result in a "maximalistic" view.
This is, e.g., met with in V. PETR’s and P. VAVROUŠEK’s e-text of the
Atharvaveda (Śaunaka) Sa h itā which is based on the edition by ROTH-WHIT-

NEY (1856) and its transcription by Ch. ORLANDI (1991). Here, we find the
digits 1 and 3 used in accordance with their Indic equivalents, 1 and 3, which
in the original text serve to denote independent svaritas that immediately pre-
cede an udātta, 1 being used if the svarited vowel is short, 3, if it is long; cf.
Fig. 11 which shows AVS 13,4,23 ff. prepared in this way. It must be stated
that the information these digits provide is rather redundant if, as in the given
case, the svarita itself is marked by a grave accent in the transcription, length
being an intrinsic feature of the vowel in question.
bhūtám ca bhávyam ca śraddh´
˙ ˙ ā ca rúciś ca
svargáś ca svadh´
ā ca //23//
yá etám devám ekav´
˙ rtam véda //24//
˙ ˙
sá evá mrtyúh sò3 ’m´
˙ ˙ rtam sò3 ’bhvà1m sá ráksah
˙ ˙ ˙ ˙ ˙
//25//
sá rudró vasuvánir vasudéye namovāké vasatkāró ˙ ˙
’nu sámhitah //26//
˙ ˙
tásyemé sárve yātáva úpa praśísam āsate //27//
˙
tásyām´ū sárvā náksatrā váśe candrámasā sahá
˙
//28// {17}

Fig. 11: AVS 13,4,23 ff.: Devanāgarı̄ text vs. e-text (with marking of independent svaritas)

It goes without saying again that with a view to a cross-corpus retrieval, the
divergent notations illustrated above seem to be an impeding factor. Given that
the interpretative transcription used for the RV Samhitā can be adapted to the
˙
other accentuation styles as well, one might wonder whether we should not aim
at a unique representation of accented vowels for all texts concerned, all the
more since it is only with certain types of secondary svaritas caused by sandhi
that we will have to expect noteworthy differences5. But even if a unification
of accent marking is thus possible to a high extent, the question remains how
to deal with the basic dichotomy of accented vs. non-accented texts we meet in
Vedic tradition. A common solution is required indeed if we intend to auto-
matically trace a quotation of a verse containing, e.g., accented puróhita , in an ṁ
unaccented environment where we must expect purohitam instead. On the other ˙
hand, we should not just dispose with the information accented texts contain by
providing them with unaccented indexes only (which would easily be possible
on the basis of the WordCruncher system). To meet these requirements,
accented Vedic texts of the TITUS collection are now being prepared in such

5
Cf. WEBER (1849/1964), p. XIII who underlines the specific occurrence of svaritas in
the White Yajurveda tradition where an unaccented vowel is contracted with a preceding
accented one; cf. MACDONNELL (1910), 104, § 108 for a survey of effects of this kind.

-8-
a way that the word forms they contain are accessible not only via an
"accented" index but also via an unaccented meta-index. This means that the
occurrence of puróhita in RV 1,1,1a will also be found in a cross-textual

search for unaccented purohitam; cf. Fig. 12 which illustrates the procedure of
˙
a "library search" as offered by the WordCruncher retrieval system.

Fig. 12: puróhitam in RV 1,1,1a accessed via a "library search" for purohitam
˙ ˙

The practicability of this procedure notwithstanding, it must be stated that the


reduction of the word forms to be retrieved to unaccented units yields a loss of
consistency in that it makes a distinction of enclitic and non-enclitic variants of
pronominal forms or accented and unaccented verbal forms impossible. This is
why for future times, we should envisage an alternate solution, viz. to provide
accented "meta-indexes" instead of unaccented ones. It goes without saying that
this presupposes much more scholarly analysis, given that the distinction of
variants of the type named above can only be founded on a thorough linguistic
analysis. The same holds true for another task of the future, viz. the preparation
of indexes for automatic lemmatization which would enable us to search for
complete paradigms in one step. This, too, presupposes a huge amount of
scholarly work which practically consists in the morphological identification of
all word forms contained in the Vedic corpus. On its basis, it will lastly be
possible to provide searching facilities for morphological and syntactical fea-
tures such as the occurrence of special cases depending on a given verb etc.6

6
On the basis of A. LUBOTSKY’s concordance (1997), a morphological inventory of the
word forms of the RV Samhitā is at present being prepared in the course of the "AUREA"
˙

-9-
A special task Vedic tradition brings about with respect to computational
analysis, is the differentiation of metrical and prose passages. If we consider
that in a metrical environment, word forms may behave in a special way both
phonetically and morpho-syntactically, a clear distinction of the context types
is required right from the beginning. Beyond that, we can expect a thorough
computational analysis to reveal important new information about the metrical
structures of Vedic themselves. Within a special project ("AUREA", cf. n. 6),
the preliminaries of a metrical "parsing" have recently been tested for the RV
Sa h itā on the basis of the digital text provided by B.A. VAN NOOTEN and G.

HOLLAND (1994). For the given purpose, the "metrically restored" text of this
edition was electronically cross-checked with the Pada-Pā h a-like text variant ṫ
prepared by A. LUBOTSKY for his word concordance (1997) in order to thus
detect inconsistencies and questionable presuppositions of the underlying text7.
The first results of the investigations carried out on this basis, mostly concern-
ing residues of laryngeals in the oldest strata of the RV Samhitā, have recently ˙
been published8; the basic considerations can be summarized as follows.
For a proper treatment of older Vedic metrics with respect to their underlying
phonological structure, at least ten subtypes of syllables must be distinguished;
cp. the following listing which also contains some further essential elements.
- long syllable (by nature: long vowel, diphthong)
= long syllable (by position caused by muta cum liquida)
~ long syllable (by position caused by ch and l B h)
B, l
_ long syllable (by position, all other cases)
V uncertain syllable ā ← /-é ā/)
(short vowel before word-initial vowel, -á ´
W uncertain syllable (short vowel + consonant before word-initial vowel, -am a-)
Y uncertain syllable ´
(short vowel before Aspirata, duhitā)
^ uncertain syllable (first element of disyllabic vowel; Ü in Neue Wege)
‘ uncertain syllable (filling syllable in "incomplete" verses, mostly before caesura)
U short syllable (all other cases)
X undefined verse-final syllable (anceps)
/ caesura in trimeter verses
\ "secondary caesura" in trimeter verses (beginning of cadence)
# word boundary within verse
& string separator

On the basis of these syllable types, preindexing could be done in a twofold


way, either "differentiated", with all types indexed separately, or "undifferent-

project ("Avesta und Rigveda: Elektronische Analyse"; cf. http://titus.uni-frank-


furt.de/curric/aurea.htm).
7
Cf. GIPPERT (1999), 99 n. 9 for an example. — Cp. Fig. 12 which shows the different
text variants interlinearily arranged (plain Sa h itā-Pā h a text, Sa h itā-Pā h a text divided
ṁ ṫ ṁ ṫ
into verse units, Pada-Pā h a text (LUBOTSKY), metrically restored text (VAN NOOTEN/HOL-

LAND). All four variants are provided with a separate index.

8
Cf. GIPPERT (1997b) and (1999).

- 10 -
iated", with all "long" types taken together as "long", and all "uncertain" and
"short" types taken together as "short". In this way, several preindexed files
were produced which allow for exact and "wildcard" searches of certain metri-
cal constellations. Thus we can now easily search for 11-syllable verses with
"non-canonical" cadences of the type -UX or the subtype -YX, i.e. constel-
lations where we might expect laryngeal residues to exist. A searching process
undertaken on this basis immediately gives the result that there are 35 examples
of -YX cadences as illustrated in Fig. 13.

Fig. 13: Search for tristubh cadences of the type -YX


˙˙

1,33,9a; 1,63,4a; 1,77,3b; 1,100,16c; 1,103,4d; 1,121,8c; 1,121,9d; 1,141,12b; 1,186,8c; 2,20 (211),1b; 2,30
(221),6a; 4,4 (300),10c; 4,16 (312),20b; 4,17 (313),18a; 5,31 (385),5c; 5,33 (387),5b; 5,41 (395),5b; 6,1
(442),12c; 6,40 (481),5a; 6,62 (503),9a; 6,65 (506),2b; 6,66 (507),7b; 7,34 (550),24b; 7,69 (585),7c; 7,93
(609),5c; 7,96 (612),2c; 9,94 (806),1a; 9,96 (808),2c; 10,1 (827),7a; 10,23 (849),4c; 10,39 (865),14b; 10,77
(903),2d; 10,77 (903),5a; 10,78 (904),4d; 10,99 (925),4c
Output: Reference list

A countercheck reveals that there are 278 examples all in all of irregular
cadences of the (undifferentiated) -UX type; cf. Fig. 14.

- 11 -
Fig. 14: Search for irregular tristubh cadences (type -UX, undifferentiated)
˙˙

1,33,9a; 1,36,12a; 1,59,4a; 1,60,4c; 1,61,1d; 1,61,11a; 1,62,3d; 1,62,5a; 1,63,4a; 1,77,3b; 1,89,4b; 1,89,6a;
1,89,10c; 1,91,21c; 1,100,6b; 1,100,8c; 1,100,16c; 1,103,4d; 1,104,3b; 1,117,22b; 1,121,1a; 1,121,8c; 1,121,9d;
1,121,15a; 1,122,10b; 1,122,10d; 1,122,11d; 1,126,1c; 1,140,13c; 1,141,12b; 1,149,1b; 1,158,5a; 1,162,22a;
1,165,15c; 1,166,15c; 1,167,2c; 1,167,5b; 1,167,11c; 1,168,10c; 1,169,5a; 1,173,8d; 1,173,11b; 1,173,12b;
1,174,9a; 1,181,1b; 1,186,2d; 1,186,8c; 1,186,9c; 2,4 (195),1b; 2,4 (195),3d; 2,13 (204),1a; 2,18 (209),2d; ...
Output: Reference list

According to the principles outlined above, some special indexes could be


prepared as well which concern, e.g., the distribution of word forms or syllables
with respect to metrical positions; cp. Fig. 15 and Fig. 16 which illustrate the
searching facilities thus obtainable.

Fig. 15: Search for word forms according to their metrical position

- 12 -
Fig. 16: Search for syllables according to their metrical position

A final result of these investigations will be a sandhi index of the RV Sam- ˙


hitā which will give exhaustive information about the occurrence of certain
sandhi types; cf. Fig. 17 showing a preliminary search for the constellation of
(written and metrically real) abhinihita-sandhi.

Fig. 17: Search for written and metrically real abhinihita sandhi

- 13 -
We hope soon to be able to present the results of these efforts to the public,
both for online and offline retrieval and in printed form. It goes without saying
that a lot of preparatory work has to be undertaken before this aim will be
reached. Everybody who is interested in extending our knowledge about the
Vedic language with computational means is invited to participate in the pro-
ject.

References:

BÖHTLINGK, O. (ed., 1889): Ḱhândogjopanishad. Leipzig.


GIPPERT, J. (1987): Mitteilung über einen geplanten Thesaurus altindogermanischer Texte auf
Datenträgern, in: Die Sprache 32/2, 1986 [1987], p. 429.
— (1995a): TITUS – das Projekt eines indogermanistischen Thesaurus, in: LDV-Forum
12/2, 1995, pp. 35-47 (also http://titus.uni-frankfurt.de/texte/
titusldv.htm).
— (1995b): TITUS – Von der Keilschrifttafel zur Textdatenbank, in: Forschung Frankfurt
4/1995, pp. 46-56 (also http://titus.uni-frankfurt.de/texte/ti-
tusff.htm).
— (1997a): TITUS - Alte und neue Perspektiven eines indogermanistischen Thesaurus, in:
Studia Iranica, Mesopotamica et Anatolica 2, 1996 [1997], pp. 49-76 (also http://
titus.uni-frankfurt.de/personal/jg/pdf/jg1998b.pdf).
— (1997b): Laryngeals and Vedic metre, in: Sound Law and Analogy (Fs. Beekes),
Amsterdam – Atlanta, pp. 63-79
— (1999): Neue Wege zur sprachwissenschaftlichen Analyse der vedischen Metrik, in:
Compositiones indogermanicae in memoriam Jochem Schindler, Praha, pp. 97-125 (manu-
script finished Oct. 1996).
— (2000): "Indoiranistisches Text-Retrieval. Elektronische Bearbeitungen altiranischer und
vedischer Texte", in: Indoarisch, Iranisch und die Indogermanistik. Arbeitstagung der
Indogermanischen Gesellschaft vom 2. bis 5. Oktober 1997 in Erlangen, herausgegeben
von B. FORSSMAN und R. PLATH, Wiesbaden, pp. 133-145.
GLASENAPP, H.v. (1929): Die Literaturen Indiens. Wildpark Postdam.
KNAUER, F. (ed., 1884): Das Gobhilag h yasūtra. Dorpat / Leipzig.

LIMAYE, V.P. / VADEKAR, R.D. (ed., 1958): Eighteen Principal Upanisads. I. Poona.
˙
LUBOTSKY, A. (1997): A Rgvedic Word Concordance. I-II. New Haven, Conn.
MACDONNELL, A.A. (1910): ˙ Vedic Grammar. Strassburg.
MÜLLER, F.M. (ed., 1877/1965): The Hymns of the Rig-Veda. 2nd ed. Oxford 1877 / 3rd ed.
(repr.) Varanasi 1965.
ORLANDI, Ch. (ed., 1991): Gli inni dell’ Atharvaveda (Saunaka). Pisa.
ROTH, R. / WHITNEY, W.D. (ed., 1856): Atharva Veda Sanhita. I. Berlin.
SĀNTAVALEKAR, D. (ed., 1956): Sāmaveda-Samhitā. Pāradı̄.
˙ ˙
SCHEFTELOWITZ, I. (ed., 1906/1966): Die Apokryphen des Rgveda. Breslau 1906 / Repr.
Hildesheim 1966. ˙
VAN NOOTEN, B. / HOLLAND, G. (1994): Rigveda: A Metrically Restored Text with an
Introduction and Notes. Boston, Mass.
WEBER, A. (ed., 1849/1964): Śatapathabrāhmanam. Berlin 1849 / Repr. Varanasi 1964.

- 14 -
Table 1.: Vedic texts (to be) incorporated in TITUS (after GLASENAPP 1929, p. 45)

Yajurveda
Rgveda ˚ Sāmaveda Atharvaveda
"Black" "White"
Samhitās

Rgveda-Sa
˚ ṁ h itā (+ Khilāni) Sāmaveda-Sa ṁ h itā Kapi ṫ ṡ h ala-Kā h a
ṫ Caraka-Kā h a ṫ Maitrāya i ya ṅ Taittirı̄ya Vājasaneyi-Sa ṁ h itā Atharvaveda-Sa ṁ h itā
˙

(Śākala) (Vāskala)
˙ (Rānāyanı̄ya)
˙ (Kauthuma) (Jaiminı̄ya) (Mādhyamdina) ˙ (Kānva) ˙ Śaunaka Paippalāda
Kāthaka-Samhitā
˙ ˙ Taittirı̄ya-Samhitā ˙

Tāndya-Mahā-(=Pañcavimśa-)Brāhmana
˙˙ ˙ ˙ Jaiminı̄ya-Brāhmana ˙ Śatapatha-Brāhmana ˙
Kapisthala-Kātha-
˙˙ ˙ Maitrāyani- ˙
Brāhmanas

a v i śa-Brāhma a (w. Adbhuta-Br.) Samhitā ˙ Samhitā ˙


˙

Aitareya- Kau ı̄taki-ṡ Ṡ ḋ ṁ ṅ


Jaiminı̄ya- Gopatha-
Brāhma a Brāhma a (Ka h aka- ṫ Brāhma a
ṅ ṅ Chāndogya-Brāhmana (= Mantra-Br.) ˙ (=Talavakāra-)Upanis ˙ Taittirı̄ya-Brāhma a ṅ (Mādhyamdinı̄ya)
˙ (Kānvı̄ya) ˙ ṅ
Brāhma a ) ṅ
ad-Brāhmana ˙
Ār e ya-Brāhma a
ṡ ṅ
Āranyakas

Brhad-Āranyaka
˙ ˙
Aitareya- Kausı̄taki-
˙
˙

Katha-Āranyaka
˙ ˙ Taittirı̄ya-Āranyaka ˙
Āranyaka˙ Āranyaka
˙
(Mādhyamdinı̄ya)
˙ (Kānvı̄ya) ˙
Upanisads

Taittirı̄ya-U. Brhad-Āranyaka-Upanisad
˙ ˙ ˙ Mundaka-Upanisad
˙˙ ˙
˙

Aitareya- Kau ı̄taki- ṡ Chāndogya-Upani a d ṡ Kena-Upani a d ṡ Maitrāya a - ṅ


Ka h a-Upani a d
ṫ ṡ Mahānārāyana-U. ˙ Īśa-Upanisad ˙ Praśna-Upanisad
˙
Upanisad ˙ Upanisad ˙ Upanisad ˙
Śvetāśvatara-U. (Mādhya ṁ d inı̄ya) (Kā ı̄v ya) ṅ Mā ūkya-Upani a d
ḋ ṅ ṡ

Ārseyakalpa
˙ Baudhāyana-ŚS
Śrauta-Sūtras

Bhāradvāja-ŚS
Āśvalāyana- Śāṅkhāyana- Mānava- Āpastamba-ŚS Vaitāna-
Drahyāyana- ˙ Lātyāyana- ˙ Yajña-Śrautasūtra Kātyāyana-Śrautasūtra
Śrautasūtra Śrautasūtra Śrautasūtra Hiranyakeśi-ŚS
˙ Śrautasūtra
Śrautasūtra Śrautasūtra
Vādhūla-ŚS
Vaikhānasa-ŚS

Mānava- Baudhāyana-GS
G h yasūtra
ṙ Bhāradvāja-GS
Grhya-Sūtras

Laugāk i - ṡ Vārāha- Āpastamba-GS


Āśvalāyana- Śāṅkhāyana- Gobhila- G h yasūtra ṙ G yh asūtra? Kauśika-
Khādira-G h yasūtra ṙ Jaimini-G h yasūtra
ṙ ṙ Hira y akeśi(=Satyā ā h a)
ṅ ṡ ḋ Pāraskara-G h yasūtra ṙ
G h yasūtra
ṙ G h yasūtra
ṙ G h yasūtra ṙ (= Kā h aka ṫ G h yasūtra

˙

-G h yasūtra)
ṙ Vādhūla-GS
Vaikhānasa-GS
Āgniveśya-GS
Śulva-Sūtras

Baudhāyana-Śulvasūtra
Laugāk i - ṡ Mānava-
Kātyāyana-Śulvasūtra
Śulvasūtra Śulvasūtra
Āpastamba-Śulvasūtra
Dharma-Sūtras

Baudhāyana-DhS

Vaisnava- ˙˙ Hārı̄ta- Āpastamba-DhS (Yājñavalkya-Sm t i) ṙ


Vāsi ṫ ṡ h a-Dharmasūtra (Gautamı̄ya-Dharmasūtra ?)
Dharmasūtra Dharmasūtra Hiranyakeśi(=Satyāsādha)
˙ ˙ ˙ (Kātyāyana-Smrti)
˙ ˙

Vaikhānasa-DhS

Nighantu Sāmavidhāna-Brāhmana ˙ Taittirı̄ya-Prātiśākhya


Nirukta Vā śa-Brāhma a
ṁ ṅ Vaikhānasa-Mantrapraśnā

You might also like