You are on page 1of 15

INTRODUCTION

At the TEI and XML in Digital Libraries Workshop that was held at the Library of
Congress in July 1998, several working groups were formed to consider various
aspects of the Text Encoding Initiative. Group 1 was charged to recommend
some best practices for TEI header content and to review the relationship
between the Text Encoding Initiative header and MARC. To this end,
representatives of the University of Virginia Library and the University of
Michigan Library gathered in Ann Arbor in early October to develop a
recommended practice guide. Our work was assisted by similar efforts that had
taken place in the United Kingdom under the auspices of the Oxford Text Archive
the previous year. The following document represents a draft of those
recommended practices. It has been submitted to various constituencies for
comment

Definition:

Text Encoding Initiative: defines a general-purpose scheme that makes it


possible to encode different textual views. “Grew out of technology based textual
analysis applications employed by Humanities scholars” e.g, tracing the use of
the word ‘love’ in the genre poems within a specific historical period. Focus has
been on text capture (in electronic form from already existing text in another
medium) rather than text creation, i.e., no other text copy exists. Assumes texts
and works on texts have a common core of textual features.
Encoding:

SGML (ISO 8879) and ISO 646 (7-bit character set standard). Encodings for
different views of text; alternative encodings for the same text features;
mechanisms for user-defined extensions to the scheme. The Guidelines make it
possible to encode many different views of the text, simulataneously if necessary.
TEI Guidelines are not prescriptive: few features are mandatory, but the
Guidelines define a core set of tags. Extensible. The focus is on the capture of
text that already exists in another medium rather than text creation.

TEI Header is a set of descriptions prefixed to a TEI encoded document that


specifies four components:

• file description (a full bibliographic description),

• encoding description (level of detail of the analysis-the aim or purpose for which
an electronic file was encoded; editorial principles and practices used during the
encoding of the text),

• text profile (classificatory and contextual information such as the text’s subject
matter; the languages and sublanguages used, the situation in which it was
produced, the participants and their setting),

• revision history (history of changes during the electronic files’ development).


contains bibliographic information supporting resource discovery, and data
management portions supporting use of the resource.

http://libraries.mit.edu/guides/subjects/metadata/standards/tei.html
HISTORY
The TEI was established in 1987 to develop, maintain, and promulgate
hardware- and software-independent methods for encoding humanities data in
electronic form. Over nearly three decades the TEI has been extraordinarily
successful at achieving its objective and it is now widely used by scholarly
projects and libraries around the world.

Although a comprehensive history of the TEI has not yet been written, all known
documentary resources about the TEI are stored in the Archive. If you (or others
you know) have electronic copies of any original TEI documents not available
here, please get in touch.The archive of the TEI-L discussion list is a rich
resource for historical information, as is the archive of the now defunct TEI-TECH
mailing list, which can be downloaded in its entirety.

Origins of the TEI

When the Text Encoding Initiative (TEI) was originally established, scholarly
projects and libraries attempting to take advantage of digital technology seemed
to be faced with an overwhelming obstacle to creating sustainable and shareable
archives and tools: the proliferating systems for representing textual material.
These systems seemed almost always to be incompatible, often poorly designed,
and multiplying at nearly the same rapid rate as the electronic text projects
themselves. This situation was inhibiting the development of the full potential of
computers to support humanistic inquiry by erecting barriers to access, creating
new problems for preservation, making the sharing of data (and theories) difficult,
and making the development of common tools impractical.

Part of the problem was simply a lack of opportunity for sustained communication
and coordination, but there were more systemic forces at work as well. Longevity
and re-usability were clearly not high on the priority lists of software vendors and
electronic publishers, and proprietary formats were often part of a business
strategy that might benefit a particular company, but at the expense of the
broader scholarly and cultural community. At the end of the eighties there was a
real concern that the entrepreneurial forces which (then as now) drive information
technology forward would impede such integration by the proliferation of mutually
incompatible technical standards.

In November 1987 a meeting at Vassar College was convened to address these


problems. Sponsored by the Association for Computers in the Humanities and
funded by the National Endowment for the Humanities, it brought together a
diverse group of scholars from many different disciplines and representing
leading professional societies, libraries, archives, and projects in a number of
countries in Europe, North America, and Asia. At this meeting the intellectual
foundation for Text Encoding Initiative was articulated. The organization of the
actual work of developing the TEI Guidelines was then undertaken by the three
TEI sponsoring organizations: The Association for Computers in the Humanities,
the Association for Literary and Linguistic Computing, and the Association for
Computational Linguistics. A Steering Committee was organized from
representatives of the sponsoring organizations, and an Advisory Board of
delegates from various professional societies was formed. To lead the actual
work two editors were chosen and four working committees appointed. By the
end of 1989 well over 50 scholars were already directly involved and the size of
the effort was growing rapidly.

The initial phase resulted in the release of the first draft (known as "P1") of the
Guidelines in June 1990. A second phase, involving an additional 15 working
groups making revisions and extensions, immediately began and released its
results throughout 1990–1993. Then, after another round of revisions,
extensions, and supplements, the first official version of the Guidelines (‘P3’) was
released in May 1994. Early on in this process a number of leading humanities
textbase projects adopted the Guidelines — while they were still very much a
moving target of rapidly changing drafts — as their encoding scheme, identifying
problems and needs and contributing proposed solutions.
In addition, workshops and seminars were conducted to introduce the wider
community to the Guidelines and ensure a steady source of experience to
support continuing development. As more scholars became acquainted with the
Guidelines, comments, corrections, and requests for extensions arrived from
around the world. In the end there were nearly 200 scholars from many
disciplines, professions, and countries in the core group that was developing the
TEI Guidelines.

The TEI Consortium

In January of 1999, the University of Virginia and the University of Bergen


(Norway) presented a proposal to the TEI Executive Committee for the creation
of an international membership organization, to be known as the TEI Consortium,
which would maintain, continue developing, and promote the TEI. This proposal
was accepted by the TEI Executive Committee, and shortly thereafter, Virginia
and Bergen added two other host institutions with longstanding ties to the TEI:
Brown University and Oxford University.

This group then formulated an Agreement to Establish a Consortium for the


Maintenance of the Text Encoding Initiative which was the basis on which a
transition group comprising representatives from the three original sponsoring
organizations of the TEI, as custodians of rights in the TEI, and from the
incoming Host Organizations set about the job of drafting and incorporating the
TEI Consortium during 2000.Incorporation was completed during December of
2000, and the first Board members took office during January of 2001.

The goal of establishing the TEI Consortium was to maintain a permanent home
for the TEI as a democratically constituted, academically and economically
independent, self-sustaining, non-profit organization. In addition, the TEI
Consortium was intended to foster a broad-based user community with sustained
involvement in the future development and widespread use of the TEI
Guidelines. In both of these goals the creation of the Consortium has proven a
positive step. Inasmuch as the original goal of the TEI was to promote
collaborative research on electronic texts, by making the encoding system no
longer an obstacle to such work, the Consortium's efforts are similarly directed
towards making the TEI encoding system as effective a tool for creating,
archiving, and sharing textual data as possible. For its members, the TEI
Consortium provides valuable services to assist them in the creation and use of
digital resources, and to help them stay abreast of rapidly changing technologies
and practices.

Following the establishment of the TEI Consortium, a critical priority was the
release of an XML version of the TEI Guidelines, updating P3 to enable users to
work with the emerging XML toolset. The P4 version of the Guidelines was
published in June 2002. It was essentially an XML version of P3, making no
substantive changes to the constraints expressed in the schemas apart from
those necessitated by the shift to XML, and changing only corrigible errors
identified in the prose of the P3 Guidelines. However, given that P3 had by this
time been in steady use since 1994, it was clear that a substantial revision of its
content was necessary, and work began immediately on the P5 version of the
Guidelines. This was planned as a thorough overhaul, involving a public call for
features and new development in a set of crucial areas including character
encoding, graphics, manuscript description, standoff markup, and the language
in which the TEI Guidelines themselves are written. The P5 version of the
Guidelines is scheduled to be released at the end of 2007.
OBJECTIVE
1) Review notes and documents prepared by Manuscript Description work
group concerning collation.

2) Review the needs and practices of those parts of the TEI community (and
relevant parts of the potential TEI community: i.e. those who would use
the TEI if it included provision for this kind of encoding) likely to use
facilities for encoding collation and physical document structure.

3) Propose a detailed work plan to improve and extend upon the


recommendations currently provided by TEI P4 in these areas. The work
plan will be determined by agreement of the working group but is expected
to address at least the following:
• provision for encoding basic structural information about each page in the
document (i.e. its identification with respect to the collation of the entire
document), this information being associated directly with the individual
page.

• provision for encoding a summary of structural information about the


document as a whole (i.e. an equivalent of a collational formula, encoded
in the TEI header)

• provision for several types of commentary on the physical document


structure (e.g. information, both structured and unstructured, such as
measurements, identification, and description of features of paper or
typography; summaries of printing history; identification of cancels, etc.);
• provision for several types of derived analytical perspectives on the
physical document structure (e.g. reconstructions of individual formes,
bifolia, other higher-order structures) using stand-off markup (e.g. <join> ),
and provision for where this information should be located within the
encoded document.

• in concert with the Manuscript Description workgroup, harmonization of


treatment of collation and physical document structure for printed books
and manuscripts, at least to ensure that no redundant or incompatible
recommendations are made in either section of the Guidelines.

4) Respond to comments on relevant other work that may be routed to this


work group by the editors.
FUNCTION
1) A TEI Header can serve many publics. Headers can be created in a text
center and reflect the center's standards, or they can serve as the basis
for other types of metadata system records produced by other agencies.
Headers can function in detached form as records in a catalog, as a title
page inherent to the document, or as a source for index displays.
2) In addition, a header may describe a collection of documents, a single
item, or a portion of an item. Variances in TEI Header content can result
from making different choices of what is being described.
3) A TEI Header may not have a one to one correspondence with a MARC
record. One TEI Header may have multiple MARC analytic records, or one
MARC record may be used to describe a collection of TEI documents with
individual headers.
4) A TEI Header serves several purposes. It may contain an historical
background on how the file has been treated. It can extend the information
of a classic catalog record. The Text Center and/or cataloging agency can
act as the gatekeeper for creators by providing standards for content.
5) Does the TEI Header act as the electronic title page or as a catalog
record? Is it integral to the document it describes or independent?
Depending on the community being served, the TEI elements will reflect
the interest of that community. Nonetheless, it is possible to describe a set
of "best practices" that will produce compatible content while
accommodating this variety of purposes. Compatibility of content
encourages a more understandable set of results when information about
assorted items is displayed as a set of search results, a contents list, or an
index, and it allows for more reasonable conversion of content information
from TEI tags to elements of other metadata sets when this action seems
advisable.
6) It is a traditional practice of librarianship to agree upon where in a
document and in what order of preference one should look to identify the
title, author, etc., of that document. This permits a certain consistency in
terminology and allows for a certain amount of authentication of content.
We recommend the following preferences to those who create headers
and to those who attempt to use headers to create traditional catalog
records that are compliant with AACR2 and ISBD(ER) rules.
7) As a member of the academic community, the header creator/editor has a
responsibility to verify, whenever humanly possible, the intellectual source
for an electronic document that presents itself without any information
regarding its source or authorship.

http://www-
personal.umich.edu/~jaheim/teiguide.html
BENEFITS
There are several tangible benefits of membership in the TEI Consortium, and
the TEI is in the process of developing additional benefits as well. One of the
most important benefits, which is difficult to quantify, is the fact that support for
the TEI helps ensure that this important community standard will continue to be
available and supported for the future, and that its development keeps pace with
the needs of the text encoding community. Other, more specific benefits, include
the following:

1) TEI annual meeting and conference


The TEI annual meeting and conference is a central event in the TEI
community and an excellent opportunity to meet with other TEI projects
and users and learn more about new developments in the TEI world.
Registration is free to current members and subscribers.

2) Voting in TEI elections


All TEI member institutions have a vote in TEI elections, which is cast by
their designated elector at the TEI annual meeting.

3) Discounts on software
The TEI works to negotiate discounts with vendors of software. Currently
TEI members and subscribers are entitled to a 20% discount on the
popular <oXygen/> XML editor, which comes bundled with TEI schemas
and stylesheets. Members and subscribers may obtain a discount code by
contacting the TEI at membership@tei-c.org.

4) Discounts on training and consultation


TEI members and subscribers are entitled to receive discounts from
participating institutions on TEI training workshops and consultation.
5) Free printed copy of the TEI Guidelines
All TEI members receive a free copy of each new printed release of the
TEI
Guidelines.

The TEI continues to explore additional opportunities for membership benefits,


such as discounts on vendor rates for digitization services. Any new benefits will
be announced on TEI-L and at this site.

http://www.tei-c.org/Membership/benefits.xml?style=printable
CONCLUSION
The above overview hopefully demonstrates the comprehensive nature of the
TEI Header as a mechanism for documenting electronic texts. The emergence of
the electronic text over the past decade has presented librarians and cataloguers
with many new challenges. Existing library cataloguing procedures, while
inadequate to document all the features of electronic texts properly, were used as
a secure foundation onto which additional features directly relevant to the
electronic text could be grafted. Chapter Nine of AACR2 (Anglo-American
Cataloguing Rules) requires substantial updating and revision, as it assumes that
all electronic texts are published through a publishing company and cannot
adequately catalogue texts which are only published on the Internet. The TEI
Header has proved to be an invaluable tool for those concerned with
documenting electronic resources; its supremacy in this field can be measured
by the increasing number of electronic text centres, libraries, and archives which
have adopted its framework. The Oxford Text Archive has found it indispensable
as a means of managing its large collection of disparate electronic texts, not only
as a mechanism for creating its searchable catalogue, but as a means of creating
other forms of metadata which can communicate with other information systems.

Ironically it is the same generality and flexibility offered by the TEI Guidelines
(P3) on creating a header which have hindered the progress of one of the main
goals of the TEI and the hopes of the electronic text community as a whole,
namely the interoperability and interchangeability of metadata. Unlike the Dublin
Core element set, which has a defined set of rules governing its content, the TEI
Header has a set of guidelines, which allow for widely divergent approaches to
header creation. While this is not a major problem for individual texts, or texts
within a single collection, the variant way in which the guidelines are interpreted
and put into practice make easy interoperability with other systems using TEI
Headers more difficult than first imagined. As with the Dublin Core element set,
what is required is the wholescale adoption of a mutually acceptable code of
practice which header creators could implement. One final aspect of the TEI
Header which is a cause of irritation to those creating and managing TEI
Headers and texts; the apparent dearth of affordable and user-friendly software
aimed specifically at header production. While this has long been a general
criticism of SGML applications as a whole, the TEI can in no way be held to
blame for this absence, as it was not part of the TEI remit to create software.
However it has contributed to the relatively slow uptake and implementation of
the TEI Header as the predominant method of providing well structured metadata
to the electronic text community as a whole. Until this situation is adequately
resolved the tools on offer tend to be freeware products designed by people
within the SGML community itself, or large and very expensive purpose-built
SGML aware products aimed at the commercial market.
http://www.slais.ubc.ca/COURSES/libr500/2000-2001-wt1/www/L_Little-Wolfe/tei.htm

1. To specify a common interchange format for machine readable texts.

2. To provide a set of recommendations for encoding new textual


materials. The recommendations would specify both what features are
to be encoded and how those features are to be represented.

3. To document the major existing encoding schemes, and develop a


metalanguage in which to describe them. (from The ACH/ACL/ALLC Text
Encoding Initiative: An Overview by Susan Hockey

You might also like