You are on page 1of 12


The Adelaide Text Analysis Tool explained

Concordancing software developed at

The University of Adelaide, Australia.
A d TAT is an easy-to-use, cross-platform tool capable of working with
collections of written text (corpora) in the following ways.

• It can conduct basic word and phrase searches.

• It can conduct associated word and phrase searches.

• It can provide frequency lists of words appearing

both left and right of search terms.

• It can print and save results.

• It can assist you in constucting corpora.

• Corpora built with A d TAT can be saved to disk for later use.

The following instructions make the software very simple to use, and
feature many screen-shots to illustrate a step-by-step introduction to
both the software and the concordancing process.

Minimum requirements: A d TAT will run in Windows 2000 or later,

Macintosh OS 10.4 or later, but requires Java version 1.5 installed.
Minimum RAM required is 512MB, but 1 GB RAM is recommended.

Acknowlegements: The development of this software was funded

by a University of Adelaide Teaching Development Grant awarded to
the Faculty of Sciences in 2007.

The development team consisted of Dr Jennifer Watling,

Margaret Cargill, Dr Ian Green, Ray Adams and Andrew Hall.

Table of Contents
Opening a file or corpus . . . . . . . . . . . . . . . . . . . . . . . . .1

Searching for a word . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

Refining your search results . . . . . . . . . . . . . . . . . . . . . .4

Controlling the search term . . . . . . . . . . . . . . . . . . . . . . . . . .4

Controlling the output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

Associated Word Search . . . . . . . . . . . . . . . . . . . . . . . . .5

Working with results . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

Making a corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

Preparing text for concordancing . . . . . . . . . . . . . . . . . . . . . .7

PDF documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

Editing text from PDF documents . . . . . . . . . . . . . . . . . . . . . .9

Not all PDFs can be easily copied . . . . . . . . . . . . . . . . . . . . .9

How to use A d TAT
The Adelaide Text Analysis Tool explained

Opening a file or corpus

To use this software you will need a corpus, or collection of text. A corpus can be a
single text file or a group of text files (.txt format). When you have some text for your
corpus, open the A d TAT package to begin. You should see a screen like this:

Click the Load

File(s) button.

This will take you to

an Open File window.
You can navigate to
the place on your
computer’s drive
where you have
saved a corpus of text
ready to explore with
the software.

Double-click on a
folder to open it. 1
The Adelaide Text Analysis Tool explained

Decide whether you

want to search a
single text file, several
files, or a folder
containing a number
of files.

You can choose a

single file by clicking
on the filename.

You can select a

number of files within
a folder by holding
down the Control key
(Command key on a
Macintosh) as you
select files.

You can also choose a

single folder which
contains a number of
text files.

Make your selection

and click the Load
File(s) button at the
bottom of the Open
File window.

Corpus details will

then appear in the
Corpus description

You can enter

a name or
here if you
want to save
the corpus (in
File Menu) for
later use. This
is optional. 2
The Adelaide Text Analysis Tool explained

Searching for a word

One of the simplest functions of A d TAT is a basic word search. This software will
search a loaded corpus, finding every occurrence of the search term to show you
other words that appear around it: the collocates used by the writers of the text in your

Choose the Basic

Search tab, and you will
see a window like this:

Enter the word you wish

to search for (called a
search term or keyword)
and click the
Basic Search button.

Search Results
If your keyword is found, This process helps you to Select a concordance and
the program will generate see which words are this panel shows where
a concordance list. commonly grouped the word appears in the
together (collocates) in source text, giving the
the text(s) being original context.

The Adelaide Text Analysis Tool explained

Refining your search results

Controlling the search term

A d TAT can search for phrases as well as words.
You can also specify parts of words you want to find:
a string that starts with, ends with or contains the
characters that you enter.

This example
shows the results
of a search for all
words that
contain “differ”.

Controlling the output

In the Basic Search window, you can also
change the Line width, allowing you to see
more of each line of text in the concordances.
The width default is 60 characters but
sometimes it is useful to see more of the text
surrounding your keyword.

Searches for common words can find too many

results, causing confusion and making it hard to
see useful examples. If a search results in too
many concordance examples of the keyword,
try limiting your attention to a
smaller number by changing the
Maximum concordances setting.

You can change the way concordances are

sorted, depending on whether you want to see
collocates which appear to the left or right of
the keyword. Pull down the Sort type menu to
change this setting. 4
The Adelaide Text Analysis Tool explained

By default, the lines of output are numbered like

some of the examples in these instructions,
making it easier to refer to specific concordances
in notes and discussions. If you do not want the
lines numbered, click to remove the tick
from the box marked Numbering.

By default, the software displays collocates which

appear alongside the keyword. If you want to pay
attention to words which are 2, 3, or 4 words away
from the keyword, this can be set using the
Collocate distance from keyword setting.

Associated Word Search

An Associated Word Search finds occurrences of

two words or phrases which are close together but
not necessarliy alongside each other.

This function is available by selecting the

Associated word search tab and is useful for
finding or demonstrating words that may be
grammatically linked but do not necessarily appear
side by side in a sentence. Examples of such
searches might be those for terms like
“discuss” [something] … “with”, or “not”
[something], “but” [something else].

Enter both search terms and click the

Associated Word Search button.

This example shows results of a search for

“not” with the Associated term, “but”:

The Adelaide Text Analysis Tool explained

Working with results

You can choose to remove a

concordance from your search results by
selecting the line, then clicking the
Delete Concordance button.

The frequency of collocates can be

found by clicking the Display button in
the Collocates frequency box below your
search results. A search can be made for left or
right collocates by changing the Sort Type
selection. After an Associated Word Search,
collocates of the primary search term are
displayed, not collocates of the associated term.

The example here shows the right collocates for

the word “human” found in a sample corpus of
business writing, after a Basic Word Search.
Frequencies are by default sorted by frequency,
but you can click on the column heading, Word, to
re-sort the list alphabetically by the words listed.

Note: A word frequency list for the entire corpus

can be found by pulling down the Corpus menu
and selecting Word Frequency. This allows you to
see the kind of vocabulary most commonly used in
your corpus.

Clicking the Save Results button below the

concordance pane will allow you to save the
current seach result to disk for later reference.

Clicking the Print Results

button will output the
current search result to
your computer’s default
printer. 6
The Adelaide Text Analysis Tool explained

Making a corpus

The best way to use the concordancing process is first to gather a collection of
articles which are relevant to the kind of writing you want to investigate. If you want to
examine research articles in a particular discipline, for example, a useful corpus would
consist of published articles from that discipline. This would allow you to search for
language features that are commonly used in this kind of writing.

The size of a corpus depends on the searches you intend doing. There are drawbacks
to having too little text in your corpus as you may not find enough examples of little-
used terms and expressions. Similarly, a corpus which is too large can result in too
many examples, especially of common words, to allow an easy evaluation of
language features.

In trials of concordancing software during the development of this package, it was

found that about 20 published journal articles, totalling around 100,000 words, made a
suitable corpus for examining the terms and language features used in writing in
particular disciplines of science.

The following steps will help you to contruct your own corpus quickly and easily:

1. Make sure the documents you want to use are written in current English, with
standard usage of prepositions, articles, verb tenses and other grammatical
features. This requirement can be covered by selecting articles for which at
least some of the authors are likely to be “native speakers” of English, and
ensuring that the articles are from a reputable source – check author and
publisher information, as well as the text itself, for guidance on this.
2. Obtain electronic copies of the articles and save only the text (sentences and
paragraphs, no page numbers, headers or footers, tables and figures), and
save them as text files (.txt format). Your sources may be web pages, PDF
documents or word processor files. See the following section, Preparing text
for concordancing, for more details on coverting text from these sources.
3. Save all the .txt files in a single folder on your computer.

Preparing text for concordancing

If you receive text in the form of a Microsoft® Word document, simply open the file and
save it as a ‘text only’ file. This will give it a ‘.txt’ extension. It should be saved in a
folder that you intend to use as your corpus.

If you are copying text from web pages, select all the text, copy and paste
immediately into a word processor document. If you are using a hard copy, you need
to scan the document and save it as a text only document.

The Adelaide Text Analysis Tool explained

PDF documents
When using PDF-based texts, software such as Omnipage or PDF2Text can be used
for conversion. If you use this method, follow the instructions provided wth the

A manual, copy-and-paste procedure can be used to convert PDF documents to the

‘text only’ format. This may require some trial and error at first, but involves a few
relatively simple steps:

1. Download the file (if it is online).

2. Open the file in Adobe® Reader, or similar PDF-reader software.
3. Open your word processor (such as MS Word®) and start a new document.
4. Select the PDF text to be copied, one page or column at a time. Care should
be taken not to copy the headers, footers and page numbers. Also, do not
copy authors' names, tables, figures, reference lists or acknowledgements, as
these will all have features which are not usually found in prose – the
sentences and phrases that you may require for models of written language.
5. Copy text (Control-C in Windows, Command-C on a Macintosh)
from the PDF document.
6. Paste the text into the new word processor document (Control-V in Windows,
Command-V on a Macintosh). Do this one page or column at a time, repeating
steps 4–6 for each part until the whole document has been copied across to
the new word processing document.
7. Select the ‘Save as…’ option.
8. Choose ‘Text only’ as the file format and name the file before saving.
9. Edit the text file as necessary (see the following section: Editing text
from PDF documents).

Take care not to

copy headers and
footers when
copying text from
PDF documents.
Select only the
text you want to
copy into your
new text file. 8
The Adelaide Text Analysis Tool explained

Editing text from PDF documents

As mentioned in step 4 above, care should be taken when copying from PDF
documents not to copy unwanted text into the new text file.

The easiest method in the long term is to copy text from one page or column at a time
and paste each ‘capture’ into the new word processor file. Repeat this process, then
later ‘repair’ the text so that it is restored to its original continuous flow by ensuring
that there are appropriate breaks between words, sentences and paragraphs. This
avoids copying the unwanted parts from the beginning of the process and also
provides a text document which will display correctly in A d TAT .

The process of text conversion and editing seems complex at first, but becomes an
almost mechanical routine with practice.

Not all PDFs can be easily copied

Some PDF files have security measures embedded to prevent copying, primarily for
reasons of copyright protection. If the copy-and-paste process in steps 4–6 above do
not work, this is usually the cause. Some are also scanned image files and require
Optical Character Recognition software to convert them to text.

Windows, MS Word and Microsoft Word are registered trade marks of Microsoft Corporation.
Macintosh is a registered trade mark of Apple Inc.
Omnipage is a registered trade mark of Nuance Communications, Inc.
PDF2Text is a registered trade mark of Retsina Software Solutions.
Adobe Reader is a registered trade mark of Adobe Systems Incorporated.
The above products are referred to in this documentation for information purposes only.
AdTAT is not affiliated with, nor has it been authorized, sponsored, or otherwise approved by
the organisations responsible for the products mentioned. The developers of AdTAT likewise
do not intend such references to be taken as endorsement of these products. 9

You might also like