You are on page 1of 5

TASK 1

Interpreting collocation graphs


a. Search for the word interest using the following settings: Span L5-R5, Statistics MI2, Statistic
cut-off value 8.0, Collocation frequency 5.0, Lemma. Look at the collocation graph. What are the
three strongest collocates of the word interest? (Use the mouse wheel to zoom; Ctrl + / – to
change the font size) What are the six most frequent collocates?
b. Click on the view option on the top right corner and select ‘Positional’. What are the strongest
collocates that appear to the left of the node?
c. Now change the view option to ‘Word Class’.
What prepositions collocate with interest? What adjectives collocate with interest?

Higher – 6 times

In – 93 times
Interest in – 67

1
Of – 58
Interest of – 11

Of interest – 24

Public – 5 times

2
TASK 2
Comparing association measures
a) Search for the word interest using different association measures. Use the default collocation
settings. Complete the table below with the top ten collocates of interest for each association
measure.

FREQUENCY MI MI2 LOG LIKELIHOOD LOG DICE

b. How do the different sets of collocates compare with each other? What type of words do they
include (grammatical or lexical words, frequent or infrequent)? Is there a preferable association
measure?

TASK 3
Interpreting collocation networks a) Create a graph for the word time using the following
settings: Span L5-R5, Statistics MI, Statistic cut-off value 5.0, Collocation frequency 5.0, Type.
Find the first-order collocate spend in the graph and double click on it. Find the second-order
collocate money in the graph and double click on it. Comment on the connection between time
and money that you can see in the collocation network.

3
TASK 4

In the following tasks, we use the LOB corpus which is provided with #LancsBox. Collocation
settings GraphColl produces collocations tables and graphs. You can search for the node and its
collocates after selecting the appropriate settings: Span: how many words to the left (L) and to
the right (R) of the search term (node) are being considered when searching for collocates.
Statistics: the association measure used to compute the strength of collocation. Threshold: the
minimum frequency and statistics cut-off values for an item (word, lemma, POS) to be
considered a collocate. Corpus: the corpus that is being searched.
Try searching for collocations in #LancsBox

1. Load the LOB corpus into #LancsBox: Download the LOB corpus; Import the corpus
2. Go to the GraphColl tool in #LancsBox and start searching/ Open the GraphColl tool
3. Follow the instructions
4. Personalise the COLLOCATION SETTINGS
5. Type the search term and click ‘Search’

Find the collocates of a word or phrase.


■ Find colligations (co-occurrence of grammatical categories).
■ Visualise collocations and colligations.
■ Identify shared collocates of words or phrases.
Type: the unit (type, lemma, part of speech [POS] tag) used for collocates.

TASK 5
Using any corpus (e.g. Brown/LOB using #LancsBox) look for words ending ‘ly’: *ly
How many are adverbs? How many are nouns? Analyse two pages of results only.
Deliverable: table containing categories
WHELK is used to find absolute and relative frequencies of the search terms in the corpus files.
NGRAMS tool = to identify lexical bundles

4
5

You might also like