You are on page 1of 3

University of Bahrain

Department of English Language and Literature


Conference and Seminars Committee
2016-2017

Using Corpora for Language and Linguistic Research

1. Go to http://corpus.byu.edu/coca/
2. Register, so that you can long in every time you need to use the corpus.
3. Go to ACCOUNT  Usage Limits, to see how many queries you are allowed per day depending on your
user status.
4. Check out corpus information by clinking on these tabs. This will give you information about the size of the
corpus, and the different genres included in it, etc.

Exercise 1: Learn the basics

5. Go to SEARCH, and type the word nice, then hit find matching strings.
6. Check out the FREQ of the word, then tick the box next to the word to retrieve all the contexts where the
word has been used.

7. Notice how many pages of results there are.

8. Also notice where each context has been retrieved from, and from what year.
9. Notice also that you can download a random sample from the corpus consisting of 100 – 200 – 500 – 1000
words. You may choose your data sample size, hit the button, then copy and paste the contexts into an excel
sheet.
10. If you’re interested in particular contexts you can save them in a list that you can go back to later. You need to
provide a title for your list – read more about the function of save list under HELP.

11. You can find out more about the distribution of a word (or structure) in the different genres within the corpus
by clicking on Chart.

12. You may also search for strings of words. Type lose weight and see what you can get.
13. REMEMBER: always hit reset before you start a new search .

DELL – March 2017 – Dr. Dana Abdulrahim Page 1


Exercise 2: Find out the most frequent collocates of a word

14. Hit the Collocates button, and then type nice in the given search box, then hit Find collocates.

15. You can limit your search to particular words that collocate with your search word, which you need to provide
in the second box. Again remember, if the corpus gives you error messages, hit the reset button to make sure
you don’t give any conflicting search commands.
16. Notice that you can specify the part of speech of the collocates you’re interested in, as well as the
distance/location of that word in relation to the KWIC (i.e. before or after the KWIC, two/three/four words
after the KWIC, etc.)
17. You may choose the part of speech that you’re interested in of the collocates. What do you find when you type
the following?

18. What do you think the following search command is looking for?

Exercise 3: Compare between the use of two words

19. If you need to check out the difference between synonymous words, you can do that by hitting the Compare
button, and typing both words in the given search boxes.

20. Notice how the results change when you change the conditions of the search (e.g. part of speech of the
collocates, distance of the collocate, etc.)

DELL – March 2017 – Dr. Dana Abdulrahim Page 2


21. The results table helps you compare between uses of these words by highlighting the number of times each of
the search words was found to collocate with a given word. Hit W1 and W2 to find specific contexts of use.

Exercise 4: Looking up lemmas vs. inflected forms

22. So far we’ve experimented with inflected forms. However, if you’re searching for a verb, e.g. ‘go’ and would
like to retrieve contexts containing all inflected forms of ‘go’: go, went, gone, going, goes, etc. you need to
type your search word in square brackets.

23. The lexical form in brackets is the lemma (i.e. the un-inflected form of the lexical item, listed in the
dictionary) and typing it in the search box will yield all of its inflected forms; whereas when we remove the
brackets in our search we are looking for particular inflected forms.
24. you can try that again now with [nice]. What inflected forms can you find?

Exercise 5: Using partial constructions to search for varieties

25. Type *more in the search box and examine the frequency tables. Now type more* and see what you get.
26. Type more * than and again check out the frequency table. This is a partial construction where the
asterisk (*) indicates a missing part that the corpus fills in with existing words/morphemes.
27. What does the following string mean and what outcomes do you expect the corpus to provide?

Now that you’ve learned the basics of corpus search, go ahead and have fun experimenting with other lexical
items/constructions!

DELL – March 2017 – Dr. Dana Abdulrahim Page 3

You might also like