You are on page 1of 1

ASSIGNMENT 1

Send me the corpus you compiled, along with a document in which you answer the questions below
(attach the files to this assignment).

A) Create a corpus of English texts and analyse it with AntConc.


Requirements:

The corpus must

§ include texts on the same topic or of the same genre (e.g. newspaper articles, Wikipedia articles, any
other texts that have something in common, literary texts – short stories, novels by the same author or of
the same genre)
§ have at least 6,000 tokens (the more, the better)
§ include the texts as separate files (separate subcorpora)
§ be in .txt format (the format processed by AntConc)
§ be cleaned of unnecessary tokens (words that you might copy but that have nothing to do with the text,
e.g. column titles or advertisements from a newspaper website)

AntConc may be downloaded here: http://www.laurenceanthony.net/software/antconc/

Tutorials on how to use AntConc may be found here: https://www.youtube.com/watch?v=O3ukHC3fyuc

B) Answer the following 10 questions to report your results:

1) What types of texts did you include in your corpus? Number of types and tokens in your corpus without a stop
list.

2) Number of types and tokens in your corpus with a stop list.

3) Type/token ratio without a stop list & type/token ratio with a stop list.

4) What does the type/token ratio (with a stop list) say about the lexical variety in your corpus?

5) List the top 10 most frequent lexical types in your corpus (mention their frequencies). What do they say about
your corpus? What parts of speech are they? (check how they are used in the concordancer)

6) Use the lemma list provided by me and list the forms of the top 10 lexical types your corpus.

7) Take the three most frequent lexical types in your corpus and see what collocations they form (1st left and 1st
right collocates for each type). Draw a list and then interpret the results in a few lines.

8) List the top three 3-word and 4-word clusters formed by the third most frequent lexical type in your corpus.

9) What are the 3 most frequent 3-grams, 4-grams and 5-grams in your corpus? List them.

10) What does this brief analysis show about the texts in your corpus? Did it help you find out anything new about
your texts? Have your findings confirmed your initial expectations about your texts? Please comment in 10 to
15 lines (200-250 words).

You might also like