You are on page 1of 0

Stefan Th.

Gries Quantitative corpus linguistics with R: a practical introduction


Morphology 1: suffixation in English: -ic vs. -ical adjectives
A problematic phenomenon in English is adjective suffiation with !ic and !ical. "n the one
hand# it is difficult to detect an$ pattern governing the distribution of suffies: when does an
adjective end in !ic onl$ %cf acrobatic&'acrobatical( and when does it end in !ical onl$
%'zoologic&zoological() *archand %+,-,( suggested that words in wider common use tend to end
in !ical. "n the other hand# when an adjective root does ta.e both suffies %electric(al)#
historic(al) etc(# this raises the /uestion of whether the two forms that constitute a pair are in fact
s$non$mous. 0t is well .nown that some such adjective pairs come with clear meaning
differences: politic %1artful# craft$# prudent1( and political %1having to do with politics1(# economic
%1having to do with economics1( and economical %1mone$!saving1( as well as historic %1famous#
memorable1( and historical %1pertaining to histor$1( are eamples in point. 2owever# there
are ver$ man$ cases where the distinction is far from clear# eamples include problematic(al)#
symmetric(al)# geometric(al)# and man$ more.
0n this case stud$ of this boo.# $ou will focus on a ver$ small issue# namel$ *archand1s
claim /uoted above that adjectives ending in !ical tend to be in wider common use# but $ou will
approach this issue from two different perspectives. The first of these will be dealt with in
Assignment + to 3: $ou will test whether adjectives in !ic are on average less fre/uent and less
widel$ dispersed than their corresponding counterparts ending in !ical.
Assignment 1
"n the basis of the above characteri4ation# formulate the alternative h$potheses and null
h$potheses in tet form and in statistical form.
5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignment+8icical.r9 and
compare $our solution with it.
0n order to retrieve the fre/uencies of such adjectives $ou would need a larger corpus than can be
provided with this boo.. 2owever# $ou can download fre/uenc$ lists from the :;7 <ersion +
that are available for download from the internet.
Assignment 2
=ownload the complete fre/uenc$ list of the :;7 <ersion + at
6http:&&www..ilgarriff.co.u.&:;7lists&all.al.g49. >n4ip the file %with# for eample# ?!4ip@ cf. the
Appendi for the lin.( and open it with a tet editor %e.g.# Tinn!R or SciTE(. This is how it
should loo. li.e:
100106029!!WHOLE_CORPUS!!ANY4124
1!*?*unc1
602%nn0113
1%/100unc1
3%/!"unc1
1%29#unc1
1%#$000unc1
An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group +
Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction
Replace the first line b$ the following line E
%RE&UENCYWOR'POS%(LES
E and save the changed file into 67:&8/clwr&8inputfiles&corp8bnc8sgml8fre/l.tt9. ;ote b$ the
wa$ that it is often not possible to load this file into spreadsheet software because this file has
more than ,3F#BBB rows.
5rite a script that has the following characteristics and performs the following operations:
%i( The script prompt the user to load that file into a data frame called freqs and ma.es the
columns available as variable names. ;ote: there are man$ different characters in the file
which ma$ interfere with R1s default settings. *a.e sure $ou do not forget to
! specif$ the right separator@
! specif$ $ou want no character for comments %comment.char in R(@
! specif$ $ou want no /uote character %/uote in R(
%7onsult the help on read.table if necessar$.(
%ii( Retrieve from the file all words that end in !ic and that are tagged as adjectives %use onl$
the eact tag G6w AHB9G# no portmanteau tags( and their fre/uencies and as well as
numbers of files in which the$ occur.
%iii( Retrieve from the file all words that end in !ical and that are tagged as adjectives %use
onl$ the eact tag G6w AHB9G# no portmanteau tags( and their fre/uencies and as well as
numbers of files in which the$ occur.
5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignmentA8icical.r9 and
67:&8/clwr&8outputfiles&morpholog$8+8assignmentA8icical.R=ata9 and compare $our solution
with them.
Assignment 3
"n the basis of these data#
%i( represent the distributions of adjective fre/uencies and adjective file occurrences
graphicall$ with bo plots %for reasons that $ou will understand when $ou loo. at the first
output# 0 recommend using a logarithmic scaling of the y!ais@ enter ?boxplot at the R
prompt(@
%ii( anal$4e the distributions statisticall$ and interpret the findings.
5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignment38icical.r9#
67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plot+.png9#
67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plotA.png9# and
67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plot3.png9 and compare $our solution with
them.
An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group A
Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction
Second# in Assignment I to - $ou will test whether the nouns immediatel$ preceded b$
adjectives in !ic are on average less fre/uent than the nouns immediatel$ preceded b$ adjectives
in !ical.
Assignment
"n the basis of the above characteri4ation# formulate the alternative h$potheses and null
h$potheses in tet form and in statistical form.
5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignmentI8icical.r9 and
compare $our solution with it.
Assignment !
5rite a script that has the following characteristics and performs the following operations:
%i( The script prompts the user to choose a corpus file to search
%67:&8/clwr&8inputfiles&corp8bnc8sgml8+.tt9(.
%ii( The script opens each corpus file# retrieves onl$ the lines with sentence numbers# deletes
unwanted annotation %all tags other than J"S tags( and converts them all into lower case.
%iii( The script retrieves all se/uences of
! an adjective tag %use the eact tag G6w AHB9G and portmanteau tags where GAHBG
is the first part of the portmanteau tag(@
! a word that ends in !ic or in in !ical;
! a noun tag %use all tags beginning with an G;G(@
! a word.
%iv( The script etracts all noun to.ens from these matches# separatel$ for !ic adjectives and
for !ical adjectives.
%v( Dor each noun collocate t$pe %not to.enK( of one of the two adjective groups# the script
searches in the :;7 fre/uenc$ list for all identical word forms that are tagged as nouns
%as defined above(# sums up the fre/uencies that are found for these# and stores the sum in
a vector for the statistical anal$sis to follow. %;ote: this can be done with loops and this is
ma$be easier at first# but ultimatel$ should tr$ to get a solution with vector functions to
wor..
5hen $ou are done# load 67:&8/clwr&8scripts&morpholog$8+8assignmentL8icical.r9 as well as
67:&8/clwr&8outputfiles&morpholog$8+8assignmentL8icical.R=ata9 and compare $our solution
with them.
Assignment "
"n the basis of these data#
%i( represent the distribution of adjectives1 collocates fre/uencies graphicall$ with a bo plot
%for reasons that $ou will understand when $ou loo. at the first output# 0 recommend
using a logarithmic scaling of the y!ais@ enter ?boxplot at the R prompt(@
%ii( anal$4e the distribution statisticall$ and interpret the findings.
5hen $ou are done# load 67:&8/clwr&8scripts&morpholog$8+8assignment-8icical.r9 and
An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group 3
Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction
67:&8/clwr&8outputfiles&morpholog$8+8assignment-!plot.png9 compare $our solution with
them.
Dor further stud$&eploration: Maunisto %+,,,# ABB+(# Gries %ABB+# ABB3b(# and the references
cited there.
An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group I

You might also like