Gries Quantitative corpus linguistics with R: a practical introduction
Morphology 1: suffixation in English: -ic vs. -ical adjectives A problematic phenomenon in English is adjective suffiation with !ic and !ical. "n the one hand# it is difficult to detect an$ pattern governing the distribution of suffies: when does an adjective end in !ic onl$ %cf acrobatic&'acrobatical( and when does it end in !ical onl$ %'zoologic&zoological() *archand %+,-,( suggested that words in wider common use tend to end in !ical. "n the other hand# when an adjective root does ta.e both suffies %electric(al)# historic(al) etc(# this raises the /uestion of whether the two forms that constitute a pair are in fact s$non$mous. 0t is well .nown that some such adjective pairs come with clear meaning differences: politic %1artful# craft$# prudent1( and political %1having to do with politics1(# economic %1having to do with economics1( and economical %1mone$!saving1( as well as historic %1famous# memorable1( and historical %1pertaining to histor$1( are eamples in point. 2owever# there are ver$ man$ cases where the distinction is far from clear# eamples include problematic(al)# symmetric(al)# geometric(al)# and man$ more. 0n this case stud$ of this boo.# $ou will focus on a ver$ small issue# namel$ *archand1s claim /uoted above that adjectives ending in !ical tend to be in wider common use# but $ou will approach this issue from two different perspectives. The first of these will be dealt with in Assignment + to 3: $ou will test whether adjectives in !ic are on average less fre/uent and less widel$ dispersed than their corresponding counterparts ending in !ical. Assignment 1 "n the basis of the above characteri4ation# formulate the alternative h$potheses and null h$potheses in tet form and in statistical form. 5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignment+8icical.r9 and compare $our solution with it. 0n order to retrieve the fre/uencies of such adjectives $ou would need a larger corpus than can be provided with this boo.. 2owever# $ou can download fre/uenc$ lists from the :;7 <ersion + that are available for download from the internet. Assignment 2 =ownload the complete fre/uenc$ list of the :;7 <ersion + at 6http:&&www..ilgarriff.co.u.&:;7lists&all.al.g49. >n4ip the file %with# for eample# ?!4ip@ cf. the Appendi for the lin.( and open it with a tet editor %e.g.# Tinn!R or SciTE(. This is how it should loo. li.e: 100106029!!WHOLE_CORPUS!!ANY4124 1!*?*unc1 602%nn0113 1%/100unc1 3%/!"unc1 1%29#unc1 1%#$000unc1 An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group + Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction Replace the first line b$ the following line E %RE&UENCYWOR'POS%(LES E and save the changed file into 67:&8/clwr&8inputfiles&corp8bnc8sgml8fre/l.tt9. ;ote b$ the wa$ that it is often not possible to load this file into spreadsheet software because this file has more than ,3F#BBB rows. 5rite a script that has the following characteristics and performs the following operations: %i( The script prompt the user to load that file into a data frame called freqs and ma.es the columns available as variable names. ;ote: there are man$ different characters in the file which ma$ interfere with R1s default settings. *a.e sure $ou do not forget to ! specif$ the right separator@ ! specif$ $ou want no character for comments %comment.char in R(@ ! specif$ $ou want no /uote character %/uote in R( %7onsult the help on read.table if necessar$.( %ii( Retrieve from the file all words that end in !ic and that are tagged as adjectives %use onl$ the eact tag G6w AHB9G# no portmanteau tags( and their fre/uencies and as well as numbers of files in which the$ occur. %iii( Retrieve from the file all words that end in !ical and that are tagged as adjectives %use onl$ the eact tag G6w AHB9G# no portmanteau tags( and their fre/uencies and as well as numbers of files in which the$ occur. 5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignmentA8icical.r9 and 67:&8/clwr&8outputfiles&morpholog$8+8assignmentA8icical.R=ata9 and compare $our solution with them. Assignment 3 "n the basis of these data# %i( represent the distributions of adjective fre/uencies and adjective file occurrences graphicall$ with bo plots %for reasons that $ou will understand when $ou loo. at the first output# 0 recommend using a logarithmic scaling of the y!ais@ enter ?boxplot at the R prompt(@ %ii( anal$4e the distributions statisticall$ and interpret the findings. 5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignment38icical.r9# 67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plot+.png9# 67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plotA.png9# and 67:&8/clwr&8outputfiles&morpholog$8+8assignment3!plot3.png9 and compare $our solution with them. An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group A Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction Second# in Assignment I to - $ou will test whether the nouns immediatel$ preceded b$ adjectives in !ic are on average less fre/uent than the nouns immediatel$ preceded b$ adjectives in !ical. Assignment "n the basis of the above characteri4ation# formulate the alternative h$potheses and null h$potheses in tet form and in statistical form. 5hen $ou are done# load the file 67:&8/clwr&8scripts&morpholog$8+8assignmentI8icical.r9 and compare $our solution with it. Assignment ! 5rite a script that has the following characteristics and performs the following operations: %i( The script prompts the user to choose a corpus file to search %67:&8/clwr&8inputfiles&corp8bnc8sgml8+.tt9(. %ii( The script opens each corpus file# retrieves onl$ the lines with sentence numbers# deletes unwanted annotation %all tags other than J"S tags( and converts them all into lower case. %iii( The script retrieves all se/uences of ! an adjective tag %use the eact tag G6w AHB9G and portmanteau tags where GAHBG is the first part of the portmanteau tag(@ ! a word that ends in !ic or in in !ical; ! a noun tag %use all tags beginning with an G;G(@ ! a word. %iv( The script etracts all noun to.ens from these matches# separatel$ for !ic adjectives and for !ical adjectives. %v( Dor each noun collocate t$pe %not to.enK( of one of the two adjective groups# the script searches in the :;7 fre/uenc$ list for all identical word forms that are tagged as nouns %as defined above(# sums up the fre/uencies that are found for these# and stores the sum in a vector for the statistical anal$sis to follow. %;ote: this can be done with loops and this is ma$be easier at first# but ultimatel$ should tr$ to get a solution with vector functions to wor.. 5hen $ou are done# load 67:&8/clwr&8scripts&morpholog$8+8assignmentL8icical.r9 as well as 67:&8/clwr&8outputfiles&morpholog$8+8assignmentL8icical.R=ata9 and compare $our solution with them. Assignment " "n the basis of these data# %i( represent the distribution of adjectives1 collocates fre/uencies graphicall$ with a bo plot %for reasons that $ou will understand when $ou loo. at the first output# 0 recommend using a logarithmic scaling of the y!ais@ enter ?boxplot at the R prompt(@ %ii( anal$4e the distribution statisticall$ and interpret the findings. 5hen $ou are done# load 67:&8/clwr&8scripts&morpholog$8+8assignment-8icical.r9 and An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group 3 Stefan Th. Gries Quantitative corpus linguistics with R: a practical introduction 67:&8/clwr&8outputfiles&morpholog$8+8assignment-!plot.png9 compare $our solution with them. Dor further stud$&eploration: Maunisto %+,,,# ABB+(# Gries %ABB+# ABB3b(# and the references cited there. An introduction published ABB, b$ Routledge# Ta$lor C Drancis Group I