You are on page 1of 10

2008 3 M arch 2008

31 2 Journal of Fore ign Languages V o.l31 N o. 2


: 1004- 5139( 2008) 02- 0023- 10 : H 0- 06 : A



( , 200030)

: ,
,
,
, / 0/
0/ 0/ 0 , :
; ,
,
: ; ; ; ; ;

The Firth ian Foundations of Corpus L ingu istics


WE IN ai-x ing
( Na tura l Language P rocess ing Institute, Shanghai Jiao Tong Un ive rsity, Shanghai 200030, C hina)

Abstrac tB T his paper se ts out to exp lore the theo re tica l foundations o f co rpus linguistics by discussing ma jor
theoretical po sitions and m ethods featuring in F irth ian lingu istics. It ho lds that F irth ian linguistics is cha racte rized by
its em pir ic ist ep istem o log ical v iew in ph ilosophy, its soc ia l v iew of language, its m onist po sition in language stud ies
and its obse rvable da ta-based resea rch pa radigm and re la ted m ethods. T he N eo-F irthians have continued along and
developed further the F irth ian line o f think ing, and established a set o f theo ries and m ethods, includ ing / the centra l
p lace o f lex is in language stud ies0, / the unity of fo rm andm eaning0, / the notion o f ex tended unit of mean ing0, and
/ a languag e- interna l stra tegy in sem antics0. T he pape r deals w ith these re lated issues respectiv ely and, a lso, spe lls
out dev elopmenta l linkages among them, wh ich leads to the conc lusion that F irth ian theor ies, N eo-F irthians. v iew s
and m ethods inc luded, constitute a v ita l com ponent o f the founda tions of corpus lingu istics.
K ey word sB em pir ic ist pa radigm; mon ism; neo- firth ians; lex is; ex tended unit of mean ing; language- in ternal strategy

0. Palm er, R. H. Rob in s, R. K. Sprigg, E ileen


J. R. F irth , W h itley F irth
, ( F irth ian lingu istics),
Saussu re, ( F irth ian s) F irth ,
Bloom field , M ichael H alliday
, Chom sky John S in cla irA ngus M cIn toshP eter S trevens,
F irth , ,
Sydn ey A llen, Jack Carnochan, ,
Eugen ie H enderson, T. F. M itchel,l F. R. , H alliday

23
, S in clair

( N eo-F irth ians);
, 1.


20 70 , , ( Emp iricism ) ( Ep iste-
, mology) ,
F irth ; / ? 0 /
, ? 0
17 18
, John Lock, G eorge
( Post-F irth ian Berk eley D avid H um e,
Corpus L ingu ists) , Fran cis Bacon A ristotle
John S in clair, M ichael 5 6 : Emp ir-i
S tubbs, M ich ael H oey, A nto in ette Renou ,f cism: ph ilosophy, the theory that all know ledge is
W o lfgang T eub ert derived from sense-experience [ 1: 605 ],
,
,
( R ation alism )
; Rene D escartes, Gottfried W ilhelm
von L eibn iz Baruch Sp inoza 5
, 6: R at ion alism: the theory that rea-
son rather than experien ce is the foundation of cer-
, John tainty in know ledge[ 1: 1539 ],
S incla ir
, 5M IT 6
:
,
R ation alism and emp iricism are views of th e
,
,
F irth ,
, F irth
, S in clair 20 60
H alliday , ,
London-Ed inburgh-B irm ingh am-Sydn ey


Jan Svartv ik : / Corpora are becom ing the m ain-
;
stream0, Svartvik, J. 1996. C orpora are becom ing m ain-
, stream [ A ]. in J. Thom as & M. Short ( eds) . U sing C orpora
, , for Languag e R esearch: S tud ies in honou r of Geoff rey L eech. 3-
13. London: Longm an.

, G. Leech Lancaster

,

24
n atu re of human know ledge. Broad ly speak ing, , F irth
em p iricists hold that all ou r know ledge derives : / A ttested language ...
from ou r sen sory, experien tia,l or emp irical du ly recorded is in the focus of atten tion for th e
in teraction w ith the world. R ation alist, by lingu ist0 [ 4: 29 ]
con trast, ho ld the negat ion of th is, th at there is F irth H alliday
som e know ledge that does not derive from S inclair, 1966
experien ce. [ 2: 2000: xv i] [ 5: 148 - 162; 6: 410 -
430] , S inclair
/ 0
London-Ed inbu rgh Corpus ,
( xv) xxxv ) ,

,
[ 7: 21- 54 ] ,


; ,
F irth ,
F irth Bacon
F irth ,



,


F irth ,
Chom sky
,
,
:
,
;
;
, ;


,
,
;

,

, Chom sky
,

, ,
( Chom sky
Chom sky
)


,
,
( S tubbs,

, [ 3:
28 ] ) F irth /
2. :
0 ( A ttested data) , S inclair

/ 0 ( N atu rally occurring data );
F irth M alinow sk i
,
( T ex tu al data) F irth
London - Ed inburgh Corpus
, : T he
, 135, 000
farm er k ills th e duck ling ( [ 3: 30 ] ) , 12, 000
, 1960, 1974

25
, , ,
/ 0 , ,
/ 0 ,
[ 4: 19 ]: , ,
,
A s w e know so little abou t m ind and as our
,
study is essen tially socia,l I sh all cease to respect
,
the duality o f m ind and body, though t and w ord,
, ,
and be satisfied w ith th e whole m an, th ink ing and

acting as a who le, in associat ion w ith h is fellows.

, , ,
, , ,
, ,
, ,
, S inclair
, , / 0
[ 10: 110 ], A itch ison
, / 0, / 0
F irth, [ 11: 121 - 2 ]
[ 8 ] ,
T eubert , F irth
: / Corpus lingu istics is based ,
on the con cep t that langu age is a fund am entally so- T eub ert[ 12: 97 ]:
cial phenomenon. 0 [ 9: 4 ]
It d ep end s on wh ich aspect of languagew e are
, interested in. If we w ant to f ind ou t wh at is
( Exter- comm on to a ll languages, w e shou ld emb race
n alized language) , ( In tern alized Chom skyan lingu istics. If we wan t to find ou t if a
language) Chom sky ; French sen tence is structured grammatically, w e
, shou ld rely on standard lingu istics. If we w an t to
, find ou t whatw ords, sen ten ces and texts mean, w e
, shou ld op t for corpus lingu ist ics.
,
3.
, ?
, ,
: Saussu re ,
, ; , Chom sky
; , ,
; , : , F irth
, , H allid ay S in clair,

26
, , [ 3: 29- 31 ];

, S aussure,
, , , ;
F irth Saussu re [ 4: ,
192] : , , / 0
Su ch a language in the Saussu rean sen se is a / 0, / 0
system of signs p laced in categories. It is a system , ;

of d ifferen t valu es, not of concrete and positive ,


term s. A ctual p eop le do not talk such / a ,
language0. H owever system atically you m ay talk,
you do not talk system at ics.
,
, F irth , / 0
;
/ 0
, ( Runn ing T exts) ;
;
, ( M on itor

Corpora)
, :
, / 0, / 0?

/ 0, / 0?
H alliday
,
Chom sky
, / 0 / 0/
, Chom sky
0 / 0 :
,
, ,

,
:
,
Chom sky. s th eory of competence and perform- ,
ance h ad d riven a m ass ive wedge b etween the sys- ,
tem and in stance, mak ing it m
i poss ib le by defin-i
,
t ion that ana lysis of actual texts cou ld p lay any part , ,
in exp lain ing th e grammar of a langu age) let alone
,
in formu lating a general lingu ist ic th eory. [ 13:

30 ]

, / 0 70
/ 0 , ; / ,
0 / 0 , , ,
, , ,
, ,
,
, BN C 200, 000, 000 , Bank of
; , Eng lish 500, 000, 000

27
, [ 15: 122 ]
/
0, , ( L exical item ) ( Lex ico-grammar,
, L exical grammar)
, , ,
, S inclair
/ 0 ( Corpu s-driven lin- , :
gu istics) [ 14: 84 - 100] , ,
, ,
, , ,
H alliday,
,
, , :

/ Th ey are grammatica l item s wh en d escrib ed


4. :
gramm atically, as en tering ( v ia classes ) in to
,
closed system s and ordered structu res, and lex ical
F irth
item s when described lex ically, as en tering in to
(M od es of m ean ing) ,
op en sets and linear co llocations. 0 [ 16: 77]
[ 4: 192 - 6 ]
, H all-i ,
d ay 1966
[ 5: 148 - 62 ] S incla ir ,
/ Beginn ing th e S tudy of Lex is0 , ,
[ 6: 410 - 30 ] ,
, S inclair ; ,
Frederick J. N ewm eyer
, L ex is ,
, / a fun ct iona l Chomskyan 0
S incla ir [ 10: 174 ], ,
, / 0 , ; , ,
/ 0,
, / 0 S inclair C orpus L ing uistics at W ork , Togn in -
i Bon elli
: / G ramm ar is n eeded b ecause you cannot S in clair , corpu s-based approach and
corpus-d riven approach,
i e 0
say everyth ing at the sam e tm
65- 83 84- 100
:
S in clair / 2003 0
, W olfgang Tueb ert ,
, E. O. W inter, S incla ir,
( Tu ebert 2003: 73 )
, ,
H all iday Lex ico-gramm ar S in clair
S inclair / Lex ical gramm ar
0/ 0 / 0 , Togn in i- Bonelli 2001: 90- 95

28
, Th e comp lete m ean ing of a word is alw ays
, contex tu a,l and no study of m ean ing apart from a
com p lete con text can be taken seriou sly.
, ,
? N ewm eyer ;
N ewmeyer / G ramm ar is ,
grammar and usage is usage. 0 [ 17 : 695] ,
, ; ,

,
, 5. 1
, Saussu re
, ( S ign ifier) ( S ig-
n ified ) [ 20: 66 - 7 ]
,
S in clair Chom sky
COBU ILD , F ranc is ;
Hunston 5 6 [ 18 ], ,
; , ,
, , S in cla ir Corpu s Con-
, cordance Colloca tion
F irth ( Colligation ) ,
( Collocates ) , node con cord-

( Semant ic preference) ( S eman tic p rosody) an ces structure and m ean ing
, , ,

[ 19: 155]:
: ,
T he end resu lt w ill be that w e w ill be ab le to S incla ir
specify a ll ma jor lex ical item s in term s of their
, ( Lemma)
syn tact ic p references and a ll gramm at ical structures
in term s of th eir key lex is and phraseology. ;
, [ 10: 78] ,
5. , ,
F irth , , ;
, ,
, ; F irth
F irth ,
F irth , F irth
( Con textualism )
F irth [ 4: 7 ]:

29
cy) ( Open cho ice p rinc-i
5. 2 p le) , ;
,
( U n it of m ean ing), [ 22: 82 ]
,
F irth 50 / Y ou ,
sha ll know a word by th e company it tak es0
, ,
, ( D elex icalization ) ,
[ 10: 113] [ 21: 156- 159 ] ,
[ 10: 110 ]
5. 3
, ; ,
( Langue- In ternal S trategy)
, ,
S in cla ir ,
, [ 15: 120 - 121 ]
( On tolog icalm odels) ( Log ic
, mod els), / 0
( Ph raseolog ical ( Un iversal con cep ts),

tendency) ,
( Conven tionality) ( Id iomaticity ), , F irth



,
,
,
, (
, ;
) ,
:

( L ingu ist ic prefabricat ions) ( Lex ical
,
phrases) ( Lexical chunks, bundles)

, ,
( Stereotyp ings)
( Formu lae),

,
; / 0
;
,
,
; ,

, [ 12: 103 ] ,
S incla ir ( Extended
,
un it of m ean ing) [ 22: 75- 106 ]
( Paraph rase)


S in clair language-internal ; W olfgang
,
Teubert d iscourse-in ternal ( Teubert 2004: 100 -
S inclair, 103) ,
, ( T erm inologica l tend en-

30
, ,
,
,
, F irth
,
, F irth
COBU ILD , S in cla ir , F irth
, ,
, ,
, ,
,
,
? ,
;
? , ,
S inclair /
0, T eubert / 0; T eubert
/ U n iverse of d iscou rse0,
,
, S inclair ,
, ,
, ; ,
, , , ,
[ 15: 115 ], ,
: ,
, ; , ;
,
; , ,
,

:
[1] Pea rsal.l J. T he N ew Oxford D ictionary of
,
English [ Z ] . Shangha :i Shangha i Fore ign
,
Language Education P ress, 2001.
,
[ 2] W ilson, R. A. and K ei,l F. C. The M IT
,
Ency clop ed ia of the Cogn itive Sciences [ Z ].
,
Shangha:i Shangha i Fo re ign Languag e
Education P ress, 2000.
[ 3] S tubbs, M. T ex t and Corp us A naly sis [ M ].
O x ford: B lackw ell Publishers, 1996.
, [ 4] F irth. J. R. Pap ers in L inguis tics [ C ]. L ondon:

31
Ox ford U n iversity P ress, 1957. 1934- 1951. Am sterdam and Ph iladelphia: John Benjam ins,
[ 5] H alliday, M. A. K. Lex is as a linguistic level 2001.
[ A ]. C. E. Bazel,l J. C. Catfo rd, M. A. K. [ 15 ] S incla ir, J. P rogress and P rospects in Co rpus
H a lliday & R. H. Rob ins. In m emory of J. R. L inguistics [ J ]. M odern Foreign Languag es
F irth [ C ]. London: Longm an, 1966. 148 - 2004, ( 2): 112- 128.
162. [ 16 ] H a lliday. M. A. K. L ex ica l re lations [ A ].
[ 6] S incla ir, J. Beg inn ing the study o f lex is[ A ] . C. K ress, C. Sy stem and Function in Languag e
E. Bazel,l J. C. Catfo rd, M. A. K. H alliday [ C] . O xfo rd: O xfo rd U n iversity P ress, 1976.
& R. H. R obins. In m em ory of J. R. F irth [ 17 ] N ewm eye r, F. J. G ramm ar is G ramma r and
[ C ]. London: L ongm an, 1966. 410- 430. U sag e is U sage [ J]. Languag e, 2003, 79
[7] Sinc lair, J. and Jones. S. Eng lish L ex ical ( 4): 682- 707.
Co llocations: a study in compu tational [ 18] Hunston. S. and F rancis. G. Pattern Gramm ar:
linguistics. Cahiers de L ex ico log ie [ A ]. A C orpus-dr iven App roach to the Lex ical
R epr inted in Fo ley, J. A. 2000. J. M. Gramm ar of English [ M ]. Am sterdam and
Sinclair on L ex is & L ex icography [ C ]. Ph ilade lph ia: John B en jam ins, 2000.
Singapore: U n iPress, 1974. 21- 54. [ 19 ] F rancic, G. A. Corpus-driven A pproach to
[ 8] H a lliday, M. A. K. Language as Social Sem iotic G ramm ar: P rinc iples, M ethods and Examp les
[M ] . L ondon: A rnold, 1978. [ A ]. M. Baker, G. F ranc is& E. Togn ini-
[ 9] T eube rt. W. Co rpus L ingu istics - A P artisan Bone ll.i T ex t and T echno logy: In H onour of
V iew [ J]. TELRI N ew sletter, 1999, ( 8 /99): J ohn S incla ir [ C ]. Am sterdam: John
4- 19. Ben jam ins, 1993. 137- 156.
[ 10 ] Sinc lair, J. Corp us, C oncordance, C olloca tion [ 20 ] Saussure. F. de. Course in General L inguistics
[M ] . O xfo rd: O xford U n iversity Press, 1991. [M ]. Be ijing: F ore ign L anguage T each ing and
[ 11] K je llm er, G. A m int o f phrases [ A ]. K. A ijm er R esearch P ress, 2001.
& B. A ltenberg. E nglish Corp us L inguistics: [ 21] L ouw, B. Irony in the T ext or Insincer ity in the
Studies in honor of Jan Svartvik [ C ]. London: W riter? T he D iagnostic P otentia l o f Sem antic
L ongm an, 1991. 111- 127. P rosod ies [ A ]. M. Baker, G. F ranc is & E.
[ 12] T eubert. W. Language and Corpus L ingu istics T ognini - Bone ll.i T ex t and T echnology: In
[ A ]. M. A. K. H a lliday, W o lfgang T eubert, H onour of John S inclair [ C ]. Am sterdam:
Co lin Y a lop and Anna G e rmakova. L ex icology John Benjam ins, 1993. 157- 176.
and Corpus L inguistics [ C ]. London and N ew [ 22 ] S incla ir, J. The Search for U n its o f M ean ing
Y ork: Con tinuum, 2004. 73- 112. [ J] . T ex tus Ix, 1996: 75- 106.
[ 13 ] H a lliday, M. A. K. Co rpus studies and
probab ilistic g ramm ar [ A ]. K. A ijm er & B. : 2006- 05- 25
A ltenberg. E nglish Corp us L inguistics: S tud ies : ( 1955- ), ,
in honor of Jan Svartvik [ C ]. London: /
L ongm an, 1991. 30- 43. 0, / 0,
[ 14] Tognin i- Bonell.i E. Corpus Linguistics at Work [M ]. / 0

32