You are on page 1of 3

Cambridge Sketch Engine – Description of Sketch corpora

1. Cambridge International Corpus (CIC)

Subcorpus name Description


American TV and Radio From 1992-98 and 2004-2008
American academic Journals and academic non-fiction
American business magazines From 1995-99 and 200-2008
American magazines
American newspapers From 2001, 2004, 2007 and 2008
American spoken including CAMSNAEA and NAEC
American written including fiction, non-fiction, magazines, websites
BNC academic British National Corpus – academic written English
BNC fiction British National Corpus - fiction (1994)
Unscripted informal conversations from different age, region and social classes,
BNC spoken and spoken language collected in different contexts, ranging from formal
business or government meetings to radio shows and phone-ins
BNC written British National Corpus – (1979-94) Fiction, Non Fiction & Magazines etc
Br spoken A selection of spoken British English including CANBECA, (lexicography version)
British academic including CUP academic books and journals,
British regional newspapers Includes Northern Echo (1990 – 1993)
British spoken
British written including non-fiction, fiction, magazines, websites
CUP Business books CUP Business Books (2005-2008)
CUP ELT books CUP student books
CUP journals CUP journals
Economist newspapers The Economist (1998/2000 & 2004 & 2008)
Financial Times newspapers Financial Times (2008)
Guardian newspapers Guardian/Observer (2007, 2000, 1999, 1997)
Independent newspapers Independent/Independent On Sunday (2008, 2005, 2004, 1999)
Mail newspapers Daily/Sunday Mail (2002 – Feb 2005, 2000,1996, 1991)
Mirror newspapers Mirror Group Newspapers (Aug 1998-June 2002 and July 2002-Feb 2005)
Scottish newspapers Glasgow/Sunday Herald (1999 – Feb 2005)
Telegraph newspapers Daily/Sunday Telegraph from (2004, 2003 and 1995)
Times newspapers The Times/Sunday Times (2006)
Written business English gathered from the web, including product descriptions,
Wolverhampton Business
company press releases, annual financial reports, business journalism, academic
Corpus
research papers, political speeches and government reports.

© Cambridge University Press 2012 CONFIDENTIAL 1


Last updated: 11.01.12
2. Cambridge Academic Corpus (CAC)

Subcorpus name Description


American academic Academic journals and non-fiction
BAWE British Academic written English
BNC academic British National Corpus - academic (1979-94)
BNC Spoken Academic AC spoken citations only taken from BNC
British academic Academic journals and non-fiction
CAMWANAE American freshman essays
CANCODE Spoken Academic AC spoken citations only taken from CANCODE
CORNELL Spoken Academic AC spoken citations only taken from CORNELL
CUP books - Education HR books (cites 21-31)
CUP books - HR ED books (cites 32-52)
CUP journals Academic and professional journals
EAPLEC Lectures from invited speakers for the Cambridge Academic English book.
LIBEL Academic spoken interactions recorded in Limerick and Belfast Universities
LNGLEC Lancaster Linguistics lectures
NUCASE Academic spoken interactions recorded in Newcastle University
VOXPOPS Interviews recorded for Cambridge Academic English

3. Cambridge Spoken Corpus (CSC)

Subcorpus name Full title Description


Air Travel Information
Am Air Travel Info Air Traffic Control speech
System (ATIS0 complete)
Spoken interactions in professional settings, academic
Corpus of Spoken discussions such as faculty council meetings and committee
Am Athelstan Professional American meetings, and transcripts of White House press
English (1997) conferences, (question-and-answer sessions). 2 million
words.
Boston Flight Control
Am Aviation Air Traffic Control speech
(1991)
120 unscripted telephone conversations between native
Callhome American English speakers of English in North America and their family or
Am Callhome
Speech close friends. (Calls originate in North America, but 75%
are to people overseas.)
Cambridge Corpus of
Am Camsnae Spoken North American 500 thousand words of spoken American English.
English
Cambridge-Cornell Corpus Spoken North American English, comprising 0.5 mill words
Am Cornell of Spoken North American of informal, highly interactive, multiparty conversations
English (1997-2000) between family / friends from North America.
Michigan Corpus of MICASE is a collection of nearly 1.8 million words of
Academic Spoken English transcribed academic speech from the University of
Am Micase Michigan in Ann Arbor, including lectures, classes and
seminars in Humanities and Arts, Social Sciences, Biological
and Health Sciences, and Physical Sciences.
© Cambridge University Press 2012 CONFIDENTIAL 2
Last updated: 11.01.12
North American English
Am Naec
Course (1997- 2000)
American English - minority language groups who speak
Project MORE : Charlotte
Am Project More English as a second language – narratives about personal
1998
history and customs / culture
Spontaneous everyday speech from all over the United
States, including face-to-face conversations, telephone
The Santa Barbara Corpus
Am Santa Barbara calls, social gatherings, lectures and work conversations.
of American Spoken English
Speakers cover a wide range of regions, occupations, ethnic
and social backgrounds, and ages.
Fully spontaneous telephone conversations from every
Am Switchboard The Switchboard Corpus major dialect of American English. Speakers range
between 20 and 69 years.
Voicemail Corpus (Part 1
Am Voicemail American voicemail messages
and 2)
Am analysis - American English TV and radio analysis
Am interview - American English TV and radio interviews
Am news - American English TV and radio news
Unscripted informal conversations from different age,
British National Corpus – region and social classes, and spoken language collected in
BNC spoken
spoken English different contexts, ranging from formal business or
government meetings to radio shows and phone-ins
Conversations, phone calls, meetings etc from companies
Cambridge and Nottingham
ranging from large global corporations to small
Br Canbec Spoken Business English
partnerships. Recordings were made in a range of
Corpus
functions within each business
Spontaneous speech taken from across the British Isles,
comprising a wide variety of social interactions, including
Cambridge and Nottingham casual conversation, people socialising together, people
Br Cancode Corpus of Discourse in shopping, people finding out information and discussions.
English All interactions are coded according to the level of
familiarity of the speakers’ relationship.

British Lexicorpus 1979-


Br Lexicorpus General chats among friends
1999
Br spoken
Strathy Corpus of Canadian 57 million words of written and spoken Canadian English,
Can Strathy
English including ...

© Cambridge University Press 2012 CONFIDENTIAL 3


Last updated: 11.01.12

You might also like