“Yes, your honor!

”: A corpus-based study of technical vocabulary
in discipline-related movies and TV shows
Eniko Csomay
, Marija Petrovic
San Diego State University, San Diego, CA, USA
Received 22 January 2012; revised 9 May 2012; accepted 10 May 2012
Vocabulary is an essential element of every second/foreign language teaching and learning program. While the goal of language
teaching programs is to focus on explicit vocabulary teaching to promote learning, “materials which provide visual and aural input
such as movies may be conducive to incidental vocabulary learning.” (Webb and Rodgers, 2009, p. 412) The present study uses
corpus-based techniques to investigate the extent to which watching discipline-related movies and TV shows in a second/foreign
language may facilitate incidental learning of technical vocabulary. A corpus of 130,000 words was complied from movies/TV
shows with legal content to track the frequency and vocabulary distribution of legal technical vocabulary items. The findings show
that the majority of technical terms is encountered more than ten times in movies/shows, and that technical terms tend to co-occur
together in particular segments of movies and episodes. The results suggest that the potential for incidental vocabulary learning
through movies and TV shows with a legal content is high and that these media could also be used for teaching language patterns in
an English for Specific Purposes (ESP) classroom with relevant focus. Hence, watching discipline specific TV shows and movies is
beneficial for both incidental learning and explicit teaching.
Ó 2012 Elsevier Ltd. All rights reserved.
Keywords: Corpus; English for Specific Purposes; Incidental vocabulary learning; Law; Media; Technical vocabulary
1. Background
The underlying principle of any language program with a focus on English for Specific Purposes (ESP) is that all
aspects of teaching are tailored to meet the learners’ specific language learning needs of a given context. To translate
these principles to all aspects of curriculum design (e.g., selection of target language forms and functions, materials
design, lesson plans, etc.), ESP practitioners have been engaging in the analysis of the contextual and language
features of the genres language learners would be studying, living in or working with (Johns, 2001; Dudley-Evans and
St John, 1998). As a result, an important role has been given to the analysis of multiple genres apparent in various
contexts, including English for Academic Purposes (EAP) and English for Occupational Purposes (EOP). In the area
of EAP, research has shown that the linguistic characteristics of spoken and written registers in the academic contexts
* Corresponding author. Department of Linguistics and Asian/Middle Eastern Languages, 5500 Campanile Dr., San Diego, CA 92182-7727,
USA. Tel.: þ1 619 549 4706 (cell).
E-mail address: ecsomay@mail.sdsu.edu (E. Csomay).
0346-251X/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.
Available online at www.sciencedirect.com
System 40 (2012) 305e315
vary (cf., Samraj 2002, 2008 for written; Csomay, 2002, 2007 for spoken) depending on the discipline and on the level
of instruction. In the area of EOP, language used in professional settings, much of the work has focused on English for
Business Purposes. Although English for Legal Purposes (ELP) is also an important sub-area of EOP, it is still
a “relatively uncultivated corner of the ESP field.” (Dudley-Evans and St John, 1998, p. 50) Yet, non-native speakers of
English who practice law in the United States as well as those non-native speakers who are entering programs in the
US to study law would consider ELP a beneficial area to study.
The fact that vocabulary is a key element of any language learning context is evident. Yet, learning and knowing
vocabularyis perhaps evenmore important for anELPcontext, as it is a disciplinaryarea where, as in the sciences, concepts
are learnt through a high number of nouns and discipline specific vocabulary. We continue this section by, first, explaining
how we understand and define technical vocabulary in the disciplinary area of law (1.1). Second, we discuss aspects of
vocabulary learning and teaching in an attempt to make an argument that in English as a Foreign Language (EFL)
environments still have a potential to foster implicit vocabulary learningthrough the context of movies andTVshows (1.2).
Third, we introduce basic principles of corpus-based methodology; a method we used to conduct our research (1.3).
1.1. Technical vocabulary in the discipline of law
Nation (2001) defines technical vocabulary as words “recognizably specific to a particular topic, field or disci-
pline.” (p. 198) Technical vocabulary can also be defined based on the words’ occurrence patterns in specific contexts
versus in more general contexts. In this school of thought, Nation (2001) classifies words into frequency bands (e.g.,
the first 1000 most frequently occurring words or the second 1000 most frequently occurring words), and considers
those words technical vocabulary that are used both inside and outside of a field, yet with the majority of the word’s
uses with a particular meaning would fall in the field. Tiersma (1999), on the other hand, defines technical vocabulary
based on a word’s multiple senses, saying that a technical term is a word or phrase either used exclusively by
a particular profession or used in a way that differs from its prototypical meaning. Consequently, as he explains
further, the “legal profession focuses intensely on the words that constitute law.” (p. 1)
In reality, the combination of these two approaches characterize technical vocabulary use most accurately. While some
technical terms appear rarely outside of a particular field, others are commonoutside the field but carry a different meaning
when used in a specific discipline or context (Nation, 2001), and some words are more common in particular disciplines
and contexts than elsewhere. As for legal language, a great deal of legal vocabulary looks like ordinary language but has
quite a distinct meaning in specific contexts. In the context of law, Tiersma (1999) calls this set “legal homonyms.” He
exemplifies the termwith the word brief, which, “in the language of the lawis a noun referring to a type of legal document,
not an adjective” and party, which “is someone who is part of lawsuit (often a single person or entity).” (p. 111)
In this study, we look at vocabulary distribution as well as word meaning in specific contexts, hence, using both of
these approaches to determine technical vocabulary in our texts.
1.2. Explicit versus implicit vocabulary learning in context
As previous empirical research has shown (Nation, 2001; Schmitt, 2008), a good understanding of a text requires
98%of coverage for reading (i.e., not knowing one in every 50 words in a text), and 95%for listening (i.e., not knowing
one in every 20 words in a text). Hence, learning vocabulary is an essential part of learning a second/foreign language
whether it happens through self-study, through explicit instruction in the classroom, or incidentally through the
environment. One of the central themes in the debate that has emerged from past studies is whether the focus should be
on explicit vocabulary teaching or implicit vocabulary learning (DeCarrico, 2001). During the past decades, much
research has provided insights and inspired teachers to improve the efficacy of explicit vocabulary teaching (Stahl
1999), leading teachers to sophisticated techniques to teach words explicitly in the classrooms. However, it is still
less obvious how to promote incidental vocabulary learning, and as Schmitt (2008) points out, the need for learning
programs to include not only explicit vocabulary teaching but also to promote incidental learning is, indeed, necessary.
Incidental learning takes place when the learner is reading or listening to “normal language use” (Nation, 2001)
without actively and consciously engaging in the actual learning process by, for example, looking up words in the
dictionary or memorizing word meanings. Unlike through explicit teaching and learning, the actual process of
learning happens subconsciously and mainly simply from the context. Hence, in line with other immersion models of
306 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
learning, the more frequent the exposure or the more frequent the encounters are to the words in the context, the
greater the chances are for learning to take place.
1.2.1. General contextual characteristics
As mentioned above, incidental learning mainly happens fromthe context and in the environment in which the learning
takes place. The more frequent the encounters are the greater the chances are of learning the word. Depending on how
informative the context is, it takes the learner to be exposedtoa wordbetweensixandtwentytimes before acquisitiontakes
place (Stahl, 1999). If the learner is in an environment where the target language is natively used, it is axiomatic to think
that there are more opportunities for incidental learning to take place fromthe context. That is, English Language Learners
in a native English speaking environment (ESL) such as, the United States, England, or Australia, have an “immediate and
natural need [for learners] to use English” (Celce-Murcia, 2001, p. 20). Simply, they are frequently exposed to real life,
non-classroom situations where they are able to subconsciously learn meanings emerging in every day contexts.
In contrast, English Language Learners in a non-English speaking environment (EFL) are in an environment where the
target language is not commonly used outside the classroom, and “the main exposure to discourse in the target language
that learners will have is in the classroomitself, via the teacher” (O’Keeffe et al., 2007, p. 222) and/or their peers. Hence,
the exposure to “normal language use” is mostly limited to only a few hours per week in the classroom. This lack of
exposure is emphasized as a common problemespecially for EFLlanguage learners (DeCarrico, 2001) as the constrained
contextual factors limit opportunities for incidental vocabulary learning in general and through listening in particular.
1.2.2. Movies and TV shows as context
Most of the research investigating incidental learning has looked at the phenomenon through reading. Relatively little
research has been carried out on incidental learning through listening; more specifically, on the extent to which learners
can benefit from watching movies and TV shows. One of the most recent studies by Webb and Rogers (2009) indicates
that “materials which provide visual and aural input such as movies may be conducive to incidental vocabulary learning”
(p. 412). In addition, according to Nation’s (2001) corpus based study, technical words typically cover about 5% of
a general text, and about 10% of a specialized text. Watching movies with discipline-specific content provides an
opportunity for more encounters with unknown words (Webb, 2010) and therefore, is conducive to the opportunity to
learn incidentally. For example, a TV drama that follows the lives of medical staff in a hospital (e.g., House, Grey’s
Anatomy) will provide input comprising vocabulary pertaining to medical procedures and treatments. Hence, the
assumption is that watching discipline-related movies and TV shows can benefit the learning of technical vocabulary.
However, it is only recently that we can find out the vocabulary coverage of texts the students are exposed to
through these environments. With the intense use of computational and corpus-based methodologies, today, we are
able to describe texts for their vocabulary profile, for example, for their proportions of general versus technical
vocabulary. In the final section, a brief introduction is given to corpus-based methodologies.
1.3. The corpus-based approach to vocabulary studies
The primary goal of corpus-based studies is to examine patterns of language use based on large collections of
naturally occurring texts. Typically, lexical studies look at frequencies of individual words or the co-occurring patterns
of two or more words, and analyze those patterns for their grammatical and functional characteristics in texts. The
frequency patterns are identified through sophisticated computational methods, enabling investigators to identify
complex patterns in large amounts of language data, and resulting in consistent results. Then, the patterns extracted
this way are associated with extra-textual variables, such as discipline or register, in order to systematically describe
differences in use (Biber et al., 1998; Biber et al., 2004).
The influence of corpora and corpus-based research on educational theories and practices has been most apparent in
the areas of vocabulary analysis (Gardner, 2007). The corpus-based approach has been explained as a means of enabling
researchers to better identify and classify vocabulary items by using high-powered computers, robust software, and
large electronic collections of texts sampled from actual language use (Gardner, 2007; Adolphs and Schmitt, 2003).
Correspondingly, in the field of ESP, the corpus-based approach has been used in determining specific vocabulary that
learners will need within a particular discipline. The approach has been especially useful in investigating whether general
vocabulary lists are adequate for distinct disciplines. Consequently, various word lists as a means of facilitating language
learningin ESPcourses have beendevised(Lam, 2001; Martı ´nez et al., 2009; Mudraya, 2006; WangLiangand Gee, 2008).
307 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
1.4. Goal of the study
The goal of this study is to track the percent coverage of technical vocabulary in movies and TVshows pertaining to
one specific content or disciplinary area: law. In the subsequent sections, first, we outline the methodology we applied
to compile a corpus of relevant transcripts, and the analytical methods we used to extract technical vocabulary from
the corpus. Then, we report on the findings.
2. Research methodology
2.1. Corpus
The transcripts of seven movies and five TVepisodes with relevant, legal content comprised the corpus of texts for
further analysis.
The movies had a total running time of 1031 mins. (17 h and 11 mins.) and an average running time
of 147 mins. (2 h and 27 mins.). The TVshows had a total running time of 238 mins. (3 h and 58 mins.) and an average
running time of 47.52 mins. (0.75 h).
The corpus consisted of 128,897 words. The sub-corpus of TV episodes (Law and Order) had a total of 32,300
words, an average of 6460 words for each of the five episodes. As for movies, the total of 96,597 words were collected
in seven movies, with an average of 13,799 running words each. The total number of running words for each movie,
and the five episodes is presented in Table 1 below.
The movies were selected from a list of twenty-five legal movies that was published on the American Bar
Association Journal website (Honorable mentions, 2008). The five episodes comprise a convenience sample and were
randomly selected from the available TV scripts. The metadata in the texts such as, scene descriptions, names of the
characters, and story line were deleted from the scripts, hence, exclusively, the language of dialogues and monologic
types of opening and closing statements in court scenes was included in the analysis.
2.2. Definitions and word lists
As we created a final word-list of technical vocabulary, we took two steps in creating our definition of technical
vocabulary following the principles of the two approaches outlined in 1.1 above. In the first step, we listed specialized
words in the corpus based on their vocabulary distribution using Nation’s Range program. In the second step, in addition
to the specialized vocabulary determined by step one above, we determined which words in our word lists extracted from
the corpus would qualify as technical vocabulary ein our context “legalese” ethrough checking a specialized dictionary.
More specifically, the goal of the first step was to create a list of specialized vocabulary through quantitative methods.
We used Nation’s Range program to obtain the set of less frequently occurring vocabulary in our corpus, and called it
specialized vocabulary. The programfirst checked the availability of every word in our corpus against already available
base lists of most frequently occurringwords, and selected thosewords that cannot be found in anyone of these lists. The
three base lists are: 1) AGeneral Service List of English Words available from West’s corpus of texts; 2) The Academic
Word List available fromCoxhead created based on a corpus of academic texts, with two sub-categories: a) the top most
frequent 2000 words of English and b) the words that are not among the first 2000 words of English but are frequent in
upper secondary school and university texts from a wide range of subjects (Nation, 2005). After checking the avail-
ability of each word from our corpus in these lists, the program provided the frequencies of each word. The fourth list
produced by Nation’s program contained the word types that could not be found in any of the three frequent-word lists
described above. Words in this fourth list were treated as specialized vocabulary in our study.
The goal of the second step was to determine the set of technical vocabulary used in our corpus using qualitative
methods in two stages. In stage one, all word lists were checked against a legal dictionary. More specifically, the
content words on Nation’s (2005) three base lists were checked for words that could be potentially used both inside
and outside the field of law (Nation, 2001; Tiersma, 1999). Since word meaning is dependent on the context in which
The number of texts included in the corpus was determined based on the potential viewing time and opportunity within a single semester
(Webb, 2010). During an average of 12-week long semester, seven movies and five episodes mean that a movie or an episode can be watched at
least once a week.
308 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
the word appears, the text files from our corpus were used for the context in which content words from the three lists
occurred. If a word occurred in a legal context, its meaning was checked against a legal dictionary. After the words
were identified, word families were formed.
In stage two, from the specialized vocabulary list created through the first step outlined above, those words were
identified and selected that reflected technical vocabulary pertinent to the legal context (“legalese”). Then, word
families were created and each word family was checked against a legal dictionary. Word families were formed with
inflected forms and closely related derived forms of a head word. The reason this was an important step in relation to
incidental vocabulary learning is, as cited in Stahl (1999, p. 8), that “someone knowing one of the words (in the family)
could guess or infer the meaning of others when encountering it in context.” The following is a sample of the word list
showing word families (cf., full list of headwords in the Appendix). An example of headwords and corresponding
derivations are in Table 2 below.
Since some words, despite having the same root, could not be placed within the same family due to the difference in
the meaning, they were checked against Black’s LawDictionary (1990). The excerpts below taken from the dictionary
(Black, 1990) illustrate this point. Two sample definitions of technical terms from a law dictionary is as follows:
Jury. A body of persons temporarily selected from the citizens of a particular district, and invested with power
to present or indict a person for a public offense.
Jurisprudence. The philosophyof law, or the sciencewhichtreats of theprinciples of positive lawandlegal relations.
All in all, the specialized vocabulary identified with Nation’s Range program was checked for their meaning
against a discipline specific dictionary called Black’s LawDictionary (1990). The final word list consisted of 651 word
types in 218 word families (see the list of word families in the Appendix). The two terms, specialized and technical
vocabulary are used interchangeably in our study.
2.3. Analysis
To identify which words to include in the analysis, each word in our corpus was checked against the three word lists
in Nation’s programdescribed above, and those words that did not appear on any of the lists in that program comprised
our specialized word list (Step 1 above). This specialized word list then was checked against Black’s Law Dictionary
(1990) and a set of ‘legalese’ word types and their families was created (Step 2 above). The total number of
Table 2
Headwords and their derivations.
Headword Derived words
Abuse abused, abuser, abuses, abusive
Accuse accused, accusing, accuser, accusations
Acquit acquittal, acquitted, acquitting
Adjourn adjourned, adjournment
Table 1
Number of running words in Movies and TV shows.
Genre Title Number of words
TV series Law and order (average length) 6460
Subtotal of 5 episodes 32,300
Movies The rainmaker 10,815
Runaway jury 16,053
A few good men 18,723
The pelican brief 9438
The devil’s advocate 13,202
Sleepers 16,580
A time to kill 11,786
Subtotal of 7 movies 96,597
Total 128,897
309 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
occurrences of these word types and families was then calculated for each of our sub corpora (TV shows and movies).
Finally, the results were analyzed to determine the percentage of running words covered by technical vocabulary.
For subsequent analyses, following the classification of encounters in Webb (2010), the technical terms were
grouped into frequency bands according to the number of encounters for each sub-corpus. As mentioned above, it
takes the learner to be exposed to a word between six and twenty times before acquisition takes place. Even though
five encounters may not lead to a full acquisition, it may lead to partial knowledge of the meaning. Thus, if a word
family was encountered less than five times, the words were not included in our analysis further, as the potential for
incidental vocabulary learning has less of a chance. In contrast, if the word family occurred five times or more, it was
included in the subsequent
analyses. For the purposes of this study, three bands were established based on the number
of encounters with each family. The first frequency band contained words occurring five to seven times in the TV
episodes, or in the movies. The second frequency band contains words encountered 8 or 9 times, and the third
frequency band includes words whose word family was encountered 10 or more times in either one of the sub-corpora.
Finally, to extract and analyze the textual contexts for these words in which each technical vocabulary occurs,
a freely available concordancing program, AntConc was used. This program helped use the word in larger context.
3. Findings
3.1. Coverage by specialized/technical vocabulary
Of the 128,897 running words in the entire set, 7980 were found to belong to the specialized vocabulary list; thus,
technical vocabulary accounts for 6.2% of the language spoken in the movies and TVepisodes. More specifically, of
the 32,300 running words in the TVepisodes 1124 are technical vocabulary, covering 3.5%of the total vocabulary, and
of the 96,597 running words in the movie transcripts, 6918 are specialized vocabulary covering 7.2% of the total text.
The number of running words, technical words and the percentage for the TVepisodes, the movie, and the entire set is
shown in Table 3 below.
The findings show that on average, technical vocabulary covers more than 5% of the language in our discipline-
related corpus of movies and TV shows. As Table 3 also shows, movies tend to be longer than TVepisodes result-
ing in an increase of running word count as well as a higher number of technical vocabulary occurrence. This is in
compliance with Webb’s (2010) findings that there is a greater chance of encountering particular words when
watching more movies. More specifically, while the total word count in movies (96,597) is nearly 75% of the total
number of running words in the corpus, the total number of specialized words in movies (6918) comprise 86% of the
total number of specialized vocabulary in the corpus. This finding indicates that there are potentially more chances to
encounter technical vocabulary in movies than in TVepisodes. However, it is not the case that the number of technical
vocabulary exponentially grows with the number of words in a text. This is evidenced by the fact that in the movies,
7.2% of the total running words are specialized vocabulary and in the episodes, special vocabulary accounts for 3.5%
of the total running words. While the total number of running words in the movies is three times as many as in the
episodes, the ratio between the running word count and technical vocabulary is only double. In other words, just
because movies tend to be at least twice as long as TVepisodes, it does not mean that we would encounter twice as
many specialized or technical vocabulary.
Table 3
Coverage by specialized/technical vocabulary.
Law and order Movies Total
Total running word count 32,300 25% 96,597 75% 128,897 100%
Average running word count 6460 13,800 10,741
Total specialized vocabulary 1124 13% 6918 87% 7980 100%
Average specialized vocabulary 224.8 988.3 665.0
Average specialized vocabulary coverage 4% 7% 6%
To illustrate, in the script for the movie titled The Rainmaker, the word dismiss was encountered four times; the word dismissal was
encountered one time; the word dismissing was encountered one time. Thus, as a word family dismiss was counted with a potential of six
310 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
The number of encounters with technical vocabulary for each sub corpora is shown in Table 4 below. As mentioned
before, in total, 174 word types of technical vocabulary occurred in five episodes of Law and Order (on average, 25.6
technical vocabulary per episode), 477 in the set of movies (on average, 68.1 technical vocabulary per movie), and 651
in the entire set (on average 54.3 per view). In terms of word families, on average, each TVepisode has 12.2 technical
word family occurring, and movies have 22.4 per movie.
As Table 4 also shows, many of the technical terms appear more than ten times in both of the sub-corpora. The least
favorable frequency band is Band 2. This means, that if technical vocabulary occurs, it tends to occur either five to
seven times or more, or ten times or more. This trend is supported by the data shown in Table 5, where the percentage
of each category of both word types and word families are shown.
Accordingly, of the total word types, 14.1% occur more than five times but at least seven times, 7.5 % occur eight to
nine times and 78.4 % appear ten times or more in the corpus. A similar proportion is shown in the percent of word
families and their frequency.
3.2. The case of “The Rainmaker”
For the purposes of analyzing the coverage and the potential for incidental learning in a single movie, and in order
to better understand the effect of technical vocabulary, we randomly selected one movie from the corpus, The
Rainmaker. We analyzed its vocabulary make-up and found 291 specialized vocabulary items occurring in the text,
which covers 2.7% of the total of 10,815 running words. Table 6 below summarizes the distributional patterns of the
word types falling into the three frequency bands.
As shown in Table 6, in this movie, a total of 62 word types comprising 25 word families were repeated many times.
Of the 62 word types, 23 were repeated at least 5 times, 5 were repeated at least 8 and no more than 9 times, and 34
were repeated 10 or more times. In line with the general tendency (see above), 37.1% of the total types occurred at
Table 4
Encounters with technical vocabulary.
Text/encounters Band 1 (5e7) Band 2 (8e9) Band 3 (10þ) Total
Types Family Types Family Types Family Types Family
Law & order
5 episodes
Total 46 19 30 10 98 32 174 61
Average 9.2 3.8 6 2 19.6 6.4 25.6 12.2
7 movies
Total 46 20 19 8 412 129 477 157
Average 6.6 2.9 2.7 1.14 58.9 18.4 68.1 22.4
12 pieces
Total 92 39 49 18 510 161 651 218
Average 7.7 3.3 4.1 1.5 42.5 13.4 54.3 18.2
Table 5
Percentage of encounters.
Text/encounters Band 1 (5e7) Band 2 (8e9) Band 3 (10þ) Total
Types % Family % Types % Family % Types % Family % Types % Family %
Law and order 26.4 31.1 17.2 16.4 56.3 52.5 26.7 28.0
Movies 9.6 12.7 4.0 5.1 86.4 82.2 73.3 72.0
Total/entire set 14.1 17.9 7.5 8.3 78.4 73.8 100 100
Table 6
The rainmaker.
Text/encounter Band 1 (5e7) Band 2 (8e9) Band 3 (10þ) Total
Types Family Types Family Types Family Types Family
The rainmaker 23 11 5 3 34 11 62 25
Percent of total 37.1% 44% 8.1% 12% 54.8% 44% 100% 100%
311 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
least five times, and 54.8% occurred at least ten times. This data supports our earlier findings that if a technical
vocabulary occurs, it is typically repeated at least ten times.
The following text extracts are taken from the movie The Rainmaker and illustrate the words from the different
frequency bands in context. The word family with the head word argue comprising words argue, argued, argument
illustrates the band of 5e7 encounters (Extract 1). The 8e9 frequency band is illustrated by the word family with the head
word depose comprising words depose, deposition, depositions (Extract 2). Finally, Extract 3 shows the word family with
the head word client comprising the words client, clients, and illustrates the frequency band of 10 and more encounters.
Extract 1 Band 1 (5e7 encounters)
Extract 2. Band 2 (8-9 encounters)
Extract 3. Band 3 (10þ encounters)
morning ? Nine o’clock ? We ’re gonna ARGUE Great Benefit ’s motion to dismiss.
continuance? - No, I ’m prepared to ARGUE the motion. - Are you a lawyer ? I
man has passed the bar , - - let him ARGUE the case. We welcome him to big-time
5-85 , Southwest 2nd , page 431 . ARGUED by J. Lyman Stone. It shows that
might have you handle some of that ARGUMENT, Rudy. It would be awfully
my manners ? I come from Memphis to DEPOSE four people, two of whom are not
’s your call , son ? I ’m going to DEPOSE Mr Lufkin, then I ’m going to go
I ’m out of town Thursday . The DEPOSITION is set for next Thursday afternoon
ready for war . It ’s my DEPOSITION, but it ’s their turf. Young
with Mr Underhall . It ’s my DEPOSITION, I ’ll call the witnesses in the
close . - Two days before her DEPOSITION? - I really don’t remember. I
Two days before she was to give a DEPOSITION in this matter. She was the pers
Benefit . I ’m going to take DEPOSITIONS from all the executives. We ’ll
I guess . You should fight for your CLIENT, refrain from stealing money, and
o become personally involved with a CLIENT. But there ’s all kinds of lawyers
now ? She no longer works for our CLIENT. We can’t produce her as a witness
What a coincidence. - My CLIENT’s going through downsizing. How
you. I’m sorry about the boy. My CLIENT wants to settle, Rudy. Let’s say
that the claim was covered. . My CLIENT should have paid out somewhere
just don’t tell anybody. My CLIENT’s son died of leukaemia because
it was more than that . Your CLIENT has a billion dollars, and your
has a billion dollars, and your CLIENT killed my son. I wanted to sue for
place for me to go but down . Every CLIENT I ever have will expect the same
You heard of them? - You got these CLIENTS signed? - I ’m on my way to see
of lawyers. and all kinds of CLIENTS, too. You okay , baby? Can I get
passed the bar, and these are my CLIENTS. Mr Stone filed this on my behalf
handled a lot of cases . I told my CLIENTS at Great Benefit - - that costs
We wouldn’t be here if your CLIENTS had done what they should’ve done
312 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
3.2.1. Vocabulary distribution
Another interesting pattern emerging from this data is that technical terms are not evenly distributed within
a movie. When analyzing the context more in detail in which technical terms or specialized vocabulary occur, it is
clear that technical terms tend to cluster together in particular scenes. The following is an excerpt from the movie
script The Rainmaker, showing the technical terms lawsuit, motion, dismiss, re-file, and court as they co-occur within
a segment of a total of 27 words. In Extract 4, what this means is that 5 out of 27 words in the segment are technical
vocabulary items. In terms of percentage, the technical terms accounts for 18.5% of the segment. This can be further
illustrated by the example in Extract 5, where a total of 7 words out of 21 (33.3%) are technical terms.
Extract 4.
A: I’m really tired of this type of lawsuits. I’m inclined to grant the motion to dismiss. You can re-file it in federal
court, take it somewhere else.
Extract 5.
A: Bailiff, remove Mr Porter. Mr Porter, you are excused from the jury.
B: Your Honor, I move to dismiss the entire panel.
This finding brings a new issue to light. If technical terms are grouped with such density, how possible is it for them
to be learnt from the context? Further research is needed to analyze concordance plots for distribution within the texts
and the effects of density on the incidental learning of technical vocabulary.
Another way of looking at co-occurring patterns is to examine concordance lines for those words that frequently
appear together. Nation (2001, p. 17) defines a collocation as “groupings of words into phrases or clauses” processed
as a unit, not as a group of separate words. While collocations are typically two-word sequences that appear together
and that have a strong idiomatic character, lexical bundles are word combinations that are identified through frequency
measures. More specifically, they have to occur at least 10 (or more) times in a million words and in at least five (or
more) texts (Biber et al., 1999). From the point of view of language learning, spending time on collocations is worth
the effort since, on the one hand, collocations make up a large number of the expressions difficult to learn as the parts
at times have some semantic unpredictability. On the other hand, acquiring a wide variety of these collocations,
learners may sound more native-like and more fluent and proficient in the given second or foreign language.
Concordance lines allows us to discover collocations that occur in law movies and TV shows. The following is
concordance line for the word dismiss excerpted from the concordance file for the entire set.
Extract 6. Concordance for dismiss
The concordance reveals phrases such as dismiss the case, dismiss (panel, visually impaired), grant the motion to
dismiss, motion to dismiss is (still) pending. Some other collocations that can be found in the corpus are: under false
pretense, standard procedures, preliminary hearing, badgering of the witness, leading the witness, and many more. At
first sight, it seems that technical jargon typically appears in clusters, whether frequently co-occurring two or more
it . So what ? You ’re just gonna DISMISS the case against Billy ? I don’t
time , I ’m gonna file a motion to DISMISS . You won’t got it . I will get it
Washington . Excuse me , I didn’t DISMISS you . I beg your pardon . I ’m not
argue Great Benefit ’s motion to DISMISS . I think we ’re ready . Deck and I
. - What ’s next ? - The motion to DISMISS is still pending . Oh , yes. That
jury . Your Honour, I move to DISMISS the entire panel . - Denied. - It
due process to just automatically DISMISS the visually impaired.. . - He ’s
313 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
words at a time. It certainly is possible that in legalese, collocations are less common than word clusters containing
three or more words at a time. Although the potential for incidental learning of collocations pertaining to technical
vocabulary through watching movies and TVshows was not in the focus of this study, hence was not investigated here,
future research conducted in this area could certainly highlight those patterns.
4. Summary and further research
The goal of the study was to investigate the extent to which watching discipline-specific movies and/or TV series in
a second or foreign language may facilitate incidental learning of technical vocabulary. The findings showed that the
majority of technical terms is encountered more than ten times; hence, there is a potential of incidental learning of
technical vocabulary through watching discipline-related movies and TVshows. Even watching a single movie provides
an opportunity for acquiring technical terms; nevertheless, there is a greater potential if movies are watched regularly.
The results reported here provide evidence that there is a potential for using discipline-related movies and TV
shows in promoting vocabulary learning. Learners’ vocabulary can be increased when they are being taught or
deliberately learning new words either by encountering them in the context or by gaining control of prefixes, suffixes
and other word building devices (Nation, 2001). By simply watching discipline specific movies and TV shows once
a week provides ample opportunities for incidental vocabulary acquisition. Watching a single movie provides an
opportunity for acquiring technical terms, although as the results also showed, there is a greater potential if discipline
specific movies and TV shows are watched regularly, at least once a week. As for direct, explicit vocabulary teaching
practices, providing students with concordance lines from the shows is a great way to raise awareness to notice
language patterns. Most certainly, the combination of explicit vocabulary teaching and incidental vocabulary learning
leads to the best results.
The results also have theoretical implications. One implication relates to vocabulary dispersion in texts. As shown
in Extracts 1 and 2, technical terms co-occur in particular segments of a movie. This finding indicates that it is not
enough to rely solely on frequency measures when we describe patterns of vocabulary use. We also need to look at
measures of lexical dispersion to capture the uneven distribution of vocabulary patterns, and describe how they relate
to specific discourse functions (Csomay 2010; Csomay and Cortes 2010). Another implication relates to co-
occurrence patterns of a word in its immediate context. The concordance lines clearly showed how two or more
words come together in sequence. Further research could use quantitative measures to calculate the likelihood of
particular word sequences to occur together in discipline specific media versus, perhaps in general language use. In
sum, these two types of co-occurrence patterns point to two phenomena: how words co-occur with other related words
and in particular phrases, and how they also get together at particular points in the text as a whole.
One of the limitations of this study is the inability to analyze concordance plots for co-occurrence of technical
terms within a movie or TVepisode. Further research is needed to analyze concordance plots for distribution and the
effects of density on the incidental learning of technical vocabulary. A study similar to the present study should be
carried out analyzing technical vocabulary in an authentic setting (e.g. courtroom) for the purpose of comparing the
authenticity of the technical vocabulary use in law-related movies and TV shows. A focus of further research could be
the fact that the language of law is not characterized only by technical terms; it also has a specific style. What makes it
particular is the use of long sentences, binominal expressions, passives and nominalization, and avoidance of the first
and second person pronouns (Tiersma, 1999). If future research shows that the language used in law-related movies
and TV shows also bears syntactic resemblances to the language in authentic legal settings, ESP classrooms can make
greater use of this potential for language teaching.
Appendix A.
Headwords in word family of technical vocabulary
Abuse, Accuse, Acquit, Adjourn, Admit, Adopt, Affidavit, Alibi, Allege, Allegiance, Appeal, Appoint, Argue,
Arraign, Arrest, Assault, Attorney, Badger, Bail, Bailiff, Bar, Batter, Bench, Breach, Brief, Broker, Bruise, Brutal,
Chamber, Charge, Civil, Client, Commit, Complain, Comply, Conceal, Confidential, Conspiracy, Constitute,
Constitution, Contempt, Contract, Convene, Convince, Coronary, Counsel, Counselor, Counter, County, Court,
Credible, Crime, Custody, Defend, Deliberate, Delinquent, Deny, Depose, Deputy, Detain, Diagnose, Disclose,
314 E. Csomay, M. Petrovic /
System 40 (2012) 305e315
Discover, Dismiss, Dispute, Divorce, Document, Drug, Embezzle, Establish, Evidence, Exam, Exculpatory, Excuse,
Execute, Executive, Exhibit, Expunge, False, Felon, File, Fingerprint, Firearm, Foreman, Forensic, Forfeit, Forge,
Foster, Fraud, Grant, Guilt, Hearing, Hearsay, Hereby, Homicide, Honor, Identity, Impartial, Implicate, Imply,
Incarcerate, Indicate, Indict, Innocent, Insane, Intent, Interrogate, Investigate, Jail, Judge, Jurisdiction, Jurisprudence,
Jury, Justice, Kidnap, Law, Legal, Litigate, Misdemeanor, Molest, Motion, Motive, Murder, Narcotics, Negotiate,
Nuptial, Object, Obstruct, Offend, Oppose, Order, Ordinance, Pedophile, Penalty, Perjury, Permit, Plaintiff, Plea,
Pledge, Precedent, Prejudicial, Preliminary, Preside, Prison, Procedure, Proceed, Process, Proof, Prosecute, Punish,
Punitive, Qualify, Rape, Recess, Recollect, Recuse, Release, Relevant, Remand, Repeat, Represent, Resume, Reveal,
Rule, Sentence, Session, Settle, Sidebar, Slaughter, Solicit, Solve, Speculate, State, Statute, Stipulate, Subpoena, Sue,
Suffocate, Suggest, Suicide, Summon, Supervise, Suppose, Supreme, Surveillance, Suspect, Sustain, Tamper, Testify,
Theft, Tort, Trial, Verdict, Verify, Victim, Vindicate, Violate, Void, Volition, Vouch, Waive, Warden, Warrant,
Whereabouts, Withdraw, Withstand, Witness, Wrong.
Adolphs, S., Schmitt, N., 2003. Lexical coverage of spoken discourse. Applied Linguistics 24, 425e438.
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., 1999. The Longman Grammar of Spoken and Written English. Longman, London.
Biber, D., Conrad, S., Reppen, R., 1998. Corpus Linguistics. Cambridge University Press, Cambridge.
Biber, D., Conrad, S., Cortes, V., 2004. ‘If you look at.’: lexical bundles in university teaching and textbooks. Applied Linguistics 25, 371e405.
Black, H.C., 1990. Definitions of the Terms and Phrases of American and English Jurisprudence, Ancient and Modern. West Publishing Co, St.
Paul, Minn.
Celce-Murcia, M., 2001. Teaching English as a Second or Foreign Language. Heinle & Heinle, Boston: M.A.
DeCarrico, J.S., 2001. Vocabulary learning and teaching. In: Celce-Murcia, M. (Ed.), Teaching English as a Second or Foreign Language. Heinle
& Heinle, pp. 285e299.
Csomay, E., 2002. Variation in academic lectures: Interactivity and level of instruction. In: Reppen, R., Fitzmaurice, S.M., Biber, D. (Eds.), Using
Corpora to Explore Linguistic Variation. John Benjamins, Amsterdam/Philadelphia, pp. 203e226.
Csomay, E., 2007. A corpus-based look at linguistic variation in classroom interaction: Teacher talk versus student talk in American University
classes. Journal of English for Academic Purposes 6, 336e355.
Csomay, E., 2010. Lexical dispersion and discourse structure: A corpus-based study. Paper presented at the American Association of Applied
Linguists (AAAL) Conference. Atlanta, Georgia. pp. 6–9.
Csomay, E., Cortes, V., 2010. Lexical bundle distribution in university classroom talk. In: Gries, S.T., Wulff, S., Davies, M. (Eds.), Corpus
linguistic applications: current studies, new directions. Amsterdam, Rodopi, pp. 153e168.
Dudley Evans, T., St John, M.J., 1998. Developments in English for Specific Purposes. Cambridge University Press, Cambridge.
Gardner, D., 2007. Validating the construct of word in applied corpus-based vocabulary research: a critical survey. Applied Linguistics 28,
Johns, A., 2001. English for specific purposes: tailoring courses to student needs and to the outside world. In: Celce-Murcia, M. (Ed.), Teaching
English as a Second or Foreign Language. Heinle & Heinle, pp. 285e299.
Honorable mentions, 2008. Retrieved May 19, 2011, from http://www.abajournal.com/magazine/article/honorable_mentions.
Lam, J., 2001. A Study of Semi-Technical Vocabulary in Computer Science Texts, with Special Reference to ESP Teaching and Lexicography.
Language Centre, Hong Kong University of Science and Technology, Hong Kong.
Martı ´nez, I., Beck, S., Panza, C., 2009. Academic vocabulary in agriculture research articles: a corpus-based study. English for Specific Purposes
2, 183e198.
Mudraya, O., 2006. Engineering English: a lexical frequency instructional model. English for Specific Purposes 25, 235e256.
Nation, P., 2001. Learning Vocabulary in Another Language. Cambridge University Press, Cambridge.
Nation, P., 2005. Range and frequency programs. Retrieved from. http://www.victoria.ac.nz/lals/resources/range.aspx.
O’Keeffe, A., McCarthy, M., Carter, R., 2007. From Corpus to Classroom. Cambridge University Press, Cambridge.
Samraj, B., 2002. Introductions in research articles: variations across disciplines. English for Specific Purposes 21, 1e17.
Samraj, B., 2008. A discourse analysis of master’s theses across disciplines with a focus on introductions. Journal of English for Academic
Purposes 7, 55e67.
Schmitt, N., 2008. Instructed second language vocabulary learning. Language Teaching Research 12, 329e363.
Stahl, S.A., 1999. Vocabulary Development. Brookline Books.
Tiersma, P.M., 1999. Legal Language. University of Chicago Press, Chicago.
Wang Liang, S., Gee, G., 2008. Establishment of a medical academic word list. English for Specific Purposes 27, 442e458.
Webb, S., 2010. A corpus driven study of the potential for vocabulary learning through watching movies. International Journal of Corpus
Linguistics 15, 497e519.
Webb, S., Rodgers, H., 2009. The lexical coverage of movies. Applied Linguistics 30, 407e427.
315 E. Csomay, M. Petrovic /
System 40 (2012) 305e315