Introductory It

A comparative study of introductory it
in research articles across eight disciplines
Matthew Peacock
City University of Hong Kong
This paper presents a corpus-based analysis of the form, function, and frequency
of introductory it plus that-clause and to-clause complementation. These struc-
tures are said to be particularly important in academic English. We examined
disciplinary variation in 288 research articles across eight disciplines, four
science and four non-science — Biology, Chemistry, Physics, Environmental Sci-
ence, Business, Language and Linguistics, Law, and Public and Social Adminis-
tration. We examined all 6,008 occurrences of it, recorded 110 different forms of
the patterns, and investigated function. Results indicate that Biology, Chemistry,
and Environmental Science writers used the structures significantly less fre-
quently than non-science writers, while Law used them more often. Numerous
other statistically significant disciplinary differences were found. Conclusions are
that the structure performs the important functions of evaluating the likelihood
or validity of propositions, evaluating or commenting on the difficulty of proce-
dures and evaluating or commenting on the necessity of procedures.
Keywords: corpus analysis, introductory it, evaluation, research articles,

disciplinary variation, research writing
1. Introduction
This article presents a corpus-based analysis of introductory it plus that-clause

and to-clause complementation, specifically the patterns Biber et al. (1999) term
it v-link ADJ that and it v-link ADJ to-inf. Examples of these two patterns are it is
possible that and it is important to respectively. Our corpus was 288 research ar-
ticles (RAs) from eight disciplines — Biology, Chemistry, Physics, Environmental
Science, Business, Language and Linguistics, Law, and Public and Social Admin-
istration. Our aims were to find and list all the adjectives used in the structures,
determine the function and frequency of each form of the structures, and compare
the use of the structures across the eight disciplines.
International Journal of Corpus Linguistics 16:1 (2011), 72–100. doi 10.1075/ijcl.16.1.04pea

issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company
Introductory it in research articles across eight disciplines 73
Williams (2002: 45) calls the structures vital to discourse communities, adding

that each discourse community develops a code for communication. This code is
one of their defining characteristics, and its uniqueness lies in the use of patterns
or collocations rather than individual words. We chose the RA for this research
because of its significance for the spread of knowledge. RAs have been called the
key medium for legitimating findings and disciplines (Hyland 1996: 252), and the
preferred genre that academic discourse communities choose for communication
(Williams 1998: 153). The language of these communities distinguishes and de-
fines them, and corpus-based methods are effective tools for research in the area
(Williams 2002: 45).
Section 2 will examine previous research on the topic. Section 3 will explain
the aims of this research and also describe the corpus and how it was investigated.
Section 4 will present our results, and will be followed by a discussion in Section 5,
and conclusions in Section 6.
2. The form, frequency, and function of introductory it plus that-clause

and to-clause complementation
Few previous corpus studies have investigated our target patterns. Biber et al.
(1999) examined them in the Longman Spoken and Written English (LSWE) cor-
pus of news articles, academic prose (book extracts plus RAs), fiction, and conver-
sation. Groom (2005), in what he calls an exploratory study, investigated them in
four corpora: 3.1 million words from History RAs, 3.2 million words from History
book reviews, 4 million words from Literary Criticism RAs, and 1 million words
from Literary Criticism book reviews. Römer (2009: 141), in a study focussing on
how vocabulary and grammar are interrelated, also presents a case study examin-
ing introductory it across two non-native speaker corpora (L1 German), under-
graduate essays (234,000 words) and undergraduate plus graduate term papers
and essays (200,000 words), and 90 RAs (from Linguistics, Philosophy, and Social
Sciences — 611,000 words). However, her count goes beyond the patterns it v-link
ADJ that and it v-link ADJ to-inf and includes others such as it is a beginning of one
and an end of another (ibid. 150). Table 1 shows the empirical findings from these
three studies. As Groom (2005) provides raw numbers and not frequency per mil-
lion words for the nine individual forms, we calculated normalized frequencies to
assist comparisons — these numbers need to be treated with caution due to this
normalizing. We will first show results for it v-link ADJ that, followed by it v-link
ADJ to-inf:
Regarding frequency, it can be seen that the patterns are more common in
academic prose, especially RAs, than in other registers. The highest frequency for
74 Matthew Peacock
Table 1. Summary of previous empirical findings

Structure Author Corpus Frequency Function
LSWE news articles; “Moderately 1. Express degree of
academic prose common” certainty
Biber et
it v-link ADJ that 2. Express affective states
al. 1999 LSWE fiction; con-
“Rare” 3. Evaluate situations /
versation events
Groom History RAs 149 p.m.w. Assess the likelihood or
it is ADJ that1
2005 LitCrit RAs 114 p.m.w. validity of claims
it is clear that
it is (un)likely All “relatively
that Biber et common”, Express degree of cer-
Whole LSWE
it is (im)possible al. 1999 more than 10 tainty
that p.m.w.
it is true that
History RAs 38 p.m.w.
it is clear that
LitCrit RAs 31 p.m.w.
it is possible that
it is true that
it is likely that
Groom History RAs 7 p.m.w. Assess the likelihood or
it is evident that
2005 LitCrit RAs 4 p.m.w. validity of claims
it is obvious that
it is well-known History RAs 4 p.m.w.
that LitCrit RAs 3 p.m.w.
it is certain that
it is impossible History RAs 4 p.m.w.
that LitCrit RAs 1 p.m.w.
Table 1. (continued)
Structure Author Corpus Frequency Function
“More com-
it v-link ADJ to Whole LSWE mon” in aca-
demic prose
it is (im)possible All “relatively Report the attitude of the
to common”, writer, e.g. likelihood,
Biber et Whole LSWE
it is difficult to more than 10 importance, necessity,
al. 1999
p.m.w. towards the proposition
it is hard to
in the to- clause
it is difficult to 52 p.m.w.
LSWE academic
it is easy to 26 p.m.w.
prose
it is possible to 24 p.m.w.
History book Assess whether some-
Groom 102 p.m.w.
it is ADJ to reviews thing is relatively easy or
2005
LitCrit book reviews 143 p.m.w. difficult to do
Note. p.m.w. = per million words; LitCrit = Literary Criticism; SocSci = Social Sciences
it v‑link ADJ that is in Groom’s History RAs, followed by Literary Criticism RAs,
then LSWE news articles and academic prose, then LSWE fiction and conversa-
tion. The highest frequency for it v-link ADJ to is in Literary Criticism book re-
views, followed by History book reviews and LSWE academic prose. Regarding
functions, we see that four different functions have been listed for it v-link ADJ
that. This is not surprising as the writers looked at different registers. But we can
say that the main function seems to be commenting on or evaluating the proposi-
tion which follows the structure. A narrower explanation of function is that the
structure is used to buttress claims — this can be seen by looking in isolation at
most of the more common adjectives found: clear, likely, true, evident, obvious,
well-known, and certain. Turning to it v-link ADJ to, we see that it also functions
to express the writer’s attitude to the proposition following the structure. Look-
ing in isolation at Biber et al.’s (1999) four most common adjectives, (im)possible,
difficult, hard and easy, leads to the tentative conclusion that assessing whether
something is relatively easy or difficult to do (Groom 2005) is an important func-
tion. It seems that there is some agreement on the function of these structures, and
on the fact that they are more common in academic writing than in other registers.
But we do not know much about their use across different disciplines or about the
actual adjectives employed (apart from the 12 or so listed) and their frequency.
However, there is further though partial information in the two studies. Biber
et al. (1999: 673) list a further 72 less common adjectives for it v-link ADJ that, and
a further 38 less common adjectives for it v-link ADJ to-inf (ibid.: 720), though
76 Matthew Peacock
without any information on frequency. Groom (2005) adds that the patterns con-
tribute to textual cohesion and reports that genre is the controlling variable, as the
two genres have different rhetorical purposes: RAs to make claims, book reviews
to evaluate the claims in books. He looked only at is, seems and would be and
lists few individual patterns. Neither of the two studies aimed to test the statistical
significance of results. However, both studies are valuable for their corpus-based
investigation of our target structures.
Other authors describe the function of introductory it in clauses such as it
has been shown and it is also suggested. Rodman (1991: 22) looked at eight Sci-
ence / Engineering RAs and four textbook chapters and says it provides author
comment. Charles (2000: 48) says it functions to evaluate situations and confer a
“positive aura” and “appropriate academic persona” on writers, and to make evalu-
ations appear more objective. Charles (2006) in a study of 16 M Phil theses com-
ments that introductory it introduces propositions, evaluates them, and obscures
the writer’s identity. Kaltenbock (2003) says its function is to create expectancy
and help readers anticipate a comment. Finally, Hewings & Hewings (2001, 2002
and 2004) looked at research writers’ use of it in 28 Business RAs from three jour-
nals and 15 MBA dissertations. They say it is used to disguise the personal and
subjective nature of evaluations and give an impression of objective, impersonal
knowledge, and to allow writers to appear neutral about propositions. They main-
tain that all this makes the structure particularly important in academic English.2
The comments of Charles (2000, 2006) and of Hewings & Hewings (2001, 2002
and 2004) about introductory it seem particularly useful in describing its function,
and there is some agreement among these writers.
We will now turn to the rationale for the present research. It is claimed that re-
search writers use it / introductory it to give an impression of objective, impersonal
knowledge and appear neutral about claims, and that this makes the structure par-
ticularly important in academic English. If this is correct, and as claims appear to
be such an important part of research writing, e.g. RAs, the proposals (of Groom
2005, Biber et al. 1999 and others) that our target patterns evaluate claims mean
they may be a particularly important part of research writing including RAs. This
makes them worth investigating further. Groom’s 2005 research is valuable — he
suggests that disciplines can be differentiated by their “preferred phraseologies”,
and that this notion is well worth examining on a larger scale. Taylor & Chen
(1991: 332) make a similar comment, saying that scientific discourse is heavily re-
stricted by disciplinary cultures, and that much more attention must be given to
discipline differences. However, Groom (2005) looked at only two (non-science)
disciplines and calls for more research across a wider range of disciplines, saying
the findings will be of great interest to researchers, English for Academic Purposes
practitioners, and applied linguists. Also, he looked only at the phrase when it
contained the linking verbs is, seems, and would be. We realized that there must
be other forms and a preliminary examination of our corpus found many, e.g. it
becomes / has become clear that, it may be (more) difficult to, it often remains dif-
ficult to, this makes it necessary to, it might be useful to; however, the focus of this
study is on the adjectives in the phrase in question, not the verbs. Finally, Groom
(2005) lists few individual patterns and does not test the statistical significance of
his results.
If our target patterns are important, they must be acquired by aspiring re-
search writers, many of whom may be non-native speakers. Bhatia (2000: 147) says
a strong justification for genre research is that it informs the teaching of research
writing, especially to writers who wish to join academic discourse communities.
Ahmad (1997: 273) adds that writers may fail to get published when their work is
written in an incorrect rhetorical style. Several writers, e.g. Oakey (2002), Hunston
(2002), Hewings & Hewings (2002: 374, 380–381; 2004) and Wray (2002: 143–147,
201–210), say that using introductory it, it clauses and other patterns may be more
difficult for non-native speakers (NNS), though these assertions are not usually
based on corpus analysis.3 Paltridge (1993: 175) adds that non-native speakers
need help in joining the discourse community of international academic research.
Römer (2009) presents results on NNS use of introductory it, although beyond the
it v-link ADJ that and it v-link ADJ to-inf patterns. While the structure is not the
focus of her study, she does note that NNS often use it to “express strong emotions
and personal opinions” (ibid.: 156) using terms such as amazing and unbelievable,
which are not found in her corpus of RAs.
There has been little research into disciplinary variation regarding our target
patterns and we suggest the area is increasingly important due to the fast-growing
numbers of research writers around the world. We propose the area has not re-
ceived the attention it warrants and that further corpus-based research is needed,
to assess variation across a number of science and non-science disciplines. The
results should tell us much more about the nature of RAs, and help teachers of
research writing inform learners of appropriate patterns. Our research can also
throw light on the above-mentioned suggestions from previous literature (e.g. Wil-
liams 2002, Charles 2000, Kaltenbock 2003, Groom 2005 and Taylor & Chen 1991).
3. Research method
3.1 Research aims
Our research investigated the form, function, and frequency of the introductory it
patterns it v-link adj that and it v-link adj to-inf in 288 RAs in eight disciplines —
78 Matthew Peacock
Biology, Chemistry, Physics, Environmental Science, Business, Language and Lin-

guistics, Law, and Public and Social Administration. These eight disciplines were
selected because they represent a range of subjects and also contain large numbers
of research writers around the world. In 2006 there were 31,337 students enrolled
on advanced research degrees in these subjects (Organisation for Economic Co-
Operation and Development, 2008 report) in the UK and USA alone.
We aimed to compare the use of the structures across the eight disciplines. The
specific aims were to do the following within our corpus:
1. Find and list all the forms of it v-link adj that and it v-link adj to-inf. That is,
locate and make a list of all adjectives used in the structures.
2. Investigate the function of all forms of the patterns. That is, examine how each
occurrence of each structure operated within its context — what it tried to
achieve.
3. Investigate the frequency of all forms of the patterns. That is, determine the
rate of occurrence of every form of both structures.
4. Investigate disciplinary variation in the patterns. That is, explain how the
function and frequency of each form and both patterns varied across the eight
disciplines, and between science and non-science writers.
3.2 The RA corpus
Our corpus consisted of 288 published RAs, 36 from each discipline. The total cor-
pus length was 1,613,871 words, so the average RA length was 5,604 words. Table 2
shows the lengths of the disciplinary corpora:
Table 2. Lengths of disciplinary corpora

Discipline Number of RAs Word length
Biology 36 211,338
Chemistry 36 140,116
Environmental Science 36 177,749
Physics 36 183,683
Total for the four sciences 144 712,886
Business 36 215,165
Language and Linguistics 36 237,834
Law 36 226,606
Public and Social Administration 36 221,380
Total for the four non-sciences 144 900,985
Total for all disciplines 288 1,613,871
Six leading refereed journals were selected from each discipline. Four were se-
lected from Law as fewer Law journals contain empirical data-driven RAs. We vis-
ited the relevant academic departments and asked two sources from each to name
six principal journals from their field. These 46 journals do represent some variation
within disciplines, though controlling for this variation was beyond the scope of this
study. For example, within Language and Linguistics there are differences in the area
of research covered by the journals System and Language Sciences. Another example
is Business and the different research represented by International Business Review
and Journal of Business Venturing; and this must also be true for the other disciplines.
Finally, we note that our informants told us that Chemistry and Biology is a Biology
journal, and Chemical Physics is a physics journal — neither are Chemistry journals.
Six RAs published in the period 2000–2003 were randomly chosen from each
journal by giving each RA a number and drawing the numbers from a box. Only
empirical data-driven RAs with the Introduction — Method — Results — Dis-
cussion format were chosen, because we agree with Hyland (1998: 97) that this is
an important genre: RAs consisting of essays or discussions, or RAs authored by
writers already chosen, were not used. We suggest that the disciplinary corpora are
sufficiently representative because of their size and because of the use of discipline
sources to choose journals.
3.3 Investigating the corpus
Analysis was done in the following steps (steps 3, 4 and 5 will be further explained
below):
1. All occurrences of it in the corpus were found using WordSmith Tools 4.0
(Scott 2004). There were 6,008.
2. All occurrences of it were manually examined.
3. All forms of the target patterns were recorded.
4. The function of every occurrence was individually checked by reading the rel-
evant sentence and surrounding sentences.
5. The frequency of all forms was calculated.
Steps 3 and 5: (a) ‘Form’ refers to the occurrence of an adjective, either in it v-link
adj that or in it v-link adj to-inf, regardless of the verb employed. E.g. it is possible
that and it seems possible that count as one form, and it is possible to counts as a
second form. (b) The corpus was split into disciplinary corpora at times during
these steps in order to check discipline variations.
Step 4: (a) ‘Function’ means “operates” or “acts”. For example, in the phrase
It is likely that this pattern is reproduced throughout the cerebrocerebellar system,
the function of It is likely that is to evaluate the likelihood of the proposition this
80 Matthew Peacock
pattern is reproduced throughout the cerebrocerebellar system. Regarding the analy-

sis of function and meaning in large corpora, Meyer (2001: 284) says analysts need
to be cautious when researching unfamiliar disciplines, as in the present study. (b)
Manual checking of the function of every occurrence is vital for two reasons: First,
to find forms that will not be found with a keyword search. For example, a key-
word search for it is clear that would not have found the following forms from our
corpus: it is also / quite / thus clear that. Other forms in our corpus which would
not have been found with a keyword search are it seems quite likely that, it may be /
does not seem / might seem necessary to, and it is obviously / rather / very difficult to.
Second, to check whether each instance matches the target function or not. Many
authors stress the importance of doing this — e.g. we can get frequency from sta-
tistical analysis but context is vital in understanding function (Tognini-Bonelli
2001: 272; 2004: 11–12); it is vital to examine each instance to confirm its relevance
(Read & Nation 2004: 30–34); a “microscopic study” must be carried out before
categorisation can be done (Williams 2002: 60); we have to look at quantitative,
lexical, and pragmatic features when identifying discontinuous phrases in corpora
(Oakey 2006); a purely statistical approach has drawbacks (Hunter & Smith 2006).
Two evaluators were involved in step 4: myself and a lecturer with a Masters in
Applied Linguistics from another local institution. In order to be able to measure
inter-rater agreement, the second coder independently carried out step 4 — that
is, evaluated the function of every occurrence of each form. I reassessed the func-
tion of every occurrence six weeks after the initial analysis to measure intra-rater
agreement (alone, as the second coder was not available). Inter-rater and intra-
rater agreement were both 100%. It is necessary to have a high degree of inter-rater
reliability in research into collocation (Read & Nation 2004: 30–34).
Statistical significance was set at p < .05 and tested with the chi-squared test,
accessed through the log-likelihood calculator (Rayson et al. 2004).4
4. Results
Results will be presented here, and discussed in the next section. We will give an
overview of our results for form, followed by the results for function and frequen-
cy. “Significantly more/less frequently” refers to p < .05 tested with the chi-squared
test. Some representative examples of patterns from disciplinary corpora are given
below in Sections 4.2 and 5. Examples of all the patterns used significantly more
frequently by the sciences, by the non-sciences, or by any one discipline are given
in Appendix 2: these examples are numbered (1) to (24) and arranged with refer-
ence to the relevant table (see Section 4.3 below for a description of the seven
tables) in which the frequency results are presented.
4.1 Overview — Form
110 different forms (i.e. structures controlled by an adjective) were found, with 78
of these appearing only once or twice in the entire corpus. There were 51 it v-link
adj that forms and 59 it v-link adj to-inf forms.
4.2 Function
Every occurrence of it v-link adj that functioned to evaluate propositions, that is,
to assess the likelihood/validity of propositions. To evaluate the likelihood/valid-
ity of a proposition means to assess how likely it is to be true, and/or to assess
how sound or well-founded in fact it is. Careful checking and rechecking of every
single occurrence, by both raters, confirmed this finding. Corpus examples are It
is possible that these slightly older individuals may be involved in riskier lifestyles
(Law), It is clear that the problem is a thermoelastic one (Physics) and It is obvious
that the Scr-PTS in combination with the glycerol transport and catabolic system is
an ideal model system (Biology).
The pattern it v-link adj to-inf, with the single exception of it is interesting to,
had two different functions: evaluating/commenting on the difficulty of proce-
dures, and evaluating/commenting on the need for or the necessity of procedures.
To evaluate the difficulty of a procedure means to assess the relative ease or dif-
ficulty of a procedure, and to evaluate the need for/necessity of a procedure means
to assess how necessary it is. Thorough checking and rechecking confirmed that
every occurrence fell into one or the other category. Examples of “difficulty” are it
is difficult to detect the true quality of a product (Business) and it is possible to cal-
culate the excess free energy of mixing (Chemistry). Examples of “necessity” are it is
important to determine whether clouds are present or not (Environmental Science)
and During the search for these optimum values, it is useful to change one parameter
of the gradiometer device while the others are kept constant (Physics).
It is interesting to had two functions. The first was to comment on the need for/
necessity of a further procedure, that is, to say that further research will be useful.
Examples are To probe the mechanism of this effect, it will be interesting to exam-
ine the influence of CYC-B/Cdk1 on putative regulators of chromosome congression
such as CENP-E (Biology) and It would be interesting to design a study to attempt
to bridge qualitative analysis with appropriate quantitative data (Law). The second
function was to comment on a matter of interest — to point out that something
is worthy of note. Examples are It is interesting to note the unusually low cascad-
ing contribution in ice VIII (Physics) and Nevertheless, it is interesting to notice the
multi-dimensionality of our MP construct (Business).
82 Matthew Peacock
4.3 Frequency
Tables 3 and 4 show results for it v-link adj that. We see in Table 3 that science
writers used the pattern significantly less frequently than non-science writers. The
difference between sciences and non-sciences is even greater if Physics is exclud-
ed: the frequency in Biology, Chemistry and Environmental Science together was
183 per million words, and Physics writers used the pattern significantly more
frequently than writers in the other three sciences (the latter finding is not shown
in Table 3). Among individual disciplines, Language and Linguistics used it sig-
nificantly more frequently, and Biology and Environmental Science used it signifi-
cantly less. The first column in Table 4 shows the most common forms in ranked
order. The two most common were it is possible that and it is clear that. Columns 2
and 3 compare sciences and non-sciences: the non-sciences used it is possible that,
it is important that and it is worth noting/worthwhile/noteworthy that significantly
more frequently. Science writers used it is obvious that significantly more often.
We also compared the distribution of all forms across individual disciplines, and
across individual journals within each discipline. There were no significant differ-
ences.
Table 3. it v-link adj that — disciplinary differences (frequencies per million words)
Discipline Discipline
Biology 170* Business 297
Chemistry 228 Language and Linguistics 362*
Environmental Science 163* Law 313
Physics 310 Public and Social Administration 267
FOUR SCIENCES 217.75* FOUR NON-SCIENCES 309.75
ALL DISCIPLINES 263.75
Note. * = statistically significant difference
Table 4. it v-link adj that — the most common forms, ranked. Science/non-science dif-
ferences (frequencies per million words)
Item Science Non-science
it is possible that 24 53*
it is clear that 36 40
it is likely that 20 34
it is interesting that 22 20
it is important that 11 26*
it is reasonable that 17 16
Table 4. (continued)
it is worth noting/worthwhile/noteworthy that 6 20*
it is unlikely that 10 13
it is obvious that 18* 4
it is evident that 14 7
it is apparent that 8 12
it is expected that 13 7
it is well known that 10 4
it is true that 3 9
Tables 5 and 6 show results for it v-link adj to-inf “difficulty”:
Table 5. it v-link adj to-inf “difficulty” — disciplinary differences (frequencies per million
words)
Chemistry 93 Language and Linguistics 198
Environmental Science 124 Law 203*
FOUR SCIENCES 119* FOUR NON-SCIENCES 165
ALL DISCIPLINES 145
Table 6. it v-link adj to-inf “difficulty” — the most common forms, ranked. Science/non-
science differences (frequencies per million words)
it is possible to 53* 37
it is difficult to 20 43*
it is not possible (impossible) to 17 19
it is hard to 0 13*
it is easy to 1 11*
Table 5 shows that science writers used this pattern significantly less frequently
than non-science writers. As with it v-link adj that, the difference between sciences
and non-sciences is even greater if Physics is excluded: the frequency in Biology,
84 Matthew Peacock
Chemistry and Environmental Science together was 94 per million words, and
Physics writers used the pattern significantly more frequently than writers in the
other three sciences. Among individual disciplines, Law used it significantly more
frequently, and Biology significantly less. Table 6 shows the most common forms
in ranked order. The two most common were it is possible to and it is difficult to.
Science writers used it is possible to significantly more frequently. The non-scienc-
es used it is difficult to, it is hard to and it is easy to significantly more — in fact sci-
ence writers hardly ever used the latter two forms. There were no significant differ-
ences by individual discipline or by individual journal regarding individual forms.
Tables 7 and 8 show results for it v-link adj to-inf “necessity”:
Table 7. it v-link adj to-inf “necessity” — disciplinary differences (frequencies per million
words)
Chemistry 21* Language and Linguistics 88
Environmental Science 90 Law 278*
FOUR SCIENCES 69* FOUR NON-SCIENCES 147
ALL DISCIPLINES 112
Table 8. it v-link adj to-inf “necessity” — the most common forms, ranked. Science/non-
science differences (frequencies per million words)
it is important to 17 44*
it is necessary to 21 20
it is useful to 8 27*
it is optimal to 0 20*
We see in Table 7 that science writers used this pattern significantly less frequently
than non-science writers. It looks as though Physics writers used it more than the
other three sciences, but there was no significant difference. Among individual
disciplines, Law used it significantly more frequently — in fact more than twice
as often as any other discipline. So the overall frequency of this pattern is largely
due to the influence of this one discipline. Biology and Chemistry, on the other
hand, used it significantly less often. Table 8 shows the most common forms in
ranked order. The two most common were it is important to and it is necessary to.
Non-science writers used it is important to and it is useful to significantly more

frequently. There were no significant differences by individual discipline or by
journal.
Table 9 shows results for it v-link interesting to-inf :
Table 9. it v-link interesting to-inf — science/non-science differences (frequencies per

million words)
it is interesting to
“recommended analysis/procedure” 24 16
it is interesting to
“matter of interest” 13 13
We see in Table 9 that the first function, commenting on the need for further pro-
cedures, is more common. It made up 61% of occurrences. There were no signifi-
cant differences between sciences and non-sciences, by individual discipline, or
by journal.
5. Discussion
Our results differ from Groom (2005): we found it v-link adj that to be more than
twice as common as he did (though he looked only at is, seems and would be, and
investigated different disciplines). Regarding function, he says the structure as-
sesses the “likelihood or validity” of claims, while Biber et al. (1999) say it express-
es degrees of certainty, affective states, or evaluates situations/events. Our findings
were similar to Groom (2005) in that we concluded it evaluates the likelihood/
validity of propositions, but differed slightly from Biber et al. (1999) — though this
is not surprising as they looked at a different corpus, the whole LSWE. Regarding
it v-link adj to-inf, we found it much more common in RAs than Groom (2005) did
in book reviews. Regarding function, Biber et al. (1999) say it is associated with
likelihood, importance and necessity and we found (with the single exception of it
is interesting to) that it has two different functions — evaluating/commenting on
either the difficulty of procedures, or the necessity of procedures.
It is evident that overall the highest use of it v-link adj that and it v-link adj
to-inf “difficulty” was in Law and in Language and Linguistics and the lowest in
Biology, Chemistry and Environmental Science. For it v-link adj to-inf “necessity”,
the highest use was also in Law and the lowest in Biology and in Chemistry. We
suggest that among our more useful findings are that science writers use the struc-
tures significantly less frequently than non-science writers and that Law used two
86 Matthew Peacock
patterns significantly more frequently. Regarding the total number of forms used,
we propose that three findings stand out: First, writers used a very wide range of
forms — we found 110 different forms, but no less than 71% (78) of them were
used only once or twice. Second, 93 different forms were used by non-science writ-
ers, but only 52 by science writers: non-science writers employed a wider range of
linguistic resources.
Our findings lead us to suggest that our target patterns do function to intro-
duce and evaluate propositions, obscure the writer’s identity/give an impression of
neutrality and objective impersonal knowledge, confer a “positive aura” on them,
increase objectivity and help readers anticipate comments. Hewings & Hewings
(2001, 2002, 2004) said introductory it/it is particularly important in academic
English and we propose that, because claims are so important in research writing,
e.g. RAs, and because our target patterns support, evaluate or comment on claims,
they are an important part of research writing. This would also make them part
of the defining code of RAs, as Williams (2002) called them. And we did find that
scientific discourse is somewhat constrained by disciplinary cultures (Taylor &
Chen 1991).
We will try to explain the discipline differences that we found and, in particu-
lar, the significant differences between sciences and non-sciences. Writers in three
of the four sciences used the patterns significantly less frequently than did non-
science writers — it v-link adj that in Biology and in Environmental Science, of it
v-link adj to-inf “difficulty” in Biology and it v-link adj to-inf “necessity” in Biology
and in Chemistry (Tables 3, 5 and 7). Returning to the whole corpus to see where
in the RA these patterns typically appear, we found that all three patterns occur
most often in the discussion and conclusions, though there are a few exceptions.
One form, it is important that, appears more often in the literature review and in
the research methods section. Another, it is useful to, appears most frequently in
research methods. And while it is difficult to and it is important to appear most
often in the discussion and conclusions, they are also used (though less often) in
the literature review and research methods.
We studied the corpus in more detail to try to explain the discipline differ-
ences. Careful examination of the science corpus revealed that Biology, Environ-
mental Science and Chemistry RAs read more like a narrative or descriptive writ-
ing (Physics RAs, on the other hand, had more resemblance to the non-science
disciplines). These science authors seem to just describe their materials (in detail),
their methods and their results — describing the steps they took and their find-
ings, one by one. The following typical extracts from these sciences illustrate this
— all examples are numbered, continuing the sequence from Appendix 2. We have
highlighted materials in boldface, methods in italics and underlined findings.
Biology — from a research methods section:

(25) Arc-5 proteins were isolated by SDS-PAGE and the gel was stained with
colloidal Coomassie blue […] The dried gel pieces were reswollen in 700
l 5 mM Tris-HCl (pH 8.0) containing 2% (w/w) trypsin/protein or 2%
Staphylococcus aureus (V8) protease/protein, and left to stand at 37°C
for 12 h [18][…] The peptide masses were matched with the theoretical
arcelin-5 peptide masses using the program Peptidesort from GCG
Molecular Sequence Analysis Software Package.
Biology — from a discussion section:

(26) Gel supershift analyses and measurements of JunD protein and jun D
mRNA [17] demonstrate that while JunD is the major Jun AP-1 component
that binds the DNA in P+ cells, its participation in P- cells is modest due to a
constitutive shortage of junD mRNA and protein expression in P- cells.
Environmental Science — from a research methods section:

(27) A 3 m³ tethered balloon was flown at elevations up to 250 m above ground
level (AGL) at the central location. Air samples were collected on adsorbent
tubes filled with a combination of Carbosieve SIII and Carbotrap (Supelco
Inc., Bellefonte, PA) for 30 min using automated air samplers […] The
hydrocarbons collected with the cartridges were concentrated using cryo-
focusing techniques and analyzed on a gas chromatograph with flame
ionization detectors (GC/FID).
Environmental Science — from a discussion section:

(28) From the conducted experimental tests, it can be seen that of the two designs
tested system A outperforms system B in terms of the retention of stored
thermal energy with a vessel heat-loss coefficient of between 2.42 to 2.97 W/
m² K, respectively.
Chemistry — from a research methods section:

(29) The master sample was synthesized from a nanoprecursor obtained by liquid
mixing in a citrate melt. Annealed Gd2O3 (99.99%, Stanford Materials,
~15 g) was dry-mixed with 300 g of high-purity citric acid monohydrate
(Fluka, <0.02% sulfate ash) and dissolved upon melting assisted by some 10
mL of water added at the bottom of the beaker.
Chemistry — from a discussion section:

(30) The cordierite precursor powder densified completely before crystallizing
into cordierite at 1000°C. This shows the high potentiality of this type of
88 Matthew Peacock
organic gel for the synthesis of silicates […] The process is very efficient,
provides ultrafine powders at relatively low temperatures, with a high level of
chemical homogeneity, especially when associated with the citrate process.
It is valuable for a very great number of compounds, including silicates and
aluminosilicates. It is well adapted to combinatorial chemistry.
It can be seen that the descriptions of materials and methods, in particular, are
dense and very detailed, and are presented in a narrative or descriptive style. Pre-
sumably Biology, Environmental Science and Chemistry authors aim to show the
order of events, or rather we can say that this order is sufficient for readers. Per-
haps readers do not need to be explicitly given certain arguments — evaluation of
propositions (how likely they are to be true, how sound or well-founded in fact
they are), comments on the relative ease or difficulty of procedures, or comments
on the need for/necessity of procedures. In other words, authors in these three
sciences tend to let readers evaluate propositions by themselves. We could also
say they seem not to develop certain arguments in the same way as other authors.
The sciences did use just two individual forms, it is obvious that and it is pos-
sible to, significantly more frequently than non-science writers (see Tables 4 and
6). On closer examination, we found that they often use it is obvious that to justify
their research methods or comment on their results and it is possible to to justify
their research methods, as the following typical examples show:
Biology:
(31) It is obvious that the Scr-PTS in combination with the glycerol transport and
catabolic system is an ideal model system to study in detail the so-called
‘glucose’-effects such as catabolite repression and inducer exclusion.
Chemistry:
(32) It is obvious that the potential of the arrest decreases linearly with increasing
of pH of the solution according to the relation: Ear = E0 − 0.059 pH.
Chemistry:
(33) The experimental results reported in this paper show that bBy polymerising
a mixture of amino acids in presence of a molecule able to act as a template,
it is possible to obtain a mixture of peptides that show selective molecular
recognition properties towards the template itself.
Examples (31) and (33) demonstrate the use of it is obvious that and it is possible
to, respectively, to justify research methods. Example (32) shows the use of it is
possible to to comment on results.
Regarding the non-sciences, we examined discipline differences with whole

patterns. First, Language and Linguistics authors used it v-link adj that signifi-
cantly more frequently than other non-science writers did (Table 3). We can say it
is significantly more common, and presumably more necessary or important, for
them to evaluate propositions, i.e. assess their likelihood/validity. A more detailed
examination of the corpus showed that Language and Linguistics authors seem to
construct long complex chains of discussion and argument, as the following two
representative examples show. Example (34), from an RA introduction, constructs
an argument using however and thus (boldface added), between evaluated propo-
sitions:
(34) It is apparent from previous studies that the F0 band plays an important
role in transmission of social status and dominance information and that
elimination of the F0 leads to lessened perceived quality of conversation.
However, other nonverbal channels, such as the visual, transmit
accommodational social status and dominance information as well, and
claims for primacy by the vocal channel in serving this function have not
been sufficiently supported. Thus, in order to increase knowledge about
F0 function, this article reports on tests of vocal channel primacy in
transmission of social status and dominance accommodation information
[…] it is possible that a restricted channel can even enhance conversation
quality in some forms of interaction.
Example (35), from a discussion section, constructs an argument using thus and
hence, between evaluated propositions:
(35) It is perhaps not so surprising that the deficits are restricted in this way.
Consider first the finding that none of the subjects were impaired on the test
of NP syntax in Experiment 3. Previous research has shown that knowledge
of canonical English word order is extremely resistant to disruption by
brain damage [53]. Thus the good performance of the subjects on the
syntax test is not unusual and actually fits into a more general pattern. Now
consider the finding that none of the subjects were impaired on the test of
grammatically irrelevant perceptual and conceptual features of adjectives in
Experiment 2 […] It is likely that these features are neurally implemented in
widely distributed brain regions that are specialized for particular types of
information content [54]. Hence it would be rather strange for a single focal
lesion to impair a very large set of these features.
Second, Law authors used it v-link adj to-inf “difficulty” significantly more fre-
quently than other non-science writers did (Table 5). Apparently they find it more
necessary and important to evaluate/comment on the difficulty of procedures. Re-
examination of the Law corpus revealed that RAs have lists of (and long discussions
90 Matthew Peacock
and explanations of) a lot of difficult procedures, as the following typical examples
show. Example (36), from a research methods section, exemplifies the use of the
pattern to support these functions:
(36) Data are limited, and cross-country measures of abatement costs are
unavailable. Time-series methods focusing on the United States are limited
by the difficulty of attributing capital flows to particular domestic policies
[…] Compliance deadlines are often phased in, and federal levels of
regulatory enforcement vary among administrations. Further, it is impossible
to aggregate over regulatory enforcement, which varies among states. State
governments carry the work on air, water, and solid waste controls (with the
exception of the Superfund), and they enforce product liability law in their
courts. Finally, if companies’ responses to regulatory costs are cumulative,
based on their assessment and projection of costs associated with a number
of regulatory programs, it would be difficult to identify which regulatory
event triggered any particular movement of capital abroad.
The next illustrative example is from a discussion section:

(37) Another methodological difficulty concerns the use of the beer tax as a
proxy for all alcohol taxes. In principle, it would be desirable to include
measures of wine and liquor taxes as well, but including all three beverage
taxes may lead to intractable multicollinearity problems: Because states tend
to change all three taxes together, it is very difficult to sort out the effect of
one tax from another, and the estimators have such large standard errors
that it is impossible to reject virtually any hypothesis. These difficulties
notwithstanding, focusing only on beer taxes can lead to mistaken policy
interpretations. In particular, the estimated coefficients will tend to overstate
the impact that raising just beer taxes would have on fatalities.
Third, Law authors used it v-link adj to-inf “necessity” significantly more frequent-
ly than other non-science writers did (Table 7) — actually more than twice as often
as any other discipline. Evidently it is very much more necessary and important
for Law authors to evaluate/comment on the need for/necessity of procedures.
Further examination of the corpus revealed that it is common for Law authors to
present long and complex justifications for their research and research procedures.
The following characteristic examples (38)–(40) exemplify this — the first is from
a discussion section:
(38) Victims who participate in restorative justice are at risk of trauma when
offenders are unresponsive to the program. It is therefore important to assess
the likelihood that offenders will be unresponsive, on account of social,
cognitive, and/or psychological traits.
Example 39 is from an introduction:

(39) Limited research exists which examines the nature of these characterizations
on television, and the depictions of racial minorities within these programs
have been largely ignored. For this reason, it is important to systematically
examine the characteristics of such media content, with particular attention
directed at the types of images and behaviors associated with minority
characters.
The final example is from a discussion section:

(40) In any case, these controls transfer wealth from landlords to tenants and
reduce the number of new parks built. Hence, it is important to determine if
the impetus for rent controls comes in response to opportunistic behavior by
park owners or is simply an attempt to capture rising land values on the part
of tenants.
Also, non-science authors used certain individual forms significantly more fre-
quently — it is important that, it is possible that, it is worth noting/worthwhile/ note-
worthy that, it is difficult to, it is hard to and it is easy to. First, they appear to use it
is important that to justify their research methods. Example (41) from a Language
and Linguistics research methods section shows this function:
(41) At the same time the generalizability and external validity of language data
depend on the reliability and validity of the language data obtained via the
different SLA tests and tasks. It is, therefore, important that these data will be
obtained via tests and tasks which are reliable and valid.
Second, they seem to use it is possible that in the discussion section, to evaluate
further explanations of their results. This can be seen in examples (42)–(43), from
Law and Business respectively:
(42) It is possible that agencies involved in task forces do not produce more than
non-task force agencies.
(43) It is possible that family businesses are not aware of opportunities offered by
alternative sources of funding.
Third, they use it is worth noting/worthwhile/noteworthy that in the discussion sec-

tion to underline the importance of certain findings. This can be seen in example
(44) from Law:
(44) It is noteworthy that associate members were least likely to report that their
gang met with other gangs.
92 Matthew Peacock
Fourth, they used it is difficult to, it is hard to and it is easy to to comment on their
research, as can be seen in examples (45)–(46):
Business — from a research methods section:
(45) However, it is well recognised that it is often difficult to analyse qualitative
data in the social sciences where measuring tools are often crude and
behavioural activities and processes are complex and multi-faceted.
Business — from a discussion section:

(46) In other words, it is relatively easy to identify risks, and you may not need
a process for that purpose, but you do need one for the more complicated
tasks of to analyzing, tracking and controlling the project risks [sic].
Finally, non-science authors used it is important to, it is useful to and it is optimal

to significantly more frequently (Table 8). While they do use them to comment
on the necessity of procedures, further examination showed that they are often
used as part of justification of the usefulness of those procedures. Example (47) is
from a Public and Social Administration discussion section, (48) from a Business
research methods section, and (49) from a Law research methods section:
(47) The team concluded that it would be very important to avoid acting as an
omnipotent judge.
(48) It is useful to assess validity in association with making a conceptual
interpretation of the model.
(49) If y is smaller than a certain threshold value, where 0 < E(h)/q, it is optimal
to use regulation as the sole means of controlling risk. Otherwise, it is
optimal to use liability only.
All the above differences, and examples, appear to be common in and typical of
the relevant disciplines. We therefore assume that the patterns revealed may be
accepted within the relevant discipline as recognized ways for writers to evaluate
propositions and the difficulty of or necessity for procedures — i.e. they are disci-
pline norms. And these norms are important. RA writers pursue agreement and
appeal to their audience of journal editors and readers for membership of their
discourse community, and face sanctions, including rejection, if they step outside
disciplinary conventions. A lot depends on publication and peer acceptance, and
the sanctions of rejected papers and claims must motivate authors to follow these
conventions. Hyland (2000: 78) says that writers need to “project an insider ethos”,
and also (1999: 108) that discipline differences reflect rhetorical constraints within
a discipline. Schmitt & Carter (2004: 2) say if a sequence is frequent in a corpus,
this indicates “it is conventionalised” by the discourse community. We hope we

have revealed some of these conventional forms in various disciplinary corpora.
Hyland also claims (1999: 107) that “[p]ublished academic writing is not the
faceless discourse it is often assumed to be” and (1998: 16, 22–23) that writers
need collective agreement that their data represent facts. This notion is also ad-
dressed by others, e.g. Latour & Woolgar (1986: 75–77, 87–88, 198–201), who say
it is important for scientists to present their claims as facts. Latour (1987: 41–43,
64, 109–114) describes how scientists construct facts and Kuhn (1996: 170) adds
that “scientific progress is not quite what we had taken it to be”. Gilbert & Mulkay
(1984: 39–40) also assert that scientists manufacture reality and sometimes pres-
ent speculation as fact (ibid.: 138, 159). We suggest that the discipline norms we
found represent different ways of achieving this.
6. Conclusions
Future studies can conduct informant interviews to investigate why disciplinary

differences occur, and also study other disciplines.
It is hoped that the discipline variations we found can inform the teaching of
research writing. Awareness of these variations is important for teachers, and dis-
cipline-specific teaching of the patterns is advisable. Useful teaching approaches
may be immersion in repeated instances (Stubbs 1995: 380) and a concordance-
based and inductive approach, raising student awareness, noticing the use of cor-
pus extracts, and teaching learning strategies (Jones & Haywood 2004: 295, 276).
We suggest that this research has added to our understanding of disciplinary
conventions in the form, function and frequency of the target structures. We have
also provided information about science/non-science variation in research writing
across a number of disciplines. We hope our findings improve our knowledge of
RAs and of academic writing, particularly our more central findings that certain
science writers use the structures less frequently than non-science writers, that
Law writers use them more frequently, and regarding the important functions of
the patterns revealed in this research.
Notes
1. History and LitCrit RAs together amount to 263 (149 + 114) p.m.w. for the pattern it is ADJ
that, though the frequency figures for specific adjectives (see below, in this table) total 232
p.m.w. This is because Groom found further forms, which are not listed in his paper.
94 Matthew Peacock
2. Similar comments are made by Hyland & Tse (2005) and Stotesbury (2003) about evaluative
that in RA abstracts. Also see Hyland (2008: 53).
3. See also Schmitt & Carter (2004: 6), Shei & Pain (2000: 168), Howarth (1998: 39–41), Ven-
tola (1992: 191), Golebiowski (1999: 238–240), Yakhontova (1997: 105), Vassileva (1997: 217),
Duszak (1994), Wood (2001: 76–7), and Bahns & Eldaw (1993: 101).
4. The log-likelihood calculator is available at http://ucrel.lancs.ac.uk/llwizard.html
References
Ahmad, U. K. 1997. “Research article introductions in Malay: Rhetoric in an emerging research

community”. In A. Duszak (Ed.), Culture and Styles of Academic Discourse. Berlin: Mouton
de Gruyter, 273–301
Bahns, J. & Eldaw, M. 1993. “Should we teach EFL students collocations?”. System, 21 (1), 101–114.
Bhatia, V. K. 2000. “Genres in conflict”. In A. Trosborg (Ed.), Analysing Professional Genres.
Amsterdam/Philadelphia: John Benjamins,147–161.
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken
and Written English. Harlow: Pearson Education.
Charles, M. 2000. “The role of an introductory it pattern in constructing an academic persona”.
In P. Thompson (Ed.), Patterns and Perspectives: Insights into EAP Writing Practice. Read-
ing: CALS, 45–59.
Charles, M. 2006. “Revealing and obscuring the writer’s identity: Evidence from a corpus of
theses”. In R. Kiely, P. Rea-Dickins, H. Woodfield & G. Clibbon (Eds.), Language, Culture
and Identity in Applied Linguistics. London: BAAL/Equinox, 147–161.
Duszak, A. 1994. “Academic discourse and intellectual styles”. Journal of Pragmatics, 21, 291–
313.
Gilbert, G. N. & Mulkay, M. 1984. Opening Pandora’s Box: A Sociological Analysis of Scientists’
Discourse. Cambridge: Cambridge University Press.
Golebiowski, Z. 1999. “Application of Swales’ model in the analysis of research papers by Polish
authors”. IRAL, 37 (3), 231–247.
Groom, N. 2005. “Pattern and meaning across genres and disciplines: An exploratory study”.
Journal of English for Academic Purposes, 4 (3), 257–277.
Hewings, M. & Hewings, A. 2001. “Anticipatory ‘it’ in academic writing: An indicator of disci-
plinary difference and developing disciplinary knowledge”. In M. Hewings (Ed.), Academic
Writing in Context: Implications and Applications. Birmingham: University of Birmingham
Press, 199–214.
Hewings, M. & Hewings, A. 2002. “ ‘It is interesting to note that…’: A comparative study of antic-
ipatory ‘it’ in student and published writing”. English for Specific Purposes, 21 (4), 367–383.
Hewings, M. & Hewings, A. 2004. “Impersonalizing stance: A study of anticipatory ‘it’ in student
and published academic writing”. In C. Coffin, A. Hewings & K. O’Halloran (Eds.), Apply-
ing English Grammar: Functional and Corpus Approaches. London: Arnold, 101–116.
Howarth, P. 1998. “Phraseology and second language proficiency”. Applied Linguistics, 19 (1),
24–44.
Hunston, S. 2002. “Pattern grammar, language teaching, and linguistic variation: Applications of
a corpus-driven grammar”. In R. Reppen, S. M. Fitzmaurice & D. Biber (Eds.), Using Cor-
pora to Explore Linguistic Variation. Amsterdam/Philadelphia: John Benjamins, 167–183.
Hunter, D. & Smith, R. 2006. “Identifying keywords and charting their development: Method-
ological issues in corpus-based historical research”. Paper presented at the BAAL Corpus
Linguistics Special Interest Group, The Open University, Milton Keynes, UK, 28 April 2006.
Hyland, K. 1996. “Talking to the academy: Forms of hedging in science research articles”. Writ-
ten Communication, 13 (2), 251–281.
Hyland, K. 1998. Hedging in Scientific Research Articles. Amsterdam/Philadelphia: John Ben-
jamins.
Hyland, K. 1999. “Disciplinary discourses: Writer stance in research articles”. In C. N. Candlin
& K. Hyland (Eds.), Writing: Texts, Processes and Practices. London: Longman, 99–121.
Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. Harlow, UK:
Longman.
Hyland, K. 2008. “Academic clusters: Text patterning in published and postgraduate writing”.
International Journal of Applied Linguistics, 18 (1), 41–62.
Hyland, K. & Tse, P. 2005. “Hooking the reader: A corpus study of evaluative that in abstracts”.
English for Specific Purposes, 24 (2), 123–139.
Jones, M. & Haywood, S. 2004. “Facilitating the acquisition of formulaic sequences: An explor-
atory study in an EAP context”. In N. Schmitt (Ed.), Formulaic Sequences. Amsterdam/
Philadelphia: John Benjamins, 269–300.
Kaltenbock, G. 2003. “On the syntactic and semantic status of anticipatory it”. English Language
and Linguistics, 7 (2), 235–255.
Kuhn, T. 1996. The Structure of Scientific Revolutions. 3rd ed. Chicago: University of Chicago
Press.
Latour, B. 1987. Science in Action. Milton Keynes: Open University Press.
Latour, B. & Woolgar, S. 1986. Laboratory Life: The Construction of Scientific Facts. 2nd ed. Princ-
eton, NJ: Princeton University Press.
Meyer, I. 2001. “Extracting knowledge-rich contexts for terminography: A conceptual and
methodological framework”. In D. Bourigault, C. Jacquemin & M.-C. L’Homme (Eds.), Re-
cent Advances in Computational Terminology. Amsterdam/Philadelphia: John Benjamins,
279–302.
Oakey, D. 2002. “Formulaic language in English academic writing: A corpus-based study of the
formal and functional variation of a lexical phrase in different academic disciplines”. In R.
Reppen, S. M. Fitzmaurice & D. Biber (Eds.), Using Corpora to Explore Linguistic Variation.
Amsterdam/Philadelphia: John Benjamins, 111–129.
Oakey, D. 2006. “Phraseology beyond the bundle: Finding a way in”. Paper presented at the
BAAL Corpus Linguistics Special Interest Group, The Open University, Milton Keynes, UK,
28 April 2006.
Organisation for Economic Co-Operation and Development. 2008. “Education Statistics”. Paris:
Centre for Educational Research and Innovation.
Paltridge, B. 1993. “Writing up research: A systemic functional perspective”. System, 21 (2),
175–192.
Rayson, P., Berridge, D. & Francis, B. 2004. “Extending the Cochran rule for the comparison of
word frequencies between corpora”. In G. Purnelle, C. Fairon & A. Dister (Eds.), Le Poids
96 Matthew Peacock
des Mots: Proceedings of the 7th International Conference on Statistical Analysis of Textual
Data (JADT 2004), Volume II. Louvain-la-Neuve, Belgium, 10–12 March, 2004. Louvain:
Presses universitaires de Louvain, 926–936.
Read, J. & Nation, P. 2004. “Measurement of formulaic sequences”. In N. Schmitt (Ed.), Formu-
laic Sequences. Amsterdam/Philadelphia: John Benjamins, 23–35.
Rodman, L. 1991. “Anticipatory it in scientific discourse”. Journal of Technical Writing and Com-
munication, 21 (1), 17–27.
Römer, U. 2009. “The inseparability of lexis and grammar: Corpus linguistic perspectives”. An-
nual Review of Cognitive Linguistics, 7 (1), 140–162.
Schmitt, N. & Carter, R. 2004. “Formulaic sequences in action”. In N. Schmitt (Ed.), Formulaic
Sequences. Amsterdam/Philadelphia: John Benjamins, 1–22.
Scott, M. 2004. WordSmith Tools. Version 4. Oxford: Oxford University Press.
Shei, C.-C. & Pain, H. 2000. “An ESL writer’s collocational aid”. Computer Assisted Language
Learning, 13 (2), 167–182.
Stotesbury, H. 2003. “Evaluation in research article abstracts in the narrative and hard sciences”.
Journal of English for Academic Purposes, 2 (4), 327–341.
Stubbs, P. 1995. “Collocations and cultural connotations of common words”. Linguistics and
Education, 7, 379–390.
Taylor, G. & Chen, T. 1991. “Linguistic, cultural, and subcultural issues in contrastive discourse
analysis: Anglo-American and Chinese scientific texts”. Applied Linguistics, 12 (3), 319–336.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benjamins.
Tognini-Bonelli, E. 2004. “Working with corpora: Issues and insights”. In C. Coffin, A. Hewings
& K. O’Halloran (Eds.), Applying English Grammar: Functional and Corpus Approaches.
London: Arnold, 11–24.
Vassileva, I. 1997. “Hedging in English and Bulgarian academic writing”. In A. Duszak (Ed.),
Culture and Styles of Academic Discourse. Berlin: Mouton de Gruyter, 203–221.
Ventola, E. 1992. “Writing scientific English: Overcoming intercultural problems”. International
Journal of Applied Linguistics, 2 (2), 191–220.
Williams, G. C. 1998. “Collocational networks: Interlocking patterns of lexis in a corpus of plant
biology research articles”. International Journal of Corpus Linguistics, 3 (1), 151–171.
Williams, G. 2002. “In search of representativity in specialised corpora: Categorisation through
collocation”. International Journal of Corpus Linguistics, 7 (1), 43–64.
Wood, A. 2001. “International scientific English: The language of research scientists around the
world”. In J. Flowerdew & M. Peacock (Eds.), Research Perspectives on English for Academic
Purposes. Cambridge: Cambridge University Press, 71–83.
Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.
Yakhontova, T. 1997. “The signs of a new time: Academic writing in ESP curricula of Ukrainian
universities”. In A. Duszak (Ed.), Culture and Styles of Academic Discourse. Berlin: Mouton
de Gruyter, 103–112.
Appendix 1 — Journals in the corpus
Biology
Applied Soil Ecology
Biochimica et Biophysica Acta
Biomass and Bioenergy
Chemistry and Biology
Current Biology
Journal of Biotechnology
Business
Industrial Marketing Management
International Business Review
International Journal of Project Management
International Journal of Research in Marketing
Journal of Business Venturing
Journal of Operations Management
Chemistry
Analytical Biochemistry
Analytica Chimica Acta
Corrosion Science
International Journal of Inorganic Materials
Journal of Chemical Thermodynamics
Journal of Solid State Chemistry
Environmental Science
Applied Energy
Atmospheric Environment
Biomass and Bioenergy
Ecological Modelling
Environmental Pollution
Global Environmental Change
Language and Linguistics
English for Specific Purposes
Journal of Neurolinguistics
Language and Communication
Language Sciences
Speech Communication
System
Law
California Law Review
Canadian Journal of Criminology
International Review of Law and Economics
Journal of Criminal Justice
98 Matthew Peacock
Physics and Material Science

Acta Materialia
Chemical Physics
International Journal of Fatigue
Journal of Luminescence
Journal of the Mechanics and Physics of Solids
Physica C: Superconductivity
Public and Social Administration
Child Abuse & Neglect
Evaluation and Program Planning
Habitat International
International Journal of Public Sector Management
Social Science & Medicine
World Development
Appendix 2 — Examples from the corpus
Example for Table 3

Language and Linguistics use of it v-link adj that:
(1) It is likely that this pattern is reproduced throughout the cerebrocerebellar system
Examples for Table 4
Non-sciences use of it is possible that:
(2) It is possible that such matching could eliminate hyperpriming (Language and Linguistics)
(3) It is possible that managers in liquidated businesses have handled the need for resources dif-
ferently from managers in surviving businesses (Business)
Non-sciences use of it is important that:
(4) it is important that we found the contextual circumstances adequately addressed (Public and
Social Administration)
(5) Clearly, it is important that both technical and management decision-making types of com-
petence are possessed by SME owner-managers (Business)
Non-sciences use of it is noteworthy/worth noting that:
(6) It is noteworthy that our findings provided empirical support for the conceptual work of
Dreux (Business)
(7) It is worth noting that Anglo-American research apparently affects research culture outside
this sphere (Language and Linguistics)
Sciences use of it is obvious that:
(8) It is obvious that there are many factors that may affect the formation of Bi2223 (Physics)
(9) It is obvious that the phenomenon is associated with the plastic deformation mechanism
(Physics)

Law use of it v-link adj to-inf “difficulty”:
(10) It is difficult to determine whether any real differences occurred because of jurors’ con-
sideration of mitigating circumstances
Sciences use of it is possible to:
(11) it is possible to construct the corresponding hypothetically broadened Bragg peak of the
20-phase mixture (Chemistry)
(12) it is now possible to easily carry out 100 or more Monte Carlo runs with a complex pho-
tochemical grid model (Environmental Science)
Non-sciences use of it is difficult to:
(13) It is difficult to change the planning horizon without process, equipment, and planning
system modifications (Business)
(14) Once rights are granted to copyright owners, it is difficult to withdraw them (Law)
Non-sciences use of it is hard to:
(15) it is hard to assess what the unique impact of each dimension is (Business)
(16) it is hard to think of any market that is not in some sense “potential” (Law)
Non-sciences use of it is easy to:
(17) In other words, it is relatively easy to identify risks (Business)
(18) It is easy to see why a safe, effective vaccine presents an attractive alternative (Law)
Law use of it v-link adj to-inf “necessity”:
(19) it is important to explore whether the public perceives acts of wife abuse as necessarily
criminal
Non-science use of it is important to:
(20) it is important to parallel such an approach with attempts to change attitudes towards TB
in the community (Public and Social Administration)
(21) It is important to ensure that both L1 and L2 English speakers have the opportunity to
participate in the MBA classroom on an equal footing (Language and Linguistics)
Non-science use of it is useful to:
(22) It is useful to consider what might stand in the way of the systematic screening of offend-
ers (Law)
(23) It is useful to assess validity in association with making a conceptual interpretation of the
model (Business)
Non-science use of it is optimal to:
(24) it is optimal to use peremptory challenges to reduce the probability of a wrongful acquit-
tal (Law)
100 Matthew Peacock
Author’s address
Matthew Peacock
Department of English
City University of Hong Kong
Tat Chee Avenue
Kowloon
Hong Kong
enmatt@cityu.edu.hk
Copyright of International Journal of Corpus Linguistics is the property of John Benjamins Publishing Co. and
its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for individual use.

Introductory It

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory It

Uploaded by

Copyright:

Available Formats

A comparative study of introductory it

in research articles across eight disciplines

Keywords: corpus analysis, introductory it, evaluation, research articles,

This article presents a corpus-based analysis of introductory it plus that-clause

International Journal of Corpus Linguistics 16:1 (2011), 72–100. doi 10.1075/ijcl.16.1.04pea

Williams (2002: 45) calls the structures vital to discourse communities, adding

2. The form, frequency, and function of introductory it plus that-clause

Table 1. Summary of previous empirical findings

3.1 Research aims

Biology, Chemistry, Physics, Environmental Science, Business, Language and Lin-

3.2 The RA corpus

Table 2. Lengths of disciplinary corpora

3.3 Investigating the corpus

pattern is reproduced throughout the cerebrocerebellar system. Regarding the analy-

4.1 Overview — Form

Tables 5 and 6 show results for it v-link adj to-inf “difficulty”:

Non-science writers used it is important to and it is useful to significantly more

Table 9. it v-link interesting to-inf — science/non-science differences (frequencies per

Biology — from a research methods section:

Biology — from a discussion section:

Environmental Science — from a research methods section:

Environmental Science — from a discussion section:

Chemistry — from a research methods section:

Chemistry — from a discussion section:

Regarding the non-sciences, we examined discipline differences with whole

The next illustrative example is from a discussion section:

Example 39 is from an introduction:

The final example is from a discussion section:

Third, they use it is worth noting/worthwhile/noteworthy that in the discussion sec-

Business — from a discussion section:

Finally, non-science authors used it is important to, it is useful to and it is optimal

this indicates “it is conventionalised” by the discourse community. We hope we

Future studies can conduct informant interviews to investigate why disciplinary

4. The log-likelihood calculator is available at http://ucrel.lancs.ac.uk/llwizard.html

Ahmad, U. K. 1997. “Research article introductions in Malay: Rhetoric in an emerging research

Appendix 1 — Journals in the corpus

Physics and Material Science

Appendix 2 — Examples from the corpus

Example for Table 3

Example for Table 5

You might also like

Table 1. Summary of previous empirical findings

Table 2. Lengths of disciplinary corpora

Appendix 1 — Journals in the corpus

Appendix 2 — Examples from the corpus

Example for Table 3

Example for Table 5