Professional Documents
Culture Documents
27–61
doi:10.1093/ijl/ecu028 Advance access publication 26 January 2015 27
COLOURS IN ONLINE
DICTIONARIES: A CASE OF
FUNCTIONAL LABELS
Abstract
The study attempts to determine if the colour of functional (part-of-speech and syntac-
tic) labels influences the speed and effectiveness of dictionary search for grammar as
well as the retention of the retrieved information. It also investigates whether the role of
colour is conditioned by the presence of examples in the microstructure. In an online
multiple choice test, participants consulted a purpose-built e-dictionary which consisted
of entries with and without examples. There were two dictionary versions. One
presented the entries in black and white. In the other, functional labels were in
colour. For each test item, the time of task completion was automatically recorded.
In the immediate retention test, the participants did the same task without access to any
dictionary. Results show that functional labels in colour significantly increase the speed
and effectiveness of online dictionary search. They also improve retention.
1. Introduction
53). By the same token, lexicographers are warned against overdoing colours,
as too much colour in a dictionary might be counterproductive.
In what follows, colours in electronic dictionaries are given more attention.
In Section 2, relevant research into the role of colours in visual search is re-
viewed and its significance for dictionary design and consultation is empha-
sized. The potential role of colours in electronic dictionaries is discussed from
the perspective of cognitive load hypothesis. Next, the section offers an over-
view of the use of colours in the major monolingual English learners’ diction-
aries online. Finally, the status of examples in electronic dictionaries is
revisited. Section 3 gives details of the study conducted to gain an insight
into the actual role of examples in online dictionary functional labels. The
obtained results are summarized in Section 4, and conclusions are drawn in
Section 5. Their significance in the context of other research is considered in
Sections 6, where the limitations of the experiment are also acknowledged. An
indication of further areas of investigation concludes the paper.
2. Literature review
Empirical research in the field of computer science shows that colours increase
the visual salience of the information displayed on the computer screen and help
in visual search (Tamborello and Byrne 2007: 183). Fisher and Tan (1989)
observed that people estimate the relative cost of attending to highlighting or
disregarding it; they attend to highlighting if it is more predictive of item status.
In their series of experiments, participants had to find a target digit in the back-
ground of four distractor digits. Highlighting validity, or the probability that the
target is highlighted, was also manipulated. In experiment one, the target was
highlighted in (randomly selected) half of the cases, and a distractor – in the
other half of the time. In experiment two, the target was always highlighted (the
level of highlighting validity was 100 percent). It turned out that colour signifi-
cantly reduced search time when highlighting was 100 percent valid. When high-
lighting validity was 50 percent, search time in the colour condition proved to be
no shorter than when no highlighting was present (Fisher and Tan 1989: 20, 22).
In the replication of the study by Fisher and Tan (1989), Tamborello and Byrne
(2007) tested empirically relative costs of attending to highlighting at other levels
of highlighting validity. Nine highlighting validity conditions were introduced: 0,
12.5, 25, 37.5, 50, 62.5, 75, 87.5 and 100 percent. When highlighting validity was
set at 75 percent, for example, a participant received half control trials (no high-
lighting) and half highlighted trials (the target digit was highlighted in red), 75
percent of which had valid highlighting and 25 percent – invalid highlighting. It
was found that the subjects were the fastest on valid trials, then – control trials and
the slowest on invalid trials. Besides, increasing highlighting validity resulted in
significantly shorter response times on valid trials and longer response times on
invalid ones. Thus, the participants proved sensitive to highlighting, and the sen-
sitivity was higher at higher levels of validity.2 As the authors put it, “the effect of
validity on sensitivity is nearly linear: sensitivity increases as validity increases”
(Tamborello and Byrne 2007: 184).
One obvious conclusion which follows from the studies for dictionary design
is that it might be useful to highlight the relevant (i.e., looked-up)
information. Second, information not useful in a given situation should not
be highlighted in order to prevent distracting users, or reducing their sensitivity
to colour.
On the other hand, it is possible to argue that the microstructure is (still) quite
predictable and mostly static, so dictionary users, especially experienced ones,
could be expected to know where to find specific information, whether high-
lighted or not. Besides, it is typically linguistic information needed to solve a
specific linguistic problem that is looked for, not arbitrary targets. The orienting
of dictionary users’ attention must then be voluntary, which in the literature on
visual search is also known as internal, goal-directed, top-down or endogenous
(Posner 1980: 5, Jamet 2014: 47). Endogenous control over the locus of attention
suggests that observers focus on the regions or objects in the visual display that
they choose for further processing, bearing in mind their goals and assumption
about their current task (Yantis 1993: 676). However, it is worth pointing out
that the orienting of attention within a visual field can also be involuntary, or
exogenous (external), and result from stimuli related to the items displayed on
the screen, rather than any specific goal (Posner 1980: 5, Jamet 2014: 47). Such
stimulus-driven or bottom-up selection of information from visual displays
occurs when some properties of the stimulus capture attention independently
of the observer’s objectives and assumptions (Yantis 1993: 676). Importantly,
stimulus-driven selection must not be dismissed as irrelevant in the context of
dictionary use. The eye-tracking study by Tono (2011) reveals that when search-
ing for specific word senses, dictionary users are often lost in lexicographic data.
Some of the subjects’ search paths were surprisingly “complex” and “tangled”,
because “users moved their eyes all over the entry but could not find an answer”,
which implies that “even though users consult a dictionary, that does not auto-
matically mean that they bring back the right information with them” (Tono
2011: 149-150). Possibly, additional graphic highlighting could successfully pre-
vent them from getting lost in their goal-directed searches. Another eye-tracking
investigation into dictionary look-up shows that elements in bold do catch sig-
nificant attention of users searching for word senses in the microstructure (Lew
et al. 2013: 248, 253). Yantis (1993: 676) points out that it is in fact a combi-
nation of exogenous and endogenous control over the locus of attention that
determines how attention is ultimately deployed in the visual field. The recent eye
tracking research by Jamet (2014) suggests that deliberate, endogenous attention
orienting mechanisms are more often successfully adopted when supported by
(continued)
Table 1: Continued.
not clear; in the fluent condition (Arial) and one disfluent condition (Comic
Sans MS greyscale) retention was significantly better than in other two disflu-
ent conditions (Impact, Impact italics). Nonetheless, what could have affected
the results was not so much the font itself as the fact that any font stood out
from the other fonts which each subject saw in the test. In the second study by
Nesi (2011), in which only 36 participants were involved, five fonts were used in
entries for a different set of five words, but the font conditions were not
rotated; each participant had access to entries in only one font. Again, Arial
was the fluent condition, and the other fonts represented disfluent conditions
(Arial Black, Comic Sans MS greyscale, Impact, Bradley Hand). The results
show that retention was the best for two disfluent fonts (Arial Black and
Bradley Hand), with the difference between them nearing significance
(p=0.08). Yet, the worst results were obtained in another disfluent condition
(Impact), and not the easiest-to-read fluent condition. It is important to note,
however, that the retention task was open-ended; the participants were even
allowed to draw pictures, which made retention difficult to evaluate. The re-
sults should be treated with caution also because there were really few subjects
in each condition and they knew the names of the fonts, among which there
were words like comic or hand. This could have affected their attitudes and
performance. Most importantly, however, the distinction between the fluent
and disfluent conditions might not be clear enough in the first place, since
many typographic features different across the fonts (like spacing, ligatures,
swashes, pitch or size) could have influenced the results. Such features were not
controlled in either study.
(e.g., Laufer 1992, Minaeva 1992, Sinclair 1984, 1991, Fox 1987, Landau 2001).
Today, examples in learners’ dictionaries are mostly corpus-based (Potter 1998,
Prinsloo and Gouws 2000, Atkins and Rundell 2008, Prinsloo 2013, Yamada
2013), and the debate itself is considered an ill-funded “ideological struggle
resulting in loss of perspective” (Prinsloo 2013: 511). The process of finding
candidates for examples is largely automated and much easier than it was a few
decades ago (cf. Engelberg et al. 2009, Kilgarriff 2013). Furthermore, in elec-
tronic dictionaries, storage space ceases to be an issue, and lexicographers can
include more examples than in dictionaries on paper, largely constrained for
space (Rundell and Kilgarriff 2011: 276). This is especially true for the elec-
tronic dictionaries which explore the medium more fully than those which are
simply retro-digitized versions of paper dictionaries (c.f. Tono 2009), also
known as printed online dictionaries (Fuertes-Oliviera 2013: 326) or copycats
(Tarp: 2011: 58-60).
Access to richer stores of data in electronic dictionaries, including additional
examples or even a featured text corpus (cf. Varantola 1994, Atkins 1996), is
dictated by the need to satisfy communicative and cognitive needs of dictionary
users. The former include text reception and production, while the latter in-
volve acquiring knowledge in general (Bergenholtz and Johnsen 2013: 562).
Enriching electronic dictionaries with access to corpora is also perfectly in
line with data-driven learning (Johns 1991a, 1991b), which involves the use
of corpus data to figure out the meaning and usage of words. Put differently,
exposure to real language encourages learners to test hypotheses about word
meaning and use. Learners become language detectives, because ready solu-
tions are not offered to them, only data which need to be further analysed
(Flowerdew 2009: 339). O’Sullivan (2007: 277) argues that data-driven learning
enhances “predicting, observing, noticing, thinking, reasoning, analysing, in-
terpreting, reflecting, exploring, making inferences (inductively or deductively),
focusing, guessing, comparing, differentiating, theorising, hypothesising, and
verifying.” Importantly, drawing conclusions concerning language makes lan-
guage facts more likely to be remembered (Kilgarriff 2009). Landure and
Boulton (2010) found that corpus consultation in conjunction with other
tools (dictionaries and translation engines) resulted in better written language
production and improved learner autonomy. This is an important conclusion,
since corpus-based examples conventionally supplied in dictionary entries “are
not necessarily geared to language production errors and certainly do not pro-
vide repeated exposure to specific target structures that can be problematic to
learners with a specific mother tongue background” (Frankenberg-Garcia
2012: 287).
Initially, a CD-ROM which contained a corpus was made available to users of
a specific dictionary, e.g., COBUILD.7 However, there was usually no (pre)fil-
tering done, and users willing to draw on the extra resource had to work like
would-be corpus linguists (Bogaards 2013: 410, Frankenberg-Garcia 2014: 142).
It is difficult to disagree with Kilgarriff (2009) that “most learners do not want to
be corpus linguists, and concordances are unfamiliar and difficult objects”.
Nowadays, links to corpora and customized tools for exploring corpus data
can be accessed from many online dictionaries (Müller-Spitzer 2013: 378). It is
stressed that since the data are to be explored by non-linguists, “lexicographers
should weigh the ease of hypertextual access against the descriptive objective
underlying the dictionary, and . . . the situation in which the user needs such extra
data” (Debus-Gregor and Heid 2013: 1011). To avoid overloading the user with
irrelevant examples, Abel (2013: 1123) stresses the role of intelligent dictionary
integration with outer sources, including corpora. Possible features of such in-
telligent integration include word sense disambiguation to allow users to search
for examples of words in specific senses, or pre-selecting good examples with the
help of specialized software, such as GDEX (Kilgarriff et al. 2008, Rundell and
Kilgarriff 2011; cf. Kosem et al. 2013). Frankenberg-Garcia (2012: 289) suggests
providing hyperlinks to different examples for decoding and encoding purposes.
Such methods of intelligent dictionary integration with external data are likely to
prevent a “suffocation effect”, when all potentially relevant information is dis-
played, rather than what is really needed in a specific situation (Fuertes-Oliviera
2013: 335, cf. Bergenholtz and Johnsen 2013: 561). “Less is more” (Fuertes-
Oliviera 2013: 335) appears then to be a better approach in this respect than
“the more the merrier” (Debus-Gregor and Heid 2013: 1003).
It is also recommended that e-microstructures should be dynamic and cus-
tomizable (Engelberg et al. 2009). Hierarchical arrangement of lexicographic
data, flexible presentation modes and highlighting search results can success-
fully prevent dictionary users from getting lost in the vast amount of poten-
tially relevant information (Yamada 2013: 204-206, cf. Lew and Tokarek 2010).
Clickable more buttons are among the efficiency-oriented devices which make
it possible to get access to more information on demand and ensure that the
presentation of lexicographic data is compact when users do not need extra
information (Debus-Gregor and Heid 2013: 1005). See more and see less but-
tons can be found in MEDO. Clicking the latter, the user hides labels for
syntactic subcategories, phonetic transcription, links to word forms, colloca-
tions and examples. It is also worth noting that the dictionary adopts a dual
track approach, in which productive vocabulary is exemplified, but receptive
entries (for much less frequent words unlikely to be used productively) do not
include examples at all (cf. Bogaards 2003: 50, Dziemianko 2006: 24–25).
Naturally, the idea is to limit to a minimum the amount of data necessary to
solve a given punctual information deficit, assumed here to be of a receptive
nature. Example sentences with receptive words cannot be called up even by
clicking See more. It is important to note that other learners’ dictionaries, also
those which do not adopt the dual-track principle and do not include effi-
ciency-oriented tools similar to see more/less, have been found not to exemplify
all headwords (and their senses); CALD2 does not offer examples in 13 percent
of the items included, LDOCE4 and MEDAL2 leave out examples in over 25
percent of the cases (Bogaards 2013: 404). Recently, Ostermann (2014), having
analysed the coverage of headwords in example sentences in the Big Five, has
called it “random” and “far from systematic”.
Overall, it seems that examples gradually cease to be a default and static
feature of dictionary entries. The increasing awareness of fine-grained control
over the amount of information available to a dictionary user sometimes jus-
tifies excluding examples from specific entries for good, or making them avail-
able on demand to address users’ needs more adequately. This seems to be a
reasonable approach not only considering reference needs and the data-driven
approach to learning, but also discrepant research results, which do not pro-
vide compelling or conclusive evidence for the actual usefulness of examples.
3. Methods
The aim of the present paper is to investigate the influence of functional labels
in colour on search time, the effectiveness of dictionary use and retention.
Functional labels are understood here as part-of-speech and syntactic labels
after Burkhanov (1998: 89).
The study attempts to answer the following research questions:
(1) Does the presence of colour in functional labels affect the speed and
effectiveness of online dictionary search for grammatical information?
(2) Do functional labels in colour help users remember the retrieved gram-
matical information?
(3) Is the role of colour, if any, moderated by entry completeness? More specif-
ically, does the effect of functional labels in colour on search time, informa-
tion retrieval and retention depend on the presence of examples in entries?
3.2 Materials
To achieve the aims of the study, an experiment was conducted in which a ques-
tionnaire, a main test and a post-test were employed. All the materials were de-
signed using the Moodle platform and were available online. The questionnaire
made it possible to profile the subjects, control for colour blindness and acquaint
participants with the functionalities of the experimental tool. The main test was
designed to examine the role of colours in online dictionary labels, and retention
was checked with the help of the post-test immediately after the main test.
In the main test, 18 English words, six nouns, six verbs and six adjectives,
were used in sentences drawn from corpora (mainly COCA). The target words
together with their immediate context had been removed from the corpus sen-
tences and their grammatical properties were manipulated in multiple-choice
questions. The subjects were requested to choose one of the four options given
for each target word which could best complete the sentence. To perform the
task, the participants were asked to consult the entries supplied below each
sentence. The entries were compiled specifically for the purpose of the study on
the basis of the information found in the Big Five. To make sure that the
subjects relied on the mini-dictionaries and not on their background lexical
and syntactic knowledge, the originally selected target words, which were al-
ready infrequent, were replaced by even rarer ones. The substitutes were drawn
from online dictionaries of difficult words in English (The Phrontistery and The
Grandiloquent Dictionary), and represented the same parts of speech as the
originally chosen words. Table 2 gives the words together with the substitutes
eventually used in the study (in brackets).
In the test, the sequence of the items was randomized to prevent learning
effects. The system automatically shuffled the sentences (with the accompany-
ing entries) each time the test was attempted.
The manipulation of the grammatical properties of headwords in multiple
choice questions depended on the grammatical description offered in the sup-
plied entries. Distractors illustrated those features which were not represented
in the entries, and could not be justified by entry consultation. In general, the
manipulations concerned noun countability, verb transitivity and adjective pre-
dicative or attributive uses. For example, a noun described in the entry as
countable was shown as an uncountable one, an intransitive verb was repre-
sented as a transitive one, and an adjective which is only attributive functioned
as a predicative one or, much less often – as a noun or verb from which an
adjective is formed by relevant suffixation. Apart from the distractors, there
was always only one option in each multiple choice question which represented
the same grammatical properties of a given target item as its entry. A more
detailed account of manipulations in multiple choice questions is given in
Table 3, with examples of correct and incorrect options (distractors) in italics.
The entries used in the study were of two types: minimalistic and complete.
A minimalistic entry consisted of the headword, its phonetic transcription,
part-of-speech, syntactic and style labels as well as a definition. In a complete
entry, there was also an example of usage.
The gapped sentences used in the test and the example sentences employed in
complete entries were selected very carefully. They had to flesh out the head-
word properties which were shown in functional labels. After all, examples in
complete entries had to be a reliable source of grammatical information and
help participants choose correct answers in the multiple choice task. When
corpora were searched, candidates for gapped sentences and examples met
the same query criteria. In the entry for the adjective brumal (extenuating),
which had the label [only before noun], the following example was given:
Among the hits for the same query, extenuating + noun, there was the fol-
lowing sentence:
It served as a basis for the gapped sentence, which, when correctly filled in
the multiple-choice task, should read:
A similar strategy of harvesting sentences was followed in each case, and the
grammatical properties of a given lexical item shown in the label were always
the guiding principle. Apart from that, attention was paid to sentence length
and internal complexity. Excessive length of both dictionary examples and
gapped sentences was avoided, and corpus sentences were edited for difficult
words. In few cases, examples were drawn from monolingual learners’ diction-
aries. Even then the examples (and corpus candidates for gapped sentences)
were checked against relevant functional labels to make sure that they repre-
sented the same grammatical characteristics of a given headword. In fact, this
approach prevented accepting any dictionary example at face value. To illus-
trate, the following example form LDOCE5:
could not be used to illustrate the intransitive verb fub (burgeon) [intransi-
tive]. The form burgeoning in the example above is used attributively and per-
forms the function of an adjective pre-modifying a noun, and not that of an
intransitive verb. Even more importantly, this deverbal adjective does not give
clear information on the transitivity of the verb from which it is derived; pre-
sent participles of transitive verbs can also be used attributively, e.g., an irritat-
ing habit. It follows that sentence 5 could not help subjects realize that the
headword fub (burgeon) was intransitive. Similarly, the OALDCE8 example of
sweep (n):
There were 219 participants in the study, all of whom were doing degrees in the
Faculty of English at Adam Mickiewicz University in Poznan, Poland (B2-C1
in CEFR). 113 of them were given the colour version of the test, and the other
106 were assigned to the black and white version.
In the experimental session, the subjects were first allotted five minutes to fill out
the questionnaire. Then, they were requested to take the main test and answer
all the multiple choice questions, but there was no time limit to perform the task.
In the test, time spent on answering each question was registered by the Moodle
logging facility, so that there were exact timestamps for each test item.
Immediately after the main test, the subjects were asked to take the retention
test, in which they answered the same questions as in the main test, but without
access to any dictionary entries. Additionally, the sequence of test items and the
order of options in multiple choice questions were changed in the retention test to
reduce learning effects. The participants had 15 minutes to complete the post-test.
4. Results
4.1 Time
Figure 2 shows the mean time needed to do the main test in the two experi-
mental conditions (colour and black and white).
1003
800
Seconds
792
600 Black and white
400 Color
200
0
Interface
Figure 2: Mean time of test completion by interface.
In the black and white condition, the subjects needed 16 minutes 43 seconds
to find the relevant information on the headwords, and in the colour condition
– 13 minutes 12 seconds. The difference of 3 minutes 31 seconds in favour of
the colour condition was statistically highly significant (t = 8.88, p = 0.00**),
and the effect size was large (Cohen’s d = 1.19), which means that there was
about 62 percent of non-overlap between the two interfaces in the distribution
of time needed to do the test. It is also worth noting that there was greater
variability in the time needed to find the right information in the black and
white condition (SD 193 sec. = 3 min 13 sec.) than in the colour condition (SD
159 sec. = 2 min 39 sec.).
Figures 3 and 4 present the mean time spent on dealing with a test item
depending on test version (interface) and entry type, respectively. Unless clearly
stated otherwise, 2-level between-group 2-level within-subject ANOVA re-
sults are discussed in Section 4.
Figure 3 shows that the search for grammatical information on a test item
was on average 12 seconds faster when labels were in colour than when they
were in black and white. The difference between the two interfaces was statis-
tically significant (F(1,16) = 8.84, p = 0.01*; partial eta2=0.356). As can be
seen from Figure 4, in turn, the search for grammatical information on a
single item was on average four seconds faster when there were no examples
in entries. In other words, examples slowed down the search by four seconds
per test item. This effect was not statistically significant (F(1,16) = 1.96,
p = 0.18; partial eta2=0.109).
Figure 5 gives information on mean search time for grammatical information
on a test item in each interface and entry condition.
When working with minimalistic and complete entries, the subjects needed
less time to perform the search when labels were in colour. The difference
between the two interfaces was similar for each entry condition (for complete
entries: 13.63 seconds, for minimalistic entries: 9.78 seconds). The interaction
between entry and interface had no statistical significance (F(1,16) = 0.46,
p = 0.51; partial eta2=0.023). It is also worth noting that there was no
Seconds
40
30
20
10
0
Black and white Color
Interface
Figure 3: Mean time (for a test item): Interface.
70
60 51.85
47.89
50
Seconds
40
30
20
10
0
Complete Minimalisc
Entry
Figure 4: Mean time (for a test item): Entry.
40
30
20
10
0
Complete Minimalisc
Entry
Figure 5: Mean time (for a test item): Entry x Interface.
statistically significant difference between the two entry conditions in the time
needed to find information on a test item in the colour interface. In other
words, when labels were in colour, the subjects spent almost the same time
extracting grammatical information from minimalistic (43 seconds) and com-
plete entries (45.04 seconds).
Mean percentages
80 66.69
60
40
20
0
Black and white Color
Interface
Figure 6: Main test results: Interface.
100
75.29 73.95
Mean percentages
80
60
40
20
0
Complete Minimalisc
Entry
Figure 7: Main test results: Entry.
80 67.89 65.49
60
40
20
0
Complete Minimalisc
Entry
Figure 8: Main test results: Entry x Interface.
Figures 6 and 7 present the results obtained in the main test in each test version
and entry condition, respectively.
As can be seen, around 83 percent of the subjects’ answers were correct when
labels were in colour, and only about 67 percent were correct when labels were
in black and white (Figure 6). The difference of almost 24 percent (82.55*100/
66.69 = 123.78) is statistically significant and the effect size is large; nearly 40
percent of the between groups variance can be explained by the manipulated
interface (F(1,16) = 10.42, p = 0.01*; partial eta2 = 0.395). However, the sub-
jects supplied comparably many correct answers when minimalistic (74 percent)
and complete entries (75 percent) were used (Figure 7). The difference of less
than two percent (75.29*100/73.95 = 101.81) between the two entry conditions
is not statistically significant and accounts for only about one percent of the
within subject variance (F(1,16) = 0.13, p = 0.72; partial eta2 = 0.008).
Data on main test results obtained in each entry condition in both test ver-
sions are presented in Figure 8.
Irrespective of whether the subjects were using complete or minimalistic
entries, there were always about a quarter more correct answers in the
colour condition than in the black and white condition (complete entries:
82.69*100/67.89 = 121.80; minimalistic entries: 82.40*100/65.49 = 125.81).
The interaction effect between entry and interface has no statistical significance
(F(1,16) = 0.08, p = 0.78; partial eta2 = 0.005).
4.3 Retention
Retention results in the two test versions and entry conditions are presented in
Figures 9 and 10, respectively.
As can be seen from Figure 9, retention was 29 percent (63.13*100/
49.11 = 128.55) better when in the main test labels had been in colour; in the
colour condition, over 63 percent of the successfully retrieved grammatical
information was retained, while in the black and white condition – about
half. The interface significantly affected the retention of grammar
(F(1,16) = 8.26, p = 0.01*; partial eta2 = 0.340). Figure 10 shows, in turn,
that retention was 11 percent better when the subjects had seen examples in
entries (59.06*100/53.17 = 111.08). In other words, examples enhanced remem-
bering the retrieved grammatical information by 11 percent; retention slightly
exceeded 59 percent when complete entries were used and 53 percent when
minimalistic entries were consulted. The effect of entry type on retention was
not statistically significant (F(1,16) = 1.93, p = 0.18, partial eta2 = 0.108).
Figure 11 presents the results of the retention test when both entry and
interface conditions are taken into account.
Comparable differences in retention (of 25-32 percent) in favour of the
colour interface were observed when the students had been using complete
and minimalistic entries (complete entries: 65.68*100/52.44 = 125.24; minima-
listic entries: 60.57*100/45.78 = 132.31). The effect of colour clearly did not
depend on entry type (F(1,16) = 0.03, p = 0.86; partial eta2 = 0.002).
Mean percentages
80
63.13
60 49.11
40
20
0
Black and white Color
Interface
Figure 9: Retention test results: Interface.
100
Mean percentages
80
59.06
60 53.17
40
20
0
Complete Minimalisc
Entry
Figure 10: Retention test results: Entry.
80
65.68
60.57
60 52.44
45.78
40
20
0
Complete Minimalisc
Entry
Figure 11: Retention test results: Entry x Interface.
5. Conclusions
The results of the study show that labels in colour reduce the time of dictionary
lookup for grammatical information, make the search more successful and help
users remember the retrieved grammatical information (cf. research questions 1
and 2). Arguably, then, colours increase germane cognitive load and reduce
extraneous cognitive load, which should be assessed positively. Unfortunately,
the same does not hold for examples. Unlike colours, examples do not speed up
the search for grammatical information in entries. They do not increase the
effectiveness of dictionary consultation, either, and, most surprisingly, do not
help users to remember the grammatical properties of the looked-up words to
an extent which would be statistically significant. In fact, a positive effect of
examples on retention was noted, but it was too small to reach statistical sig-
nificance. Importantly, however, examples do not interfere with colours; the
use of colour in labels has a similar effect regardless of whether entries are rich
or minimalistic. In other words, the effect of functional labels in colour on
search time, information retrieval and retention does not depend on entry
completeness (cf. research question 3).
6. Discussion
grammatical information from being more effective than the search for gram-
mar in the black and white interface. Style labels, even though in colour, were
simply of no use in the task, and did not produce such a great capture effect as
functional labels in colour, which supplied information necessary to do the test
and whose consultation was endogenously motivated. In further experiments,
however, it might be advisable to include distractor tasks, not only cues, so that
the goal-directed search is more difficult to automate. It this way, the colour-
driven (exogenously motivated) search behaviour might be made more inde-
pendent, though naturally not separate from, endogenous control. The inter-
action between endogenous and exogenous (including colour-driven) control
over search for dictionary information should be carefully examined in the
future.
The study has a number of limitations, some of which suggest other direc-
tions of further research. It centred on the retrieval of grammatical information
from short entries in a grammar-focused task. It is by no means certain that
similar results could be obtained if other information categories had to be
retrieved from longer entries in different situations of dictionary use. It is
not known to what extent the results are colour-specific, either. Maybe differ-
ent effects could be observed if other colours were used in further investiga-
tions. The suitability of different colours for highlighting (different types of)
dictionary information seems to be another unexplored area of dictionary
design and use. Introducing flexibility in highlighting depending on individual
user preferences and contexts of dictionary use (e.g., reading comprehension,
writing formal essays, academic papers or informal e-mails, text correction or
proofreading) poses a serious challenge connected with successful use of col-
ours in dictionaries, which should also be taken on by researchers. In the pre-
sent study, as pointed out above, style labels were in colour, even though they
did not convey information useful in the test. Although they served to reduce
the salience of functional labels in colour, they did not cancel out their positive
effect. Yet, the study cannot answer the question as to whether there should be
any limit on the number of information categories displayed in colour at the
same time, beyond which colours might simply cease to be useful. Finally, only
immediate retention was tested. It remains to be seen whether colours could
enhance retention also in the long run. There is no doubt that an eye-tracking
study is needed to get a much better insight into the role of colours in electronic
dictionary interfaces.
Even more importantly, it might be instructive to empirically test the relative
usefulness of different highlighting methods. In existing dictionaries available
in black and white, various typographic tools usually serve to highlight infor-
mation, e.g., italics, bold print, capital letters, grey (rather than black) font. It is
vital to know if colour facilitates visual search and retention more than such
traditional highlighting techniques. Admittedly, the black and white interface
in the current study was not naturalistic, since no highlighting methods
Acknowledgement
Special thanks go to Mikolaj Placzek MSc, who adapted the Moodle logging
facility to the requirements of the experiment.
Notes
5 Style labels, absent from the entry for professor, are also in purple in LDOCE5.
6 Individual aesthetic feelings, intuition and affiliation (e.g., the main colour on the
website of the institution where a dictionary was compiled) are the reasons usually given
by lexicographers enquired about the rationale behind the choice of specific colours for
entry elements.
7 See Varantola (1994: 609) for a critical assessment of the search options the CD-
ROM offered.
8 Unfortunately, for practical reasons, including the availability of subjects and the
computer laboratory, no pilot study could be conducted. Nonetheless, several academic
teachers and native speakers of English were consulted prior to the experiment to obtain
feedback on the reliability of test. They considered the test reliable and had no serious
reservations.
9 Test samples are given in the Appendix. They appear in colour in the online version
of the article.
10 In their feature-integration theory of attention, Treisman and Gelade (1980: 99)
assume that “simple features can be detected in parallel with no attention limits” (hence
the name parallel search), so the search for items distinguished by any such feature
remains unaffected by variations in the number of other unmarked items in the display.
They hold that parallel search applies when the searched-for target is a singleton, i.e.,
when it differs from other items in the visual field in one single feature (like colour, size,
motion, e.g., a red item among green items, or a blue letter among red Ts and green Xs).
In such conditions, a target distinguished by a single feature is salient in the search
display and is quickly detected independently of display size. For example, in the search
for a red item among green items, the number of green items makes hardly any differ-
ence (cf. Nagy and Sanchez 1990). As Wolfe (1998a: 17) puts it, “all items can be
processed at once to a level sufficient to distinguish targets from non-targets. The red
item, if present, ‘pops out’ and makes its presence known”. On the other hand, when the
target is defined by a conjunction of features (e.g., a vertical red line in a background of
horizontal red and vertical green lines, or a green T among red Ts and green Xs), a serial
search is used and the time of target detection increases with increasing display size
(Treisman and Gelade 1980: 99; Turatto et al. 2004: 298, Gaspelin et al. 2012: 1464).
Wolfe (1998a: 17) explains that when conjunctions of features come into play, “the
target might be the first item visited by attention. It might be the last item or it
might be any item in between. On average, attention will need to visit half of the
items.” Interestingly, in the study by Jonides and Yantis (1988), singletons defined by
static manipulation of colour did not elicit involuntary and automatic capture of atten-
tion, but only onset singletons did, i.e., those which appeared abruptly. Recently,
Gaspelin et al. (2012) have noted that attention capture effects by colour singletons
are greater in parallel than serial search, while capture effects by abrupt onsets were
greater for serial than parallel search. However, the distinction between the two search
modes (parallel and serial) has often been shown to be difficult to draw in the first place
(Pashler, 1987; Townsend, 1990; Wolfe, 1994, 1998a, 1998b).
References
Mayor, Michael. (ed.). 2009. Longman Dictionary of Contemporary English. (Fifth edi-
tion.) Harlow: Longman. (LDOCE5) http://www.ldoceonline.com/.
McIntosh, Colin. (ed.). 2013. Cambridge Advanced Learners’ Dictionary. (Fourth edi-
tion.) Cambridge: Cambridge University Press. (CALD4) http://dictionary.cam-
bridge.org/dictionary/learner-english/.
Perrault, Stephen J. (ed.). 2008. Merriam-Webster’s Advanced Learner’s English Dictionary.
Springfield: Merriam-Webster. (MWALED) http://www.learnersdictionary.com/.
The Phrontistery. http://phrontistery.info/.
Rundell, Michael. (ed.). 2007. Macmillan English Dictionary Online. Oxford: Macmillan
Education. (MEDO) http://www.macmillandictionary.com/.
Sinclair, John. (ed.). 2012. Collins COBUILD Advanced Dictionary. (Seventh edition.)
Boston: Heinle Cengage Learning, Glasgow: Harper Collins Publishers.
(COBUILD7) http://www.myCOBUILD.com/.
Turnbull, Joanna. (ed.). 2010. Oxford Advanced Learner’s Dictionary of Current English.
(Eighth edition.) Oxford: Oxford University Press. (OALDCE8) http://oald8.oxfor-
dlearnersdictionaries.com/.
B. Other literature
Abel, Andrea. 2013. ‘Electronic Dictionaries for Computer-Assisted Language
Learning’. In Rufus H. Gouws, Ulrich Heid, Wolfgang Schweickard and Herbert
E. Wiegand (eds), Dictionaries. An International Encyclopedia of Lexicography.
Supplementary Volume: Recent Developments with Focus on Electronic and
Computational Lexicography. Berlin, Boston: De Gruyter Mouton, 1115–1136.
Al-Ajmi, Hashan. 2008. ‘The Effectiveness of Dictionary Examples in Decoding: The
Case of Kuwaiti Learners of English’. Lexikos, 18: 15–26.
Almind, Richard. 2005. ‘Designing Internet Dictionaries’. Hermes, 34: 37–54.
Atkins, Beryl T. Sue. 1996. ‘Bilingual Dictionaries: Past, Present and Future’.
In Martin Gellerstam, Jerker Järborg, Sven-Göran Malmgren, Kerstin Norén,
Lena Rogström and Catalina Röjder Papmehl (eds), EURALEX’96 Proceedings:
Papers Submitted to the 7th EURALEX International Congress on Lexicography in
Göteborg, Sweden. Göteborg: Göteborg University, 515–546.
Atkins, B., T. Sue and Michael Rundell. 2008. The Oxford Guide to Practical
Lexicography. New York: Oxford University Press.
Béjoint, Henri. 1981. ‘The Foreign Student’s Use of Monolingual English Dictionaries:
A Study of Language Needs and Reference Skills’. Applied Linguistics, 2.3: 207–222.
Bergenholtz, Henning and Mia Johnsen. 2013. ‘User Research in the Field of Electronic
Dictionaries: Methods, First Results, Proposals’. In Rufus H. Gouws, Ulrich Heid,
Wolfgang Schweickard and Herbert E. Wiegand (eds), Dictionaries. An International
Encyclopedia of Lexicography. Supplementary Volume: Recent Developments with
Focus on Electronic and Computational Lexicography. Berlin, Boston: De Gruyter
Mouton, 556–568.
Bergenholtz, Henning and Sven Tarp (eds) 1995. Manual of Specialised Lexicography:
The Preparation of Specialised Dictionaries. Amsterdam: Benjamins.
Bogaards, Paul. 1996. ‘Dictionaries for Learners of English’. International Journal of
Lexicography, 9.4: 277–320.
Bogaards, Paul. 2003. ‘MEDAL: The Fifth Dictionary for Learners of English’.
International Journal of Lexicography, 16.1: 43–55.
Nesi, Hilary. 2011. ‘The Effect of E-dictionary Font on Vocabulary Retention’. Paper
read at E-Lex2011. Electronic Lexicography in the 21st Century: New Applications for
New Users. Bled. http://videolectures.net/elex2011_nesi_effect/.
O’Sullivan, Íde. 2007. ‘Enhancing a Process-oriented Approach to Literacy and Language
Learning: The Role of Corpus Consultation Literacy’. ReCALL, 19.3: 269–286.
Ostermann, Carolin. 2014. ‘Frame Semantics and Learner’s Dictionaries: Frame
Example Sections as a New Dictionary Feature’. Paper read at Euralex 2014
Congress, July 2014, Bolzano, Italy.
Pashler, H. 1987. ‘Detecting Conjunctions of Color and Form: Reassessing the Serial
Search Hypothesis’. Perception and Psychophysics, 41: 191–201.
Posner, Michael I. 1980. ‘Orienting of Attention’. The Quarterly Journal of Experimental
Psychology, 32.1: 3–25.
Potter, Liz. 1998. ‘Setting a Good Example: What Kind of Examples Best Serve the
Users of Learners’ Dictionaries?’ In Thierry Fontenelle, Philippe Hiligsman,
Archibald Michels, André Moulin and Siegfried Theissen (eds), Actes Euralex’98
Proceedings. Liège: Cambridge University Press, 357–362.
Prinsloo, Daniel. 2013. ‘New Developments in the Selection of Examples’. In Rufus
H. Gouws, Ulrich Heid, Wolfgang Schweickard and Herbert E. Wiegand (eds),
Dictionaries. An International Encyclopedia of Lexicography. Supplementary
Volume: Recent Developments with Focus on Electronic and Computational
Lexicography. Berlin, Boston: De Gruyter Mouton, 509–516.
Prinsloo, Daniel and Rufus H. Gouws. 2000. ‘The Use of Examples in Polyfunctional
Dictionaries’. Lexikos, 10: 138–156.
Rundell, Michael. 2012. ‘It Works in Practice but Will It Work in Theory? The Uneasy
Relationship between Lexicography and Matters Theoretical’. In Ruth Vatvedt Fjeld
and Julie Matilde Torjusen (eds), Proceedings of the 15th EURALEX Congress. Oslo:
University of Oslo, 47–92.
Rundell, Michael and Adam Kilgarriff. 2011. ‘Automating the Creation of Dictionaries:
Where will It all End?’ In Fanny Meunier, Sylvie De Cock, Gaëtanelle Gilquin and
Magali Paquot (eds), A Taste for Corpora. In Honour of Sylviane Granger. Université
Catholique de Louvain: John Benjamins, 257–281.
Sinclair, John. 1983. ‘Lexicography as an Academic Subject’. In Reinhard
R. K. Hartmann (ed.), LEXeter ’83 Proceedings. Tübingen: Niemeyer, 3–12.
Sinclair, John. 1984. ‘Naturalness in Language’. In Jan Aarts and Willem Meijs (eds),
Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English
Language Research. Amsterdam: Rodopi, 203–210.
Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Summers, Della. 1988. ‘The Role of Dictionaries in Language Learning’.
In Ronald Carter and Michael McCarthy (eds), Vocabulary and Language
Learning. London: Longman, 111–125.
Svensén, Bo. 2009. Practical Lexicography: Principles and Methods of Dictionary-
making. Oxford: Oxford University Press.
Sweller, John. 2010. ‘Element Interactivity and Intrinsic, Extraneous and Germane
Cognitive Load’. Educational Psychology Review, 22.2: 123–138.
Sweller, John, Jeroen van Merrienboer and Fred Pass. 1998. ‘Cognitive Architecture and
Instructional Design’. Educational Psychology Review, 10.3: 251–296.
Tamborello, Franklin and Michael Byrne. 2007. ‘Adaptive but Non-optimal
Visual Search Behavior with Highlighted Displays’. Cognitive Systems Research, 8:
182–191.
Tarp, Sven. 2011. ‘Lexicographical and Other E-tools for Consultation Purposes:
Towards the Individualization of Needs Satisfaction’. In Pedro Fuertes-Olivera and
Henning Bergenholtz (eds), E-lexicography: The Internet, Digital Initiatives and
Lexicography. London, New York: Continuum, 54–70.
Tono, Yukio. 2009. ‘Pocket Electronic Dictionaries in Japan: User Perspectives’.
In Henning Bergenholtz, Sandro Nielsen and Sven Tarp (eds), Lexicography at a
Crossroads: Dictionaries and Encyclopedias Today, Lexicographical Tools
Tomorrow. Bern: Peter Lang Verlag, 33–67.
Tono, Yukio. 2011. ‘Application of Eye-tracking in EFL Learners’ Dictionary Look-up
Process Research’. International Journal of Lexicography, 24.1: 124–153.
Townsend, J. T. 1990. ‘Serial and Parallel Processing: Sometimes They Look Like
Tweedledum and Tweedledee but They Can (and Should) Be Distinguished’.
Psychological Science, 1: 46–54.
Trap-Jensen, Lars. 2010. ‘One, Two, Many: Customization and User Profiles in Internet
Dictionaries’. In Anne Dykstra and Tanneke Schoonheim (eds), Proceedings of the
XIV Euralex International Congress. Ljouwert: Afûk, 1133–1143.
Treisman, Anne M. and Garry Gelade. 1980. ‘A Feature-integration Theory of
Attention’. Cognitive Psychology, 12: 97–136.
Turatto, Massimo, Giovanni Galfano, Simona Gardini and Gian Gastone Mascetti. 2004.
‘Stimulus-driven Attentional Capture: An Empirical Comparison of Display-size and
Distance Methods’. The Quarterly Journal of Experimental Psychology, 57A.2:
297–324.
Varantola, Krista. 1994. ‘The Dictionary User as Decision Maker’. In Willy Martin,
Willem Meijs, Margreet Moerland, Elsemiek ten Pas, Piet van Sterkenburg and
Piek Vossen (eds), Proceedings of the 6th EURALEX International Congress.
Amsterdam: Vrije Universiteit Amsterdam, 606–611.
Verlinde, Serge, Patrick Leroyer and Jean Binon. 2010. ‘Search and You Will Find.
From Stand-alone Lexicographic Tools to User Driven Task and Problem-oriented
Multifunctional Leximats’. International Journal of Lexicography, 23.1: 1–17.
Wolfe, Jeremy M. 1994. ‘Guided Search 2.0: A Revised Model of Visual Search’.
Psychonomic Bulletin and Review, 1: 202–238.
Wolfe, Jeremy M. 1998a. ‘Visual Search’. In H. Pashler (ed.), Attention. London:
University College London Press, 13–73.
Wolfe, Jeremy M. 1998b. ‘What Can 1 Million Trials Tell Us about Visual Search?’
Psychological Science, 9: 33–39.
Yamada, Shigeru. 2013. ‘Monolingual Learners’ Dictionaries: Where Now?’
In Howard Jackson (ed.), The Bloomsbury Companion to Lexicography. London:
Bloomsbury Publishing, 188–212.
Yantis, Steven. 1993. ‘Stimulus-driven Attentional Capture and Attentional Control
Settings’. Journal of Experimental Psychology: Human Perception and Performance,
19.3: 676–681.
Zgusta, Ladislav. 1971. Manual of Lexicography. The Hague: Mouton.
Appendix
Sample 1b: Main test, labels in black and white, complete entry
Sample 2b: Main test, labels in black and white, minimalistic entry