Professional Documents
Culture Documents
Reading Time and Vocabulary Rating in The Japanese Language: Large-Scale Reading Time Data Collection Using Crowdsourcing
Reading Time and Vocabulary Rating in The Japanese Language: Large-Scale Reading Time Data Collection Using Crowdsourcing
Abstract
This study examined the effect of the differences in human vocabulary on reading time. This study conducted a word
familiarity survey and applied a generalised linear mixed model to the participant ratings, assuming vocabulary to be a random
effect of the participants. Following this, the participants took part in a self-paced reading task, and their reading times were
recorded. The results clarified the effect of vocabulary differences on reading time.
5178
tures (Miyauchi et al., 2017), syntactic and seman- 2014) for stimulus sentences. OW is one copyright-free
tic categories (Kato et al., 2018), and co-reference in- sample in BCCWJ. For OT, 38 samples (17 elementary,
formation, including exophora and clause boundaries 9 middle, and 12 high school) of Japanese language
(Matsumoto et al., 2018). Annotation labels and read- classes (contemporary writing) were used. Reading
ing time were compared (Asahara, 2017; Asahara and time data on these stimulus texts were collected, as-
Kato, 2017; Asahara, 2018b; Asahara, 2018a). How- suming that anyone who had received Japanese lan-
ever, the text amount and number of participants were guage education in Japan had read them in the past.
limited. This study aimed to increase the reliability This study assumed that it would be used as contrast
of the results by increasing the amount of data. Eye- data for readability evaluation in Japanese language ed-
tracking experiments could not be performed in the ucation after obtaining data on people who received
laboratory due to the COVID-19 pandemic; therefore, Japanese language education overseas or learners of
large-scale self-paced reading time data were collected the Japanese language. For PB, 83 samples of BC-
via crowdsourcing to precede the Natural Stories Cor- CWJ core data were used. The study was conducted
pus (Futrell et al., 2018). based on various annotations, such as dependency on
Following this, datasets of word familiarity ratings these data. It was assumed that the data would be used
were surveyed. The NTT Database Series: Lexical for commercial purposes, limited by the reading time
Properties of Japanese (Nihongo-no Goitokusei) (NTT data of PN. Table 1 shows the number of samples, sen-
Communication Science Laboratories, 1999 2008) is tences, and phrases in the stimulus sentences (for the
the world’s largest database that examines lexical fea- number of recruited participants, see the next section).
tures from a variety of perspectives, aiming to clarify For comparison, the Natural Stories Corpus sampled
Japanese language functions. In addition, it contains approximately 1,000 words. The present data aver-
subjective data, such as word familiarity, orthography- aged over 1,000 characters as OW, OT, and PB samples
type appropriateness, word accent appropriateness, in BCCWJ. For each stimulus sentence, two questions
kanji familiarity, complexity, reading appropriateness, with ‘yes’ or ‘no’ answers were established to check
and word mental image, and objective data based on whether the participants read the content correctly.
the frequency of vocabulary as it appears in newspa-
pers. The Word Familiarity Database (Heisei Era Ver- 3.1.2. Self-Paced Reading Method
sion) (Amano and Kondo, 1998)(Amano and Kondo,
1999; Amano and Kondo, 2008) is an advanced lexical
database on the familiarity of vocabulary. The Word
Familiarity Database (Reiwa Era Version) (NTT Com-
munication Science Laboratories, 2021) was created,
as it was noted that people’s perceptions of vocabu-
lary changed since the first survey. The word mental
image characteristic database collected information on
the ‘ease of sensory imagery of semantic content’ for
written and spoken stimuli (Sakuma et al., 2008). In
addition, NINJAL has been continuously working on
the estimation of word familiarity for the WLSP (Asa-
hara, 2019) and has published several lexical tables.
5179
or over 3,000 ms4 were eliminated. Consequently, the
Table 2: The statistics of the participants
number of cases shown in ‘Objects of analysis’ in the
registers participants per sample table was considered appropriate data points.
OW White paper 425
OT Textbooks 200 3.2.2. Format of the Reading Time Data
PB Books 200 Table 5 shows the format of the experimental data.
The data are in the TSV form. BCCWJ Sample ID
and BCCWJ start are the location information in BC-
experiment was conducted (Asahara, 2019) in section
CWJ. This information enabled the comparison of mor-
4.1. This study recruited 2,092 people with a large
phological information and various annotation data in
variance in word-familiarity responses. This research
BCCWJ. SPR sentence ID and SPR bunsetsu ID are
was conducted as a preliminary experiment, as eligible
the sentence ID and phrase ID in the experiment,
participants could be selected in advance, and the vo-
which indicated the order of presentation in the ex-
cabulary of each participant could be estimated when
periment. SPR surface is a surface form. OT and
estimating word familiarity. For OW, this study re-
PB are masked so that only the number of charac-
cruited 500 people on a trial basis, and 427 people
ters (SPR word length) can be seen. SPR reading time
participated. For OTs and PBs, this study recruited
is the reading time to be analysed. In addition, a
200 people.3 This study recruited 100 participants who
logarithmic reading time (SPR log reading time) was
answered ‘yes’ to content-checking questions and 100
generated during the analysis. SPR instructiontime is
participants who answered ‘no’. The number of partic-
the time when the first instruction was given. Read-
ipants are presented in Table 2.
ing instructions were skipped in the second and sub-
3.2. Organising the Reading Time Data sequent experiments. SPR QA question is a ‘yes’
3.2.1. Eliminating Inappropriate Data or ‘no’ question for checking during the experi-
ment, and SPR QA answer is the correct answer.
SPR QA correct is a flag indicating whether the par-
Table 3: Elimination of inappropriate samples ticipant answered the ‘yes’ or ‘no’ question correctly.
Description OW OT PB The published data comprised only correct answers.
All samples 425 7,453 16,553 SPR QA qatime is the time required for the partici-
(1) disagree 54 278 199 pant to answer the ‘yes’ or ‘no’ question. SPR subj ID
(2) multiple time 0 46 64 is the participant ID. SPR averageRT is the aver-
(3) <150ms or 2,000ms< 15 1,439 2,875 age reading time per sample during the experiment.
(4) incorrect answer 48 825 2,090 SPR timestamp indicates the time when the experi-
Appropriate samples 308 4,865 11,325 ment was performed. SPR trial indicates the num-
ber of times the reading time of the genre was mea-
As the data in this study were collected online, it was sured. SPR control is the method of presenting in-
assumed that data quality was worse than data collected formation during the experiment (whether there was a
in person. Therefore, inappropriate data were elimi- space between phrases). To examine the impact of de-
nated, including (1) data of respondents who did not pendency on reading time, the information of BCCWJ-
agree with the method of handling the experimental DepPara (Asahara and Matsumoto, 2016) was added to
data and the reward payment method at the start of the the core data OW/PB. As BCCWJ-DepPara defines a
experiment, (2) repeated submissions by the same par- sentence boundary different from BCCWJ, there was a
ticipant, with only the first submission retained, (3) re- discrepancy with the phrase ID when presented (Dep-
sponses with an average reading time per sample below Para bid, DepPara depid). The number of dependen-
150 ms or above 2,000 ms, and (4) data of respondents cies (DepPara depnum) was based on the BCCWJ-
who answers the ‘yes’ or ‘no’ question incorrectly. DepPara standard. For OT, textbooks of school type
(BCCWJ OT school type, OT01: elementary, OT02:
Table 4: Elimination of inappropriate data points middle, OT03: high school) were set as the object of
Description OW OT PB analysis.
Appropriate samples 140,449 6,114,814 11,309,783
(5) <100 or 3,000< 3,653 409,916 540,403 3.3. Preliminary Analysis of the Reading
Objects of analysis 136,797 5,704,898 10,769,380 Time Data
3.3.1. Analytical Method
Inappropriate data for each data point were eliminated. This section presents the preliminary analysis results
The ‘Appropriate samples’ row in Table 4 shows the of a frequency-based analytical method (Baayen, 2008;
number of data point contained in the appropriate sam- Vasishth et al., 2021). Reading times were exam-
ple. An additional five points that were below 100 ms
4
As a criterion, we referred to the Natural Stories Cor-
3
In some samples, 200 or more people were recruited due pus(Futrell et al., 2018), which removed data points below
to a mistake made in the experimental set-up in OT and PB. 100 ms or above 3,000 ms.
5180
Table 5: Data format
column name description OW OT PB
BCCWJ Sample ID sample ID in BCCWJ ✓ ✓ ✓
BCCWJ start position in BCCWJ ✓ ✓ ✓
SPR sentence ID sentence ID ✓ ✓ ✓
SPR bunsetsu ID phrase ID ✓ ✓ ✓
SPR surface surface form ✓ masked masked
SPR word length number of characters ✓ ✓ ✓
SPR sentence sentence ✓ masked masked
SPR reading time reading time ✓ ✓ ✓
SPR log reading time logarithm of reading time ✓ ✓ ✓
SPR instruction time instruction time ✓ ✓ ✓
SPR QA question YES/NO question ✓ ✓ ✓
SPR QA answer correct answer of YES/NO question ✓ ✓ ✓
SPR QA correct correct or not in YES/NO question ✓ ✓ ✓
SPR QA qa time time of YES/NO question ✓ ✓ ✓
SPR subj ID participant ID ✓ ✓ ✓
SPR averageRT average reading time in a sample ✓ ✓ ✓
SPR timestamp timestamp of experiment ✓ ✓ ✓
SPR trial trial number per subject N/A ✓ ✓
SPR control presentation method ✓ ✓ ✓
DepPara bid phrase id in treebank ✓ N/A ✓
DepPara depid dependent id in treebank ✓ N/A ✓
DepPara depnum the number of dependents ✓ N/A ✓
BCCWJ OT school type school types for textbooks N/A ✓ N/A
ined using a generalised linear mixed model (R (R or absence of significant differences. Values in paren-
Core Team, 2020), lme4 (Bates et al., 2015), and theses are standard deviation. The presentation or-
stargazer (Hlavac, 2018)). SPR sentence ID (experi- der (SPR sentence ID, SPR bunsetsu ID) in the sam-
mental sentence ID) and SPR bunsetsu ID (experimen- ples tended to have a shorter reading time with any
tal phrase ID), which present information of the presen- register. There was also a tendency for the reading
tation order, and SPR word length, which is the num- times to increase as the length of words increased
ber of words of surface form, were used as fixed ef- (SPR word length). The correlation between OW and
fects. For OW and PB, the number of dependencies PB and the number of dependencies of the phrase (Dep-
of the phrase, DepPara depnum, were considered. For Para depnum) was examined. In both results, read-
OT, school type, SPR OT school type, was considered. ing times tended to be shorter for phrases with a large
For OT and PB, the trial order, SPR trial, was mod- number of dependencies. For OT and PB, the trial or-
elled as a fixed effect when the same participant read der when the same participant read multiple samples,
multiple samples. This study considered SPR subj ID SPR trial, was considered as a fixed effect. For OT,
(participant ID) as a random effect to model individ- the reading times decreased as the number of trials in-
ual differences between participants. For OT/PB, BC- creased; however, for PB, the reading times increased
CWJ Sample ID (sample ID of BCCWJ) was used as a as the number of trials increased. For OT, school
random effect to model individual differences between type, SPR OT school type, was considered. Partici-
samples. The analytical formula is as follows: pants tended to spend less time reading high school
textbooks than elementary school textbooks. For OT, it
was easy to proceed in the order of elementary school
SPR reading time ∼ SPR sentence ID → middle school → high school on selection screens.
+ SPR bunsetsu ID + SPR word length There was a correlation between SPR trial and the ele-
+ SPR trial + DepPara depnum mentary, middle, and high school types. This may have
led to discrepancies between the registers in the effect
+ BCCWJ OT school type+
of the number of trials in reading times. The partici-
+ (1|SPR subj ID factor) pants tried OW, OT, and PB in a specified order. In the
+ (1|BCCWJ Sample ID). (1) first experiment, the reading times tended to be longer
than those of the other registers. OT includes textbooks
Data points with values outside 3SD were eliminated likely to be encountered during Japanese language ed-
and the analysis was conducted again. ucation in Japan. However, due to the influence of
3.3.2. Results the presentation order, the reading times tended to be
Table 6 and 7 shows the analysis results, including longer than that of PB. For books, comparisons were
the estimates for the fixed effects with the presence made between genres using the Nippon Decimal Clas-
5181
Table 6: Preliminary analysis results of reading time by GLMM
Dependent variable:
SPR reading time
OW (white paper) OT (textbook) PB (book)
SPR sentence ID −6.042∗∗∗ (0.048) −0.125∗∗∗ (0.0004) −0.143∗∗∗ (0.001)
SPR bunsetsu ID −1.477∗∗∗ (0.046) −2.047∗∗∗ (0.011) −0.856∗∗∗ (0.006)
SPR word length 24.311∗∗∗ (0.160) 5.113∗∗∗ (0.021) 6.705∗∗∗ (0.014)
DepPara depnum −15.048∗∗∗ (0.555) −5.225∗∗∗ (0.033)
SPR trial −0.760∗∗∗ (0.005) 0.379∗∗∗ (0.006)
BCCWJ OT school typeOT02 −7.721 (8.898)
BCCWJ OT school typeOT03 −25.228∗∗∗ (8.352)
Constant 540.050∗∗∗ (11.951) 361.566∗∗∗ (7.155) 306.887∗∗∗ (5.321)
data points 133,806 5,617,794 10,701,504
elimination rate (outside 3SD) 1,726 (0.0126) 87,103 (0.0152) 168,671 (0.0155)
log-likelihood −897,781.800 −34,071,483.000 −64,679,004.000
∗∗∗
Note: p<0.01
sification and other methods. In this design, the judgments were split between
character-based (WRITE and READ) and voice-based
4. Reading Time with Vocabulary Rating (SPEAK and LISTEN) judgments, and between pro-
duction (WRITE and SPEAK) and reception (READ
4.1. Building Vocabulary Rating Data and LISTEN) judgments. The participants gave
This study quantified the bias of the participants ob- five ratings for each factor, ranging from 5 (well-
tained during the collection of word familiarity data known/often used) to 1 (little known/rarely used).
(Asahara, 2019) and used it for vocabulary data. The collected data were modelled in the following
Specifically, data on the extent to which participants ways: word familiarity as a random effect of 84,114
knew, wrote, read, spoke, and heard the target word surface forms5 and vocabulary as a random effect
were collected. The list included 96,557 target words of 6,732 participants using a Bayesian linear mixed
taken from the WLSP. Voice data (oral pronunciations) model. The graphical model used to estimate the rat-
for the lexical entries were not included; however, ings is shown in Figure 4: Nword is the number of
speech and hearing were considered as two of the fol- words (surface forms), Nsubj is the number of partici-
lowing five perspectives: pants, Index i : 1 . . . Nword is the index of words, index
j : 1 . . . Nsubj is the index of participants, and y (i)(j)
KNOW: How much do you know about the target is the rating of KNOW, WRITE, READ, SPEAK, LIS-
word? TEN. y was generated by a normal distribution with
µ(i)(j) and σ, as follows:
WRITE: How often do you write the target word?
y (i)(j) ∼ N ormal(µ(i)(j) , σ).
READ: How often do you read the target word?
σ is a hyper-parameter of the standard deviation and
SPEAK: How often do you speak the target word?
5
96,557 words include several homonyms and consist of
LISTEN: How often do you hear the target word? 84,114 surface forms.
5182
KNOW WRITE+SPEAK WRITE+READ
KNOW, WRITE, READ, SPEAK, and
5000
600 1200
LISTEN are the original estimates.
1000
Frequency
Frequency
Frequency
2000
WRITE+SPEAK, READ+LISTEN,
0 400
and WRITE+SPEAK-READ-LISTEN
are for production or reception.
0
-3 -1 0 1 2 3 -4 -2 0 2 4 -4 -2 0 2 4
WRITE+READ, SPEAK+LISTEN,
and WRITE+READ-SPEAK-LISTEN
are for text or speech.
2500
2500
1000
1000
Frequency
Frequency
Frequency
Frequency
1000
1000
0 400
0 400
0
-3 -1 0 1 2 3 -3 -1 0 1 2 3 -4 -2 0 2 4 -4 -2 0 2 4
WRITE+SPEAK-READ-LISTEN WRITE+READ-SPEAK-LISTEN
SPEAK LISTEN
6000
6000
2500
Frequency
Frequency
4000
4000
Frequency
Frequency
1000
1000
2000
2000
0
0
-3 -1 0 1 2 3 -3 -1 0 1 2 3
-4 -2 0 2 4 -4 -2 0 2 4
Frequency
Frequency
15
20
10
10
5
5
5
0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
(i) (i)
µ(i)(j) is a linear formula of slopes γsubj , slopes γword , Figure 2 shows the estimated word familiarities. Fig-
and an intercept α: ure 3 shows the distribution of the vocabulary of those
(i) (j)
who participated in the collection of reading time data.
µ(i)(j)=α+γword +γsubj . The frequentist model for the word familiarity rate es-
timation could not be constructed due to the number of
The slopes were modelled by a normal distribution with
parameters in the environment.
the hyper-parameters of µword , σword , µsubj , and σsubj
(means and standard deviations): The data is available at https://github.com/
masayu-a/WLSP-familiarity.
(i)
γword ∼ N ormal(µword , σword ),
(j)
γsubj ∼ N ormal(µsubj , σsubj ). 4.2. Reading Time with Vocabulary Rating
(i)
The word familiarity rates were composed by γword . Data of the participants who participated in the word fa-
The biases of subject participants were modelled by miliarity survey, including over 200 answers, and over
(j)
γsubj . five samples of the reading time experiment were anal-
This study used a normal distribution with a standard ysed. Table 8 shows the statistics of the participants,
deviation of 1.0 for words and 0.5 for the participants. and Table 9 shows the data with vocabulary ratings.
5183
(𝑖𝑖)
𝛾𝛾𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 ~ 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝜇𝜇𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 , 𝜎𝜎𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 )
𝛼𝛼 𝜎𝜎
(𝑗𝑗)
𝛾𝛾𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ~ 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝜇𝜇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 , 𝜎𝜎𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 )
𝜇𝜇𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
𝑖𝑖 ∶ 1 … 𝑁𝑁𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 (𝑖𝑖) (𝑗𝑗)
𝜇𝜇 (𝑖𝑖)(𝑗𝑗) = 𝛼𝛼 + 𝛾𝛾𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 + 𝛾𝛾𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝜎𝜎𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
(𝑖𝑖)
𝛾𝛾𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑦𝑦 (𝑖𝑖)(𝑗𝑗) ~ 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝜇𝜇 (𝑖𝑖)(𝑗𝑗) , 𝜎𝜎)
(𝑗𝑗)
𝜎𝜎𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝛾𝛾𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑗𝑗: 1 … 𝑁𝑁𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
5184
Table 10: Analysis results of reading time with vocabulary rating by GLMM
Dependent variable:
SPR reading time
OW (white paper) OT (textbooks) PB (books)
SPR sentence ID −6.087∗∗∗ (0.051) −0.127∗∗∗ (0.0004) −0.142∗∗∗ (0.001)
SPR bunsetsu ID −1.501∗∗∗ (0.049) −2.046∗∗∗ (0.011) −0.856∗∗∗ (0.006)
SPR word length 24.820∗∗∗ (0.170) 5.170∗∗∗ (0.021) 6.798∗∗∗ (0.015)
SPR trial −0.757∗∗∗ (0.005) 0.382∗∗∗ (0.006)
DepPara depnum −15.310∗∗∗ (0.591) −5.258∗∗∗ (0.034)
WFR subj rate (Vocab) −81.239∗∗∗ (21.227) −16.169 ∗
(9.087) −18.405∗∗ (8.731)
Constant 558.984∗∗∗ (12.936) 353.723∗∗∗ (6.548) 306.631∗∗∗ (5.425)
data points 121,769 5,407,252 10,321,560
elimination rate (outside 3SD) 2,732 (0.0219) 83,724 (0.0152) 162,740 (0.0155)
log-likelihood −818,815.100 −32,796,021.000 −62,393,234.000
∗
Note: p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
Table 11: Analysis results of logarithm reading time with vocabulary rating by GLMM
Dependent variable:
SPR log reading time
OW (white paper) OT (textbooks) PB (books)
SPR sentence ID −0.012∗∗∗ (0.0001) −0.0004∗∗∗ (0.00000) −0.0004∗∗∗ (0.00000)
SPR bunsetsu ID −0.003∗∗∗ (0.0001) −0.006∗∗∗ (0.00003) −0.003∗∗∗ (0.00002)
SPR word length 0.036∗∗∗ (0.0002) 0.011∗∗∗ (0.0001) 0.014∗∗∗ (0.00004)
SPR trial −0.002∗∗∗ (0.00001) 0.001∗∗∗ (0.00002)
DepPara depnum −0.022∗∗∗ (0.001) −0.012∗∗∗ (0.0001)
WFR subj rate (Vocab) −0.180∗∗∗ (0.040) −0.052 ∗∗
(0.025) −0.052∗∗ (0.026)
Constant 6.255∗∗∗ (0.025) 5.826∗∗∗ (0.020) 5.664∗∗∗ (0.016)
data points 135,070 5,412,398 10,327,584
elimination rate (outside 3SD) 1,559 (0.0125) 78,578 (0.0143) 156,716 (0.0149)
log-likelihood −38,816.700 −598,180.400 −743,585.500
∗
Note: p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
and vocabulary in Japanese was developed. Before based Computational Psycholinguistics Using Annota-
the reading time experiments, word familiarity rate ex- tion Data’.
periments were conducted with the same participants.
The effect of vocabulary on reading times in Japanese 7. Appendix: Results of Bayesian Linear
was investigated by conducting a large-scale survey Mixed Model
with multi-register materials. The results of the study
confirmed that the group with a large vocabulary had
shorter reading times. Table 12: Analysis results of reading time with vocab-
The data without the original textdata is avail- ulary rating by BLMM of lognormal
able at https://github.com/masayu-a/ OW(white paper) mean se mean sd
BCCWJ-SPR26 . Further analyses will be conducted α intercept 6.217 0.064 0.123
by comparing the data with various annotations of BC- βsentid -0.012 0.000 0.002
CWJ. The Bayesian analysis will also be conducted. In βbid -0.003 0.000 0.001
addition, we hope to digitise the process of acquiring βlength 0.037 0.000 0.006
language reading ability by collecting the reading βdependency -0.024 0.001 0.009
times of L1 and L2 learners. βsubjrate -0.118 0.094 0.116
σ 0.988 0.075 0.097
6. Acknowledgments σsubj 2.603 1.020 1.254
We thank the anonymous reviewers for their valuable
feedback. This work has been partially supported A Bayesian linear mixed model evaluation (Sorensen
by JSPS KAKENHI (Grant Number JP18H05521 et al., 2016) by rstan was conducted. The following
and JP22H00663) and is NINJAL project ‘Evidence- lognormal model was used:
6
The original text is available at https://clrd. model {
ninjal.ac.jp/bccwj/en/. real mu;
5185
// prior Bates, D., Mächler, M., Bolker, B., and Walker, S.
gamma_subj ˜ normal(0,sigma_subj); (2015). Fitting linear mixed-effects models using
for (k in 1:N) { // lme4. Journal of Statistical Software, 67:1–48.
mu = alpha + beta_length * length[k] +
Cop, U., Dirix, N., Driegphe, D., and Duyck, W.
beta_dependent * dependent[k] +
(2017). Presenting geco: An eyetracking corpus of
beta_sentid * sentid[k] +
beta_bid * bid[k] + monolingual and bilingual sentence reading. Behav-
beta_subjrate * subjrate[k] + ior Research Methods, 49:602–615.
gamma_subj[subjid[k]]; Frank, S. L., Monsalve, I. F., Thompson, R. L., and
time[k] ˜ lognormal(mu,sigma); Vigliocco, G. (2013). Reading-time data for eval-
} uating broad-coverage models of english sentence
} processing. Behavior Research Methods, 45:1182–
1190.
The results of OW (white paper) are presented in Ta- Futrell, R., Gibson, E., Tily, H. J., Blank, I., Vishnevet-
ble12. The results were same as the frequentist model sky, A., Piantadosi, S., and Fedorenko, E. (2018).
results. In addition, the blmm evaluation was per- The Natural Stories Corpus. In Proceedings of the
formed on OT and PB. The models of OT and PB did Eleventh International Conference on Language Re-
not converge. Further studies will investigate more so- sources and Evaluation (LREC 2018), Miyazaki,
phisticated statistical models with the blmm. Japan, May. European Language Resources Associ-
ation (ELRA).
8. Bibliographical References Futrell, R., Gibson, E., Tily, H. J., Blank, I., Vishnevet-
sky, A., Piantadosi, S. T., and Fedorenko, E. (2021).
Amano, S. and Kondo, T. (1998). Estimation of Men-
The natural stories corpus: A reading-time corpus of
tal Lexicon Size with Word Familiarity Database. In
english texts containing rare syntactic constructions.
Proceedings of International Conference on Spoken
Language Resources and Evaluation, 55(1):63–77.
Language Processing, volume 5, pages 2119–2122.
Hlavac, M. (2018). stargazer: Well-formatted regres-
Asahara, M. and Kato, S. (2017). Between Reading
sion and summary statistics tables. R package ver-
Time and Syntactic / Semantic Categories. In Pro-
sion 5.2.2.
ceedings of IJCNLP, pages 404–412.
Asahara, M. and Matsumoto, Y. (2016). BCCWJ- Husain, S., Vasishth, S., and Srinivasan, N. (2015). In-
DepPara: A syntactic annotation treebank on tegration and prediction dificulty in Hindi sentence
the ‘Balanced Corpus of Contemporary Written comprehension: Evidence from an eye-tracking cor-
Japanese’. In Proceedings of the 12th Workshop on pus. Journal of Eye Movement Research, 8(3):1–12.
Asian Language Resources (ALR12), pages 49–58, Kato, S., Asahara, M., and Yamazaki, M. (2018). An-
Osaka, Japan, December. The COLING 2016 Orga- notation of ‘word list by semantic principles’ la-
nizing Committee. bels for balanced corpus of contemporary written
Asahara, M., Ono, H., and Miyamoto, E. T. (2016). japanese. In Proceedings of the 32nd Pacific Asia
Reading-Time Annotations for “Balanced Corpus of Conference on Language, Information, and Compu-
Contemporary Written Japanese”. In Proceedings of tation (PACLIC 32).
COLING, pages 684–694. Kennedy, A. and Pynte, J. (2005). Parafoveal-on-
Asahara, M. (2017). Between reading time and infor- foveal effects in normal reading. Vision Research,
mation structure. In Proceedings of PACLIC, pages 45(2):153–168.
15–24. Kliegl, R., Grabner, E., Rolfs, M., and Engbert, R.
Asahara, M. (2018a). Between reading time and (2004). Length, frequency, and predictability effects
clause boundaries in Japanese - wrap-up effect in of words on eye movements in reading. European
a head-final language. In Proceedings of PACLIC, Journal of Cognitive Psychology, 16(1-2):262–284.
pages 19–27. Kliegl, R., Nuthmann, A., and Engbert, R. (2006).
Asahara, M. (2018b). Between Reading Time and Tracking the mind during reading: The influence
Zero Exophora in Japanese. In Proceedings of of past, present, and future words on fixation dura-
READ2018, pages 34–36. tions. Journal of Experimental Psychology: Gen-
Asahara, M. (2019). Word familiarity rate estima- eral, 135(1):12–35.
tion using a Bayesian linear mixed model. In Pro- Kuperman, V., Dambacher, M., Nuthmann, A., and
ceedings of the First Workshop on Aggregating Kliegl, R. (2010). The effect of word position
and Analysing Crowdsourced Annotations for NLP, on eye-movements in sentence and paragraph read-
pages 6–14, Hong Kong, November. Association for ing. Quarterly Journal of Experimental Psychology,
Computational Linguistics. 63:1838–1857.
Baayen, R. H. (2008). Analyzing Linguistic Data: A Laurinavichyute, A. K., Sekerina, I. A., Alexeeva,
practical Introduction to Statistics using R. Cam- S., Bagdasaryan, K., and Kliegl, R. (2019). Rus-
bridge University Press. sian sentence corpus: Benchmark measures of eye
5186
movements in reading in russian. Behavior Research ROM]. University of Dundee, Psychology Depart-
Methods, 51(3):1161–1178. ment.
Luke, S. G. and Christianson, K. (2018). The Steven Luke. (2017). The Provo Corpus: A Large Eye-
provo corpus: A large eye-tracking corpus with Tracking Corpus with Predictability Norms. OSF.
predictability norms. Behavior Research Methods, NTT Communication Science Laboratories. (2021).
50:826–833. NTT. NTT Goi Database.
Maekawa, K., Yamazaki, M., Ogiso, T., Maruyama, T., Naoko Sakuma and Mutsuo Ijuin and Takao Fushimi
Ogura, H., Kashino, W., Koiso, H., Yamaguchi, M., and Masayuki Tanaka and Shigeaki Amano and
Tanaka, M., and Den, Y. (2014). Balanced Corpus Kimihisa Kondo. (2008). Tangoshinzousei: NTT
of Contemporary Written Japanese. Language Re- Database Series: Nihongo-no Goitokusei Volume 8.
sources and Evaluation, 48:345–371. Sanseido.
Matsumoto, S., Asahara, M., and Arita, S. (2018).
Japanese clause classification annotation on the bal-
anced corpus of contemporary written japanese. In
Proceedings of the 13th Workshop on Asian Lan-
guage Resources (ALR13), pages 1–8.
Miyauchi, T., Asahara, M., Nakagawa, N., and Kato, S.
(2017). Annotation of information structure on the
balanced corpus of contemporary written japanese.
In Proceedings of 2017 Conference of the Pacific As-
sociation for Computational Linguistics.
NTT Communication Science Laboratories. (1999-
2008). NTT Database Series [Nihongo-no Goitoku-
sei]: Lexical Properties of Japanese.
Pan, J., Yan, M., Richter, E. M., Shu, H., and Kliegl,
R. (2021). The beijing sentence corpus: A chinese
sentence corpus with eye movement data and pre-
dictability norms. Behavior Research Methods.
R Core Team. (2020). R: A language and environment
for statistical computing.
Sorensen, T., Hohenstein, S., and Vasishth, S. (2016).
Bayesian linear mixed models using Stan: A tutorial
for psychologists, linguists, and cognitive scientists.
Quantitative Methods for Psychology, 12:175–200.
Vasishth, S., Schad, D., Bürki, A., and Kliegl, R.
(2021). Linear Mixed Models in Linguistics and
Psychology: A Comprehensive Introduction.
5187