Professional Documents
Culture Documents
Acta Psychologica
journal homepage: www.elsevier.com/locate/actpsy
A R T I C L E I N F O A B S T R A C T
Keywords: Despite the essential role of statistical learning in shaping human behavior, there are still controversies con
Statistical learning cerning its measurement. In this paper, we present a novel online target-detection task in an acoustic word
Word segmentation segmentation paradigm, which is able to track the process of learning and does not build on deliberation and
Online target detection
decision making. Beside testing the novel online task, we also examined its relationship with two offline mea
Two-alternative forced choice task
Statistically-induced chunking recall task
sures: the traditional two-alternative forced choice (2AFC) task, and the statistically-induced chunking recall
(SICR) task (Isbilen et al., 2017). Participants showed a significant learning effect on the online task, reflected in
PsycINFO classification codes:
2343 (Learning & Memory)
the decrease of reaction times during training and in the differences between reaction times to predictable versus
2220 (Tests & Testing) unpredictable targets. Online learning scores correlated with the 2AFC scores, but this association was only
present when participants did not have explicit knowledge about stimuli. SICR scores were not associated with
any of the other measures. The internal consistency was higher for online learning measures than for the other
two tasks. These findings show that the online target detection task is a good tool for assessing statistical
learning, and invite further research on its psychometric properties.
* Corresponding author at: Department of Cognitive Science, Budapest University of Technology and Economics, Budapest, Egry József utca 1, H-1111, Hungary.
E-mail addresses: lukics.krisztina.sara@ttk.bme.hu (K.S. Lukics), lukacs.agnes@ttk.bme.hu (Á. Lukács).
https://doi.org/10.1016/j.actpsy.2021.103271
Received 5 June 2020; Received in revised form 8 February 2021; Accepted 10 February 2021
Available online 22 March 2021
0001-6918/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
participants are shown legal words from the learning phase and part- Batterink and Paller (2017), RTs were measured following a separate
words or non-words from the same syllable set and with the same learning phase. This way, while making the familiarization phase
length. In judgement tasks and two-alternative forced choice (2AFC) identical to those in traditional SL tasks, they applied post-hoc testing,
tasks (in the case of adults) or in head-turn preference paradigms (in the and measured learning at one point only. Item-detection tasks, however,
case of infants), participants are able to discriminate between words and also give an opportunity to track the process of learning online. Based on
part-words or non-words above chance level.1 This above-chance per the Serial Reaction Time Task (Nissen & Bullemer, 1987), Hunt and
formance has been regarded as evidence that SL has occurred. However, Aslin (2001) created a series of segmentation tasks where light patterns
although judgement and forced-choice tasks are the most widely used on a response box followed a statistical pattern and triplets of light pairs
tests for SL, there have been concerns about their psychometric suit formed “words”. Participants had to press lit buttons. As a measure of
ability, and therefore, their accurateness. This is especially problematic learning, reaction times of button presses to predictable versus unpre
when testing individual differences in SL, which, like in other fields of dictable items were contrasted. This task revealed the acquisition of
cognitive science, have recently gained significant attention (particu statistical patterns in a domain and modality different from the tradi
larly in the case of investigating the relationship between linguistic skills tional word segmentation paradigms, but the findings motivate tracking
and SL, e.g., Conway, Bauernschmidt, Huang, & Pisoni, 2010; Kidd, SL online with other stimulus arrangements as well. In the verbal
2012; Misyak & Christiansen, 2012; but see Kaufman et al., 2010, for acoustic domain, Batterink (2017) examined the process of SL in a word
relationship of SL with intelligence and personality traits, and Siegelman segmentation task by comparing RTs for predictable and unpredictable
& Frost, 2015, for methodological considerations). syllables through numerous short streams. While the psychometric
Psychometric concerns about judgement or forced-choice measures properties and the accurateness of these tasks are yet to be tested, the
point out that in adults, these show no, or only weak associations with less explicit nature and certain findings (for example, the relationship
other measures of SL on the same learning material (e.g., Batterink & between neural entrainment during learning and post-learning RT
Paller, 2017; Batterink, Reber, Neville, & Paller, 2015; Franco, Eberlen, scores in the study of Batterink, 2017) are promising.
Destrebecqz, Cleeremans, & Bertels, 2015; Isbilen, McCauley, Kidd, & The statistically-induced chunking recall (SICR) task by Isbilen et al.
Christiansen, 2017), they show low internal consistency (Arnon, 2019; (2017) is the result of a targeted attempt to find better measures for
Siegelman, Bogaerts, Elazar, Arciuli, & Frost, 2018), and modest sta individual differences in SL. In this task, after familiarization, partici
bility over time (Arnon, 2019; Siegelman & Frost, 2015) (however, in pants have to recall strings that are either structured by the statistical
ternal consistency and test-retest reliability is higher in the case of properties encountered during the preceding training phase, (i.e. strings
nonlinguistic SL tasks, Arnon, 2019; Siegelman, Bogaerts, & Frost, 2017, that consist of words), or unstructured strings which contain syllables of
Siegelman, Bogaerts, Elazar, et al., 2018). Moreover, test-retest reli the familiarization phase in random order. Results showed that short-
ability is extremely weak in the case of children (Arnon, 2019). There term memory performance for structured items was better than for un
are also theoretical considerations against the use of post-exposure structured ones. When assessing the test-retest reliability of the measure,
explicit tasks. As deliberation and decision making processes are Isbilen and her colleagues found that recall performance for structured
inherent in judgement tasks, noise deriving from variation in other items and random items both had excellent stability over time; the dif
cognitive abilities involved in explicit judgments might be large in these ference score, however, was less stable. They further examined the
task scores, questioning their validity in measuring SL. Furthermore, relationship between SICR performance for structured items and 2AFC
metalinguistic instructions might be challenging for children, presenting scores, and found no correlation between the two measures. The authors
a further potential confound in assessing their SL capacity. argue that since SICR proved to be reliable measure of SL, and 2AFC is
These arising concerns and the effort to find better targeted tasks has not reliable, SICR is able to capture individual differences in SL ability.
resulted in a diverse set of paradigms providing an opportunity to find a As both target detection and SICR paradigms are good candidates for
more proper measure for SL. The target detection paradigm was adapted being more targeted measures of SL, which could make them suitable for
to verbal statistical learning by Franco et al. (2015). This task requires more accurate group level testing as well as assessing individual dif
participants to press a button whenever they hear/see a given item in the ferences, we need detailed studies of both measures, together with
stream. As participants proceed to extract the underlying structure of developing new tasks which might better meet these expectations. The
stimuli, they become increasingly better in processing the next element aim of the present study is to introduce a new target detection paradigm
in linguistic sequences, resulting in quicker button presses. Franco and which examines statistical learning online. We wanted to develop a new
colleagues demonstrated learning with this measure in a post-training task 1) to assess learning online, getting insight into the process of
task: after familiarization, in a set of consecutive short streams, they learning over time, not only in a post-hoc manner; 2) to have a measure
compared RTs for syllables that were either predictable (second or third that does not build on deliberation and decision making, and thus
syllables of a word) or unpredictable (first syllables of a word). They measures SL with less influence and noise from other cognitive capac
found a meaningful diversity in these RT scores, which did not correlate ities; 3) and to obtain a task which is not metalinguistic and this way,
with 2AFC scores. In a similar design, Batterink et al. (2015) also found more suitable for children. In Experiment 1, we wanted to examine the
no correlation between the 2AFC measures and RT measures from their relationship between online target detection measures and scores from
target detection task. SICR and 2AFC tasks testing the knowledge of the same training mate
In a subsequent study, Batterink and Paller (2017) also used a post- rial. By examining the relationship between online target detection
familiarization item-detection task in their word segmentation para scores and SICR and 2AFC scores, we wanted to test the validity of this
digm. They found that across all their measures, RT scores correlated new measure, as well as to contribute to the search for a more targeted
with degree of neural entrainment throughout the familiarization phase, measure of SL. We hypothesized 1) that we will find evidence of learning
which they assumed to be the most direct indicator of learning. in online RT changes in the target detection task, 2) that the measures
Although the item-detection RT measure and the explicit measures were from the online item-detection task and the SICR task would show a
correlated, no associations were found between entrainment changes significant correlation, as they are less influenced by other cognitive
and explicit SL measures. domains than the 2AFC task, and 3) that the relationship between the
In the studies by Franco et al. (2015), Batterink et al. (2015), and 2AFC task and the other two measures will be more modest.
1
In the case of adults, there may be small differences in the test: they can be
instructed to give their responses based on preference, familiarity, or the word
status of the test sample.
2
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
2. Experiment 1 Table 2
Sequences in the SICR task by type.
2.1. Method Word Part-word Non-word
3
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
Fig. 1. The procedure of the experiment. In the first phase, participants completed an online target detection task. After this phase, there were two more tasks: a two-
alternative forced choice (2AFC) task and a statistically-induced chunking recall (SICR) task. The order of the two latter was counterbalanced across participants to
eliminate potential order effects.
task, we calculated an accuracy rate for each sequence type (word, part- blocks, and a learning effect reflected in significantly longer RTs for
word, and non-word) by dividing the number of correctly recalled unpredictable syllables in the random block than in the preceding and
bigrams by the number of all bigrams, yielding a number between 0 and subsequent word blocks. The changes in RTs through blocks are illus
1. This method differed from that of Isbilen et al. (2017), who quantified trated in Fig. 2.
learning as the number of correctly recalled syllables. We chose this As the assumption of normal distribution was violated in the case of
score as it provides the best resolution of recalled sequences while still accuracies, as well, a Friedman test was used to test differences between
reflecting the acquisition of item relations, which is a core element of SL. the blocks. This test also showed a significant block effect, χ2(4) = 42.07,
p < .001. The post hoc Wilcoxon signed ranks tests after Holm-
Bonferroni sequential corrections showed that there was a significant
2.2. Results
increase in hit rates between the first two blocks, and participants were
significantly less accurate in their responses in the random block
2.2.1. Online target detection task
compared to the preceding and following word blocks, but there was no
In the online target detection task, RTs were collected from accurate
significant difference between the second and third training blocks
button presses for targets within a 1200 ms time window from stimulus
(Block TRN1 < Block TRN2: Z = − 3.17, p = .003, r = 0.61; Block TRN2
onset. Accuracies were calculated by dividing the number of correct
< Block TRN3: Z = − 1.87, p = .062, r = 0.36; Block TRN3 > Block RND:
responses on targets by the number of targets. To analyze reaction times
Z = − 3.46, p = .002, r = 0.67; Block RND < Block REC: Z = − 3.77, p <
and accuracies, we calculated the median of RTs and accuracies for each
.001, r = 0.73). Accuracies through blocks are shown in Fig. 3.
block by participant. Descriptive statistics for RTs and accuracies are
shown in Table 3.
2.2.2. Two-alternative forced choice (2AFC) task
RT changes through blocks 1–5 were analyzed with a Friedman test
As there was no order effect either in the 2AFC task or in the SICR
(as the assumption of normality of residuals in the parametric ANOVA
task, task order was not taken into account in the analysis of the post-
was violated), and a significant effect of Block was found, χ2(4) = 70.09,
familiarization tasks. In the 2AFC task we tested three contrasts: 1)
p < .001. Post hoc Wilcoxon signed ranks tests revealed that RTs in each
block were significantly different from the previous one (we applied
Holm-Bonferroni sequential corrections because of the multiple com
parisons, Gaetano, 2018: Block TRN1 > Block TRN2: Z = − 3.03, p = 1000
** *** *** ***
.002, r = 0.58; Block TRN2 > Block TRN3: Z = − 4.13, p < .001, r = 0.80;
Block TRN3 < Block RND: Z = − 4.52, p < .001, r = 0.87; Block RND >
Block REC: Z = − 4.52, p < .001, r = 0.87). That is, we found a learning 750
and/or practice effect yielding decreasing RTs through the first three
median RT
Table 3
Descriptive statistics of median RTs and accuracies by block in the online target 500
detection task.
Block Block RT RT range ACC ACC
type median median range
250
Block Training 400.00 268.00–797.00 0.60 0.33–0.87
TRN1
Block Training 342.50 105.00–658.00 0.73 0.47–1.00
TRN2 0
Block Training 176.00 70.00–627.00 0.80 0.40–1.00
TRN3 TRN1 TRN2 TRN3 RND REC
Block Random 611.00 485.00–816.00 0.53 0.20–1.00
RND Fig. 2. Median reaction times by block. Boxes indicate RT data between the
Block Recovery 236.00 35.00–631.50 0.80 0.33–1.00
first and third quartiles, vertical lines indicate medians, and whiskers illustrate
REC
the range of data in each group.
4
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
RT difference = (Block RND RT mean− Block TRN3 RT mean)+(Block RND RT mean− Block REC RT mean)
word vs. part-word, 2) word vs. non-word, and 3) part-word vs. non- 2
5
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
the training blocks. For instance, a participant may show very fast measure of the word segmentation paradigm to SL processes. This online
learning (identical RTs at the beginning and at the end of the training) if target detection task proved to be capable of assessing SL: the decreasing
their RTs did not decrease through the training blocks. But if their RTs at RTs and increasing accuracies through the first three blocks show that
the end of the training did not differ from RTs in the reference block the paradigm is suitable for measuring learning in its process, providing
(Block RND), or their RTs were too scattered at the end of training, the data about its temporal properties, and the difference between the
lack of change through Blocks TRN1 to TRN3 cannot be attributed to fast random block and the neighboring word blocks showed a learning effect
learning. A detailed description of the three scores is provided in Ap that is not influenced by motor practice. We also calculated another
pendix A. In the case of the 2AFC and SICR measures, the overall scores exploratory online measure, learning efficiency, which, contrary to the
were used, as these cover the testing of all three contrasts. other two online measures, RT training and RT difference, is capable of
We calculated Spearman-Brown reliability coefficients of split-half capturing the dynamics of learning. RT training and RT difference had
correlations for the indices used in the correlational analyses through relatively high split-half reliabilities. Although they did not reach the
5000 iterations. For each measure, in each iteration, we divided the conventional psychometric standard of 0.80, these values outscore in
trials of the task into two random sets with the constraint that different ternal consistencies measured in linguistic statistical learning studies
trial types were represented equally in the two sets (two random sets for with the 2AFC task (Arnon, 2019; Siegelman, Bogaerts, Elazar, et al.,
RTs in the different blocks in the target detection task for the online 2018). Learning efficiency had more moderate reliability, however, as
measures, and two random sets for the different contrast types in the this was an exploratory measure, we aimed to test its adequacy further in
2AFC and SICR tasks), calculated task indices based on the responses in Experiment 2.
the two trial sets for each participant, and correlated indices calculated In Experiment 1, we looked at the relationship of the novel target
from these two sets. We applied Spearman-Brown correction to the ob detection task measures and two offline measures from the 2AFC task,
tained Pearson correlation coefficients in each iteration as the split-half and the SICR task by Isbilen et al. (2017). We expected the target
method underestimates reliabilities. Mean reliability coefficients were detection and SICR measures to show a significant correlation, as both
0.71 for RT training (2.5th percentile: 0.46; 97.5th percentile: 0.86), 0.70 are hypothesized to be more adequate measures of SL than the widely
for RT difference (2.5th percentile: 0.50; 97.5th percentile: 0.86), and used 2AFC task. We also expected that the 2AFC measure, due to its
0.61 for learning efficiency (2.5th percentile: 0.27; 97.5th percentile: deficiencies, would show weaker correlations with the online item
0.83). Reliabilities of the offline measures were more moderate: 0.58 for detection and SICR task scores. Contrary to our hypotheses, the SICR
2AFC overall (2.5th percentile: 0.30; 97.5th percentile: 0.78), and 0.43 measure was not associated with any other measures, and we found a
for SICR overall (2.5th percentile: 0.07; 97.5th percentile: 0.71). significant correlation between two of our online measures (the RT
In the analysis of relationships between RT training, RT difference, difference and learning efficiency) and 2AFC scores. 2AFC scores were not
2AFC overall, and SICR overall scores the two RT measures showed a particularly reliable, being in the range of earlier findings (Arnon, 2019;
significant correlation, and there was also a significant correlation be Siegelman, Bogaerts, Elazar, et al., 2018). The internal consistency of
tween the RT difference and the 2AFC overall measures. On the other the SICR measure was especially weak.
hand, the SICR overall score did not correlate with any of the other task A possible explanation for the relationship of online and offline 2AFC
measures. Results of this correlation analysis are detailed in Table 6. scores is that as the presentation rate was relatively slow (compared to
We also examined the relationship between learning efficiency and the presentation rates from 215 to 300 ms, Franco et al., 2015; Batterink
2AFC overall and the SICR overall measures. A distinct correlation et al., 2015; Batterink & Paller, 2017), and the word set was relatively
analysis was made as learning efficiency shares variability with both RT small and well separable (four words with TPs of 1 between syllables
training and RT difference. Learning efficiency was significantly correlated within words), the structure of the streams could have been transparent
with the 2AFC overall score, but the SICR overall score did not correlate for participants. Earlier studies with visual SL tasks found that under
with the other two measures (Table 7). explicit instructions and slower presentation rates (800 ms, 6000 ms,
For a more direct comparison of our results and those of Isbilen et al. 14,000 ms) participants may gain explicit knowledge of the stimuli (as
(2017), we also included three further measures used in their original discussed by Arciuli, Torkildsen, Stevens, & Simpson, 2014). While there
study, the 2AFC word > non-word contrast, the SICR word syllable-by- were no explicit instructions to look for words in the present study, the
syllable scores (the proportion of recalled syllables in the case of six- properties of the word set and the slower presentation rate could lead to
syllabic sequences consisting of words), and the a SICR word trigram the formation of explicit representations, which then would facilitate
scores (the proportion of recalling trigrams in the case of six-syllabic performance on the online target detection task, as well as on the 2AFC
sequences consisting of words), in two additional analyses. There was task, mediating the relationship between them. In a second experiment,
no significant relationship either between the 2AFC word > non-word we aimed to explore this possibility. As the correlations between the
and SICR word syllable-by-syllable scores (rs = 0.02, p = .911), or between SICR measure and either the online target detection or 2AFC scores were
the 2AFC word > non-word and SICR trigram word scores (rs = 0.08, p = not significant, it may not, or may have differently been affected by
.708). explicit representations. Because of the lack of a significant association,
we did not include the SICR task in Experiment 2.
In Experiment 1, we tested the sensitivity of a novel target detection In Experiment 1, participants may have developed explicit
Table 6
Correlations between online and offline measures. Table 7
Correlations between learning efficiency and offline measures.
Measure 1 2 3
Measure 1 2
1. RT training
2. RT difference 0.68* 1. Learning efficiency
3. 2AFC overall 0.19 0.49* 2. 2AFC overall 0.59*
4. SICR overall − 0.13 − 0.14 0.10 3. SICR overall 0.08 0.10
Note. As the RT difference and 2AFC overall scores did not have a normal dis Note. As the 2AFC overall score did not have a normal distribution, Spearman’s rs
tribution, Spearman’s rs coefficients are reported here. coefficients are reported here.
* *
p < .01. p < .01.
6
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
representations of words or segments predicting target syllables, and 0.66; Block TRN2 > Block TRN3: Z = − 3.16, p = .002, r = 0.40). RTs in
this explicit knowledge could have facilitated performance in the online the random block were also significantly different from RTs in the
target detection and the 2AFC task, driving associations between the RT neighboring word blocks (Block TRN3 < Block RND: Z = − 6.50, p <
difference scores and the 2AFC measures. To explore this possibility, we .001, r = 0.83; Block RND > Block REC: Z = − 6.41, p < .001, r = 0.81).
wanted to look at the relationship between the online reaction time (RT Changes in median RTs are illustrated in Fig. 4.
difference and learning efficiency) and the offline 2AFC measures in more For analyzing the learning process in accuracy changes in target
detail taking into account the explicit knowledge of words by partici detection, we also used a Friedman ANOVA and found a significant ef
pants. Thus, in Experiment 2, we tested participants with the same word fect of Block, χ2(4) = 36.48, p < .001. Post hoc Wilcoxon signed rank
segmentation task without the SICR task, as we found no associations tests with Holm-Bonferroni corrections for significance levels showed
between the scores of this task and measures from the other two tasks in that through the first three blocks, there was a significant increase in
Experiment 1. We also assessed participants’ explicit knowledge of the accuracies for targets between the blocks TRN1 and TRN2, but not be
training stimuli in a debriefing session after the task. tween blocks TRN2 and TRN3 (Block TRN1 < Block TRN2: Z = − 2.26, p
= .048, r = 0.28; Block TRN2 < Block TRN3: Z = − 1.65, p = .099, r =
0.21). Accuracies in the random block were significantly lower than in
3.1. Method
the neighboring word blocks (Block TRN3 > Block RND: Z = − 4.15, p <
.001, r = 0.52; Block RND < Block REC: Z = − 4.81, p < .001, r = 0.60).
3.1.1. Participants
Changes in target accuracies through blocks are illustrated in Fig. 5.
64 university students (mean age: 20.88, SD = 2.37, 45 females, 8
left-handed) participated in the experiment for course credit; they were
3.2.2. Two-alternative forced choice (2AFC) task
all native speakers of Hungarian. Before the experiment, all of them gave
In the 2AFC task, as in Experiment 1, we tested three contrasts: 1)
an informed consent, in accordance with the principles set out in the
word vs. part-word, 2) word vs. non-word, and 3) part-word vs. non-
Declaration of Helsinki and the stipulation of the local IRB. The study
word sequences. For each contrast, a score was calculated by dividing
was approved by the United Ethical Review Committee for Research in
the number of correct answers by the number of all trials. The overall
Psychology (EPKEB). Participants were informed about the purpose of
measure was calculated by averaging accuracy rates in the three con
the research after the study.
trasts. These measures are detailed in Table 9. We found a significant
difference from the chance level of 0.5 in the case of all 2AFC measures.
3.1.2. Stimuli
Stimuli were the same as in Experiment 1, without stimuli for the
3.2.3. Relationships between online and offline measures
SICR subtask.
As in Experiment 1, we assessed the reliability of Experiment 2
measures. The method of reliability estimation was identical to that of
3.1.3. Procedure
Experiment 1: we calculated Spearman-Brown reliability coefficients of
Participants completed the experiment in groups of two or three. The
split-half correlations in 5000 random samples for each measure. Mean
procedure was the same as that of Experiment 1 without the SICR sub
reliability coefficients were 0.74 for RT training (2.5th percentile: 0.62;
task. Participants completed the online learning task first, and then the
97.5th percentile: 0.83), 0.86 for RT difference (2.5th percentile: 0.80;
offline 2AFC task. After the experiment, they filled out a debriefing form
97.5th percentile: 0.91), and 0.68 for learning efficiency (2.5th percen
consisting of a set of open-ended questions asking about their knowledge
tile: 0.53; 97.5th percentile: 0.80). Reliability of 2AFC overall was 0.37
of the stimuli used in the experiment. Questions of the debriefing session
(2.5th percentile: 0.13; 97.5th percentile: 0.57).
are listed in Appendix B.
We used the RT difference and the 2AFC overall measures for assessing
the relationship between the task scores, as these measures were
3.2. Results correlated in Experiment 1. The correlation between these measures was
significant, but moderate (rs = 0.32, p = .009). With the purpose of
3.2.1. Online target detection task testing its adequacy, we also examined the relationship of learning effi
To analyze reaction time and accuracy data through the online task, ciency to the 2AFC overall measure, and here too, there was a weak, but
we calculated median RTs as well as target accuracies for each block by
participant as in Experiment 1. Descriptive statistics for these measures
are shown in Table 8.
We analyzed RTs through the five blocks with a Friedman ANOVA 1000
*** ** *** ***
(as the assumption of normal distribution of the residuals in the para
metric ANOVA was violated) and found a significant effect of Block,
χ2(4) = 116.61, p < .001. Two participants were excluded from this
750
analysis as they gave no responses in Block TRN1. For the post hoc
Wilcoxon signed ranks tests we applied Holm-Bonferroni sequential
median RT
Table 8
250
Descriptive statistics of median RTs and accuracies by block in the online target
detection task.
Block Block RT RT range ACC ACC
type median median range 0
Block TRN1 Training 412.50 198.00–780.50 0.67 0.00–1.00
TRN1 TRN2 TRN3 RND REC
Block TRN2 Training 318.25 25.00–747.00 0.73 0.13–1.00
Block TRN3 Training 222.75 11.50–762.50 0.80 0.27–1.00
Fig. 4. The distribution of median RTs by block. Boxes indicate RT data be
Block RND Random 597.50 364.00–762.00 0.60 0.27–0.93
Block REC Recovery 310.00 53.00–698.00 0.73 0.27–0.93 tween the first and third quartiles, vertical lines indicate medians, and whiskers
show the range of data in each group.
7
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
The association between online and offline measures was not significant
in the case of those who knew what predicted their target syllables (RT
0.50 difference and 2AFC overall: rs = 0.06, p = .744; learning efficiency and
2AFC overall: rs = 0.04, p = .810), but there was a significant relation
ship between online and offline test scores in the case of participants
0.25 who did not have explicit knowledge about what syllables or syllable
combinations predicted their targets (RT difference and 2AFC overall: rs
= 0.50, p = .013; learning efficiency and 2AFC overall: rs = 0.55, p =
0.00 .006). The difference between the strength of associations in the pres
ence and absence of local pattern knowledge was not statistically sig
TRN1 TRN2 TRN3 RND REC
nificant either in the case of RT difference and 2AFC overall scores (Z =
Fig. 5. The distribution of accuracies by block. Boxes indicate accuracy data 1.51, p = .132), or in the case of learning efficiency and 2AFC overall
between the first and third quartiles, vertical lines indicate medians, and scores (Z = 1.50, p = .134).
whiskers show the range of data in each group. A similar pattern was observed when participants were grouped by
knowledge of the global pattern. There was a trend-like difference be
tween the groups in RT difference scores, Z = − 1.76, p = .079, r = 0.23,
but there was no difference for learning efficiency scores, Z = − 1.33, p =
Table 9 .185, r = 0.17, and 2AFC overall scores, Z = − 0.44, p = .662, r = 0.06.
Accuracy rates and the differences from chance level in the case of 2AFC Descriptive statistics of groups are shown in Table 11. The relationship
measures. between online and offline measures was not significant in those who
Type Median Range Z value p(x > Effect noticed that the stream consisted of words (RT difference and 2AFC
0.5) size overall: rs = 0.14, p = .550, learning efficiency and 2AFC overall: rs =
1. 2AFC overall 0.71 0.42–0.92 Z= p < .001 r = 0.85 − 0.15, p = .507), but it was significant in those who did not have explicit
6.81 knowledge about the pattern (RT difference and 2AFC overall: rs = 0.38,
2. 2AFC word > part- 0.75 0.25–1.00 Z= p < .001 r = 0.78
p = .018, learning efficiency and 2AFC overall: rs = 0.55, p < .001). The
word 6.20
3. 2AFC word > non- 0.88 0.50–1.00 Z= p < .001 r = 0.84 difference between these correlations in the presence and absence of
word 6.75 global knowledge was not statistically significant in the case of RT dif
4. 2AFC part-word > 0.63 0.13–1.00 Z= p = .004 r = 0.36 ference and 2AFC overall scores (Z = 1.56, p = .118), but the correlations
non-word 2.88 of learning efficiency and 2AFC overall scores in the presence and absence
Note. Differences from chance level of 0.5 were tested using Wilcoxon signed of global pattern knowledge differed significantly (Z = 2.95, p = .003).
rank tests, as the measures were not normally distributed. The relationship of online and offline measures in samples divided along
aspects of recognizing local and global patterns is illustrated in Figs. 6
and 7.
significant relationship (rs = 0.26., p = .037).
To see whether developing explicit representations of words
contributed to correlations between the online and offline measures, we 3.3. Discussion
divided participants along two dimensions derived from the answers
they gave in the debriefing questionnaire. Responses were scored by two Experiment 2 was designed to examine the effect of forming explicit
independent raters. The first dimension reflected whether they noticed representations on the relationship between measures of the online
any local pattern in the stimuli, that is, whether they had any knowledge target detection task and the 2AFC task. We replicated findings of
about what syllable or set of syllables predicted their target syllable. If a Experiment 1, as we found a positive relationship between the online
participant gave responses to Questions 1–4 implying knowledge of a target detection measures (RT difference and learning efficiency) and the
local pattern, i.e., what preceded their target syllable, they got an overall 2AFC task measures. Contrary to our assumption, though, this rela
score of 1 in the local pattern dimension. These responses included tionship was only present in the absence of explicit knowledge about
sentences like “My target syllable occurred after the syllable ‘go’.”, “I local or global patterns. We found no evidence of an association between
knew which syllable preceded my target.”, or “‘Go’ was always followed measures in the groups of participants who had explicit knowledge
by ‘ki’.”. If the answers gave no indication of local pattern knowledge, about what predicted their target syllable in the online task, or about the
they got an overall score of 0. If their responses were ambiguous across
Questions 1–4, they were not scored and were excluded from analyses
investigating the effect of local pattern knowledge. The second dimen
Table 10
sion reflected whether they noticed a global pattern, that is, whether Descriptive statistics of RT difference, learning efficiency and 2AFC overall
they knew that the stimuli consisted of words. If a participant gave re scores grouped by knowledge of local patterns.
sponses to Questions 1–4 implying knowledge of the global pattern, i.e.,
Measure Knowledge of local patterns
that the streams consisted of words, they got a score of 1 in the global
pattern dimension. These responses included sentences like “There were No Yes
words in the speech stream.”, “Syllables could be combined.”, or “There Median Range Median Range
were patterns in the speech stream.”. If the answers gave no indication of 1. RT difference 162.50 − 315.25–556.50 402.50 9.00–593.50
global pattern knowledge, they got a score of 0. If their responses were 2. Learning efficiency 0.05 − 0.12–0.32 0.15 0.01–0.30
ambiguous regarding global pattern knowledge, they were not scored 3. 2AFC overall 0.69 0.42–0.83 0.71 0.50–0.92
8
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
Table 11 observed weak internal consistency for 2AFC task scores, replicating
Descriptive statistics of RT difference, learning efficiency and 2AFC overall earlier findings (Arnon, 2019; Siegelman, Bogaerts, Elazar, et al., 2018).
scores grouped by knowledge of global patterns.
Measure Knowledge of global pattern 4. General discussion
No Yes
In this paper, a novel online target detection task was introduced to
Median Range Median Range
test statistical word segmentation abilities. Most SL tasks measure
1. RT difference 362.75 − 110.00–539.50 265.00 45.75–564.00 learning only at a given point, preventing gaining insight into the pro
2. Learning efficiency 0.13 − 0.05–0.32 0.05 0.00–0.30
cess of learning throughout the task. The present task was designed to be
3. 2AFC overall 0.71 0.42–0.92 0.75 0.50–0.88
suitable for tracking learning online through reaction times and accu
racies. The aim of Experiment 1 was to test the sensitivity of online
measures from this task to SL processes, and to look at their relationship
fact that the streams consisted of words. This suggests developing with measures from a 2AFC task and the SICR task by Isbilen et al.
explicit representations about the stimuli was not the factor explaining a (2017). Recently, several authors in the SL literature raised concerns
significant relationship between the online target detection and the that 2AFC might be influenced by other cognitive processes, like
2AFC task measures, and implies that implicit extraction of structure deliberation and decision-making (e.g., Arnon, 2019; Christiansen,
from sound patterns can account for the positive relationship found in 2018; Isbilen et al., 2017), and therefore it may not be a suitable mea
Experiment 1 and 2. sure for SL. The online target-detection and the SICR tasks are
Furthermore, our online measures showed good split-half reliability processing-based measures (Christiansen, 2018), which have the
values, especially the RT difference score. On the other hand, we advantage of not requiring reflection on the acquired representations,
therefore, they are both good candidates to be suitable measures for SL.
A) Explicit knowledge about global pattern B) No explicit knowledge about global pattern
C) Explicit knowledge about local pattern D) No explicit knowledge about local pattern
Fig. 6. The relationship between the 2AFC overall and RT difference measures in groups of participants A) who had explicit knowledge of global patterns, B) who did
not have any explicit knowledge of global patterns, C) who had explicit knowledge about local patterns, D) who did not have any explicit knowledge of local patterns
in the experimental stimuli. For illustrative purposes, we fitted a regression line to the data, and a minimal amount of jitter was added to increase visibility.
9
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
A) Explicit knowledge about global pattern B) No explicit knowledge about global pattern
C) Explicit knowledge about local pattern D) No explicit knowledge about local pattern
Fig. 7. The relationship between the 2AFC overall and learning efficiency measures in groups of participants A) who had explicit knowledge of global patterns, B) who
did not have any explicit knowledge of global patterns, C) who had explicit knowledge about local patterns, D) who did not have any explicit knowledge of local
patterns in the experimental stimuli. For illustrative purposes, we fitted a regression line to the data, and a minimal amount of jitter was added to increase visibility.
In Experiment 1, we found that the online target detection measure learning efficiency did not extend our findings, we believe it can be a
was sensitive to learning: RTs decreased and accuracies increased promising tool in atypical populations, where the dynamics of SL can be
through the three training blocks with streams of words, and the more affected and variable (e.g., Developmental Language Disorder,
disruption of structure in the random block resulted in a significant in Developmental Dyslexia, etc.) than what we see in typical performance.
crease in reaction times and decrease of accuracies, relative to both the In these cases, this measure of the dynamics of learning can be an
previous and next word blocks. These results – the decrease in RTs and additional tool of capturing the nature of differences between pop
increase in accuracies – show that the prediction of targets becomes ulations. However, despite being a promising measure, its relatively low
better over time and likely reflect participants’ increasing sensitivity to reliability needs further attention.
the statistical structure of the stream. The presence of such a learning Despite the promising results, our paradigm is not without limita
effect makes the online target detection task a good candidate for a more tions. In order to create an online target detection task which is capable
targeted SL measure. This is also supported by reliability estimates of of tracking learning from the beginning and also feasible for participants
task indices: RT training and RT difference both had high internal con who are exposed to the stimuli for the first time, we used a slow pre
sistencies across the two experiments. We also introduced and verified sentation rate. This rate of 500 ms (with 400 ms long syllables and 100
the adequacy of the learning efficiency measure. In contrast to the other ms long pauses), diverges from the timing of natural language, which is a
two online measures based on reaction time differences between blocks, limitation of this task. We are currently working on a version with a
this measure is sensitive to the dynamics of learning. In both experi faster presentation rate (with 270 ms syllable duration and 30 ms pau
ments, learning efficiency showed similar associations with offline mea ses) to test whether the presentation rate of syllables has an effect on
sures as the RT difference online score. While in our study, introducing learning.
10
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
As one reviewer pointed out, responses to last syllables could serve as replicated in Experiment 2). This result is similar to the findings of
cues to word boundaries, thus enhancing performance in the 2AFC task: Siegelman and his colleagues (Siegelman, Bogaerts, Kronenfeld, & Frost,
that is, correct motor responses reliably coincided with word bound 2018), who tracked learning online utilizing a self-paced learning task in
aries. Indeed, results from the statistical learning literature show that a visual nonverbal segmentation paradigm. They found a positive as
input from other than the primary input modality can contribute to sociation between online scores derived from the self-paced learning
learning. In a word segmentation study with four months old infants phase and 2AFC scores within the same task. On the other hand, we
(Seidl, Tincoff, Baker, & Cristia, 2015), babies were more successful in found no association between the RT training scores and 2AFC scores.
extracting a word from a speech stream if their knees or elbows were Earlier studies examining the relationship between the 2AFC task
touched by the experimenter during they were presented the word in the and target detection tasks found partly different patterns of the associ
stream. However, the present online target detection paradigm was ations. Franco et al. (2015), and Batterink et al. (2015) found no cor
different from that of Seidl et al. (2015) in two important aspects. First, relation between a 2AFC task and their post-training target detection
in our task, participants’ motor responses were self-initiated, not measure, while Batterink and Paller (2017) found a positive relationship
external signals, so they might be less likely to have a cue value. Second, between a familiarity rating task and their item-detection task (they did
infants got a continuous tactile input during the target word. Moreover, not analyze associations between their target detection and 2AFC tasks).
self-initiated motor responses are not reliable cues in natural languages However, there are significant differences between these earlier studies
like, for instance, changes in prosody (e.g., Kabak, Maniwa, & Kazanina, and the current one. The target detection task in the present study
2010; Langus, Marchetto, Bion, & Nespor, 2012; Morgan, Meier, & differed from the detection tasks in previous studies in three important
Newport, 1987). In sum, while it cannot be excluded that motor re aspects, which could all contribute to the different patterns we got: 1)
sponses served as cues to word boundaries, this does not necessarily we were monitoring responses to syllables during training, in contrast
undermine the validity of learning based on transitional probabilities with the post-familiarization tasks utilized by the other studies; 2) there
between syllables. was only one target syllable, in contrast with the other studies, where
As in the current online target detection task the instruction was to target syllables were alternating during the task; 3) we measured
respond to the last syllable of a word, a further possible criticism is that learning by comparing RT performance of syllables in the last position of
instead of gaining information about the extraction of words, we only words versus syllables in a pseudorandom order, while earlier studies
had gathered evidence about the acquisition of a given syllable pair (the contrasted performance for syllables in different positions within words,
target bigram). In this paradigm, words are defined as a sequence of two yielding varying predictability for target syllables in 1st, and 2nd and
syllable pairs with higher transitional probabilities bounded by syllable 3rd positions (that is, the first syllables of words are still predictable to
pairs with lower transitional probabilities, so detecting “words” reflects some extent, making the pseudorandom stream a more adequate con
learning of sequences of syllables pairs. Different experimental designs dition for measuring processing of unpredictable syllables).
can provide more information about the entire syllable triplet forming a We hypothesized that forming explicit representations about words
word. One possibility is to have an alternating target in the online task, could be a mediating factor in the relationship between the online and
so that reaction time and accuracy data can be collected about all syl 2AFC measures and we conducted Experiment 2 to examine this possi
lables during the learning process. Another possible solution is designing bility. The results showed that the association between scores on the two
a post hoc 2AFC task in which the items also test whether participants tasks is only observed in participants without explicit knowledge of
learned other transitional probabilities than the one involving the target structure in the stream. This suggests that this relationship is not likely
syllable, for instance, by including word-foil contrasts where both se to be the result of forming explicit representations enhanced by the
quences include the target bigram, so that familiarity decisions cannot relative transparency of the stimulus set, and the online target detection
be based on the presence of this bigram. task and the 2AFC task might tap into similar SL processes. One possi
To sum up, the online target detection paradigm is a promising task bility is that they both reflect the operation of a single mechanism, with
for measuring SL, and additional studies may further verify and improve the small amount of shared variation caused by methodological con
its validity. This should be complemented by testing further its psy founds in the tasks (e.g., by psychometric and methodological short
chometric properties essential for assessing individual differences comings of the 2AFC task). Another possibility is that multiple processes
effectively (Siegelman et al., 2017). are at work during SL and they affect performance of the two tasks to a
We found evidence of learning in the 2AFC task as well, similarly to different extent so that they do not share a large variance. These mul
many previous studies. For this measure, similarly to earlier studies, we tiple processes might be different mechanisms which drive learning, as
did not find high internal consistency in either of the two experiments acquisition of transitional probabilities (as suggested in the original
(Arnon, 2019; Siegelman, Bogaerts, Elazar, et al., 2018). For the SICR studies by Saffran, Aslin, & Newport, 1996; and Saffran, Newport, &
task, we replicated the findings of Isbilen et al. (2017): participants were Aslin, 1996) and chunk-formation (as proposed by e.g., Perruchet,
more successful in recalling sequences of words than sequences of non- 2018), or different stages of processing, like encoding of stimuli, pattern
words, and they also performed above chance level in the case of our detection, retention and retrieval (Bogaerts, Siegelman, & Frost, 2016;
overall learning score, indicating a significant learning effect. However, Frost et al., 2015). Further studies should shed light on this question.
reliability of task scores was very weak. The analysis of the relationship between 2AFC and SICR tasks
To see how measures from the new target detection task relate to replicated the findings of Isbilen et al. (2017): as no relationship was
other SL measures, we tested associations between measures from all observed between 2AFC and SICR task measures, these two measures
three tasks. We expected that the online task would have a positive seem to be independent. Moreover, contrary to our hypothesis, there
relationship with the SICR task, and a more modest, or insignificant was no positive association between the online target detection and
relationship with the 2AFC task. We also expected to see only a weak or SICR measures. A possible explanation is that SICR taps into a different
no relationship between the SICR and 2AFC tasks. Analyses of re aspect or mechanism of SL. Another plausible cause of this result is that
lationships between the measures only partly supported our hypotheses. both measures were considerably noisy, which could hide a potential
We found a positive correlation between the online target detection and relationship between the tasks. An additional possible source of this
2AFC measures: those who showed a bigger online learning effect in the pattern of results is that SICR is a production-based measure. As Isbilen
RT difference and learning efficiency scores were also more efficient in the et al. (2017) noted, “unlike 2AFC and reaction time tasks, SICR requires
2AFC task and had higher 2AFC overall scores (and this finding was both immediate comprehension and production on the part of the
11
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
learner” (pp. 568). Both 2AFC and target detection paradigms test SL 5. Conclusion
more from the side of perception: participants have to perform opera
tions on perceived sequences of items. In the SICR task, SL is measured Statistical learning contributes to the acquisition of many knowledge
from the side of production: participants have to articulate sequences of types and skills. Despite its crucial role in several domains of cognition,
items. That is, in our case, the item detection and 2AFC tasks address the there is no consensus about what tasks are appropriate for measuring it.
question “How does SL ability affect processing of incoming se The widely used judgement and 2AFC measures are criticized for their
quences?”, while the question for SICR can be formulated more like psychometric weaknesses, making them less favorable for assessing
“How does SL ability affect processing of incoming stimuli and pro either group-level effects or individual differences. As part of the efforts
duction of these represented sequences?”. Speech perception and pro to find more suitable measurements of statistical learning, we intro
duction are different processes. For instance, data from patients with duced a novel online target detection paradigm for statistical word
speech deficits suggest that verbal short term memory may not be uni segmentation offering measures that do not build on deliberation and
tary, with separate input and output buffers in operation (e.g., Howard decision making. We found that our new task is suitable for measuring
& Nickels, 2005; Martin, Lesch, & Bartha, 1999; see Jacquemot & Scott, statistical learning effects, provides an opportunity to track learning
2006, for a discussion). This is especially important in this case, as SICR online, and has favorable measures of reliability. This makes our task a
is essentially a verbal short term memory task. Consequently, the effect good candidate for investigating group-level effects, as well as individ
of SL on short term memory may not be unitary: it can influence the ual differences.
input side, perception, as well as the output side, production, possibly in We also examined the relationship between online learning and two
a distinct manner. This is an aspect that needs further investigation, for other measures from a statistically-induced chunking recall and a two-
instance, through measuring the effect of stimulus structure on pro alternative forced choice task. Performance levels were not correlated
cessing and not production-based short-term memory measures between all three tasks. The online target detection measure and the
(following the methods of digit span tasks which measure verbal short- two-alternative forced choice task were positively associated, and this
term memory by pointing to elements of sequences or matching of two relationship was only observed when participants did not form explicit
sequences), and including production-based measures for other task representations about stimuli. Scores of the statistically-induced
types too. chunking recall task were not correlated with performance on the
As in the case of earlier studies, we did not find very strong re other two tasks. This pattern of findings might reflect multiple statistical
lationships between different measures of SL, which can be explained learning processes or be a product of low reliability and noisiness of the
partly by their psychometric properties and noisiness, but may still raise measures. We hope that our study contributes to the quest to find suit
concerns regarding their validity. Studies looking for new approaches in able tests for assessing statistical learning, and inspires future studies to
SL measurement often build on the implicit notion that a single core systematically assess the psychometric properties of different measures.
mechanism is at work during SL (as described for the relationship be
tween priming and recognition measures in Shanks & Perruchet, 2002,
or different SL paradigms, Perruchet & Pacton, 2006), hence the purpose Declaration of competing interest
is to find a method which measures it with the largest accuracy.
Although eliminating methodological shortcomings is an important None.
effort, it is not necessarily the only source of variation between different
measures. If one assumes multiple mechanisms behind SL (e.g., learning Acknowledgements
TPs or chunk-formation), or takes the entire process of SL into consid
eration (e.g., encoding, pattern detection, retention, retrieval), it is This work was supported by the Momentum Research Grant of the
reasonable to assume that different tasks are sensitive to different SL Hungarian Academy of Sciences (Momentum 96233 ‘Profiling learning
mechanisms or processes. That is, the goal might rather be finding mechanisms and learners: individual differences from impairments to
multiple accurate measures, which together could better describe a excellence in statistical learning and in language acquisition’, PI: Ágnes
person’s SL ability that shapes behavior than scores obtained on a single Lukács). We thank the volunteers and students for participating in the
task (like in the case of executive functions, Miyake & Friedman, 2012). experiments. We are also grateful to Dorottya Dobó for her help in the
Further work is needed on different SL tasks testing less homogeneous design of the study, data collection and for her comments on the
populations, and systematically assessing and improving their psycho manuscript; to Fruzsina Krizsai for her help in data processing; and we
metric properties (see e.g., Siegelman et al., 2017, for a summary of would like to thank Bertalan Polner and Kornél Németh for their advices
shortcomings in SL testing and potential solutions). on the statistical analysis.
2
Note that we used 75 ms as a unitary range instead of using individual reaction time spreads (e.g., interquartile range), as the spreads in Block TRN3 greatly
varied across participants. That is, a participant with a larger reaction time spread in Block TRN3 would reach their “end state” of learning sooner than a participant
with the same median reaction times but smaller spread.
12
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
which none of the slices had a RT greater than this value. We then extracted the number of the start of the slice from the number of all possible slices (e.
g., if it was the slice 19 to 28, we extracted 19 from 36, getting 17), thus getting the number of remaining slices, where median RTs are already equal to
or below the critical value. As the maximum of remaining slices is 35 (e.g., when a participant reaches the critical value immediately with the first
slice), we divided the number of remaining slices with 35. As a result, we got a number between 0 and 1.
The RND-TRN3 difference score is calculated from the difference between the random block (Block RND) and the last training block (Block TRN3):
the median RT of Block TRN3 is extracted from the median RT of Block RND. As the time window was 1200 ms in the target detection task, and thus the
largest possible difference between the two blocks is 1200 ms, we divided this number with 1200. As a result, we got a number that could vary between
− 1 and 1 (where negative values mean that RTs were smaller in Block RND). We included this score as it can indicate a learning effect irrespectively of
the presence or absence of the RT decrease through the training blocks.
The TRN3 RT spread score is calculated from the interquartile range of RTs in Block TRN3. As the time window was 1200 ms, and thus the largest
possible interquartile range was also 1200 ms, we divided the TRN3 RT interquartile range with 1200, and extracted this number from 1. As the
variability of this number was relatively small among participants (between 0.69 and 0.98), we took its square to magnify its effect. As a result, we got
a number that could vary between 0 and 1. We included this score as we hypothesized that stable and less scattered RTs in Block TRN3 indicate more
successful learning through the training blocks.
References Jacquemot, C., & Scott, S. K. (2006). What is the relationship between phonological
short-term memory and speech processing? Trends in Cognitive Sciences, 10(11),
480–486. https://doi.org/10.1016/j.tics.2006.09.002
Arciuli, J., Torkildsen, J.v. K., Stevens, D. J., & Simpson, I. C. (2014). Statistical learning
Kabak, B., Maniwa, K., & Kazanina, N. (2010). Listeners use vowel harmony and word-
under incidental versus intentional conditions. Frontiers in Psychology, 5. https://doi.
final stress to spot nonsense words: A study of Turkish and French. Laboratory
org/10.3389/fpsyg.2014.00747
Phonology, 1(1), 207–224.
Arnon, I. (2019). Do current statistical learning tasks capture stable individual
Kaufman, S. B., DeYoung, C. G., Gray, J. R., Jiménez, L., Brown, J., & Mackintosh, N.
differences in children? An investigation of task reliability across modality. Behavior
(2010). Implicit learning as an ability. Cognition, 116(3), 321–340. https://doi.org/
Research Methods. https://doi.org/10.3758/s13428-019-01205-5
10.1016/j.cognition.2010.05.011
Batterink, L. J. (2017). Rapid statistical learning supporting word extraction from
Kidd, E. (2012). Implicit statistical learning is directly associated with the acquisition of
continuous speech. Psychological Science, 28(7), 921–928. https://doi.org/10.1177/
syntax. Developmental Psychology, 48(1), 171–184. https://doi.org/10.1037/
0956797617698226
a0025405
Batterink, L. J., & Paller, K. A. (2017). Online neural monitoring of statistical learning.
Langus, A., Marchetto, E., Bion, R. A. H., & Nespor, M. (2012). Can prosody be used to
Cortex, 90, 31–45. https://doi.org/10.1016/j.cortex.2017.02.004
discover hierarchical structure in continuous speech? Journal of Memory and
Batterink, L. J., Reber, P. J., Neville, H. J., & Paller, K. A. (2015). Implicit and explicit
Language, 66(1), 285–306.
contributions to statistical learning. Journal of Memory and Language, 83, 62–78.
Martin, R. C., Lesch, M. F., & Bartha, M. C. (1999). Independence of input and output
https://doi.org/10.1016/j.jml.2015.04.004
phonology in word processing and short-term memory. Journal of Memory and
Bogaerts, L., Siegelman, N., & Frost, R. (2016). Splitting the variance of statistical
Language, 41(1), 3–29. https://doi.org/10.1006/jmla.1999.2637
learning performance: A parametric investigation of exposure duration and
Misyak, J. B., & Christiansen, M. H. (2012). Statistical learning and language: An
transitional probabilities. Psychonomic Bulletin & Review, 23(4), 1250–1256. https://
individual differences study: Individual differences in statistical learning. Language
doi.org/10.3758/s13423-015-0996-z
Learning, 62(1), 302–331. https://doi.org/10.1111/j.1467-9922.2010.00626.x
Christiansen, M. H. (2018). Implicit statistical learning: A tale of two literatures. Topics in
Miyake, A., & Friedman, N. P. (2012). The nature and organization of individual
Cognitive Science. https://doi.org/10.1111/tops.12332
differences in executive functions: Four general conclusions. Current Directions in
Conway, C. M., Bauernschmidt, A., Huang, S. S., & Pisoni, D. B. (2010). Implicit
Psychological Science, 21(1), 8–14. https://doi.org/10.1177/0963721411429458
statistical learning in language processing: Word predictability is the key. Cognition,
Morgan, J. L., Meier, R. P., & Newport, E. L. (1987). Structural packaging in the input to
114, 356–371. https://doi.org/10.1016/j.cognition.2009.10.009
language learning: Contributions of prosodic and morphological marking of phrases
Franco, A., Eberlen, J., Destrebecqz, A., Cleeremans, A., & Bertels, J. (2015). Rapid serial
to the acquisition of language. Cognitive Psychology, 19(4), 498–550.
auditory presentation: A new measure of statistical learning in speech segmentation.
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from
Experimental Psychology, 62(5), 346–351. https://doi.org/10.1027/1618-3169/
performance measures. Cognitive Psychology, 19(1), 1–32. https://doi.org/10.1016/
a000295
0010-0285(87)90002-8
Frost, R., Armstrong, B. C., Siegelman, N., & Christiansen, M. H. (2015). Domain
Perruchet, P. (2018). What mechanisms underlie implicit statistical learning?
generality versus modality specificity: The paradox of statistical learning. Trends in
Transitional probabilities versus chunks in language learning. Topics in Cognitive
Cognitive Sciences, 19(3), 117–125. https://doi.org/10.1016/j.tics.2014.12.010
Science. https://doi.org/10.1111/tops.12403
Gaetano, J. (2018). Holm-Bonferroni sequential correction: An Excel calculator (1.3)
Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One
[Microsoft Excel workbook]. Retrieved from https://www.researchgate.net/publicat
phenomenon, two approaches. Trends in Cognitive Sciences, 10(5), 233–238. https://
ion/322568540_Holm-Bonferroni_sequential_correction_An_Excel_calculator_13.
doi.org/10.1016/j.tics.2006.03.006
Howard, D., & Nickels, L. (2005). Separating input and output phonology: Semantic,
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old
phonological, and orthographic effects in short-term memory impairment. Cognitive
infants. Science, 274(5294), 1926–1928. https://doi.org/10.1126/
Neuropsychology, 22(1), 42–77. https://doi.org/10.1080/02643290342000582
science.274.5294.1926
Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task:
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of
Access to separable statistical cues by individual learners. Journal of Experimental
distributional cues. Journal of Memory and Language, 35(4), 606–621. https://doi.
Psychology: General, 130(4), 658.
org/10.1006/jmla.1996.0032
Isbilen, E. S., McCauley, S. M., Kidd, E., & Christiansen, M. H. (2017). In Testing statistical
Seidl, A., Tincoff, R., Baker, C., & Cristia, A. (2015). Why the body comes first: Effects of
learning implicitly: A novel chunk-based measure of statistical learning (pp. 564–569).
experimenter touch on infants’ word finding. Developmental Science, 18(1), 155–164.
13
K.S. Lukics and Á. Lukács Acta Psychologica 215 (2021) 103271
Shanks, D. R., & Perruchet, P. (2002). Dissociation between priming and recognition in Siegelman, N., Bogaerts, L., Kronenfeld, O., & Frost, R. (2018). Redefining “learning” in
the expression of sequential knowledge. Psychonomic Bulletin & Review, 9(2), statistical learning: What does an online measure reveal about the assimilation of
362–367. https://doi.org/10.3758/BF03196294 visual regularities? Cognitive Science, 42, 692–727.
Siegelman, N., Bogaerts, L., Elazar, A., Arciuli, J., & Frost, R. (2018). Linguistic Siegelman, N., & Frost, R. (2015). Statistical learning as an individual ability: Theoretical
entrenchment: Prior knowledge impacts statistical learning performance. Cognition, perspectives and empirical evidence. Journal of Memory and Language, 81, 105–120.
177, 198–213. https://doi.org/10.1016/j.jml.2015.02.001
Siegelman, N., Bogaerts, L., & Frost, R. (2017). Measuring individual differences in
statistical learning: Current pitfalls and possible solutions. Behavior Research
Methods, 49(2), 418–432. https://doi.org/10.3758/s13428-016-0719-z
14