You are on page 1of 48

Application of concepts from Cross-Recurrence Analysis in speech production: An overview and a comparison to other nonlinear methods

Leonardo Lancia Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany Susanne Fuchs Center for General Linguistics (ZAS/Phonetik), Berlin, Germany Mark Tiede Haskins Laboratories, New Haven, USA

Corresponding author: Leonardo Lancia DeutscherPlatz, 6. 04103 Leipzig, Germany Email: leonardo_lancia@eva.mpg.de

1

This is an author-produced manuscript that has been peer reviewed and accepted for publication in the Journal of Speech, Language, and Hearing Research (JSLHR). As the “Papers in Press” version of the manuscript, it has not yet undergone copyediting, proofreading, or other quality controls associated with final published articles. As the publisher and copyright holder, the American Speech-Language-Hearing Association (ASHA) disclaims any liability resulting from use of inaccurate or misleading data or information contained herein. Further, the authors have disclosed that permission has been obtained for use of any copyrighted material and that, if applicable, conflicts of interest have been noted in the manuscript.

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Abstract Purpose: The aim of this paper is to introduce an important tool, Cross Recurrence Analysis, to speech production applications, by showing how it can be adapted to evaluate the similarity of multivariate patterns of articulatory motion. The method differs from classical applications of cross recurrence analysis because no phase space reconstruction is conducted and a cleaning algorithm removes the artifacts from the recurrence plot. The main features of the proposed approach are robustness to non-stationarity and efficient separation of amplitude variability from temporal variability. Methods: These claims are tested by applying our method to synthetic stimuli whose variability has been carefully controlled. The proposed method is also demonstrated in a practical application: we use it to investigate the role of biomechanical constraints in articulatory reorganization as a consequence of speeded repetition of CVCV utterances containing a labial and a coronal consonant. Results: Overall our approach provides more reliable results than other methods, particularly in the presence of high variability. Conclusions: The proposed method is a useful and appropriate tool for quantifying similarity and dissimilarity in patterns of speech articulator movement, especially in such research areas as speech errors and pathologies, where unpredictable divergent behavior is expected.

KEYWORDS (5 allowed): Articulatory kinematics, Cross Recurrence Analysis, time series, variability.

2

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Introduction Speech production is a highly complex motor skill where articulatory movements are precisely coordinated in space and time. Different coordination patterns are the building blocks for a variety of phonemes and allophonic variations that make human phonemic inventories so rich. For instance, in a voiceless alveolar stop, tongue tip closure is coordinated with jaw rising (Mooshammer, Hoole, Geumann, 2007), glottal opening (Löfqvist and Yoshioka, 1984) and velar closure (Krakow, 1999). If glottal opening disappears, the voiceless alveolar stop may become a voiced alveolar stop. If the velar closure is missing, it may become a nasal alveolar stop. If the jaw rising is missing, the voiceless alveolar stop may become a weak variant, since the lower incisors can not be used as an obstacle source to realize a prominent burst. Small variations in coordination might be possible, since none of these articulatory coordinative patterns will be repeated exactly the same way by a given speaker. There are at least two challenging questions, crucial for a better understanding of the articulatory coordination in space and time: How much variability in articulatory coordination is allowed to still satisfy the listener in understanding what has been said? Which individual, morphological, socio-cultural language specific and motor control factors are responsible for the observed variability (e.g. Perkell and Klatt 1986; Lindblom, Brownlee, Davis, and Moon, 1992; Schöner, Martin, Reimann, and Scholz, 2008)? These questions are so fundamental that variability in coordinated motion is a key parameter in speech production, speech motor control and in defining what sets pathological speech apart from normal speech. In this paper we present a new technique that is able to compare and quantify the similarity of repetitions of single articulatory movements as well as coordinated movements of different articulators. In order to quantify the variability which characterizes the production of a given articulatory movement, many productions of that movement need to be compared and their similarity/dissimilary assessed. Commonly this task is accomplished by considering a strongly 3

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

reduced representation of the movements. For example, articulatory gestures are frequently represented by position, velocity and/ or acceleration values at one or several critical points in time. There is however a growing number of studies in which alternative methods are adopted to compare speech movements in time. Among these methods are Dynamic Time Warping (Sakoe and Chiba, 1978), Functional Data Analysis (e.g. Lucero, Munhall, Gracco and Ramsay, 1997; Lucero and Koenig, 2000; Lucero, 2005), the Spatio-Temporal Index (e.g., Smith et al., 1995), relative phase indexes (e.g., Van Lieshout and Moussa, 2000; Lancia and Tiede, 2012) and Correlation Map Analysis (e.g., Barbosa, Déchaine, Vatikiotis-Bateson & Yehia, 2012). The advantage of these methods is that they take into account the whole development of motion in time. Since they consider the whole time series, these methods are neutral with respect to the theoretical assumptions concerning the units of speech production (be it a target region, gesture, syllable, word, or prosodic phrase; see Smith & Goffman, 2004 for discussion) in contrast to the widespread approach of comparing articulatory characteristics at selected temporal landmarks (for a comparison of methods see Van Lieshout & Namasasivan, 2006). Moreover, mastering these methods can avoid time-consuming manual or semiautomatic labeling. Time series methods also provide a means of distinguishing between amplitude and temporal variability. Such separation is particularly useful in the context of speech motor control mechanisms to determine whether time is a control parameter for the motor control system, or rather a byproduct of movement amplitude, velocity and stiffness (Kelso, Vatikiotis-Bateson, Saltzman & Kay 1985, but see also Fuchs, Perrier & Hartinger, 2011 for a different opinion). In this paper we introduce a new technique based on Cross Recurrence Analysis (CRA), a method that is well-suited for assessing the similarity of repeated articulatory motions over time. The main advantages of our approach based on CRA are that multivariate data (for 4

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

example, concurrent movements of the tongue, lower and upper lips, jaw and head) can easily serve as input, and the method is robust in the presence of a large amount of variability in the data. Moreover, temporal alignment prior to data analysis is not required, nor is precise segmentation of the data. The paper is organized in the following way: First, some frequently used methods for analyzing time series are briefly described. We will discuss their underlying assumptions and will then make a transition to CRA and its characteristics. A new algorithm tailored to the properties of speech is implemented and validated by means of synthetic stimuli whose variability has been systematically controlled. Results are compared to other frequently used methods used to distinguish amplitude and temporal variability. The method is then applied to articulatory data of human speech recorded by electromagnetic articulometry (EMA). In this context we present a case study addressing the variability of articulator motion during a speech repetition experiment in which the behavior of the jaw underwent systematic qualitative changes as speech rate increased. Finally, potential applications of Cross Recurrence Analysis to speech production are discussed. Background It has been observed repeatedly that when comparing two articulatory trajectories it is appropriate to separate differences in the shape of the movements from phasing differences in their timings and durations. Such a separation supports quantification of amplitude variability distinct from temporal variability. For example, it is useful to capture the similarity between two trajectories that have the same amplitude but are characterized by different rates of change in time, e.g. due to variations in speech rate. In order to compare such trajectories it is necessary to determine which point of the first trajectory should be compared with which point of the second. This mapping is often termed temporal alignment, and it is produced through a so-called time-warping function. 5

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Several approaches have been proposed to find an appropriate warping function. For example, Smith, Goffman, Zelaznik, Ying & McGillem, (1995) have used linear interpolation of a series of lower lip movements aligned with consistent initial and final landmarks to obtain a standardized number of samples (time normalization), which are partitioned into equally-sized bins. In each bin the standard deviations of the aligned movement trajectories are then computed and averaged to obtain the index of spatiotemporal variability (STI). It has been observed that the computation of this index is powerful as long as the needed warping function is linear (Lucero, 2005). When the difference in speed between two trajectories is not a constant value, nonlinear methods are more appropriate. In Lucero, Munhall, Gracco & Ramsay (1997), articulatory signals are aligned using the registration technique introduced by Ramsay in the framework of Functional Data Analysis (FDA; Ramsay & Silverman, 1997). Within this approach, the trajectories are first linearly aligned in order to have the same length; and then an average trajectory is computed. The optimal transformation of time is determined through an algorithm which minimizes the distance between each linearly aligned time series and the average time series. As with linear STI, once the trajectories share the same time-scale, this can be partitioned, and for each bin a standard deviation from the mean can be computed. Note that the registration method was originally developed for the alignment of monodimensional signals, and it performs poorly when multivariate signals, corresponding to the simultaneous motion of several articulators, are compared (Lucero & Koenig, 2000). It also assumes that each trajectory tracks the same events (peaks and valleys). The Dynamic Time Warping (DTW) algorithm (Sakoe & Chiba, 1978) can also be used to calculate the degree of variability across trajectories. This can be achieved by pairwise comparison of the trajectories from an observed sample. An index of variability is computed by averaging the dissimilarity measures obtained from the pairwise comparisons. To compare two time series X and Y of lengths M and N a distance matrix D of size () is obtained by 6

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

calculating the distance between every possible pair of points selected from the two time series. D(i, j) thus corresponds to the distance between X(i) and Y(j). The function mapping the points of one time series to the points of the other time series corresponds to a path connecting the lower left corner of the matrix D(1,1) to the top right corner D(M,N). D(1,1) stores the distance between the initial states of the two time series and D(M,N) the distance between the end states. The path connecting these locations is found through dynamic programming. A measure of dissimilarity of the two time series can be obtained by averaging the distances between the aligned points. Two important assumptions must be met in order for DTW to produce a good alignment. First, the initial and final points of the two trajectories must already be aligned. Second, as with FDA, the time series under comparison must have the same peaks and valleys in the same order. It has also been observed that the presence of local amplitude differences across curves is a source of error in any alignment obtained using DTW (Keogh & Pazzani, 2001; but see Lucero & Koenig, 2000 for additional criticism). New solution to an old problem (Introduction of cross recurrence analysis) Cross Recurrence Analysis (CRA) is a recent addition to methods for comparing time series. Its particular advantage is that it does not assume that the mapping between two time series is continuous, which means that a comparison is possible even when some events are present in one time series but not in the other. This technique is based on the same distance matrix used by DTW. However each distance smaller than a predefined threshold is considered as an indicator of the presence of equivalent states in the two time series. Indices of similarity can be obtained by quantifying properties of the distribution of the locations in the distance matrix which correspond to matches across time series. These features make alignment of the time series unnecessary and the whole approach more robust to artifacts. Several variants of the basic approach have been proposed to deal with various kinds of signals (see Marwan, Romano, Thiel, & Kurths, 2007 for a review). The next section contains a summary 7

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

description of the basic CRA method, followed by a more detailed discussion of the steps performed to conduct CRA and of the potential pitfalls of this method. We then present an original method derived from CRA conceived to compare multivariate signals with different time-scales without temporal alignment. Recurrence and Cross-Recurrence Analysis Recurrence Analysis focuses on the degree of repeatability of a single signal over time. When instead the similarity/dissimilarity of two different signals is compared the approach is known as Cross-Recurrence Analysis (Zbilut, Giuliani & Webber, 1998; Marwan & Kurths, 2002). These techniques are traditionally used to analyze and compare univariate time series under the assumption that the observed time series are produced by a multidimensional dynamical system whose behavior cannot be observed directly. Time delay embedding (Takens, 1981) is used to extract from an observed univariate time series a multidimensional time series which shares many important features with the trajectory of the underlying dynamical system. This multidimensional time series can be constructed from the observed signal and a given number of time delayed copies of that signal. The value on the first dimension of the reconstructed time series at time corresponds to the value of the observed time series at that same time, while the value on the ith dimension at time corresponds to the value of the observed time series at time , where is the delay parameter. Once the multidimensional time series is obtained this is submitted to Recurrence Analysis. To illustrate the principles of Recurrence Analysis, we use it to characterize a simple synthetic univariate time series obtained by calculating the sine of the angle θ as it moves through the interval [0:6π] (shown in Figure 1a). Methods to estimate optimal values for the time delay and the embedding dimensions are discussed in the next section. For this example we applied time delay embedding, with the delay parameter equal to π/2, and with two embedding dimensions (the original and the delayed copy of the signal). We obtained a bivariate time 8

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

series corresponding to and (shown in Figure 1b). Let Xi represent the state of the bivariate signal at time step i. The Recurrence Plot (RP) in Figure 1c shows, for each time step i of the bivariate signal (horizontal axis), all the corresponding time steps j (vertical axis) where the state Xj is equal to Xi, representing these points with a black dot. Constructing these plots requires the comparison of each state Xi with all states Xj. Since Xj is always equal to Xi when j=i, a recurrence plot always contains a line of dots along its main diagonal ( i=j). This line is called the Line Of Identity (LOI). Since, in our example, the signal is periodic, all subdiagonals are present, as shown in Figure 1c. The vertical distance between the sub-diagonals corresponds to the period of the oscillation. ------------ Insert Figure 1 around here-----------The CRA analog to a recurrence plot is called a Cross Recurrence Plot (CRP, illustrated by Figure 1d). In a CRP the horizontal axis is the time axis of the first time series, while the vertical axis is the time axis of the second time series. A black dot at coordinates i,j corresponds to time points where the time series have the same state. The main difference between RPs and CRP s is the absence of the line of identity in the latter. The presence of such a line would indicate that the two time series behave in an identical way. However, if the two time series proceed through the same states at the same time or at different times, the CRP plot will display continuous stretches of connected dots on the main diagonal or on sub-diagonals. In addition, if the two signals behave similarly throughout their extents but with local mismatches and with a certain amount of time warping, a CRP will display a quasi-continuous line locally bowed around the main diagonal as in Figure 1d. This line is then termed the Line Of Synchronization (LOS), because, when present, it contains the information needed to synchronize the two signals (The point Xi of one time series is mapped on a point of the other time series if one of the points of the LOS has coordinates i,j). Differences in the amplitudes of the time series determine the presence of discontinuities in 9

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

the LOS, and differences in the timescales determine changes of its local slope (Figure 1d). Continuous stretches of lines parallel to the LOS indicate the presence of periodicities in both signals. Time delay and embedding dimensions To apply time delay embedding to a univariate time series an appropriate delay parameter and number of dimensions (delayed copies) have to be determined. The aim of such embedding is to reconstruct a smooth trajectory in a multidimensional space which preserves the uniqueness of the original dynamics and the linearization of the dynamics at each time step. If the behavior of the underlying system is deterministic, then the uniqueness of the dynamics is preserved if each time it is found in a given state it changes in the same direction and by the same amount, and this relation must be preserved in the reconstructed trajectory. Uniqueness is more likely to be preserved with an increasing number of dimensions for the reconstructed time series, and it is generally preserved when , where is the number of dimensions in the reconstructed time series and is the number of dimensions of the underlying dynamical system. The impact of the delay parameter on the results of the embedding has been discussed in several papers (cf. Small, 2005 for a review). If is too small the dimensions of the

reconstructed time series will be strongly correlated which results in spurious similarities between points of the reconstructed trajectory. But if is too large the functioning of the underlying system at time t will lose influence on values measured at times with for higher values of . The following heuristics are generally used for estimating optimal values for and (cf. Marwan et al., 2007; Small, 2005). An appropriate time delay value is generally chosen by comparing the original time series with its delayed versions using mutual information. As an illustration we use CRA to compare the vertical movement of the jaw (henceforth JAW) and the tip of the tongue (henceforth TT) as recorded during repeated production of the 10

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

utterance /tapa/ (see Footnote 1 below for details on the collection of these data). The corresponding time series are shown in Figure 2a and 2b respectively. Each time series was normalized to have zero mean and standard deviation equal to 1. Computation of the mutual information between two trajectories permits quantification of their interdependence at some delay (lag). In this heuristic, the delay between the original and its lagged copy is gradually increased, and the lag corresponding to the first minimum of the mutual information function between the two time series is chosen as a time delay for the embedding (cf Fig. 2c and 2d). Choosing the minimum of the mutual information for minimizes the correlation between the dimensions. The number of embedding dimensions is chosen using the false nearest neighbors method. Each state Xi of a multidimensional trajectory has a given number of nearest neighbors, which are other observed states of the same trajectory whose distance in space from Xi is low (ignoring distance in time). Multidimensional time series are constructed with an increasing number of embedding dimensions, using the value for determined through the application of mutual information. Each time a dimension is added, the number of nearest neighbors is computed for each point of the reconstructed time series. The procedure is stopped when adding one dimension does not produce a significant drop in the number of the nearest neighbors (cf Fig. 2e and 2f). The reconstructed time series for this example with and are shown in Fig 2g and 2h. ----------------- Insert Figure 3 around here----------------Note however that the estimation of these parameters can be problematic, especially in the analysis of non-stationary signals. For such signals a given delay can be correct for some sections of the signals but inappropriate for other sections (Marwan, Thiel, Nowaczyk, 2002; Marwan, 2008). Also note that, as in the example presented in this section, applications of CRA are usually limited to the comparison of two univariate signals (however see Marwan, 11

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Thiel & Nowaczyk, 2002 for examples of CRA computed from multivariate signals). In this paper we wish to address the comparison of multivariate signals corresponding to the joint motion of the articulators in the 3D space. While methods for embedding multivariate signals have been proposed (Stewart, 1996; Garcia & Almeida, 2005; and Vlachos & Kugiumtzis, 2008), there are some considerations which suggest that for our purposes an alternative approach may be more appropriate. When using CRA to examine the evolution over time of two variables regulated by essentially the same dynamical system it makes sense to reconstruct the two time series by adopting a common value for the delay parameter and a common value for the number of embedding dimensions. However, when comparing speech movements corresponding to two productions of the same utterance, we have to take into account that the same acoustic/articulatory goal can be realized in different ways depending on the speakers’ habits, physiology or on task requirements (for example, it is reasonable to expect that at high speech rates motion patterns are simplified and a reduced number of degrees of freedom is available). Also, results from some areas of speech research indicate that a working assumption that the same dynamical system underlies the behavior of the compared trajectories is clearly false, as with categorical speech errors (errors in which the speaker produces a speech pattern which is qualitatively different from the one expected) or pathologies which, like apraxia of speech, compromise the recruitment and the control of the articulators at abstract levels (cf. Ziegler, Staiger & Aichert, 2010). In these cases the dynamical systems underlying the two time series may reside in different spaces and a comparison of their trajectories is meaningless. An additional technical issue concerns the lengths of the time series compared. A basic assumption of the embedding procedure is that we can look at the evolution of the system over time in order to reconstruct its present state. The assumption works reasonably well when repeated behavior is observed. However this is not necessarily the case when comparing 12

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

two productions of the same speech utterance, because each of them corresponds to an independent repetition of a motion pattern. An alternative analysis strategy is to avoid performing any embedding, instead comparing the time series directly as they exist in the space defined by their original dimensions (henceforth the data space). For our example we compare the positions of the recorded multivariate trajectories. However such a choice has important consequences for the interpretation of the results. First we cannot interpret the similarity value obtained from the comparison of two multivariate trajectories as an index of the overlap between two dynamical systems. This means that no inferences can be made on the similarity of the evolution over time of the processes which produce the recorded time series. Second, the various dimensions in each of the time series compared can be correlated. We can interpret correlation across dimensions as evidence that several observed dimensions carry information about the same underlying process. A process affecting several dimensions will have a stronger influence on the distance measures used to detect recurrences than a processes influencing only one dimension. This bias is maximized when using a Euclidean norm to determine the recurrences and minimized when using a maximum norm (see next section). Third, the presence of correlations across dimensions also introduces artifactual recurrences in the plot because correlated coordinates lie near the diagonal of the data space, reducing its effective dimensionality (this is easily seen in 2 dimensions where points whose coordinates are correlated lie near the straight line of slope 1). If this is combined with too lax a similarity criterion a multidimensional time series will behave like a monodimensional time series for the purpose of the comparison. This generates a particular kind of artifact in the CRPs of periodic trajectories in which lines with negative slope (i.e. lines oriented toward the bottom right corner) can be observed. The presence of such lines with negative slope indicates that a

13

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

portion of one time series is inappropriately matching a time-reversed portion of the other time series. Norm and criterion Once two trajectories have been prepared, whether submitted to time-delay embedding or not, each state vector of one reconstructed trajectory is compared to each state vector of the other reconstructed trajectory and their distance is stored in a distance matrix. As for DTW, the distance values (D(i,j)) can be computed according to different norms. The most frequently used are the Euclidean norm: (1) and the maximum norm: (2) Here n indicates the different coordinates of the time-series and N indicates the number of dimensions. The distance matrix in Fig. 2i was obtained using a Euclidean norm to compare pairwise the states of the multidimensional time series in Fig. 2g with the states of the time series in Fig. 2h. Once the distance matrix has been obtained, a criterion must be applied in order to choose which distance values indicate the presence of similar state vectors and thus the presence of recurrences. Using a fixed threshold (ε), two state vectors are considered equivalent if their distance does not exceed the threshold. CRP(i,j) will be equal to one if . This criterion was applied to the distance matrix in Fig. 2i, with . The resulting CRP is shown in Fig. 2l. With a fixed recurrence rate criterion, the threshold is adjusted in order to produce a predetermined number of recurrence points which corresponds to a percentage of the number of available locations in the CRP. This percentage will be referred to as the recurrence rate parameter. With a fixed amount of neighbors criterion, the number of recurrence points present in each column of the recurrence plot is fixed to a percentage of the number of points that a column 14

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

can contain and the threshold is varied from one column to the other, in order to obtain the desired amount of dots in each column. The percentage of the number of points in a column which are promoted to recurrence points will be referred to as the amount of neighbors parameter. The use of a norm to collapse together several dimensions in the computation of the similarity between states of the two time series is related to a common issue in multivariate time series comparisons. When computing the distance scores between state space vectors we collapse distance values from several dimensions according to a norm (we mentioned the Euclidean and the maximum norm). Because the norm has only one threshold parameter, we need to normalize the values in each dimension and bring them to a common scale. If we assume constant measurement noise across dimensions but variable signal to noise ratio, the normalization step amplifies noise. Noise amplification increases with the difference in signal to noise ratio. This highlights the importance of signal conditioning prior to analysis. Cross recurrence quantification When using cross recurrence analysis to compare two signals, the main quantitative indices of their similarity are derived from counts of the points belonging to diagonal lines, i.e. straight lines whose angle with the horizontal axis is equal to (Webber & Zbilut, 1994, Marwan & Kurths, 2005; Marwan, Romano, Thiel & Kurths, 2007). The determinism (referred to as %DET) is equal to the percentage of dots belonging to diagonal lines with respect to the total number of dots present in the plot. This measure quantifies how well one time series is predicted from the other one. Other frequently used measures include the mean length of the diagonals (commonly denoted as MEANL) and their maximum length (MAXL). Quantification indices based on straight diagonal lines work properly when the compared time series are stationary and share the same time scale. However, when comparing speech movements made by different articulators we expect their timescales to be different. The 15

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

matches occurring across signals with different time scales result in bowed lines in the cross recurrence plots, and recurrence points belonging to such bowed lines are not captured by measures based on straight diagonal lines. If the speed of the movement is not uniform across or within the time series, different signals may differ in their smoothness and the same signal can vary its smoothness over time. This can be appreciated by comparing panels e and f of Figure 1 where two regions of the CRP in panel d of the same figure are magnified. The band in panel e is thicker than the band in panel f because the time series on the vertical axis of the CRP is smoother near the end (after frame 1000) than at the middle (between frames 600 and 700). Indeed, if two time-series X and Y present a match with Xi≈Yj, and Y is particularly smooth around j, there is a strong probability that the two signals will also match at position (i,j+1) because of the small difference between Yj and Yj+1. While the presence of bowed lines in the plot can obscure the similarity between portions of the two signals, variable smoothness introduces false matches. New algorithm To reduce the potential biases introduced by the artifacts described in the last three sections, we implemented an algorithm derived from simple image processing techniques and from the strategy proposed by Marwan, Thiel & Nowaczyk (2002) to track structures of connected dots in a recurrence plot. Before computation of the CRP, the time series are resampled to a predetermined length, which permits the comparison of CRP measures obtained from various pairs of time-series. Then the CRP is computed by applying the fixed recurrence rate criterion on a matrix of distances obtained by using the maximum norm. Each group of connected dots is then isolated using a connected components labeling algorithm (Rosenfeld & Pfaltz; 1966). We systematically select each group of connected dots, and apply the algorithm described by Marwan et al. (2002) to the smallest rectangular portion of the plot containing any dots belonging to the selected group of dots. This algorithm is used to reconstruct a line of 16

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

thickness equal to one dot which follows the path underlying the group of connected dots. The initial point of the line is placed at the coordinates of the dot nearest to the lower left corner of the smallest rectangle containing the connected dots. A rectangular window of size () is placed with the lower left corner located onto the starting dot. The length of the two sides of the window is increased by one until one or more dots are found under the region of the CRP delimited by the rectangular window. If dots are found under the window, the location of the second dot of the line is obtained by computing the center of mass. A new rectangular window of size () is located with its left corner on the second dot of the reconstructed line and the search for a new dot is repeated. The algorithm terminates when the right or the top sides of the plot are reached. Notice that placing at each iteration the lower left corner of the rectangular window at the location of the last tracked point introduces a monotonic constraint on the line to be found. This line can only grow in the top right direction. This feature helps discard groups of connected dots oriented in the bottom right direction (see supplemental material for a more detailed description of the algorithm and a Matlab implementation). ----------------- Insert Figure 3around here----------------The time series shown in panels a and b of Figure 3 represent the evolution over time of the vertical displacements of the tip of the tongue (TT), the lower lip (LL) and the jaw (JAW), recorded during the production of two repetitions of the utterance ‘pasa’ using EMA. The recurrence plot shown in Figure 3c is obtained by applying the methods described in the preceding sections, without embedding, to a comparison of these trajectories. Although the CRP was obtained by comparing both the vertical and horizontal displacement of the articulators considered, to simplify the display, only the vertical positions of the articulators are displayed. Panel d shows the output of our algorithm when that recurrence plot is processed. Once all the artifactual dots have been removed, we can quantify the similarity between the time series by computing the percentage of dark dots belonging to continuous 17

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

lines (regardless their slope and curvature) with respect to the total number of dark dots present in the plot before the application of the cleaning algorithm. When a fixed recurrence rate criterion is adopted, the total number of dots is set at a fixed percentage of the number of locations present in the plot. This quantity is obtained by multiplying the product of the lengths of the time-series (i.e. the area of the plot) by the fixed recurrence rate parameter. The proposed index of similarity is a version of the %DET index insensitive to differences in the time-scales; therefore it will be referred to as elastic determinism (%EDET) in the remainder of this paper. Validating CRA and comparison with other methods The aim of the following analysis is to test whether the method proposed is sensitive to amplitude variability while not being sensitive to temporal variability. To do this several sets of synthetic trajectories were produced, in which we introduced variability directly by modifying the amplitude values of the time series (amplitude variability), or indirectly, by transforming the time-scale of the trajectories (temporal variability). Our expectation is that the values of %EDET index obtained from the comparison of the time series will decrease when a given amount of amplitude variability is directly introduced in the time series, but will not decrease when the same amount of amplitude variability is introduced by manipulating the time-scales of the signal. The stimuli are intended to reproduce vertical and horizontal movements of the TT, the LL and the jaw during the production of the utterance /pata/. As a template we used an EMA recording of these articulator trajectories obtained from a German speaker. We constructed 12 groups of stimuli, each composed of 60 time series of fixed length with 6 dimensions (vertical and horizontal components of TT, LL, and jaw displacement), and varied temporal and amplitude variability across the groups (relative to the prototype time-series). The amount of variability characterizing a group of time series corresponds to the average deviation of the 18

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

time-series of the group from the template. To simplify computations, values of variability are computed separately for each dimension in units of standard deviation of the template. This means for instance that a value of 0.3 for the variability characterizing one synthetic vertical trajectory of the jaw indicates that the mean error between the synthetic and the recorded jaw movement is equal to 0.3 times the standard deviation of the recorded vertical movement of the jaw. Further details on the production of the synthetic signals are provided with the supplementary material accompanying this paper. Before being compared, the signals were normalized such that, on each dimension, their mean was 0 and their standard deviation was 1. Some examples of jaw trajectories produced with different values of temporal and amplitude variability are shown in panels a-n of Figure 4. ----------------- Insert Figure 4 around here----------------Results We applied our version of CRA, FDA, and DTW to compare pairwise synthetic stimuli from the 12 different sets, including comparisons without any adjustment of the timescales. Since these signals were produced with the same length, results obtained without any alignment should correspond to results obtained by linear normalization of the time scales. In each of the 12 sets, we grouped the signals into 30 pairs to obtain an equal number of similarity measures. Trajectories were paired following the order in which they were generated (i.e. the first was paired with the second, the third with the fourth and so on). For FDA, we first aligned all the members from the same set, and then we compared the aligned signals by pairs. Panels o-r from Figure 4 show the mean results of pair wise comparisons of the trajectories belonging to the different groups over the value of the variance coefficients. Gray marks indicate values from groups with amplitude variability set to 0.035 and increasing temporal 19

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

variability. Black marks indicate values from groups with temporal variability set to 0.035 and increasing amplitude variability. For the modified CRA, results are given in terms of 100%EDET in order to facilitate direct comparison across panels; measures based on DTW, FDA alignment and linear normalization are obtained by computing the mean error on each dimension after alignment. The results of the comparisons in Figure 4 show that our variant of cross recurrence analysis is the method showing the least sensitivity to the temporal variation while still being sensitive to amplitude variation (see caption). This behavior is observed even in the presence of a considerable amount of temporal variability (up to 0.4 times the standard deviation of the prototype trajectories), while the other indices show sensitivity to even moderate amounts of temporal variability. The poor performance of linear normalization was expected on the basis of its inability to separate amplitude variability from non-uniform temporal variability. The reduced performance of FDA with multivariate signals was also expected, given the difficulty of aligning events across multiple signal components. The poor DTW results are somewhat unexpected as the synthetic signals satisfied the basic requirements for this method (absence of discontinuity in the mapping between the signals and full realization of the basic pattern in both the signals being compared). However the poor performance of DTW may be due to its sensitivity to local differences in peak amplitude values, due the use of the standard deviation as a parameter for the normalization procedure. When changing the time scale of a trajectory, its standard deviation can change considerably. Therefore, if the two time series compared are normalized with respect to the standard deviation, differences in amplitude scales can be introduced. If we normalize instead by peak amplitude (Wang & Gasser, 1997) or with the amplitude range, the DTW results are closer to those obtained using recurrence analysis. 20

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Behavior of cross recurrence analysis with real speech movements We use recordings of articulatory movements in order to illustrate a complete application of the method to test a research hypothesis. The experiment was based on the observation by Rochet-Capellan & Schwartz (2007) that in the speeded repetition of CVCV sequences containing a labial and a coronal consonant the articulatory coordination of the tongue, lip and jaw complex undergoes a reorganization (the “Labial-Coronal” effect). While repeating utterances like /pata/ or /tapa/ at slow speech rates, the jaw was elevated each time a consonant constriction was produced. However at fast speech rates two consecutive jaw raising cycles merged into one. Together with the reduction of the jaw cycles, at fast speech rates the phasing of the other articulators also changes. The lip constriction gesture (LL) tends to get closer in time to the following tongue tip (TT) constriction, and consequently gets further away from the preceding tongue tip constriction. If we assume that the temporal cohesion of speech gestures is an index of their grouping in the same utterance, this phasing is more compatible with a sequence of utterances starting with a labial consonant (like /pata/) than with a sequence of utterances starting with a coronal (like /tapa/). Regardless of the order of the consonants in a given target utterance, at fast speech rates speakers tended to phase the LL and the TT gestures with the first closely preceding the second. Therefore, when repeating an utterance like /tapa/ with an increasing speech rate, speakers tended to reorganize to a LL:TT coordination more compatible with the production of /pata/. This observation is particularly interesting because in the lexical phonotactics of many languages, CVCV sequences containing a labial and a coronal consonant start more

frequently with the labial than with the coronal (MacNeilage & Davis, 2000). Identifying factors which may lead to gestural reorganization under production rate pressure could provide an explanation for this observed cross-linguistic preference. The merging of consecutive jaw cycles has also been observed in the speeded repetition of CVCV disyllables 21

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

by German speakers (Lancia & Fuchs, 2011). In that study it was shown that when one jaw cycle per utterance was observed, the raising of the jaw tended to be synchronized with the production of the coronal constriction. The aim of the following analysis (following Lancia & Fuchs, 2011) is to observe the influence of mechanical constraints on reorganization of this type. We tested whether, as speech rate increased, the utterances produced with the jaw raised during the coronal constriction but not during the labial constriction were less variable (more robust) than utterances produced by raising the jaw during the labial constriction and not during the coronal constriction. This assumption was based on the observation that during the production of a coronal constriction the jaw is more elevated and its position less variable than during the production of a labial constriction (Keating, Lindblom, Lubker, & Kreiman, 1994; Koenig, Lucero & Löfqvist, 2003; Mooshammer et al, 2007). If at fast speech rates the jaw cycle synchronous with the tongue tip is needed to support the formation of the coronal constriction, we can then expect that articulatory patterns in which this jaw cycle is reduced are more likely to change than articulatory patterns in which the jaw cycle synchronous with the lower lip is reduced. In the Lancia and Fuchs (2011) study subjects were asked to produce reiterant speech first at an increasing and then at a decreasing speaking rate over a time interval of 16 seconds tracking a visual metronome. The metronome consisted of successively varying black and white images with a frequency between 3.3 and 20 Hz. We recorded the same speech material as in the study by Rochet-Capellan and Schwartz, extended to front and back tongue articulations, but will only present the results for /pata/, /tapa/, /fata/, /tafa/, /sapa/ and /pasa/. Four speakers of German with no reported history of speech, language or hearing impairment were recorded1. In order to determine which jaw cycle was more reduced, for each utterance we compared peak jaw height during the production of each constriction. If the jaw is raised only for the 22

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

production of the coronal constriction, we expect it to be higher than during the production of the labial constriction. We first computed the ratio of jaw elevation at the coronal closure relative to the labial closure (hereafter called JawRatio) for each utterance in order to determine which jaw cycle would be reduced. Second, in order to test the effect of general jaw elevation on the independence of lower lip motion, the mean elevation of the jaw was computed in each utterance. Third, we computed the %EDET index to compare the similarity of all articulatory motions of consecutive utterances2. Each utterance was represented by the vertical and horizontal displacement of the jaw, the tongue tip (TT) and the lower lip (LL). No embedding was performed, but each dimension was normalized so to have mean 0 and standard deviation equal to 1, and each utterance was resampled to a standardized length of 250 samples. The distance matrix between the time series representing two consecutive utterances was obtained using the maximum norm, and the fixed recurrence rate criterion was adopted to derive the CRP. The recurrence rate was set to 0.05. In order to focus on rapid speech rate behavior, only utterances shorter than 0.3 sec were selected for the analysis (N=596). This threshold was adopted because, as observed by Rochet-Capellan and Schwartz (2007), utterances shorter than 0.3 sec differ qualitatively from longer utterances in many respects. Using these parameters, we ran a mixed-model linear regression analysis in R (www.rproject.org/) using lmer with %EDET as the dependent variable and JawRatio, mean elevation of the jaw, segmental content (/fata/, /pata/, /pasa/, with /pata/ as reference level) and order of the consonants (labial-coronal vs. coronal-labial) as fixed effect predictors, and subjects as random effects. We included all the possible interactions among the predictors and discarded those which did not reach significance in a stepwise fashion 3. We also included a random intercept for each speaker as well as a speaker specific random slope for each predictor. The following significant effects were found: 23

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

1) %EDET was inversely correlated with JawRatio (slope estimate= -0.99, t= -4.05, pMCMC= 0.04); 2) /pasa/ shows smaller values for %EDET with respect to /pata/ (estimate= -2.15, t= -2.96, pMCMC= 0.04); 3) the interaction between the utterance type and the mean elevation of the jaw revealed that in /pata/ the average elevation of the jaw was more positively correlated with %EDET in CL utterances than in LC utterances (estimate= 2.06, t= 2.87, pMCMC= 0.005); 4) The three-way interaction between utterance type, mean elevation of the jaw and the internal order of the consonants revealed that this effect is significantly weaker in /pasa/ utterances (estimate= -1.91, t= -4.8, pMCMC= 0.0001) and that it is likely to be reversed in /fata/ utterances (estimate= -4.2738, t= -1.96, pMCMC= 0.05). The remaining effects did not reach significance. We will focus on the first, since in this context we are less interested in a comparison of different utterances. Due to measurement conventions, the height of the jaw is always negative, because it is deduced from the vertical distance between the jaw sensor and the bite plane. Therefore high values of JawRatio indicate that the jaw is actually lower during the production of the coronal constriction than during the production of the labial constriction; low values for this variable indicate the inverse relation. JawRatio showed a negative correlation with %EDET, meaning that at fast speech rates the collective behavior of the articulators changes less from one utterance to the following one if in the first utterance the jaw is more elevated during the coronal constriction than during the labial constriction.

Problematic cases The analysis strategy adopted in the preceding section, based on pairwise comparisons of utterances produced consecutively in the task, was adopted also using DTW. However we could not observe a significant relation between JawRatio and the similarity scores obtained from application of DTW. A close inspection of those pairwise comparisons for which the two methods gave diverging results revealed three causes of systematic failure of DTW 24

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

alignment. These were 1) differences in local peak amplitudes across the signals; 2) the presence of different articulatory events in the utterances; and 3) differences in the order of the gestures composing the utterances. These cases are exemplified in Figures 5, 6 and 7. By comparing the time series shown in the panels a and b of each figure we obtained the CRP shown in panel c. The gray lines shown with the CRPs correspond to the time warping functions obtained using DTW. Horizontal or vertical portions of the warping function indicate that several points from one time series are lumped together and mapped onto one point of the other time series. In some of the examples, the locations of these errors are indicated on the time series on the left side of the figure. A portion of one time series whose points are mapped onto the same point of the other time series are displayed over a gray background; the point to which the whole portion is mapped is indicated by a vertical gray bar. The time series corresponds to the vertical movement of TT, LL and the jaw as recorded during repetitions of the disyllable ‘pata’. Several errors were observed. The phonetic interpretation of the acoustic signal corresponding to each time series is displayed on the corresponding panel. ----------------- Insert Figure 5 around here----------------The example from Figure 5 shows the consequences of the DTW’s sensitivity to local peak amplitude differences. The errors are produced by the higher elevation of the jaw in the first half of the time series shown in panel b. ----------------- Insert Figure 6 around here----------------The alignment error observed in Figure 6 is due to the presence of an additional event in the time series contained in panel b of that Figure, where the gestures needed to produce the syllable /ta/ are repeated. Since the mapping produced by DTW is continuous the same events must be contained in the same order in the time-series compared. ----------------- Insert Figure 7 around here----------------25

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

The errors observed in Figure 7 are due to the inversion of the relative order of the two syllables in the time-series shown in panel b. Although the examples illustrated in Figures 5 and 6 are correctly handled by our variant of cross recurrence analysis, the differences in the order underlying the gestures in the two utterances may affect the results. It is expected that changes in the relative order of the articulatory gestures will affect the shape of their trajectories because the starting point of each gesture changes from one order to the other. In such cases our method correctly reports the differences. However we cannot exclude a priori the possibility that the same gestures are produced similarly but in reversed order in the two utterances compared. In such cases the quantification of similarity/dissimilarity between the two utterances should be treated with a degree of skepticism. The %EDET measure, and its classical version %DET, are local measures of similarity between two time series: they refer to the similarity in the changes from one point in time to the next one. These measures are thus minimally affected by the change in the relative order between entire portions of trajectory. Depending on the purpose of our analysis, it may be useful to detect these cases and treat them separately from others. The following approach can be used to identify such cases, in which the gestures which result in similarities across the utterances are not produced in the same order. We can consider the distribution of the distances between the points belonging to continuous lines in the plot and the main diagonal of the plot, and characterize this distribution using one of the available tests for unimodality (e.g. Silverman, 1981; Hartigan & Hartigan, 1985). If this distribution is unimodal we can infer that the correctly repeated portions of utterances are being produced in the same order. However, if the distribution is strongly multimodal the ordering of the segments is likely changing from one utterance to the other.

26

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Discussion The approach to cross recurrence analysis proposed in the current paper differs from other implementations of this technique in several aspects: due to the introduction of a cleaning step time delay embedding can be avoided, the %EDET index is robust against temporal variability, and artifacts introduced by non-stationarity and variable smoothness of the signals are reduced. The results of the analysis using synthetic stimuli show that the additional artifacts introduced by the lack of an embedding step are efficiently handled by our cleaning algorithm. Eliminating embedding considerably reduces the number of parameters to be controlled in the application of this technique, and is preferable to embedding with an inappropriate number of dimensions (delayed copies of the original trajectory) and/or inappropriate time delay, which results in artifacts (Marwan et al., 2007). This is particularly useful in the comparison of speech signals given the assumption of nonstationarity. For such signals appropriate values for the embedding parameters may be hard to define. A remaining parameter of importance controls the selectivity for mapping the (cross) recurrence points. If the fixed recurrence rate criterion is adopted in the computation of the CRP, this parameter corresponds to the percentage of locations in the CRP which are promoted to recurrences before any cleaning step is performed. By modulating the sensitivity of the analysis, this parameter gives flexibility to the method. Importantly, the proposed %EDET index is much less sensitive to the parameter governing the selectivity of the CRP than the classical %DET because artifacts resulting from a too lax a criterion are reduced in the cleaning steps. With both the synthetic and the natural signals analyzed in this paper we tested several values for this parameter ranging between 0.01 and 0.1 without obtaining qualitative changes in the results (Webber & Zbilut, 2009 suggest a value equal to 0.05). However fine-tuning still improves the analysis. An automatic method to choose the optimal 27

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

value for this parameter has been proposed by Schinkel, Dimigen and Marwan (2008). According to these authors the criterion which selects the recurrence points should be tuned in order to maximize the separation between the signals as sorted according to the experimental factors. In this work we have shown some preliminary applications of a variant of Cross-Recurrence Analysis to a study of the Labial-Coronal Effect in a speeded repetition task. The acceleration paradigm is designed to induce a reorganization of articulatory coordination so that even when subjects produce a coronal consonant first and a labial second, they will frequently shift to the reverse patterns during fast speech. In contrast, when subjects start with a labial consonant first and a coronal second, they will not completely reorganize their articulatory motion. In our analysis we wanted to investigate the articulatory coordination between the jaw, tongue tip and the lower lip. We hypothesized that during fast speech the jaw is only raised once per utterance, and that jaw raising occurs synchronously with tongue tip closure, but not with the lower lip. Thus, the coupling between the jaw and the tongue tip is stronger for the production of an alveolar stop than the coupling between the jaw and the lower lip during the production of a bilabial stop. The method based on CRA we applied here is one of few that can investigate the variability of coordinated motions over time, and is thus a promising tool for research applications within speech production and speech pathology. In speech production for instance this tool can be used to investigate variability in coordinated actions over extended sequences, which has the potential to further development of theoretical concepts that consider all articulators and their respective contributions to the produced utterance (e.g. Articulatory Phonology or exemplar models of speech production). Furthermore, results could be used to contrast approaches that focus only on the variability of

28

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

the main articulator involved in the production of a certain phoneme (e.g. the presence vs. absence of a given articulatory feature in feature geometry). The CRA method discussed here is also a potentially valuable tool for speech pathology. A particular characteristic of many speech pathologies is that articulatory motions deviate from normal patterns and are generally more variable. Identifying the factors associated with articulatory variability in a given pathology would provide crucial information concerning the nature of the pathology itself. For example, numerous examples in studies of disfluencies show that perceptually fluent speech can be produced by stutterers, but that it differs from fluent speech produced by normal speakers (Zimmermann, 1980; Caruso, Abbs & Gracco, 1988; Smith & Kleinow, 2000). In some studies it is suggested that stuttering is an impairment of general motor control functions (e.g. Max, Caruso & Gracco, 2008; Van Lieshout, Bose, Square, & Steele, 2007; but see McClean, Tasko & Runyan, 2004 for an alternative hypothesis.). Tests of this hypothesis generally include a comparison between the variability observed across utterances produced by normal speakers and the variability observed across fluent utterances produced by stutterers, and the %EDET measure introduced above could provide a useful metric for performing such comparisons. Our measure of variability can more generally be used to quantify data collected through diadochokinetic speech tasks which are often used as a diagnostic tool (e.g. Kent, Kent & Rosenbek, 1987) for children with normal speech skills (Yaruss & Logan, 2002) but also for children with persistent speech disorders (e.g., Wren, Roulstone & Miller, 2012; Preston & Edwards, 2009), for patients with Parkinson’s disease (e.g., Wong, Murdoch & Whelan, 2012; Karlsson et al., 2011), stuttering (e.g., Loucks & de Nil, 2006), aphasia (Bose & van Lieshout, 2012) or persons with amyotrophic lateral sclerosis (Mefferd, Green & Pattee, 2012). The analysis of data collected in such tasks is often based on some measure of articulatory variability. The application of CRA to this case would thus be straightforward (and basically 29

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

identical to the application to the repeated speech task described in this paper), allowing discussion of repetitive speech behavior and its evolution in time within a speaker, identification of portions of utterances where the articulatory behavior is more variable, quantification of changes in the amount of variability over the duration of a sequence of repeated utterances, separation of temporal and spatial variability, and support for models of the underlying control mechanisms (Xu, 2010). Conclusion In this paper we have illustrated the basic features of cross recurrence analysis; we adapted these concepts to the comparison of multivariate time-series, and have introduced a new index of similarity/dissimilarity called %EDET. This index is computed after removing spurious recurrences from the cross-recurrence plots, and supports multidimensional comparison of motion patterns observed across multiple articulators. Through the application of this method to the pairwise comparisons of articulatory movements from a given sample we can estimate the amount of articulatory variability in the sample. The method is sensitive to differences in amplitude variability across different events while being relatively insensitive to variability in their relative timing. Importantly this approach does not rely on time normalization of the motion patterns. Other methods used to compare movement trajectories require an aligning step, which is prone to errors when comparing multivariate trajectories or trajectories which differ qualitatively (i.e., which do not present the same events in the same order). Our implementation of cross recurrence analysis is thus a useful alternative given high levels of temporal and amplitude variability, and it is therefore suitable for applications where other methods would be unreliable, as for example in the analysis of speech errors, diadochokinetic tasks or pathological speech.

30

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Acknowledgements This work was partially sponsored by a grant from the BMBF (01UG0711) and a grant from the German French University in Saarbrücken given to the PILIOS project.

References Ballard, K.J., Robin, D.A., Woodworth, G. & Zimba, L.D. (2001). Age-related changes in motor control during articulator visuomotor tracking. Journal of Speech, Language and Hearing Research, 44(4), 763–777. Barbosa, A.V., Déchaine, R.-M., Vatikiotis-Bateson, E. & Yehia, H.C. (2012). Quantifying time-varying coordination of multimodal speech signals using correlation map analysis. Journal of the Acoustical Society of America,131(3), 2162–2172. Benoit, C. (1986) Note on correlation analysis in speech timing. Journal of the Acoustical Society of America,80(6), 1846–1848. Bose A. & van Lieshout P. (2012). Speech-like and non-speech lip kinematics and coordination in aphasia. International Journal of Language and Communication Disorders, 47(6), 654–672. Caruso, A. J., Abbs, J. H., & Gracco, V. L. (1988). Kinematic analysis of multiple movement coordination during speech in stutterers. Brain, 111, 439–456. Conte, E., Vena, A., Federici, A., Giuliani, R. & Zbilut, J. (2004). A brief note on possible detection of physiological singularities in respiratory dynamics by recurrence quantification analysis of lung sounds. Chaos, Solitons & Fractals, 21(4), 869–877. Eckmann, J., Kamphorst, S. & Ruelle, D. (1987). Recurrence plots of dynamical systems. EPL Europhysics Letters, 4, 973–977.

31

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Fuchs, S., Perrier, P. & Hartinger, M. (2011). A critical evaluation of gestural stiffness estimations in speech production based on a linear second-order model. Journal of Speech, Language and Hearing Research, 54, 1067–1076. Garcia, S. P., & Almeida, J. S. (2006). Multivariate phase space reconstruction by nearest neighbor embedding with different time delays. Phys. Rev. E, 72(2), 027205. doi:10.1103/PhysRevE.72.027205. Hartigan, J., A. & Hartigan, P., M. (1985). The dip test for unimodality. The annals of statistics. 13(1), 70–84. Huang, B.H. & Jun, S.A. (2011). The effect of age on the acquisition of second language prosody. Language and Speech, 54(3), 387–414. Iwanski, J. & Bradley, E. (1998). Recurrence plots of experimental data: To embed or not to embed. Chaos, 8(4), 861–871. Karlsson, F., Unger, E., Wahlgren, S., Blomstedt, P., Linder, J., Nordh, E., Zafar, H. & van Doorn, J. (2011). Deep brain stimulation of caudal zona incerta and subthalamic nucleus in patients with Parkinson's disease: Effects on diadochokinetic rate. Parkinsons’ Disease, 2011:605607. Epub. Keating, P. A., Lindblom, B., Lubker, J., & Kreiman, J. (1994). Variability in jaw height for segments in English and Swedish VCVs. Journal of Phonetics, 22(4), 407-422. Kelso, J.A.S, Vatikiotis-Bateson, E., Saltzman, E.L. & Kay, B. (1986). A qualitative dynamic analysis of reiterant speech production: Phase portraits, kinematics, and dynamic modeling. Journal of the Acoustical Society of America, 77(1), 266–280. Kent, R.D., Kent, J.F. & Rosenbek, J.C. (1987). Maximum performance tests of speech production. Journal of Speech and Hearing Disorders, 52(4), 367–387. Keogh, E. & Pazzani, M. (2001). Derivative dynamic time warping. Proceedings of the First SIAM International Conference on Data Mining, Chicago, USA. 32

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Koenig, L., Lucero, J., & Löfqvist, A. (2003). Studying articulatory variability using functional data analysis. In Proceedings of the 15th International Congress of Phonetic Sciences, 269-272. Krakow, R. A. (1999). Physiological organization of syllables: a review. Journal of Phonetics, 27(1), 23-54. Lancia, L. & Fuchs, S. (2011). The labial coronal effect revisited. Proceedings of the 11th International Seminar on Speech Production, Montreal, Canada, 187-194. Lancia, L. & Tiede, M. (2012). A survey of methods for the analysis of the temporal evolution of speech articulator trajectories. In S. Fuchs & P. Perrier (eds.), Speech Planning and Dynamics, 233–271. Frankfurt am Main, Germany: Peter Lang. Lindblom, B., Brownlee, S., Davis, B., & Moon, S. J. (1992). Speech transforms. Speech Communication, 11(4), 357-368. Löfqvist, A., & Yoshioka, H. (1984). Intrasegmental timing: Laryngeal-oral coordination in voiceless consonant production. Speech Communication, 3(4), 279-289. Loucks, T.M. & De Nil, L.F. (2006). Oral kinesthetic deficit in adults who stutter: A targetaccuracy study. Journal of Motor Behaviour, 38(3), 238–246. Lucero, J. (2005). Comparison of measures of variability of speech movement trajectories using synthetic records. Journal of Speech, Language, and Hearing Research, 48(2), 336– 344. Lucero, J. & Koenig, L. (2000). Time normalization of voice signals using functional data analysis. Journal of the Acoustical Society of America, 108, 1408–1420. Lucero, J., Munhall, K., Gracco, V. & Ramsay, J. (1997). On the registration of time and the patterning of speech movements. Journal of Speech, Language, and Hearing Research, 40(5), 1111–1117.

33

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Lucero, J. C., & Löfqvist, A. (2003). Functional data analysis of articulatory variability in VCV sequences. Proceedings of the 6th International Seminar on Speech Production, Sydney, Australia, 156-160. MacNeilage, P. F., & Davis, B. L. (2000). On the origin of internal structure of word forms. Science, 288(5465), 527-531. Maizel, J. & Lenk, R. (1981). Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proceedings of the National Academy of Sciences, USA,78(12), 7665–7669. March, T., Chapman, S. & Dendy, R. (2005). Recurrence plot statistics and the effect of embedding. Physica D: Nonlinear Phenomena, 200(1–2), 171–184. Marwan, N. (2010). How to avoid potential pitfalls in recurrence plot based data analysis. International Journal of Bifurcation and Chaos, 21(4), 1003–1017. Marwan, N., Romano, M., Thiel, M. & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5–6), 237–329. Marwan, N. & Kurths, J. (2002). Nonlinear analysis of bivariate data with cross recurrence plots. Physics Letters A, 302(5–6), 299–307. Marwan, N. & Kurths, J. (2005). Line structures in recurrence plots. Physics Letters A, 336(4– 5), 349–357. Marwan, N., Thiel, M. & Nowaczyk, N. (2002). Cross recurrence plot based synchronization of time series. Nonlinear Processes in Geophysics, 9(3–4), 325–331. Max, L., Caruso, A. J., & Gracco, V. L. (2003). Kinematic analyses of speech, orofacial nonspeech, and finger movements in stuttering and nonstuttering adults. Journal of Speech, Language and Hearing Research, 46(1), 215–232. McClean, M. D., Tasko, S. M., & Runyan, C. M. (2004). Orofacial movements associated with fluent speech in persons who stutter. Journal of Speech, Language and Hearing Research, 47(2), 294-303. 34

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Mefferd, A.S., Green, J.R. & Pattee, G. (2012). A novel fixed-target task to determine articulatory speed constraints in persons with amyotrophic lateral sclerosis. Journal of Communication Disorders, 45(1), 35–45. Mooshammer, C., Hoole, P., & Geumann, A. (2007). Jaw and order. Language and Speech, 50(2), 145-176. Perkell, J. S., & Klatt, D. H. (1986). Invariance and variability in speech processes. In Symposium on Invariance and Variability of Speech Processes, Cambridge, MA, US. Lawrence Erlbaum Associates, Inc. Preston, J.L. & Edwards, M.L. (2009). Speed and accuracy of rapid speech output by adolescents with residual speech sound errors including rhotics. Clinical Linguistics & Phonetics, 23(4), 301–318. Ramsay, J. & Silverman, B. (1997). Functional Data Analysis. Springer, New York. Rosenfeld, A. & Pfaltz, J.L. (1966). Sequential operations in digital picture processing. Journal of the Association for Computing Machinery, 13(4), 471–494. Sakoe, H. & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43– 49. Schinkel, S., Dimigen, O. & Marwan, N. (2008). Selection of recurrence threshold for signal detection. The European Physical Journal-Special Topics, 164(1), 45–53. Schinkel, S., Marwan, N. & Kurths, J. (2007). Order patterns recurrence plots in the analysis of ERP data. Cognitive Neurodynamics, 1(4), 317–325. Shockley, K., Santana, M.V. & Fowler, C.A. (2003). Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception and Performance, 29(2), 326–332.

35

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Schöner, G., Martin, V., Reimann, H., & Scholz, J. P. (2008). Motor equivalence and the uncontrolled manifold. In Proceedings of the 10th International Seminar on Speech Production, Strasbourg, France. 23–28. Silverman, B.W. (1981). Using kernel density estimates to investigate multimodality, Journal of the Royal Statistical Society. B 43, 97-99. Small, M. (2005).Applied nonlinear time series analysis: applications in physics, physiology and finance. World Scientific Series on Nonlinear Science, Series A: Monographs and Treatises, 52, Hackensack, NJ: World Scientific Publishing Co. Pte. Ltd. Smith, A. & Goffman, L. (2004) Interaction of language and motor factors in speech production. In B. Maasen, R. D. Kent, H. Peters, P. van Lieshout & W. Hulstijn (eds.), Speech Motor Control in Normal and Disordered Speech, 225–252.Oxford, England: Oxford University Press. Smith, A., Goman, L., Zelaznik, H., Ying, G. & McGillem, C. (1995). Spatio temporal stability and patterning of speech movement sequences. Experimental Brain Research, 104(3), 493–501. Smith, A., & Kleinow, J. (2000). Kinematic correlates of speaking rate changes in stuttering and normally fluent adults. Journal of Speech, Language and Hearing Research, 43(2), 521– 536. Takens, F. (1981). Detecting strange attractors in turbulence. In Rand, D., A. &Young, L., S. (Eds.), Dynamical Systems and Turbulence, Lecture Notes in Mathematics, vol. 898, 366–381. Berlin, Germany: Springer-Verlag. Thiel, M., Romano, M., Read, P. & Kurths, J. (2004). Estimation of dynamical invariants without embedding by recurrence plots. Chaos, 14(2), 234–243. Van Lieshout, P. & Moussa, W. (2000).The assessment of speech motor behaviors using electromagnetic articulography. The Phonetician, 81(I), 9–22. 36

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Van Lieshout, P. H., Bose, A., Square, P. A., & Steele, C. M. (2007). Speech motor control in fluent and dysfluent speech production of an individual with apraxia of speech and Broca's aphasia. Clinical linguistics & phonetics, 21(3), 159-188. Van Lieshout, P. & Namasivayam, A., K. (2010). Speech motor variability in people who stutter, chapter 11, in Maassen, B. & van Lieshout, P. (Eds.), Speech Motor Control: New Developments in Basic and Applied Research, 190–214. Oxford, England: Oxford University Press. Vlachos, I., & Kugiumtzis, D. (2008). State space reconstruction for multivariate time series prediction. Nonlinear Phenomena in Complex Systems, 11(2), 241–249. Wang, K. & Gasser, T. (1997). Alignment of curves by dynamic time warping. The Annals of Statistics. 25(3), 1251-1276. Webber Jr., C. & Zbilut, J. (1994). Dynamical assessment of physiological systems and states using recurrence plot strategies. Journal of Applied Physiology, 76(2), 965–973. Wong, M.N., Murdoch, B.E. & Whelan, B.M. (2012). Lingual kinematics during rapid syllable repetition in Parkinson's disease. International Journal of Language & Communication Disorders, 47(5), 578–588. Wren, Y.E., Roulstone, S.E. & Miller, L.L. (2012).Distinguishing groups of children with persistent speech disorder: findings from a prospective population study. Logopedics Phoniatrics & Vocology, 37(1), 1–10. Xu, L. (2010). In defense of lab speech. Journal of Phonetics, 38, 329–336. Yaruss, J.S. & Logan, K.J. (2002). Evaluating rate, accuracy, and fluency of young children's diadochokinetic productions: a preliminary investigation. Journal of Fluency Disorders, 27(1), 65–85.

37

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Young, G.S., Rogers, S.J., Hutman, T., Rozga, A., Sigman, M. &Ozonoff, S. (2011). Imitation from 12 to 24 months in autism and typical development: A longitudinal Rasch analysis. Developmental Psychology, 47(6), 1565–1578. Zbilut, J. P., Giuliani, A., & Webber, C. L. (1998). Detecting deterministic signals in exceptionally noisy environments using cross-recurrence quantification. Physics Letters A, 246(1), 122-128. Zbilut, J., Hu, Z., Giuliani, A. & Webber Jr, C. (2000). Singularities of the heart beat as demonstrated by recurrence quantification analysis. Engineering in Medicine and Biology Society, 2000. Proceedings of the 22nd Annual International Conference of the IEEE, (4), 2406–2409. Zbilut, J., Thomasson, N. & Webber, C. (2002). Recurrence quantification analysis as a tool for nonlinear exploration of nonstationary cardiac signals. Medical Engineering &Physics, 24(1), 53–60. Ziegler, W., Staiger, A., & Aichert, I. (2010). Apraxia of speech: what the deconstruction of phonetic plans tells us about the construction of articulate language. In Mussen, B & Van Lieshout, P. (Eds.), Speech motor control: New developments in basic and applied research, 1, 3 -22. Oxford university Press. Oxford: Uk. Zimmermann, G. (1980). Stuttering: A disorder of movement. Journal of Speech, Language and Hearing Research, 23(1), 122-136

Footnotes
1

Articulatory movements were recorded by means of electromagnetic articulography (EMA;

Carstens AG100). Three sensor coils were glued on the speakers’ tongue, the most anterior one about 1 cm from the tip, the most posterior one about 5 cm from the tip and the third one equidistant between them. Sensors for tracking lower lip and jaw movements were glued on 38

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

the vermillion border of the lower lip and just below the lower incisors. Two reference sensors, one attached at the bridge of the nose and the other just above the upper incisors were used to compensate for head movement in the helmet. A bite plane served as a reference to determine the vertical and horizontal coordinates of each individual vocal tract. Movements were recorded with a sampling rate of 200 Hz, and concurrently recorded audio at 22050 kHz. Articulatory movements were low-pass filtered at 18 Hz.
2

The choice of conducting comparisons only between consecutive utterances is specific to

this experimental design. In this way we can measure the influence of the relative elevation of the jaw during the production of each utterance on the precision of its repetition. In other experimental designs different comparison strategies may be adopted.
3

A non significant interaction was retained if a likelihood ratio test showed that the model

with the interaction fitted the data significantly better than the model without the interaction.

Figure captions Figure 1. Panel (a): sine of θ over the range [0 - 6π]. Panel (b): continuous line: sine of θ over the range [0 - 6π]; dashed line sine of θ over the range [π/2 - 6π + π/2]. Panel (c): RP of the time-series shown in panel (b). Empty and the filled circles at the time series indicate a recurring state vector occurring three times. Similar symbols indicate the corresponding points in the RP. Panel (d): CRP obtained from the comparison of the time-series plotted against the horizontal and vertical axis of the plot. The time series plotted against the vertical axis is obtained by adding perturbations to both amplitude and time-scale of the time series plotted 39

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

against the horizontal axis. Panels (e) and (f): magnification of two regions of the CRP in panel (d). Figure 2. Cross recurrence plot of JAW and TT movement signals. Panel (a): jaw movement. Panel (b): TT movement. Panels (c) and (d): mutual information from time series in (a) and (b) over lag. The lag corresponding to the first minimum is indicated by the vertical dashed line. Panels (e) and (f): Nearest neighbors over number of dimensions for the time series in (a) and (b) obtained with τ = 16. Panels (g) and (h): Reconstructed time series. The different dimensions are shifted on the y axis in order to avoid overlap. Panel (i): distance matrix obtained by all possible pair wise comparisons between the frames of the time series in (g) and the frames from the time series in (h). Euclidean distance was used. Panel (l): Cross recurrence plot obtained from the matrix in (i) using a fixed threshold equal to 0.1. Figure3. Panels (a) and (b): JAW LL and TT movement signals in the vertical direction during two productions of the utterance /pasa/ (one production per panel). Panels (c) and (d): example of CRPs computed with the standard procedure panel (c) and at the end of the processing steps panel (d). The CRPs were obtained by comparing the time series displayed in panels (a) with the time series in panel (b). Figure 4. Examples from the 12 sets of synthetic trajectories for jaw vertical movement (10 trajectories per set shown) and results of the analysis of variability. Panels (a - f): trajectories obtained setting to 0.035 the coefficient for the temporal variance. Panels (g - n): trajectories obtained setting to 0.035 the coefficients for the amplitude variances. The 6 values used for the varying coefficient in each row are 0.02, 0.1, 0.2, 0.3, 0.4, 0.5. Panel (o): results of the comparisons conducted with our adaptation of cross recurrence analysis. Panels (p – r): results obtained with DTW, FDA and linear normalization. The circles indicate the median values from the comparisons among members of the 12 groups illustrated in panels (a – n). The bars indicate the standard deviations and are centered over the mean values. The x axis shows the 40

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

value of the variability coefficient which varies across the groups. The black circles are obtained from groups characterized by different amplitude variability coefficients. The gray circles are obtained from groups characterized by different temporal variability coefficients. Figure 5. Example of problematic comparison. Panels (a) and (b): Vertical displacement of JAW, LL and TT during two productions of /pata/. Panel (c): CRP obtained from the time series in panels (a) and (b). The gray bold line in panel (c) indicate the warping function obtained using DTW. See text for details. Figure 6. Example of problematic comparison. Panels (a) and (b): Vertical displacement of JAW, LL and TT during two productions of /pata/. Panel (c): CRP obtained from the time series in panels (a) and (b). The gray bold line in panel (c) indicate the warping function obtained using DTW. See text for details. Figure 7. Example of problematic comparison. Panels (a) and (b): Vertical displacement of JAW, LL and TT during two productions of /pata/. Panel (c): CRP obtained from the time series in panels (a) and (b). The gray bold line in panel (c) indicate the warping function obtained using DTW. See text for details.

41

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014

Downloaded From: http://jslhr.pubs.asha.org/ by Zentrum Fuer Allgemeine Sprachwissenschaft, Susanne Fuchs on 04/14/2014