Running Gait Paper

Vietoris-Rips Persistence on Running
Gait-Based Features
Evan Carter, Alexander Rubin, Nikhil Pulimood
Duke University
August 11, 2019

Abstract
This project looks to explore the use of Topological Data Analysis (TDA) in running gait analysis. The major
motivation of this project was to identify biological features that affect a person’s running gait, similar to
how certain features have affected a person’s walking gait. In the process of cleaning our data, we developed
an algorithm to produce the median length of a runner’s stride. After calculating the median stride length,
we performed a persistent homology analysis using a Vietoris-Rips (VR) Complex on this normalized stride
path. Our final results showed that height and speed are directly correlated with the persistent homology of
specific sensors on a runner’s body.
I. Introduction lected the data, placed various sensors across

each runner’s body, as seen in figure 1 be-
unning gait analysis has received a lot
R
low. The runners were then each placed on
of contemporary interest in the field of a treadmill, warmed up, and asked to run at a
gait pattern recognition. Since walking slow pace (2.5 m/s), a medium pace (3.5 m/s),
gait pattern has proven to be a unique human and a fast pace (4.5 m/s). We call these dif-
identifier [1, 2], it was natural for us to try ferent speeds 25, 35, and 45 respectively. The
to understand whether this attribute exists in room with the treadmill was surrounded by
running gait data. Running gait is inherently a motion capturing cameras that collected the
cyclic motion, and thus time series persistent sensor data.
homology made sense as a form of analysis.
An interesting point raised by Lamar-Leon
et al. [3, 2, 4] is that with persistent homol-
ogy, connected components, holes, and cycles
are useful indicators, even as the shape of the
data gets overwhelmed with a large number
of data points. Therefore, with the large num-
ber of data points gathered with running gait
data, persistent homology is a useful tool to
analyze this data in an effective way. Therefore,
we chose to use persistent homology as our
primary tool in analyzing running gait. Figure 1: The Thigh Top Medial sensor is circled green
and the Heel Bottom sensor is circled red.
II. Dataset
i. Set Up ii. Sensors Used

Our data was collected and compiled into an Originally, we planned on looking at all the
open dataset by [5]. The researchers, who col- sensors of the lower leg, since many of the
1
Persistence on Running Gait • May 2018
papers that we read on gait pattern used this

part of the leg in their analyses. However, we
found that many of our chosen sensors were
difficult to process in MATLAB due to data
corruption. This is why we decided to only
use the thigh top medial and heel bottom on a
runner’s leg. These feature can be seen below
in figure 1.
III. Methods
Below are the steps that we used to normalize
the dataset and perform persistence using a
Figure 2: 450 data points of a runner’s point cloud with
VR-complex:
left foot, 2.5 m/s, and feature thigh top medial.
• Returned the point cloud of a person at an
inputed foot, speed, and feature.
• Ran a sliding window on the point cloud.
that is a very similar to a forward moving
• Returned the critical points of the x-
average with a specified step size. We first
coordinates from the sliding window aver-
initialized an array to hold the averages and
ages.
subsequently iterated through the point cloud
• Removed "fake" critical points (see page 3
with a counter i. During each iteration, we
for more information).
calculated the x,y,z averages of the i row, i + 1
• Returned the median stride path of the
row, i + 2 row, up to i + stepsize row. The
sliding window averages.
sliding window algorithm can be found on
• Returned the max filtration value by calcu-
our GitLab at
lating the max distance between consecu-
https://gitlab.oit.duke.edu/acr43/
tive data points and multiplying by some
running-gait/blob/master/get_sliding_
factor.
window.m.
Point Cloud
After trial-and-error, we developed the above
methods to normalize the dataset in order to
run TDA. Firstly, we calculated a point cloud
by initializing a struct data structure in MAT-
LAB that reads a file for an inputed person and
speed. While iterating through the rows of the
struct, we retrieved the x and y-coordinates of
the inputed feature from each row, and stored
these points in an array. We then exported and
visualized the array. In order to better visualize
the stride paths of the runner, we shrunk the
point cloud by an inputed factor.
Figure 3: 450 data points of a runner’s point cloud with
left foot, 2.5 m/s, and feature thigh top medial
Sliding Window after performing the sliding window with step
In order to clean up the stride paths, we size 20.
implemented a sliding window algorithm
2
Critical Points are very important, since when running a

VR-complex on points with uniform gaps,
In the figure above, many of the stride paths
these points in the stride would become
overlap. Consequently, when running a VR-
connected nearly at the same time. We began
complex on the sliding window averages, the
to analyze the local x-maxima, and we realized
complex would detect many unwanted cycles.
that at low speeds, there were several x-values
Therefore, we came up with an approach to
that should not have been considered be local
take the average of all the strides of a runner
maxima. These incorrect values were very
at a particular foot, speed, and feature. In
close to the real local maxima, and thus the
order to do this, we needed to calculate the
calculated time distances between the real
average time of a stride. We assumed that
maxima and the "fake" maxima would be very
the x-coordinates of the runner would be cycli-
minuscule relative to other time distances. This
cal, since the runner would always begin a
would greatly affect our calculated average
new stride after his previous had concluded.
time distance, and would sometimes affect
We considered the time distances between the
the median time distance if there were many
peaks of each stride, and we assumed that each
"fake" local maxima. These "fake" maxima
peak would correspond with a local maximum
points existed primarily at low speeds. We
from the x-coordinates. Therefore, we calcu-
believe that runners are not going to have
lated the time distances between each local
uniform strides at low speeds. We developed
maximum and returned the average of these
an algorithm to remove these points, and the
distances.
code can be found at
Median Stride Path running-gait/blob/master/get_peaks.m.
In order to calculate the average stride of a
runner, we initialized an array with a size
equal to the average time distance. This array
would hold the points of the average stride.
We initiated a counter i that starts at 1 and
ends at the average time distance, and we
iterated through our new initialized array with
this counter. We assigned each row of our
array equal to the average of the point cloud’s
ith row, i + average time distance row, i +2∗
average time distance row, onward. The code
can be found at
running-gait/blob/master/get_median_ Figure 4: "Fake" local maxima from x-coordinates of a
stride_path.m. runner with left foot, 2.5 m/s, and feature
thigh top medial.
"Fake" Critical Points

After removing the "fake" local maxima,
This did not prove to be effective, as many of there were still some strides at low speeds that
the stride paths were not complete. Instead of were not getting completed. We decided to
taking the average time distance, we decided run our median stride path algorithm on the
to take the median time distance. This seemed averages from our sliding window instead of
to work for many of the runners at high on the point cloud. This proved to be highly
speeds, but at low speeds, we were not getting effective, and the strides for each runner in
uniform gaps between points. Uniform gaps the dataset were completed at any speed. We
3
stuck to the median stride instead of the aver- m/s respectively. The shortest runner (Person
age stride, since we were unsure if there were 13) has a height of 165.2 cm, roughly 5’5". The
still some "fake" critical points that we were average height runner (Person 22) has a height
not detecting with our algorithm. of 175 cm, roughly 5’9". The tallest runner
(Person 14) has a height of 187.2 cm, roughly
6’1.6".
There are two major take aways from our

TDA analysis. Firstly, the barcodes of the heel
bottom could give a good indication of speed.
At 2.5 m/s, the one dimensional chain of the
heel bottom for both runners terminated be-
tween 255 and 260. At 4.5 m/s, however, the
one-dimensional cycle for both runners termi-
nated between 410 and 420. Secondly, the bar-
codes of the thigh top medial could give a good
indication of height. As seen in figures for the
thigh top medial at a speed of 25, there are two
Figure 5: Median stride of a runner at 2.5 m/s and fea- distinct one dimensional cycles. For Person 14
ture thigh top medial, using a step size of 20
(tallest), the longest living cycle terminated at
for the sliding window averages.
at step 27, whereas Person 13 (shortest) termi-
nated at 16. For person 22 (medium height) the
longest living one dimensional cycle was at 23.
Max Filtration Value As expected person 23’s termination between
After finally calculating the median stride the short and tall person, but interestingly it is
path, we calculated the max filtration value in 4 steps off from the tall person, and 7 steps off
order to run persistence. The algorithm which from the short person. Even though the middle
we used to calculate the max filtration value height is 12.2 cm shorter than the tall, and 9.8
can be found at cm taller than the short person. Showing that
https://gitlab.oit.duke.edu/acr43/ the results are not linear because the middle
running-gait/blob/master/get_distance. height termination step is about twice as far
m. from the short person as the tall.
With the median stride path and max filtra-
tion value, we ran a Vietoris-Rips complex on The persistent homology bar code diagrams
a Euclidean Metric Space of the median stride for feature heel-bottom for the three runners
path. We used the Javaplex library in MATLAB can be found at
to perform the persistence. https://gitlab.oit.duke.edu/
acr43/running-gait/tree/master/
IV. Results persistent-homology-diagrams/
heel-bottom.
We computed the barcodes of different runners
from the dataset, and we specifically chose The persistent homology bar code diagrams
three of these runners for our discussion here. for feature thigh top medial for the three
We chose a runner who was tall, a runner who runners can be found at
had an average height, and a runner who was https://gitlab.oit.duke.edu/
short. We analyzed the bar code diagrams of acr43/running-gait/tree/master/
their thigh top medial and heel bottom at a persistent-homology-diagrams/
slow speed and at a fast speed, 2.5 m/s and 4.5 thigh-top-medial.
4
Figure 6: Persistent homology of thigh top medial for a Figure 7: Persistent homology of thigh top medial for a
runner with height 165.2 cm and speed 2.5 runner with height 187.2 cm and speed 2.5
m/s. m/s.
V. Discussion
i. Over Fitting
The nature of our dataset made it necessary
to normalize our data to perform any sort of
analysis. As shown earlier, it was not possible
to perform persistent homology on the original
point cloud because there was close to no rec-
ognizable shape. Our normalization method
created a median stride path, which could be
compared between subjects. The result of this Figure 8: Point cloud of thigh top medial, which was
method can be seen in figure 8. We found dif- normalized using the sliding window algo-
ferences in the subjects’ stride paths and drew rithm with step size 20 and then the median
stride path algorithm.
conclusions with this data. (show image)
Unfortunately, it is possible that our
normalization techniques could cause us to
lose some aspects of a runner’s gait pattern. algorithm, we later found that modifications to
Perturbations that can be seen in our removal the algorithm make the sliding window unnec-
of "fake" local maxima may be important essary. If we were to continue our research, we
features that we are smoothing. We developed would remove the effect of the sliding window
a time series animation of a runner’s strides and hopefully retain some lost information.
along with his median stride path in order to Since a gait pattern is considered to be similar
visualize any stride details that we could be to a fingerprint, these fine details may be es-
losing with our normalization techniques. The sential components for our analysis. Moreover,
video can be found at the main form of our analysis uses TDA, and
https://gitlab.oit.duke.edu/acr43/ thus the general shape of the data is the most
running-gait/blob/master/Thigh_Heel_ important aspect. While the exact shape of the
Path_Animation%20(Converted).mov. stride can change, the number of connected
While the sliding window was originally im- components and the overall homology of the
portant to clear up the data for our stride path data stays consistent [2].
5
ii. Further Research Running Form

One area that would have also been interest-
There are some areas of research which would
ing to explore would have been an analysis
have been interesting directions to explore,
of running form efficiency and its relationship
which we either did not have the data for or
to persistent homology. A runner’s form is
resources to pursue. When there are better
not readily available, since we do not have the
data science tools, there will be better meth-
videos of our runners. However, we could po-
ods for helping treatment with injury and and
tentially recreate the motion using an animated
biomechanical research. [6]
point cloud with all the features. We started
to visualize this in the animation link above.
Given our experience with this type of anima-
Injury tion and the subjective nature of classifying
running form efficiency, analyzing this data
The effects of injury on running gait would could prove challenging.
have been an interesting area to analyze us- The dataset does include the foot-strike pat-
ing persistent homology. In a prior study, re- tern of each subject. Mid-foot and fore-foot
searchers were able to discern differences in the strike patterns are considered to be more effi-
kinematic gait patterns of healthy and patho- cient than heel-strikes, so it would be interest-
logical runners [7]. These differences occurred ing to explore whether the foot-strike pattern
in the frontal and sagittal plane knee angles affects the persistent homology of a running
(P<0.001), independent of age, height, weight, gait. It is also possible that other researchers
and running speed. The dataset we used does have performed running form efficiency anal-
have some injury data, but in order to under- ysis on this specific open dataset, and those
stand that effect of injury, we would need data results would be helpful if we decided to do
from the subjects before and after injury which this type of analysis.
is not available. If we could find more features
(such as height, weight, etc.) that can be used Competitive vs. Elite
to classify the persistent homology of subjects’
gait patterns then we could compare subjects In a similar vein as running form, the competi-
with similar features and then observe the ef- tion level of a runner would be an interesting
fects of injury between these subjects. variable to explore with persistent homology.
The dataset classifies a subjects’ competition
levels using the designations "Competitive"
and "Elite," but we are unable to determine
Biological Sex Classification
how this classification is made. The dataset
When we started this project, we began our does provide a runner’s preferred race distance
research by reading Lamar-Leon et al [3, 2] and their pace at this distance. It would be in-
who used gait pattern to classify a runner’s teresting for us to independently compare a
biological sex. Like other papers on gait recog- subject’s ability to the abilities of other com-
nition that used persistent homology, this pa- petitive runners, and thus make our own com-
per used computer vision technology to gather petitive classification. We could use this to
data about the subject’s gait. Interested by this compare our gait analysis.
classification technique, we planned to explore
whether these findings in walking gait pattern
translated to running gait. Unfortunately, the
dataset we used only provided data for one
female subject, making it difficult to do any
sort of classification.
6
References
[1] A. Phinyomark, G. Petri, E. Ibáñez-Marcelo, S. T. Osis, and R. Ferber, “Analysis of big data
in gait biomechanics: Current trends and future directions,” Journal of Medical and Biological
Engineering, Jul 2017.
[2] J. Lamar-Leon, R. A. Baryolo, E. Garcia-Reyes, and R. Gonzalez-Diaz, “Gait-based carried

object detection using persistent homology,” in Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications (E. Bayro-Corrochano and E. Hancock, eds.), (Cham), pp. 836–
843, Springer International Publishing, 2014.
[3] J. Lamar-León, R. A. Baryolo, E. B. G. Reyes, and R. González-Díaz, “Persistent-homology-

based gait recognition,” CoRR, vol. abs/1707.06982, 2017.
[4] J. L. Leon, A. Cerri, E. G. Reyes, and R. G. Diaz, “Gait-based gender classification using
persistent homology,” in Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications (J. Ruiz-Shulcloper and G. Sanniti di Baja, eds.), (Berlin, Heidelberg), pp. 366–373,
Springer Berlin Heidelberg, 2013.
[5] R. Fukuchi, C. A. Fukuchi, and M. Duarte, “A public data set of running biomechanics and
the effects of running speed on lower extremity kinematics and kinetics,” 3 2017.
[6] R. Ferber, S. Osis, J. Hicks, and S. Delp, “Gait biomechanics in the era of data science,” Journal
of Biomechanics, vol. 49, 10 2016.
[7] A. Phinyomark, S. Osis, B. Hettinga, and R. Ferber, “Kinematic gait patterns in healthy runners:
A hierarchical cluster analysis,” Journal of Biomechanics, vol. 48, p. 3897–3904, 11 2015.

Running Gait Paper

Uploaded by

Copyright:

Available Formats

You might also like

Running Gait Paper

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Running Gait Paper

Uploaded by

Copyright:

Available Formats

Vietoris-Rips Persistence on Running

August 11, 2019

I. Introduction lected the data, placed various sensors across

i. Set Up ii. Sensors Used

papers that we read on gait pattern used this

Critical Points are very important, since when running a

"Fake" Critical Points

There are two major take aways from our

ii. Further Research Running Form

[2] J. Lamar-Leon, R. A. Baryolo, E. Garcia-Reyes, and R. Gonzalez-Diaz, “Gait-based carried

[3] J. Lamar-León, R. A. Baryolo, E. B. G. Reyes, and R. González-Díaz, “Persistent-homology-

You might also like