You are on page 1of 4

Current Directions in Biomedical Engineering 2015; 1:192–195

Floris Ernst* and Philipp Saß

Respiratory motion tracking using Microsoft’s


Kinect v2 camera
Abstract: In image-guided radiotherapy, monitoring and relies on a mathematical model linking the motion on the
compensating for respiratory motion is of high impor- patient’s chest to the motion of the actual target.
tance. We have analysed the possibility to use Microsoft’s It has been shown that the accuracy of these correla-
Kinect v2 sensor as a low-cost tracking camera. In our ex- tion algorithms can be improved by incorporating multiple
periment, eleven circular markers were printed onto a Ly- markers [2]. In this work, we demonstrate how consumer
cra shirt and were tracked in the camera’s color image us- hardware (Microsoft’s Kinect v2 depth sensor) can be used
ing cross correlation-based template matching. The 3D po- to accurately track the 3D position of multiple markers us-
sition of the marker was determined using this information ing a special marker shirt.
and the mean distance of all template pixels from the sen-
sor. In an experiment with four volunteers (male and fe-
male) we could demonstrate that real time position track-
ing is possible in 3D. By averaging over the depth values
2 Methods and materials
inside the template, it was possible to increase the Kinect’s
To acquire respiratory motion traces, a special marker
depth resolution from 1 mm to 0.1 mm. The noise level was
shirt has been developed. Eleven marker templates were
reduced to a standard deviation of 0.4 mm. Temperature
printed onto a Lycra shirt, ensuring tight fit on the volun-
sensitivity of the measured depth values was observed for
teers, while position of the markers correspond to areas
about 10-15 minutes after system start.
relevant for the measurement. Each marker consists of a
Keywords: radiotherapy; motion compensation; respira- black circle surrounded by a black ring. Details and num-
tory tracking; template matching bering of the markers are shown in Figure 1.
Tracking the position of the markers is done using Mi-
DOI: 10.1515/CDBME-2015-0048 crosoft’s Kinect v2 camera (see Figure 2) and the corre-
sponding software development kit (SDK) [5]. The cam-
1 Introduction era is able to simultaneously capture three different types
of images at a frame rate of up to 30 Hz: a color image
In many clinical applications, detecting and tracking of (1920×1280 pixels), an infrared-illuminated grayscale im-
respiratory motion is required. As an example, image- age (512 × 424 pixels), and a depth image (512 × 424 pix-
guided radiotherapy (IGRT) of the chest and abdomen els, depth resolution of 1 mm). Typical images are shown
relies heavily on this principle: some kind of marker is in Figure 3. Details about the technology behind the sen-
placed on or attached to the patient’s chest and is mon- sor is given in [4].
itored using a non-invasive localisation device. Subse-
quently, trajectory of the marker is then analysed and used
to either dynamically activate the treatment beam (called
gating [3]) or to guide the radiation source [6]. Especially in 1 2 3

the second scenario, tracking one marker may not be suffi- 4 5


cient: the actual target of the treatment beam – the tumour
6 7
– is typically not observed directly. Although this could be
done (either using continuous X-ray localisation [7] or 3D 8
9 10
ultrasound tracking [1]), the current method in clinical use 11

(a) Photo of the Lycra shirt (b) Schematics and number-


ing of the templates
*Corresponding Author: Floris Ernst: University of Lübeck, Institute
for Robotics and Cognitive Systems, Germany, E-mail:
ernst@rob.uni-luebeck.de Figure 1: Marker shirt for motion tracking
Philipp Saß: University of Lübeck, Institute for Robotics and
Cognitive Systems, Germany
F. Ernst and P. Saß, Respiratory motion tracking using Microsoft’s Kinect v2 camera | 193

in Figure 5 was programmed to follow a sinusoidal mo-


tion (therefore similar to respiratory motion) along the z-
axis while the distance to the patient couch was computed
for each camera frame. Second, the distance to the patient
couch was measured repeatedly for about twelve minutes
to determine the amount of noise and possible drift.

Figure 2: Kinect v2 Sensor. Photograph courtesy of Microsoft Corp. 3 Results


Using these images, and the known intrinsics and ex- Using our multi-threaded implementation in C#, tracking
trinsics of the color- and IR-cameras inside the Kinect sen- eleven markers in the color camera image – using ROIs of
sor, it is possible to determine the 3D position for each twice the size of the marker template – was possible in real
pixel in the depth image. We have developed an applica- time using multi-threading on a MacBook Pro Retina (2.3
tion that allows selecting and tracking up to 15 markers in GHz Core i7, four cores, 16 GiB RAM, SSD). In general, the
real time. The general process is as follows: runtime of one template matching iteration was around
1. During setup, the user is shown a camera image of the 80 ms.
subject and is asked to select the initial position of the
markers and the template to use for tracking.
2. The position of the template in the given regions of in- 3.1 Volunteer study
terest (ROI) is determined
Recording motion traces of the markers worked for all four
3. The distance of the center point of the template found
volunteers (three male, one female), although markers one
is determined using template matching
and three were difficult to track due to stretching of the fab-
4. The matching ROIs are centered around the position
ric. Figure 6 shows the distances measured for all eleven
of the last match
templates. Note the large differences in amplitude between
the individual sensors.
To reduce the noise in the measured depth value and to
The depth motion trace of a second volunteer (subject
increase the depth resolution, the z coordinate of the tem-
four) is shown in Figure 7. Note the much larger ampli-
plate was computed using the average depth of all pixels
tude for markers 5–8 and 11 and the sudden motion around
in the template (20 × 20 pixels per template).
t = 95 s due to the volunteer sneezing. Additionally, the
Template matching is done using cross correlation. It
values from markers one and three (red and blue, respec-
is implemented in C# using a wrapper library (EmguCV)
tively) show that tracking them is difficult due to deforma-
around the OpenCV computer vision library.
tion.
Additionally, the in-image motion of the template was
also evaluated. It is exemplarily shown for one marker
2.1 Volunteer study
(marker eight of subject one) in Figure 8. Here, it is clear
that there is very little motion in the left/right direction,
The Kinect sensor was attached to an industrial robot
as would be expected. In the superior/inferior-direction,
(Adept Viper s850) to allow accurate and stable place-
however, some motion is present (one pixel corresponds
ment. The setup is shown schematically in Figure 5. In a
to approximately 1-1.5 mm in our setup, depending on the
small volunteer study (four participants, one female, three
exact distance from the sensor), albeit not as strong as in
male), we evaluated the possibility of feature tracking. Our
the anterior/posterior-direction.
volunteers were asked to lie down in supine position and
breathe normally for three to four minutes.

3.2 Accuracy measurements


2.2 Accuracy measurements Using the same setup as described before, we determined
the absolute accuracy of the depth measurements. The tra-
Finally, the stability and accuracy of the Kinect sensor was jectory of the robot – overlaid with the measured distance
evaluated in another experiment. First, the robot as shown to the template – is given in Figure 9. Clearly, the distance
194 | F. Ernst and P. Saß, Respiratory motion tracking using Microsoft’s Kinect v2 camera

Figure 3: Typical frames acquired with the Kinect v2 sensor. (A): color image, (B): IR illuminated scene, (C): depth image, (D): overlay of color
and depth images.

670 marker 1
marker 2

distance sensor to template [mm]


660 marker 3
marker 4
650 marker 5
marker 6
marker 7
640 marker 8
marker 9
(A) (B) (C) (D) 630 marker 10
marker 11
620
Figure 4: Process of region-of-interest-based template matching.
610
(A) – template found inside ROI (black). (B) – template moved. (C) –
0 50 100 150 200
template found in old ROI. (D) – Old ROI (gray) and new ROI (black). time [s]

Figure 6: Anterior/posterior motion traces of the markers for sub-


measured by the Kinect sensor deviates substantially from ject one (female).
the true motion of the robot, maximum is 3.7 mm and the
750 marker 1
root mean square error (RMSE) is 2.0 mm with a working 740
marker 2
marker 3
distance sensor to template [mm]

marker 4
distance on the order of 50 cm. 730 marker 5
720 marker 6
The results of the static measurement evaluation are 710
marker 7
marker 8
marker 9
shown in Figure 10. The measurement was taken directly 700
marker 10
690 marker 11
after turning on the Kinect sensor and some kind of time- 680

dependent drift is visible. We believe that this is due to the 670


660
changing temperature of the sensor PCB. The depth value 650
0 20 40 60 80 100 120 140 160 180 200
is determined – as outlined above – from averaging all pix- time [s]

els in the template, resulting in sub-millimeter resolution. Figure 7: Anterior/posterior motion traces of the markers for subject
The noise level, however, is still considerable: we observe four (male). Note the sudden peak around 95 s, which is due to the
a standard deviation of 0.4 mm. volunteer sneezing and the low quality of markers one and three.

real time using standard hardware. Additionally, by aver-


4 Discussion aging the depth values inside the marker template, it is
possible to substantially reduce the measurement noise to
We have demonstrated that the Kinect v2 sensor’s data a standard deviation of 0.4 mm. On the other hand, how-
streams – color image and depth image – can be used to ever, we observed that the depth values measured using
track multiple markers on the human chest in 3D and in the robotic setup and the sinusoidal motion pattern devi-
ate strongly from the actual data: the motion amplitude
Kinect sensor of the sine was 20 mm and the amplitude of the template
X matching was more than 25 mm – 25 % more. We believe
Y
that this is caused by multiple factors:
Z
1. Inaccurate alignment of the depth axis of the Kinect
control PC
sensor with the robot’s z-axis and the template center
patient 2. Errors in sensor’s calibration (the Kinect sensor stores
couch its intrinsics and extrinsics in firmware and we did not
robot
perform camera calibration)

Figure 5: Schematic setup of the experiment.


F. Ernst and P. Saß, Respiratory motion tracking using Microsoft’s Kinect v2 camera | 195

superior/inferior 689
left/right

distance sensorr to template [mm]


position/change/in/color/image/[px]

688.5
4
688

3 687.5

687
2
686.5

1 686

685.5
0
0 100 200 300 400 500 600 700 800 900 1000 685
frame/number 0 100 200 300 400 500 600 700
time [s]

Figure 8: Inferior/superior and left/right motion traces for marker


Figure 10: Measurement noise from a static target, recorded over
eight of subject one.
twelve minutes (blue) and running average (red).

As next steps, we plan to perform sub-pixel template


matching to increase the resolution along the L/R- and References
S/I-axes and to further analyze the accuracy of the setup
by tracking the marker with a dedicated tracking device [1] O. Blanck, P. Jauer, F. Ernst, R. Bruder, and A. Schweikard. Pilot-
Phantomtest zur ultraschall-geführten robotergestützten Ra-
(like NDI’s Polaris Spectra system). Also the operating
diochirurgie. In H. Treuer, editor, 44. Jahrestagung der DGMP,
speed of the system (now about 15 fps) could be increased Cologne, Germany, 2013. DGMP, pages 122–123.
due to massive code parallelization, so that every frame [2] R. Dürichen, M. A. F. Pimentel, L. Clifton, A. Schweikard,
from the Kinect v2 is used. We need to make sure, however, and D. A. Clifton. Multi-task gaussian processes for mul-
that the light emitted by the Kinect v2 sensor does not in- tivariate physiological time-series analysis. IEEE Trans-
actions on Biomedical Engineering, 62(1):314–322, 2014.
terfere with the IR light used by the Spectra system. Both
10.1109/TBME.2014.2351376.
operate in the near-infrared range around 850 to 860 nm.
[3] J. Hanley, M. M. Debois, D. Mah, G. S. Mageras, A. Raben,
K. Rosenzweig, B. Mychalczak, L. H. Schwartz, P. J. Gloeggler,
Author’s Statement W. Lutz, C. C. Ling, S. A. Leibel, Z. Fuks, and G. J. Kutcher. Deep
Conflict of interest: Authors state no conflict of interest. inspiration breath-hold technique for lung tumors: the potential
Material and Methods: Informed consent: Informed con- value of target immobilization and reduced lung density in dose
escalation. International Journal of Radiation Oncology, Biology,
sent has been obtained from all individuals included in
Physics, 45(3):603–611, 1999. 10.1016/s0360-3016(99)00154-
this study. Ethical approval: The research related to hu- 6.
man use has been complied with all the relevant national [4] D. Lau. The science behind Kinects or Kinect 1.0 versus 2.0.
regulations, institutional policies and in accordance the http://www.gamasutra.com/blogs/DanielLau/20131127/
tenets of the Helsinki Declaration, and has been approved 205820/The_Science_Behind_Kinects_or_Kinect_10_versus_
20.php, November, 2013. Online, last visited 2015-03-24.
by the authors’ institutional review board or equivalent
[5] Microsoft Corporation. Kinect for Windows SDK 2.0. http:
committee.
//www.microsoft.com/en-us/download/details.aspx?id=44561,
October, 2014. Online, last visited 2015-03-24.
690
template distance [6] A. Schweikard, H. Shiomi, and J. R. Adler, Jr. Respiration track-
robot motion
685 ing in radiosurgery. Medical Physics, 31(10):2738–2741, 2004.
10.1118/1.1774132.
680
[7] H. Shirato, S. Shimizu, K. Kitamura, T. Nishioka, K. Kagei,
position [mm]

675 S. Hashimoto, H. Aoyama, T. Kunieda, N. Shinohara, H. Dosaka-


Akita, and K. Miyasaka. Four-dimensional treatment planning
670
and fluoroscopic real-time tumor tracking radiotherapy for mov-
665 ing tumor. International Journal of Radiation Oncology, Biology,
Physics, 48(2):435–442, 2000. 10.1016/s0360-3016(00)00625-
660
250 300 350 400 450 500
frame number 8.

Figure 9: Sinusoidal motion trace performed by the robot (red) and


as measured from template matching and averaging the depth val-
ues (blue).

You might also like