Human-Machine Collaborative Systems for Microsurgical Applications

A. M. Okamura, D. Kragic, P. Marayong, M. Li, and G. D. Hager Engineering Research Center for Computer-Integrated Surgical Systems and Technology The Johns Hopkins University Baltimore, MD 21218 aokamura@jhu.edu, hager@cs.jhu.edu

Abstract— We describe our current progress in developing Human-Machine Collaborative Systems (HMCSs) for vitreoretinal eye surgery. Three specific problems considered here are (1) the development of systems tools for describing and implementing an HCMS, (2) segmentation of complex tasks into logical components given sensor traces of a human performing the task, and (3) measuring HCMS performance. Our goal is to integrate these into a full microsurgical workstation with the ability to automatically “parse” traces of user execution into a task model which is then load into the execution environment providing the user with assistance using online recognition of task state. The major contributions of our work to date include an XML task graph modeling framework and execution engine, an algorithm for realtime segmentation of user actions using continuous Hidden Markov Models, and validation techniques for analyzing the performance of HMCSs.

Fig. 1.

Structure of a Human-Machine Collaborative System.

I. I NTRODUCTION The goal of the Human-Machine Collaborative Systems (HMCS) project is to investigate human-machine cooperative execution of small scale, tool-based manipulation activities. Our work on HMCS is specifically aimed at microsurgery [8], [14] and, more recently, cell manipulation, but the basic principles apply to many other fine scale tasks such as opto-electronic assembly and assembling of LIGA parts [3]. The motivation for collaborative systems is based on evidence [8], [17] suggesting that humans operating in collaboration with robotic mechanisms can take advantage of robotic speed and precision, but avoid the difficulties of full autonomy by retaining the human component “inthe-loop” for essential decision making and/or physical guidance [8]. Our approach to HMCS focuses on three inter-related problems: (1) Synthesis: the development of systems tools necessary for describing implementing an HMCS; (2) Modeling: given sensor traces of a human performing a task, segmenting those traces into logical task components and/or measuring the compatibility of a given HMCS structure to that sequence of components; and (3) Validation: measuring HMCS performance. Figure 1 depicts the high-level structure of the humanmachine collaborative systems we are developing. Logically, there are three components: the human, the augmentation system, and an observer. We assume that a user primarily manipulates the environment using the augmentation system, although unaided manipulation may take place

in some settings (dashed line). The user is able to visually observe the tool and the surrounding environment, and directs an augmentation device using force and position commands. The system may also have access to endpoint force data, targeted visual data and/or other applicationdependent sensors (e.g., pre-operative or intra-operative imaging). The role of the observer is to assess available sensor data (including haptic feedback from the user) and initiate, modify, or terminate various forms of assistance. Optional direct interaction between the observer and the user may also be used to convey information or otherwise synchronize their interaction. The basic notion of HMCS is clearly related to traditional teleoperation, although the goal in HMCS is not to “remotize” the operator [7] but rather to provide appropriate levels of operator assistance depending on context. At one extreme, shared control [5] can be viewed as an HMCS for manipulation tasks in which some degrees of freedom are controlled by machine and others by the human. At the other extreme, supervisory control [16] gives a more discrete, high-level notion of human-machine interaction. Our notion of HMCS essentially incorporates both views, combining them with broader questions of modeling manipulation activities consisting of multiple steps and varying level of assistance, and validating those models against human performance data. In the remainder of this paper, we review our results to date in the areas of synthesis, modeling, and validation. For peer-reviewed versions of this work, see [1], [2]

The robot is guided by applying forces and torques f ¾ R 6 on the manipulator handle. By choosing k. In particular. In the case kτ 1 we have an isotropic admittance. if n is 2 the preferred directions span a surface. In what follows. all expressed in the robot base frame. We have investigated two general forms of motion assistance with the JHU SHR: force scaling [15] and motion guidance using virtual fixtures [10]. we control the overall admittance of the system. we refer to the case of kτ 0 as a hard virtual fixture. so the user will never experience this discontinuity. As projection is invariant to scale. f. 2. The development to this point directly supports motion in a subspace. The robot is ergonomically appropriate for minimally invasive microsurgical tasks and provides micron-scale accuracy [17]. likewise expressed in robot base coordinates. the final control law is in the general form of an admittance control with a time-varying gain matrix determined by D´t µ. the JHU Steady Hand robot is a 7 DOF robot equipped with a force sensing handle at the endpoint. Briefly. representing the instantaneous preferred directions of motion. One minor issue here is that the division by f is undefined when no user force is present. and so forth. II. the preferred direction is along a curve in SE(3). Choosing k τ low imposes the additional constraint that the robot is stiffer in the nonpreferred directions of motion. If u f ´x Sµ is the signed distance of the tool tip from a surface S. Thus. we can define a new preferred direction as Dc ´xµ ´1   kd µ D f f · kd Du 0 kd 1 (4) combining the two vectors that encode motion in the preferred direction and correcting the tool tip back to S. we focus on the latter. there is no virtual fixture because there is no defined preferred direction. we arrive at an admittance control of the form v k ´f D · k τ f τ µ k´ D · kτ D µf (3) Thus. As noted above. f D D f and introducing a new admittance ratio k τ ¾ 0 1 that attenuates the non-preferred component of the force input. We define two projection operators. A. Experimental setup using the Johns Hopkins University Steady Hand Robot (right). as Span´Dµ and Ker´Dµ D I  D (2) D D´D¼ Dµ· D¼ (1) Fig. into two components. Describing Spatial Motion Constraints “Virtual fixtures” in our implementation provide cooperative control of the manipulator by “stiffening” a handheld guidance mechanism against certain directions of motion or forbidden regions of the workspace. since it is not possible to move in any direction other than the preferred direction. we model the robot as a purely kinematic Cartesian device with tool tip position x ¾ SE ´3µ and a ˙ control input that is endpoint velocity v x ¾ R 6 . if n is 1. Microsurgical tasks occur near the limit of the manipulation capabilities of the human hand-eye system. In this paper. . The robot responds to the applied force. we write (4) as Dc ´xµ ´1   k d µ D f · kd f Du 0 kd 1 (5) noindent and apply (3) with D D c. By decomposing the input force D f and fτ f   fD vector. Figure 2).that we are given a 6 ¢ n time-varying matrix D D´t µ 0 n 6. S YNTHESIS Our current system development efforts are motivated by the domain of retinal microsurgery. We define SHR guidance geometrically by identifying a space of “preferred” directions of motion. the span and the kernel of the column space. Let us assume where · denotes pseudo-inverse for the case where D is (column) rank deficient. We augment the surgeon’s physical abilities using the Johns Hopkins University Steady Hand Robot (JHU SHR. Tools are mounted at the end-effector and “manipulated” by an operator holding a force-sensing handle also attached to the end-effector. we describe our framework for implementing virtual fixtures on the JHU SHR. However. but it does not allow us to define a fixed desired motion trajectory. implementing a means of direct control for the operator. For example. All other cases will be referred to as soft virtual fixtures. The constant kd governs how quickly the tool is moved toward the reference surface. a challenging task is retinal vein cannulation: the introduction of a micro-pipette into a vessel of approximately 100 µm in diameter [18]. Here. Studies on virtual fixtures for teleoperation have indicated that user performance can increase as much as 70% with the fixture based guidance [13]. in practice the resolution of any force sensing device is usually well below the numerical resolution of the underlying computational hardware computing the pseudo-inverse. there is a discontinuity at the origin. One potential problem with this control law is that when the user applies no force.

M ODELING AND S ENSING H UMAN I NTENT One problem that arises in HMCS is that of providing the appropriate assistance to a user. Upon contact. puncturing is performed. continuous HMM recognition. A CCD camera is attached to the robot and views a curve on a task plane. the system must have a model of the task being performed. and inserting it when appropriate until contact is made. models for such procedures can be defined by relatively simple graphs. based on the intent or context of his or her actions. GUI used to generate task graphs.org error . and to attach appropriate types or levels of assistance to each node. We have investigated continuous Hidden Markov Models (HMMs) [12] as a means of modeling tasks from sensor traces of user execution. and a set of transition conditions. 4. III. In this case. To this end. Currently. (2) directly writing an XML file. and a means for relating the actions of the user to that model. after which the needle can be safely withdrawn. at each state the graph executor initiates the appropriate assistance mode and monitors for transition events to subsequent states. In order to do so. each with a clearly defined outcome.1 Model 1 1 Model 2 1 Model 3 2 3 4 5 2 3 4 5 2 3 4 5 Fig. and (3) automating off–line task modeling. A graph example for vein cannulation task. we have developed a set of tools for defining task graphs. it is possible to model tasks using a small vocabulary of primitive “gestemes” with an associated assistance mode. Experimental results with users are presented in Section IV. For example. The center is marked with a cross. there are three ways of generating a task graph: (1) using the graphical interface. Move button2 error error error button1 Start Orient button2 button1 Insert error contact Puncture Hold error error button1 button1 Retract button1 Error Done Fig. Network for real-time. serial. We have also developed a new algorithm for HMM recognition that allows for real-time segmentation of user actions. An implementation of this framework in operation is shown in Figure 2. and nx and ny is the vector from the center to the closest point on the curve. Hence. We now turn to the latter of these methods. and a user attempts to move the robot to follow the curve (similar to tracking a blood vessel in preparation for retinal cannulation). 3. A task graph execution engine reads the XML description and coordinates the execution of the graph. retinal vein cannulation involves positioning and orienting of a needle to the vicinity of the vein. High-Level Task Specification Most microsurgical tasks consist of discrete. A. Task graphs are represented using a specialization of the Extensible Markup Language (XML) 1. quasi-static steps.xml. then we have u ´n x ny 0 0 0 0µ¼ and D ´tx ty 0 0 0 0µ¼ More details and extensions of this basic task can be found in [11]. The view is processed using the XVision tracking system [6] to determine the tangent to the reference path at a point closest to the center of the image in real time. Thus. 5. We have tested our hypothesis using data acquired from task execution with the JHU SHR and modeled using The 1 www. A task graph modeling system allows a user to interactively design a graph for the task to be performed and save the graph as an XML file. it must be possible to create models of procedures. if t x and ty are the components of the tangent to the curve in the image plane. The graph itself references a library of basic primitives for providing user assistance at each node. for restricted domains such as microsurgery. Offline Task Modeling The basic hypothesis of offline task modeling is that. In order to provide appropriate assistance. B. as shown in Figures 3 and 4. Fig.

we have developed an online continuous HMM recognition algorithm [9]. position. Five users participated in the experiment and each user performed ten runs. and the resulting decrease in recognition performance was only 1. fτ and f ¡ δ were the elements of the observation vector used to train the HMMs. VALIDATION During task execution. The structure of each HMM is again SLR and each model operates in parallel. The input data acquired from the robot consisted of seven variables: Cartesian forces and torques expressed in robot-end-effector coordinates. paint and remove.observing vector o 1 to ot and being in state j at time t. Task descriptions for the human performance experiments (right): (a) path following.cam. The real-time HMM recognition output was given every 33ms. (b) off-path targeting. we found that we could use gestemes trained for the paint task in the peg-in-hole task. five gestemes were used: place. but avoid a portion of the path covered by a circle. We tested the online recognition algorithm with the JHU SHR in a planar environment. Here. although preliminary. where the task was to follow a sinusoidal path. withdraw and remove.ac. and (2) a paint task. The accuracy of recognition and the sensitivity of our algorithm to training in differing environments was examined by an experiment where: (1) The same sine curve was used for both training and recognition. In addition. (2) follow curve (user is following the curve). we demonstrate that both speed and accuracy can be improved simultaneously with the aid of virtual fixture guidance.08%. insert. One model is chosen to describe the user motion at any given time. speed and accuracy are generally used as metrics to evaluate performance. suggest that for suitably constrained situations. we define the sum of the likelihood of all states as the total likelihood at time t as L´o 1 o2 ot λµ N φ j λ ´t µ. which are analogous to retinal vessel cannulation and retinal-membrane peeling. B. φ1 λ ´t µ ∑M m 1 φN m ´t   1µ M for 1 λ M (7) Hidden Markov Model Toolkit 2 (HTK). The obtained segmentation accuracy of the system over all users was around 84% for peg-in-hole task and 90% for paint task. and the magnitude of translation between the last reading and the current one. we included three models/gestemes (HMMs) of user actions: (1) silence (user is doing nothing). the average accuracy of real-time continuous recognition among all subjects was larger than 90% in both cases. was decomposed into the directions parallel to reference curve δ and the normal direction τ. The partial likelihood of each model can be computed as φ j λ ´t µ where φ1 λ ´1µ φ j λ ´1µ 1 j N 1 can 1 0 i 1 ∑ φi λ ´t   1µai j N b j ´ot µ (6) The likelihood of state 1 of a model λ in time t be defined as Fig. f . There is typically a trade off between these two metrics. Online Recognition and Assistance Given a task model. the four gestemes were: place. For the paint task. where the model with the highest total likeli∑j 1 hood represents the current hypothesis. f δ . The data used was force in the x and y directions. and (c) avoidance. . and (2) Different sine curves were used for training and recognition. We assign a linearly sequential left-to-right (SLR) structure to the HMMs of several gestemes. as described by Fitt’s Law [4]. To do this. respectively. Each gesteme was trained independently on a training set.uk For each model. The force. the second question is whether it is possible to determine the appropriate task state online. In this section. The black line denotes the virtual fixture reference path. The experimental setup in both cases makes use of the curve following virtual fixture described in Section II-A. Two tasks were investigated: (1) a pegin-hole task. then gestemes were run in parallel to segment a test set (Figure 5). Suppose we have M potential models and each model has N states. and (3) avoid curve (user is not following the curve). and the path to be followed by the user is dark gray. it is plausible that complex tasks can be modeled using a small set of manipulation primitives. 6. IV. In our experiment. For a given model λ. For the peg-in-hole task. let φ j λ ´t µ represent the likelihood of 2 http://htk.eng. position. These results. We believe that this improvement in recognition over the offline case is due to the additional context provided by the virtual fixtures.

Admittance Ratio (kτ ) 1 2 3 4 5 6 7 8 9 10 0 0.710 1. we recorded the time each subject needed to get back on the curve after leaving it.39mm thick sine curve with a 35mm amplitude and 70mm wavelength. a decrease in admittance ratio does not improve execution time significantly.1. a decrease in admittance ratio reduces error. However.357 19.042 24.745 5.879 20.121 0. For Experiment II. In general.3 0. 0.954 TABLE I Average 0.353 14.229 1. For the avoidance task. we recorded the time needed to avoid the circle. we added a circle of radius 10mm with its center located at the midpoint of the sine curve.6 1 0. Experiment II included eight subjects performing each task three times with four discrete admittance ratios corresponding to four guidance levels (0=complete.256 2.020 0.030 0. as shown in Figure 7. the data indicate that improvements in error and time have linear relationships with admittance ratio. 7. . since the output velocity of the SHR is linearly proportional to admittance ratio. the execution times for no guidance and soft guidance do not differ significantly. Detailed results can be found in [10]. .959 9.425 2.078 0. The reference path was a 0.3 0. However. For Experiment II. and 1=none).629 11. ROWS 5-7 TARGETING . For the targeting task. We note that none of the subjects performed worse in both time and error when admittance ratio decreased.681 5. and told to move along the path as quickly as possible without sacrificing accuracy.405 9.6 1 Guidance Complete Medium Soft None Medium Soft None Medium Soft None Time (Seconds) Average Standard Deviation 18. except between medium and complete guidance. reducing guidance reduces the time and error during target acquisition. For Experiment I. For the avoidance task.702 - Error (Pixels) Standard Deviation 0. For the off-path targeting task.754 3. At the run time.013 0. The analysis also shows no difference in error for all guidance levels. For the data obtained in Experiment II. considering accuracy and speed with equal emphasis. Fig. The error represents the deviation from the reference path.910 0.PATH E XPERIMENT II: E XPERIMENTAL RESULTS FOR EIGHT SUBJECTS : ROWS 1-4 FOR PATH FOLLOWING . Data are shown in Table I. the average execution time and error were used to determine the improvement in performance with different guidance levels. This is to be expected. Complete guidance resulted with the best performance but more guidance resulted in shorter time and/or higher accuracy compared to no guidance.456 0.148 3. AND 8-10 FOR AVOIDANCE . we performed ANOVA and multiple pair-wise comparisons using Tukey’s method to determine significant differences. 0. ). Average normalized time and error versus admittance ratio for the path following task.180 0. . stronger guidance slowed task execution. The subjects were provided with instructions as shown in Figure 6.647 1. 0.6 1 0. Experiment I included five subjects performing the path following task three times with eleven admittance ratios from 0 to 1 (0. the time and error during motion from Point A to Point B in Figure 6 were recorded.871 4.664 7.3=medium. only the .3 0. and no error was measured.256 0.6=soft.061 0.178 6.317 2.437 FOR OFF .087 0. For the path following task. A. Virtual Fixtures We present two experiments that examine the effects of virtual fixture guidance ranging from complete guidance (admittance ratio = 0) to no guidance (admittance ratio = 1).

such as object avoidance and off-path targeting. Fig. Kragic.3 was applied. “Fusion of Human and Machine Intelligence for Telerobotic Systems”. P.) [2] D. “Parallel Assembly of High Aspect Ratio Microstructures”. B. as well as techniques for automatic recognition of user activities. Online Assistance In this experiment. These components can be integrated through a task graph and execution engine specifically designed for serial procedures such as those encountered in microsurgery.2 0 Follow curve Average error HMM+VF VF NG kτ = 0. strong guidance will reduce accuracy and increase execution time. R. Guo. load the task model into the execution environment. Computer Vision and Image Understanding. V. The error (deviation from the curve) and time were again used to validate the performance. Christenson. Okamura. We are currently in the process of integrating these components into a full micro-surgical workstation for vitreoretinal eye surgery. 8.2 1 0. M. and path following vs. Hager and K. Journal of Experimental Psychology. we expect to shortly be able to automatically “parse” traces of user execution into a task model. online admittance tuning is recommended to achieve the full benefit of virtual fixture guidance. a lower amount of guidance is recommended. 1998. Hager. The results are presented in Figure 8 showing the good performance of the combined algorithm (HMM+VF).” 2003 International Symposium on Robotics Research. 1954. and (3) NG: the virtual fixture was removed (no guidance. 8 subjects were asked to follow the sine curve (model follow curve) and the edge of the circle (model avoid curve) in three different modes: (1) HMM+VF where real-time recognition was used to determine when to apply or remove the virtual fixture. For surgical applications. [4] P. IIS-0099770. under Grants No. J. and NG indicates no guidance (kτ = 1). A. This material in based upon work supported by the National Science Foundation. M. 47:381–391. With this system. and G. Kragic. The results indicate that the selection of virtual fixture admittance is a task-dependent process. For tasks that require significant interaction from the user. avoid curve section of the task. vol. Toyama. and ITR-0205318.6 0. Feddema and T.” Submitted to the International Journal of Robotics Research. execution time was considered. ACKNOWLEDGMENTS The authors thank Russell Taylor and Blake Hannaford for their insight related to this research. We have successfully developed a control method for implementing guidance virtual fixtures with an admittance controlled cooperative robot. M. time. M. T. Based on the linear relationship between admittance and performance found in Experiment I. T. The results indicate that less guidance reduces the execution time.1. (2) VF: a virtual fixture with constant admittance ratio .3110–3115. “Human-Machine Collaborative Systems for Microsurgical Applications. VF indicates a constant virtual fixture with admittance ratio kτ = 0. 69(1):21-27. and total task. D. Marayong. Therefore. To create a “gold standard” for the correct segmentation. and thereby provide the user with assistance using online recognition of task state. The HMM+VF technique provides the user with assistance when he/she wants to follow the curve. pp. k τ = 1). and removes the assistance otherwise. International Conference on Robotics and Automation. [3] J. and G. “The information capacity of the human motor system in controlling the amplitude of movement”. Bejczy. Proc. Marayong. Li.4 1. A. However. HMM+VF indicates that the virtual fixture was switched off by the HMM when it was detected that the user intended to move away from the curve. we developed admittance ratio selection parameters that can be used to determine an appropriate guidance level [10]. “The XVision system: A general purpose substrate for real-time vision applications”.3834. D. The system will be tested using ex-vivo and in-vivo experiments involving evaluation of system performance during operation by surgeons. Li. (In press. Tarn and A. P.3. Okamura. SPIE Microrobotics and Microassembly. C ONCLUSIONS AND F UTURE W ORK We have presented the structure of our HMCS project and current theoretical and experimental results to date. Avoid curve Time Total task 50 40 30 20 10 0 HMM+VF VF NG Follow curve Avoid curve Total task Mean and standard deviation of average error and time for the follow curve section of the task. we found that an admittance ratio of approximately 0. [5] C.4 0. path avoidance. [6] G.8 0. R EFERENCES [1] D. error reduction is more important than time required to perform the task. D. “Human Machine Collaborative Systems for Microsurgical Applications. EEC-9731478. 1999. Hager. Fitt.6 is optimal. M. Using equal weighting for error vs. We are also exploring HMCS implementations for teleoperated systems. 1995. the subjects were asked to press a space bar when they intended to transition from one model to another.

Eng. L. Wang. Rabiner. 2002. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. Kavoussi. Heindl. Z. J. G.1089–1095. Whitcomb. Whitcomb. H. of Mech. Stoianovici. P. “Recognition of Operator Motions for Real-Time Assistance Using Virtual Fixtures”.808–812. B. 108(12):2249–2257. [17] R. IEEE Trans. 1991. Hirzinger. PhD Thesis. 2003. [18] J.Head and Neck Surgery. Symp. Whitcomb. IEEE/RSJ International Conference on Intelligent Robots and Systems. Robot-assisted stapedotomy: Micropick fenestration of the stapes footplate. on Robotics and Automation. C. “A Steady-Hand Robotic Syatem for Microsurgical Augmentation”. A. Grunwald. Jensen. H. P. Stanford University. Rothbaum. Okamura. D. T. Kumar. M. L. E. Okamura. G. D. Bettini and A. “Injection of tissue plasminogen activator into a branch retinal vein in eyes with central retinal vein occlusion”. Li. 11th International Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. Auer and R. M. Sheridan. Nov. Medical Image Computing and Computer-Assisted Interventions. A. Opthalmology. L. [13] L. International Conference on Robotics and Automation. 18:1201–1210. R. Niparko. IEEE Virtual Reality. Stoianovics. deJuan and L. Proceedings of the IEEE. 1986. R. Human Supervisory Control of Robot Systems. Okamura. Li and A.1999. Berkelman. L. Gupta. N. A. Otolaryngology . 1999. Rosenberg. on Experimental Robotics.125–131. 2002.Int. Francis. A. [9] M. Dept. [15] J. 77(2):257–286. Taylor. [16] T. “Effect of Virtual Fixture Compliance on Human-Machine Cooperative Manipulation”. Jensen. Marayong. H. IJRR. Kumar.” 2003 IEEE International Conference on Robotics and Automation.. Marayong. Barnes. L. M. Whitcomb. Stoianovici. “Adaptive Force Control of Position/Velocity Controlled Robots: Theory and Experiment”. . [10] P. Roy and L. M. and J. 2001. Weiss. L. pp. and G. pp. 1989. 2002. pp. Barnes. “Spatial Motion Constraints: Theory and Demonstrations for Robot Guidance using Virtual Fixtures. Roy. [14] D. Taylor. (in press) [12] L. 2. P. pages 417-426. D. B. Hager. [11] P. [8] R. “Performance of robotic augmentation in common dextrous surgical motions”. P.[7] G. S. M. Taylor. 18(2):121-137. Hager. Goradia. M. “A Sensorbased Telerobotic System for the Space Robot Experiment ROTEX”. D. and Brunner and H. 1994. Virtual Fixtures. L.