Professional Documents
Culture Documents
Weng Fei Low, Zhi Gao, Cheng Xiang, and Bharath Ramesh*
The N.1 Institute for Health, National University of Singapore
Email: bharath.ramesh03@u.nus.edu
Abstract
1
deal with the event camera output is to implicitly choose old pixel thresholds [2] further limits the maximum speed
a time-period by buffering a certain number of past events of dynamics.
[25, 14, 27]. However, choosing an ideal number of events In contrast to the above plane fitting methods, [6] pre-
to buffer while retaining critical image information is in- sented a variant of Lucas-Kanade flow estimation for event
tricately dependent on the apparent object motion, scene cameras. More recent approaches for optical flow estima-
complexity and spatial resolution of the sensor. Fig. 1 il- tion from event-based sensors include conventional block
lustrates the drawback of these two ways of processing the matching on adaptive time-sampled event stream images
event camera output. [20]. Reference [3] uses a variational approach that simulta-
In addition to the above ways of handling the event neously estimates optical flow and image intensity as tem-
camera output, a third class of existing event-based works poral contrast is generated due to a combination of both.
[23, 7] aim to render conventional video streams for tra- Note that these methods are computationally intensive and
ditional computer vision processing, albeit under challeng- have limited applications under low-power settings or real-
ing motion conditions where normal cameras fail to capture time scenarios. In this regard, some works [17, 8] have ad-
crisp images. These methods unsurprisingly revert to one vocated the use of spiking neural networks to process event-
of the main drawbacks of frame-based processing, i.e., high based data. Nonetheless, this area of research remains fairly
power consumption [4]. unexplored and also requires dedicated hardware platforms.
In contrast to the above works, we explicitly model Compared to the above works, our flow estimation
the set of active events in the spirit of [5] and com- approach neither depends on a temporal window (peri-
pute continuous-time flow information. Previous works odic/adaptive) nor is iterative [2, 21, 5]. Instead, we effi-
[5, 21, 2] employ iterative routines to estimate the visual ciently select the optimal neighbours of the incoming event
flow, which are relatively slow and resource intensive in and rely on a simple assumption that it lies on the local pla-
both software and hardware implementations for typically nar SAE for a single-shot flow estimation. Therefore, the
high event rates from the DVS. Thus, the motivation of our proposed approach has potential for performing under low
work is to devise a robust single-shot optical flow estima- power settings in a small application specific IC, or to run
tion algorithm (SOFEA) that allows for a broader range of real-time in software, or in embedded systems.
downstream applications in a power-efficient manner. To The remainder of the paper is organized as follows. The
this end, the two main theoretical contributions of this pa- basic description of SAE and optical flow using plane fitting
per are: is given in Section 3. Next, Section 4 presents our proposed
• An efficient greedy algorithm to optimally select the approach SOFEA for DVS. Section 5 presents the experi-
neighboring events associated to an incoming event for mental results and comparison to existing works. Finally,
accurate non-iterative local Surface of Active Events the paper is concluded in Section 6.
(SAE) plane fitting (Section 4.2).
3. Surface of Active Events and Optical Flow
• A mathematically simpler description of the local pla-
nar SAE, for efficient computation in hardware or soft- An event e can be characterized by a 4-tuple (x, y, t, p),
ware, is enabled by imposing an additional constraint where x, y represents the spatial location or position of the
that the incoming event lies on it (Section 4.3). event in pixel coordinates, t represents the time of occur-
rence of the event and p ∈ {−1, +1} represents the polarity
Our framework estimates the optical flow of each event, and of the event. On top of that, the event stream can be defined
thus, event stream images can be rendered at any given time as a sequence of events (en )n∈N such that i < j =⇒ ti ≤
instant using lifetime estimation, as shown and evaluated tj , where ei = (xi , yi , ti , pi ), ej = (xj , yj , tj , pj ).
against [21] in the experiments section. Disregarding event polarity, each event carries informa-
tion in the three-dimensional xyt spacetime domain. Con-
2. Related Work
sequently, the sequence of most recent events over the en-
The local plane fit approach is a widely adopted method tire spatial domain, which is a subset of the event stream
[5] for event-based optical flow estimation. The single plane (en )n∈N , can be simply described by a spatio-temporal sur-
fit described in [29] is heavily prone to noise since it consid- face Σe . The spatio-temporal surface Σe is better known as
ers all neighbouring events with the same polarity. There- the Surface of Active Events (SAE), as coined in [5]. The
fore, to discard outlying data points, performing plane fit- mathematical definition of the SAE is given by:
ting iteratively is essential to improve accuracy [2, 21, 5].
The recent work [2] describes a hardware implementation Σ e : R2 → R
(1)
by placing a 3 × 3 spatial constraint on the size of the neigh- p 7→ Σe (p) = t
borhood used for plane fitting, which limits the goodness- ⊤
of-fit. Additionally, the incorporation of noise filtering and where p = x y .
An interesting property of the SAE is that its spatial
h i⊤
e (p) ∂Σe (p)
gradient ∇Σe (p) = ∂Σ∂x ∂y
encodes informa-
tion of optical flow associated to the active events. For
a local planar SAE, ∇Σe (p) can be further defined as
a ⊤
− c − cb . For instance, suppose the stripe in Figure
e (p)
2 was moving at half the speed, the magnitude of ∂Σ∂x
e (p)
would be twice as large, while ∂Σ∂y remains approxi-
mately zero. The mathematical relation of the normal flow
with respect to ∇Σe (p) of a local planar SAE is given as
such [29]:
1 c a
v⊥ = ∇Σ̂e (p) = − 2 (5)
k∇Σe (p)k a +b b2
∇Σe (p)
where ∇Σ̂e (p) = k∇Σ e (p)k
is the unit vector of ∇Σe (p).
With the above analysis of ∇Σe (p), the local planar ap-
Figure 2: Spatio-temporal points on the Surface of Active proximation of the SAE, as made in Equation 2, also implies
Events (SAE) of a stripe translating in the negative x-axis an assumption of uniform normal flow across a local spatial
direction. At a given time instant, the SAE can be used to neighborhood. This method of estimating the normal flow
generate the event stream image (grey plane), which is a of events through the spatial gradient of its corresponding
binary representation of currently active pixels. local planar SAE, as devised in [5], is widely known as the
Local Plane Fit method [29].
Note that Σe is a single-valued function of the event po- 4. Single-shot Optical Flow Estimation Algo-
sition p. In other words, each point on the spatio-temporal rithm (SOFEA) for DVS
surface indicates the occurrence or activation of an event
at the particular position and time instant, given by its In this section, we will introduce the Single-shot Opti-
⊤
cal Flow Estimation Algorithm (SOFEA) for DVS in de-
x y t coordinates. Thus, despite the asynchronous
nature of the event stream, the exact position of all active or tail. The motivation behind the algorithm is to devise a non-
occurring events for any given time instant t can be easily iterative and robust optical flow estimation method for event
determined from the corresponding time slice of the SAE, cameras that enables efficient hardware realization. This al-
which we denote as an Event Stream Image. Figure 2 pro- lows for a broader range of downstream applications in a
vides an illustration of a SAE and how an Event Stream power-efficient manner. Figure 3 shows the framework of
Image can be acquired from the corresponding SAE. SOFEA, which is based on the Local Plane Fit method de-
A simple expression for Σe can be obtained by approxi- scribed in Section 3.
mating the SAE as a plane in a local spatial neighborhood. The algorithm is executed for each incoming event from
In other words, the SAE is locally approximated by the first- the event stream, with the current event being processed de-
order Taylor series expansion of Σe as follows: noted as the Event-of-Interest (EOI) eEOI . First, the EOI
passes through a refractory filter, which contributes to a
Σe (p + ∆p) ≈ Σe (p) + ∇Σe (p)⊤ ∆p (2) better conditioned SAE for subsequent processing. Then,
we greedily select past spatially neighboring events that are
With the approximation above, the local planar SAE can (believed to be) associated to the EOI so that a local SAE
be sufficiently described by a set of plane parameters Π = plane fitting can be performed in a fast and non-iterative
⊤
a b c d as follows: manner, while maintaining robustness of the plane fit. A
⊤ final noise rejection filter, based on the goodness-of-fit, is
p employed before the normal flow estimate for the EOI is
t Π = 0 (3) released to downstream applications. In the following sub-
1 sections, we will address the details for each component of
the framework given in Figure 3.
From the equation above, a closed form expression for Σe
can also be derived as such: 4.1. Refractory Filter
p The refractory filter employed in SOFEA is closely re-
Σe (p) = − ac − cb − dc
(4) lated to the refractory period following a neuron action po-
1
including the EOI, for fitting the local planar SAE. This ap-
Event Stream
proach is also similarly adopted in [2, 21] so that no addi-
EOI tional latency is introduced and no limits (both lower and
upper bound) are implicitly imposed onto the speed of dy-
namics. With that, it is not a trivial task to distinguish spa-
Refractory Filtering tially neighboring events of the EOI that truly belong to the
local planar SAE, as some of them belong to that of the past
Figure 3: Framework of Single-shot Optical Flow Estima- 1. Events in N ∗ have a common polarity with the EOI.
tion Algorithm (SOFEA) for DVS. 2. Events in N ∗ are located in the L × L spatial neigh-
borhood about the EOI.
3. The EOI is not an event in N ∗ .
tential [12]. As applied in [2, 25, 29], refractory filtering
is performed by suppressing incoming events within a pe- 4. Each event in N ∗ is a 8-neighbor of the EOI or other
riod of time after a past event has occurred at the same events in N ∗ .
position. The period of time mentioned is known as the 5. The maximum timestamp difference between the EOI
refractory period Trf . Mathematically speaking, an event and events in N ∗ is minimized. In other words, the
ei will be filtered out if there exists an event ej such that minimum timestamp of events in N ∗ is maximized,
xi = xj , yi = yj , ti > tj , ti < tj + Trf . since the EOI has the largest timestamp.
The incorporation of a refractory filter contributes to a 6. Suppose Criterion 1 to 5 yield non-unique solutions,
better conditioned SAE for plane fitting [29]. Apart from N ∗ is then taken as one of them that gives the maxi-
achieving noise reduction, the refractory filter also induces mum sum of timestamps.
invariance of the SAE to the scene dynamic range, to some
extent, as it suppresses the burst of events following the first With the optimality criteria above, it can be proven by
that occurs at the same position due to a change in contrast. mathematical induction that the following greedy heuristic
However, this limits the maximum event occurrence rate at yields a locally optimal set of neighboring events N (proof
a particular position, and hence the maximum speed of dy- provided in the supplementary material):
namics [2]. With that said, the optimal value of Trf depends
on the scene dynamics and the DVS used, and may be ap- At each stage, select the common polarity event
propriately set based on the intended application. with the next largest timestamp in the L × L spa-
tial neighborhood about the EOI, which is also an
4.2. Greedy Selection of Associated Spatially Neigh- 8-neighbor of the EOI or any previously selected
boring Events event.
To estimate the parameter Π of the local planar SAE as- Figure 4 provides a graphical illustration of the greedy
sociated to the EOI, events in the L × L (L is odd) spatial selection algorithm. An efficient implementation of the pro-
neighborhood are utilized to perform plane fitting. In con- cedure is also given in Algorithm 1, which is an adaptation
trast to [5], we only consider events occurred in the past, of Prim’s algorithm [24] for finding the minimum spanning
Algorithm 1 Efficient Greedy Selection of the Locally Op-
timal Set of Neighboring Events N Associated to the EOI
1: L ← { e | e is in the L × L spatial neighborhood
about eEOI and e 6= eEOI }
2: C ← { e | e is an 8-neighbor of eEOI with the same
polarity }
3: N ←∅
4: while |N | < Nnb and C 6= ∅ do
5: enew ← e ∈ C with the largest timestamp
6: C ← C ∪ { e | e is a 8-neighbor of enew with the
/ C ∪ N and
same polarity, e ∈
e ∈ L } \ enew
7: N ← N ∪ { enew }
8: end while
9: return N
∇Σe (p)LS
= (∆P ⊤ ∆P )−1 ∆P ⊤ ∆t
(11)
Figure 5: All 4 possible spatially collinear arrangements of 1
= adj (∆P ⊤ ∆P )∆P ⊤ ∆t
events, visualized for a 5 × 5 pixel neighborhood. Each cir- det(∆P ⊤ ∆P )
cle indicates a specific location in the local neighborhood,
where:
where the EOI is located on the central circle. The color of
the circles (excluding white) reflect the particular collinear N N
nb nb
∆x2i
P P
arrangement it is associated to. Thus, the set of events in N ∆xi ∆yi
∆P ⊤ ∆P =
i=1 i=1
are collinear if and only if all of them are located on circles
N nb N nb
2
P P
of the same color, other than white. ∆yi ∆xi ∆yi
i=1 i=1
∆x = xEOI − x
where ∆p = pEOI − p and ∆t = tEOI − t. ∆y = yEOI − y
The above simple description can also be similarly de-
rived from Equation 2 with p = pEOI . The ordinary least Note that for a L × L (L is odd) spatial neighborhood
squares estimate of ∇Σe (p), denoted as ∇Σe (p)LS , can be about the EOI, ∆x and ∆y is strictly within the range
obtained as follows: [ −L 21 , L 12 ], where L 21 = 12 (L − 1). With the above
constraint, each of the product terms ∆x2 , ∆x∆y and
∇Σe (p)LS = ∆P † ∆t (10) ∆y 2 in the entries of ∆P ⊤ ∆P can be efficiently computed
with the use of a Lookup Table (LUT) which stores all
where: 1
2 (L 2 + 1)(L 2 + 2) distinct pairwise products of integers in
1 1
(a) SOFEA (b) Mueggler et al. [21] SOFEA 42.30 ± 87.57 16.73 ± 18.09