Nueral Net 2012 Article HS1

Accepted Manuscript
Capturing significant events with neural networks
Harold Szu, Charles Hsu, Jeffrey Jenkins, Jefferson Willey, Joseph

Landa
PII: S0893-6080(12)00024-X
DOI: 10.1016/j.neunet.2012.01.003
Reference: NN 2944
To appear in: Neural Networks
Received date: 29 August 2011

Revised date: 23 December 2011
Accepted date: 18 January 2012
Please cite this article as: Szu, H., Hsu, C., Jenkins, J., Willey, J., & Landa, J. Capturing
significant events with neural networks. Neural Networks (2012),
doi:10.1016/j.neunet.2012.01.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a
service to our customers we are providing this early version of the manuscript. The manuscript
will undergo copyediting, typesetting, and review of the resulting proof before it is published in
its final form. Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
*Manuscript 1
Click here to view linked References
Capturing Significant Events with Neural Networks

Harold Szu*a, Charles Hsub, Jeffrey Jenkins a, Jefferson Willeyc, and Joseph Landad
a
US Army NVESD, Fort Belvoir VA
b
Trident Systems Inc. Fairfax VA
c
US Naval Research Lab, Washington DC.
d
Briartek Inc. Alexandria VA
ARTICLE INFO A B S T R A C T
Article history: Smartphone video capture and transmission to the Web contributes to data
pollution. In contrast, mammalian eyes sense all, capture only significant
events, allowing us vividly recall the causalities. Likewise in our videos,
Keywords: we wish to skip redundancies and keep only significantly differences, as
Compressive Sensing determined by real-time local medium filters. We construct a Picture Index
Associative Memory (PI) of one‟s (center of gravity changes) among zeros (no changes) as
Human Visual System Motion Organized Sparseness (MOS). Only non-overlapping time-
Compressive Video Sampling ordered PI pair is admitted in the outer-product Associative Memory
(AM). Another outer product between PI and its image builds Hetero-AM
(HAM) for fault tolerant retrievals.
1. INTRODUCTION
We wish to describe a spatiotemporal information sampling strategy for a small field of view handheld
Smartphone. We begin with a common sense approach about video frame rates: “How many views or
frames does a monkey need in order to tell a good zookeeper from a bad one?” Monkeys select 3
distinctive views, which we refer to as m frames: frontal, side and a 45o view [1]. Interestingly, humans
need only m = 2 views when constructing a 3-D building from architectural blueprints, or for visualizing a
human head. These kind of questions, posed by Tom Poggio et al.[1], can be related to an important
medical imaging application. The Compressive Sensing (CS) strategy in medical imaging may save the
patient from un-wanted radiation exposure with a smaller number of m views of even smaller number of
exposed pixels, which is counted as the -norm , ,where is Least
Mean Squared (LMS); is Manhattan, or city-block, window-rim distance; and counts the
non-zero elements (anything non-zero raised to zero power is 1 mathematically). The k-
degree of sparseness is the sensing degree of freedom satisfying .
A million-dollar question remains in the medical imaging technology: how to block the imaging
radiation for m optimal linear sparse combinations. We don‟t know a high precision technology for
controlling the X-ray and Gallium contrast agents (cf. Concluding Remarks). Instead, we address the
Smartphone data pollution challenge by exploiting an Artificial Neural Network (ANN) and Associative
Memory (AM) learning technology. We wish to capture significantly meaningful m frames to render a
video Cliff Note, without the usual post-processing time and labor costs.
We follow a mammalian vision, from Darwinian viewpoints, paying attention to significant and
abrupt changes for mating, food, and survival reasons. For example, when driving at night, we watch for
pedestrians, in the rain not random raindrops. When the visual stimuli received by both eyes agree at a
moment in time, it is certainly a signal; if disagree, it could be noise and rejected. Such an experience of
2
selective image fusion is effortless unsupervised learning to separate signal from noise. Combining the
power of two eyes with the brain associative memory we can also solve sophisticated image sources de-
mixing problems (cf. Concluding Remarks)[3].
Smartphone Spotting App: Smartphone took 3 pictures of the same person of variable poses
{ } and 2 pictures of another person { , etc., created a private FaceBook Web
database: After a phone call meeting a friend in a football stadium, one may
wish to turn on the spotting face app. The phone camera can match any incoming picture with the
Smartphone database [A], which is mathematically equivalent to an over-determined inverse, by fitting a
highly redundant database [A] with a sparse representation of the incoming (identifying with known
facial poses [A] must be sparse to be potentially unique).
(1)
Single Frame App: Given an image acquisition rectangular matrix of m rows and N columns
consisting of few known ones (transparent ones for keeping the pixels) per row among a dense sea of
zeros (opaque zeros for rejecting pixels). Statistically speaking, the ones of each row will not be
overlapped with the ones in the other row which is sparse and thus orthogonal in the statistical sense.
Thus, the same linear algebra Eq(1) can be interpreted differently in single frame app. The column
vector has m measured summary values (similarly to Monkey‟s need m=3 views and human m=2
views) of the unknown image vector of N pixels. Given the input measurement vector and known
sparse orthogonal acquisition matrix , we must determine the unknown image of N pixels. The
question is what to do when there are N-m missing conditions. In other words, finding from becomes
an ill-posed inverse problem.
Solving the ill-posed inverse requires a performance measure, e.g. LMS similarity
-norm, together with a constraint at the minimum or sparse city-block distance: of the
norm, rather , for computational tractability reasons. Without the constraint, the LMS is blind to all
possible direction cosines within the hyper-sphere surface, called Penrose‟s pseudo-inverse: right-
multiplier or left-multiplier . Indeed, using the
sparseness constraint, solving the -constrained -optimization becomes a linear programming CS
problem, as published by Emmanuel Candes of Caltech, Justin Romberg of GIT Technology, and Fields
prize winner Terrence Tao of UCLA [2, 6] as well as David Donoho of Stanford who adopted the pre-
processing of wavelet sub-band codec before CS[7]. The Sparse Measurement Theorem stated that the
sampling operator has the Restricted Isometry Property if its matrix representation has k ones
randomly distributed among zeros is bounded within: ; .
Instead of seeking a coarse inverse solution located on the hyper sphere LMS surface, CRTD sought
after a sharper solution at a corner of the hypercube inscribed within the hyper-sphere surface, by
imposing the sparseness constraint as a -norm (by the true sparseness constraint at -norm
having N biaxial combinatorial choices).
Now we introduce our paper. There is more than one way to achieve the sparseness, a purely
randomly way or an organized way. We choose the latter to assign the information meaning to the
location of ones. These ones are selected sparsely by the significant changes of local Center of Gravity
among neighborhood frames, and, otherwise, zeros. Furthermore, this sparse one among zeros is taken as
the Picture Index representing the full resolution image in the Massive Distributive Parallel (MDP)
Hetero-Associative Memory:
(2)
3
where the superscript T denotes a transpose of a column vector to a row vector. The sampling rate can be
adaptively decided by the C.G. changes of local scenery movement, from 30 Hz toward a few Hz or
less, until the next is found and satisfied the orthogonality condition.
. (3)
where the information flow help selected and kept 2 frames form several by-passed
frames. Thereby, we record this fact in the jump-over sequential storage index Associative Memory
(4)
Since the input of a Picture Index to the index associative memory [AM] can reproduce the next new
picture index in a Fault Tolerant fashion, as the originally stored time-order.
, (5)
where the well-known McCulloch-Pitts neuronal sigmoid logic is the neuronal two-state normalization
(firing or not) that has the Boltzmann canonical ensemble form in terms of the Boltzmann constant
and brain Kelvin local temperature T:
In a cool down local limit, , the sigmoid logic is reducible to Von Neumann binary logic:
, that could be more zeros than ones as the sparse representation. Moreover, a
sequentially updated Hetero-AM storage is defined
(6)
[HAM] can be used to recover a high resolution image at some equilibrium temperature T
(7)
This strategy emulates the Hippocampus AM storage in the center of the brain. The AM and sigmoid
logic is familiar to the neural network community, but its relationship to the current Compressive Sensing
has not been elucidated before. In this adaptive or learning aspect, our approach has generalized the
statistically purely random sparse pseudo-orthogonality. Our orthogonality is deterministically achieved
by non-overlapping ones over zeros.
In Sect. 2, we will review the AM storage in terms of sparse matrix algebra of outer-product „write‟
operations and the inner-product „read‟ operations. The Picture Index (PI) is automatically produced by a
video frame generated Motion Organized Sparseness (MOS) which is crucial for achieving non-
overlapping orthogonality, and therefore the fault tolerance (FT) and generalization. Our approach is
intended to be frame-selective and information-compressive sampling. We wish to eventually produce an
automated „Cliff Notes‟(not shown), which would merge all distinctive frames into a single (large) frame
story line to aid human analysts, who otherwise have to manually sift through terabytes of data.
2. ASSOCIATIVE MEMORY
We will illustrate the mathematical consequence of sparse orthogonality that can enjoy both the
fault tolerance and the generalization as if two sides of the same AM coin. Before that, we wish to
mention the biological basis of a sparse neuronal representation called the grandmother neuron(s). Todd
4
and Marols in Nature 2004 [9] summarized the capacity limit of visual short-term memory in human
Posterior Parietal Cortex where sparsely arranged neuronal population (sometimes even a single or few
neuron(s)) fires intensely for 1 second without disturbing others, supporting the independence strategy
yielding the orthogonality attribute. The „grandmother neuron(s)‟ may be activated by other stimulus and
memories, but is the sparse sole representation of „grandmother‟ for the individual. A mammalian
strategy of paired sensors involves a short-term working memory. To substantiate the electric brain
response as a differential response of visual event related potentials, Pazo-Alvarez et al in Neuroscience
2004 [5] reviewed various modalities of brain imaging methodologies, fMRI (Ahlfores ‟99), MEG, VEPs
(Kremlacek „2001), ERPs, PET (Cunningham ‟90); Watson ‟93. They confirmed the possibility of
automatic comparison–selection of motion direction changes. This is the biological base of MOS.
The necessary and sufficient condition of MDP AM storage is the orthogonality, as depicted in Fig.1.
Thus, we recapitulate the essential attributes, sparseness and orthogonality, as inspired by the brain
hippocampus storage mechanism. The hippocampus is 2-D mesh street map, like Manhattan city (Fig.
1c) , and N traffic lights are N analog neurons sending Morse coded firing rates as blinking lights from 30
Hz to 100 Hz firing rates. According to the Hebbian learning product rule, all N2 street intersections can
store the product of neuronal firing rates as the impedance values at neuronal synaptic gap intersections.
Even if a local neuron or traffic light broken down, the distributed memory, remained at the street map,
can be re-directed and retrieved.
Given facial images , three possible significant or salient features, such as the eyes, nose, and
mouth can be extracted in the rounding off cool limit, the maximum firing rate of 100 Hz to one, and
lower firing rates to zero: (1, 0)  (big, small). When these neuronal firing rates broadcast among
themselves, they form the Hippocampus Associative Memory [AM] at the synaptic gap junctions denoted
by the weight matrix . When a small child first is introduced to their Aunt and Uncle, the image of an
Uncle gets compared with the Aunt. The 5 senses fusion is facilitated beneath the cortical layer through
the Claustrum [10]. The child could distinguish the Uncle by multi-sensing noticing that he has a normal
sized mouth (0), his nose is bigger (1) by comparison to that of the Aunt, and has normal sized eyes (0).
These features can be expressed as firing rates fold  (n1, n2, n3)  (eye, nose, mouth)  (0, 1, 0), which
turns out to be the coordinate axis of the family feature space. Likewise, the perception of an Aunt
having big (1) eyes, a smaller (0) nose, and smaller (0) mouth (1,0,0), forms another coordinate axis ..
Mathematically, k/N=0.3, this selection of sparse saliency features satisfies the orthogonality criterion,
this ANN sparse classifier satisfies the nearest neighbor classifier principle, as well as Fisher‟s Mini-Max
classifier criterion for intra-class minimum spread and inter-class maximum separation [11, 12].
When the uncle is smiling at the child, the child generates a new input features fnew  (n1, n2, n3) 
(eye, nose, mouth)  (0, 1, 1) through the same neural pathway, then the responses arrive at the
hippocampus where the AM system recognizes or corrects effortlessly the new input back to the most
likely relative, namely the big-nose Uncle state (0, 1, 0), within the fault tolerance of direction cos(45o).
We write „data‟ to the AM by an outer-product operation between the Uncle‟s feature vector in both
column and row forms, and over-written that of Aunt at the same 2-D storage without cross talk
confusion. This is MDP happened among hundred thousand neurons in a local unit. If the Uncle smiles,
he is read by the child as a new input. The AM matrix–vector inner product represents three feature
neurons (0,1,1) which send their 100 Hz firing rates through the AM architecture of Fig. 1c, and the
output (0,1,0) obtained after applying a sigmoid, , threshold to each neuron confirms that he remains to
be the big nose Uncle.
5
z (mouth) z (mouth) w31 w32 w33
w21 w22 w23

(011)
w11 w12 w13
x (eye) (100) x (eye)
(010) (010)
y (nose) y (nose) n1(eye) n2(nose) n3(mouth)
a. b. c.
Fig. 1a: Feature organized sparseness (FOS) may serve as the fault tolerance attribute of a distributive associative memory
matrix. When a child meets his Aunt and Uncle for the first time, the child pays attention to extract three feature neurons which
fire at 100 Hz or less, represented by 1 or 0 respectively. Fig.1b: Even if the uncle smiled at the child (indicated by (0,1,1) in the
first quadrant), at the next point in time, the child still can recognize him by the vector inner product read procedure of the [AM]
matrix and the new input vector (0, 1, 1). A smiling uncle is still the uncle as it should be. Mathematically speaking, the brain‟s
Hippocampal storage is equivalent to the vector outer product of the feature vector (0, 1, 0) to itself at the associative memory
[AM] matrix. Fig.1c: This broadcasting communication circuitry network is called the artificial neural network (ANN)
indicating adjustable or learnable weight values {Wij; i,j=1,2,3} of the neuronal synaptic gaps among 3 neurons indicated by 9
adjustable resistances where both uncle and aunt feature memory are additively stored concurrently.
Write by the vector outer product repeatedly over-

written onto the identical storage space forming
associative matrix memory [AM]. Orthogonal features
Read by the vector inner product recalling from the
are necessarily for soft failure indicated in a 3-
sparse memory template, and then use the nearest
dimensional feature subspace of N-D.
neighbor to correct input data via the vector inner
product:
 1 0 0  0   0 
 
  0 1 0 1   1
 0 0 1  0
 0
The fault tolerant AM erases the one-bit error (the bottom bit) recovering the original state, which is
equivalent to a semantic generalization: a big nosed smiling uncle is still the same big nose uncle. Thus,
for storage, the orthogonality can produce either fault tolerance or a generalization as two sides of the
same coin, according to the orthogonal or independent feature vectors. In other words, despite his smile,
the AM self corrected the soft failure degree about the degrees of sparseness , or
generalized the original uncle feature set depending on a feedback cognitive supervision. We have 10
billions‟ neurons and 100 billions‟ synapses used or lost; with some help of replenishment and
regeneration the synapses could last over 125 years. Perhaps, another reason for us, having a creative free
will, is not to saturate our brain memory degrees of freedom and kept at sparse 10~15% level (10%
x ).
What is the biological mechanism to regulate a sparse traffic? In the retina, ganglion cells regulate
the flow of energy between photoreceptor cells (rods, cones, etc.) and the brain. Photoreceptors always
send a signal when activated, while Ganglions only send a signal to the brain if their membrane potential
passes an internal threshold. In a sense, the Ganglion cells act as a gatekeeper for the brain; passing along
significant spatiotemporal information. The Picture Index is our mathematical embodiment of a gate
keeper procedure for information-selective compressive sampling, defined as follows. Since is a gate
keeper to the stored [AM], Eq(3) implies a variable video sampling rate which is no longer fixed at the
oversampling rate of 30 Hz = 1/ , rather than a local reduced sampling interval , whenever a new
sampled frame is orthogonal to the old frame. Otherwise, the associated Associative Memory Eq(4)
[AM] will suffer from cross talk between Picture Indices, and will not be fault tolerant. A is
significant if it satisfies the orthogonality condition Eq(3), and becomes admissible for the overlaid AM
storage Eq(3). Conversely, an analyst may wish to find a specific sparse Picture Index with an image
6
input to matrix [HAM]; he or she can retrieve the associated Picture Index, and a NN classifier can
instantaneously find the next Picture Index from [AM] and the next full resolution image from [HAM]
without search time defined in Eq(4).
3. SIGNIFICANT EVENT STORAGE AND RETRIEVAL
We define a significant event to be a tiger jumping out of bushes. The processing window size
may have a variable resolution with learnable window sizes, in order to determine the optimum Locality
Center Gravity (LCG) movement. This may be estimated by a windowed Median filters (not Mean filter),
used to select their majority gray-value delegating-pixel locations and image weight values, according to
the local grey value histogram (64x64, 32x32, 16x16, etc.). Then, we draw the optical flow vector from
one local delegator pixel location to the next, pointing from one to the next. The length of the vector is
proportional to the delta change of gray values as the delegator weights. Similarly, we repeat this Median
Filter over all windows of two frames. We can sequentially update multiple frames these optical flow
vectors for testing the net summation, plotted one vector tails-to-another vector heads, to cover a
significant movement over the window size, say half. Then the net is threshold having the value of „one‟
representing the whole window population to build a Picture Index (PI) (indicating a tiger might be
jumping out with significant LCG movement); otherwise, the net LCG will be threshold at zero(as the
wind is blowing tree branches or bushes in a cyclic motion without a net LCG motion). We could choose
the largest jump LCG among f frames, or simply keep the last frame of image representing all these f
frames (to save the working Cache memory).
. Fig. 2a Video images of a tiger is jumping out of the bushes; Fig.2b Optical flow vectors f=5 frames reveals LCG
movement in Tiger region versus cyclic fluctuation of wind-blowing bushes; Fig. 2c: indicated an associated Picture
Index sparse representation.
The pseudo code is described as follows.

1. Read images (frame by frame, X1, X2, .., Xf,…) from an image sequence
2. Find Picture Index (dX1, dX2, .., dXf,…) by threshold, –
3. Check orthogonal condition between graphical index and
4. Using outer product form a AM =[dXf+1dXf]; and HAM=[Xf .dXf]
5. Given a picture index , retrieve the next , by the inner product [AM .
6. Given recover the image by the inner product [HAM , etc.
In the following figures, we consider a high speed example: (i) spotting the rotational direction of laces
during a baseball pitch, and a slow speed example, (ii) catching an Orchid‟s punctuated blossoming at
midnight.
7
Fig. 3: How do we hit a curve baseball? A curve baseball takes about

half a second to travel 60 ft. 6 in. at 80- 100 mph (113~140 ft/sec.)
from the pitcher‟s mound to home plate. Good batters learn cues for a
quick reaction by (i) catching 4 types of seam rotation (fastball 90mph
- topspin, curveball 75mph – diagonal spin, slider 85mph – diagonal
spin, and Knuckleball 65mph - no spin), evident by the ball‟s red
seams at arm‟s length away, where the Visual Contrast Sensitivity
Function is optimal; (ii) taking the initial batting condition (i.e. a
pitcher will likely throw a slower pitch with 2 strikes against the
Fig. 4 Top-Two „writes‟ to the HAM via an outer product of feature
batter), (iii) the batter watches body, shoulder, or hand hold clues from
changes following a sigmoid -threshold operation. Two HAM writes
the pitcher, and the catcher‟s final body movement. When pitcher‟s
and reads are displayed on the top, bottom respectively. Outer
fingers spin the top side of a fastball forwardly down during the release
products of the feature changes (in binary mode after a threshold
of the ball, it increases the top side‟s velocity with the spin, thus Vtop >
operation) and the object image is continuously updated in the [HAM]
Vbottom. Since the air drag is proportional to the quadratic velocity
(„over-write' mode) shown on the top left. The top right shows the
Ftop~Vtop2 (the first power comes from the air mass in the collision
update of HAM with two different scenes (ball with top spin in this
cylinder and the second power derives from the air mass momentum),
example). Bottom-Recalling an image from HAM via inner product
the drag will produce a top heavy distribution function known as the
with an arbitrary change vector (darker grayscale indicates zeros) and a
Magnus effect in sports[16]. A topside-spin-forward ball generates a
sigmoid transfer function is used to recall the closet image, shown on
larger drag force at the top side; it pushes down the ball towards the
the bottom left. The bottom right depicts another test result using a
home plate. The other physics effects are the lower air density the
different change vector.
further it flies and the less humanity, the less drag [16].
Fig.5 - This HVS strategy ensures a sequence of

blossoming orchid (1st row) that the updated change
is a similar degree of sparseness and also exactly
where the change happened (2nd row). In fact, a full
sequence of sharp images (5th row) can be
sequentially recalled from priming the [AM] sub-unit
Eq(4) (3rd row); [HAM] recall Eq(2) (4th row).
4. ICA Linear Algebra versus CS Linear Algebra
In this Letter, we have relaxed the „purely random‟ condition of sparse sampling matrix with
„motion-organized‟ sparseness (MOS). Consequently, we can recover original image in two steps, from a
single [AM] storage of sampling time ordered picture indices, and their hetero associated images storage
[HAM]. In case of multispectral color camera without an explicit object motion, we shall treat the time
index like the spectral index. The MOS becomes the Feature Organized Sparseness (FOS), e.g. a day-
night 4- spectral imaging our compressive sampling strategy. The Rosetta stone may be deciphered by
replacing MOS and known [A] by an unknown ICA de-mixed for blind source/feature
8
extraction. We thereby generalize the purely randomness pseudo-orthogonality sparseness condition to

the orthogonal sparsely extracted feature condition. In other words, we shall determine, inversely, the
sampling mask by solving the ICA Blinded Sources Separation (BSS) for both the feature sources
and the mixing matrix . Thus, we compare the CS linear algebra with the ICA algebra side-by-side as
follows.
; (8)
: ; . (9)
Where the Hubel-Wiesel wavelet is modeled by the digital sub-band wavelet bases successfully
been applied to JPEG 2000 video compression [14, 15]; and was a column vector of N components,
with only k non-zeros wavelets is indexed by . A smaller -norm value can rule out,
for example the superfluous homogeneous uniform solutions, which might reduce the contrast of a true
image. The CS community adopted the Fourier spatial frequency domain the wavelet bases in column
vectors, forming a long rectangular matrix , and a wide rectangular sampling matrix, which is
filled with randomly scattered ones among zeros. Their product turns out to be the
unknown mixing matrix in ICA.
(11)
(1) Symmetric Wiener Whitening in ensemble average matrix .

By definition satisfying .
Q.E.D.
(2) Orthogonal Transform:
By definition
Q.E.D.
The Step (2) can reduce ICA de-mixing to orthogonal rotation. It provides a simple geometrical solution
as the killing vector, orthogonal to all the row vectors except for one. This
rotation procedure generates a correspondingly independent source along the specific direction [11, 12].
This ICA algebra might help design a new FOS sampling matrix deterministically as follows. We
need to apply the unsupervised learning to determine the source blindly without knowing the mixing
matrix , where is the inverse of the feature extraction matrix
. We apply aforementioned Step (1) Wiener whitening in image domain, and Step (2) orthogonal
matching pursuit to derive the source algebraically. The ICA procedure provides us a rank-1 outer
product estimation about how to find a needle-in-the haystack feature trap. We can increase the
efficiency of multi/hyper-spectral compressive sensing methodology. Given the linear measurement
data , we find . Given all independent sources, we construct
the orthogonal ones (by the Gram-Schmidt procedure) Given the linear measurement
data, we prefer the orthogonal feature extraction such that . We verified, by
rank-1 outer product approximation, the chain relationship:
as it should be self-consistently.
5. Concluding Remarks: Physics and Mathematics Challenges

9
The CRTD CS methodology was originally developed for medical imaging. A Digital CT Scan,
angiography, may be operated at 40% lower absorbed X-ray radiation than earlier models (500 milli unit
of Wilhelm „Roentgen equivalent man‟ (rem) or Int‟ Sys. U. at 5 milli Rolf Sirvert=1 Joule/kg). In the X-
ray skull target, CRTD further reduced the amount of radiation exposure by computer simulations by
another factor of ten, say from 300 slices of X-Ray projections to 30 slices at 1 cm apart. CRT can
reproduce an identical size and resolution image. Such a feasibility of less radiation in less measurement
holds promises for the medical community because it allows more patients to be examined at lower risk
and less cost (i.e. wear and tear on equipment and technicians). Besides the reason of sparse imaging
content, the reason that CRTD can sample much less than the Nyquist critical sampling rate, 2 per Fourier
modes, may be due to the bio-tissue coupling tend to make an over-estimation of the independent number
of Fourier modes. The adequate number of views M is proportional to the net degree of
freedom k. CRTD introduced a mask which is a long rectangular matrix
having k number of 1‟s (1 means keep a pixel) distributed in a purely random fashion at arbitrary
locations among numerous 0‟s. An original image should be filtered through a random sparse sampling
mask. Unfortunately, because of the short penetrating radiation the physics challenge of blocking X-rays
(or probing RF fields with contrast agent everywhere) is easy said than being done, and so far has escaped
the community, with the exception of optics demonstrated by Baraniuk of Rice Univ. The mathematical
challenge is, that given under-measurements of , finding the original larger set of
image was an ill-posed inverse problem without sufficient a-priori information. A trivial example of
ambiguity is that any number of homogenous solutions, , could be added to an
inhomogeneous solution and still satisfies the original set of measurements; but
obviously this added homogeneous solution contributes a larger total magnitude and is therefore
ruled out by the minimum -norm constraint. This post-processing linear programming is relatively slow
but has inspired over 300 research and development papers in the world.
IN SUMMARY
For Smartphone camera having a small field of view, we do not use the long rectangular sampling matrix
of purely random sparse ones among zeros, in order to avoid the delay of image recovery. We use an
motion organized sparseness (MOS) and keep the independence consequence. We took advantage of the
LCG organized changes as ones among zeros for useful information retrieval indexing. We made sure
that the frame sampling time is delayed enough to allow enough motion displacement
generating non-overlapping ones among zeros. Then this sparse change template becomes a natural
picture index, which generalized the Kodak image chip concept originally developed for digital
restoration of analog film. We added the change to image chip becoming the picture index . Then, we
adopt the fault tolerant associative memory [AM] computed by the outer product between two time-
ordered indices . Also, we form another outer product between . with the corresponding original
image , called the Hetero-Associative Memory [HAM]. These pairs are how we can instantaneously
recall any image, without suffering cross talk or delay from slow search and recovery. In short, we do not
delete pixels; but entire frames that are redundant or insignificant, and keeping entirely new frames if its
image change picture index is orthogonal to all the earlier: if ;then is
admissible.
The biological HVS has a dense population of about 140 million Rod cells for night vision
(grayscale) and 6.5 million day color cones (a mixture of red, green, and blue sensing cells) per eye
10
arranged for multiple spectral CS in a seemingly random mosaic pattern at the retina. We only explored
motion organized sparseness (MOS) of our eyes. The full potential of organized sparseness has not been
explored for the multispectral camera that 4th Gen Smartphone ought to have with this CSp app. One
shall build a full EOIR spectrum fovea camera applying a generalized Bayer filters using spectral-blind
Photon Detectors (PD) emulating cones and rods per pixels. We wish to mention in passing that current
camera technology has already employed a deterministic sparseness principle: (i) Bayer color image
filters (4x4 per pixel for RGB color and grey spectrum to trade the spatial resolution with spectrum
resolution) helps the spectral blind photon detector array of N pixels to measure N/4 color pixels; (ii)
Vanadium Oxide (VOX) night vision Focal Plane Array, where N=m pin-outs at the read out of Charge
Coupled Device (CCD) may be represented by the image Acquisition matrix [A] of Eq(1) performs full
row-sums, column-sums, and diagonal-sums. Since storage technology becomes inexpensive, or „silicon
dirt cheap,‟ we can afford seemingly wasteful 2-D distributive AM storage of 1-D independent vectors,
but we will gain a lot as follows. We wish to design the camera with over-written 2-D storage in a MDP
fashion, in terms of a sparsely orthogonal Picture Index (PI) following the fault-tolerant Associative
Memory (AM) Principle. We can avoid the cross-talk confusion and unnecessary random access memory
(RAM) search-delay, based on the classical 1-D sequential optical CD technology storage concept: a
pigeon-a-hole.
The design of FOS or MOS sparse sampling matrix may be possible by exploiting the ICA
unsupervised learning ANN. There are two ways to do the factorization of j-pdf a posteriori-ly or a priori-
ly. The a posterior approach is based on the engineering filter concept that design a filter so that nothing
coming out is meaningful except the noise characterized at Maximum of a-posteriori entropy
of de-mixed neural net weighted output by Tony Bell, Terrence Sejnowski,
Sunichi Amari and Erkki Oja (BSAO) circa 1997[11]. Independently, the a priori approach is based on
the physics statistical mechanics by Szu et al. maximizing the a-priori unknown feature sources
entropy and minimizing the unknown mixing matrix [A] error energy at an
isothermal equilibrium. The Helmholtz free energy is the total energy E subtracting the non-workable
source entropy – (human equilibrium brain temperature 37oC). This
mathematics is known as Lagrange Constrained Neural Networks [3] which was demonstrated with
Matlab code for: (i) 7 spectral band Landsat images over a Mediterranean desert city, (ii) dual infrared
spectral breast cancer images [13], and (iii) a point-nonlinear space-variant mixture image [12].
Applications of these blind sources separations methodology to inversely design a sparse sampling matrix
might open up a new challenging of cameras to the ANN community.
REFERENCES
[1]. Martin A. Giese and Tomaso Poggio, “Neural mechanisms for the recognition of biological movements,”
Nature Reviews Neuroscience 4, 179-192 (March 2003).
[2]. E. J. Candes, J. Romberg, and T. Tao “Robust Uncertainty Principle: Exact Signal Reconstruction from
Highly Incomplete Frequency Information,” IEEE Trans IT, 52(2), pp.489-509, Feb.2006
[3]. H. Szu, I. Kopriva, “Comparison of LCNN with Traditional ICA Methods,” WCCI/IJCNN 2002;H. Szu, P.
Chanyagorn, I. Kopriva: “Sparse coding blind source separation through Powerline,” Neurocomputing 48(1-4):
1015-1020 (2002); H. Szu: Thermodynamics Energy for both Supervised and Unsupervised Learning Neural
Nets at a Constant Temperature. Int. J. Neural Syst. 9(3): 175-186 (1999); H. Szu and C. Hsu, “Landsat
spectral Unmixing à la superresolution of blind matrix inversion by constraint MaxEnt neural nets, Proc. SPIE
3078, pp.147-160, 1997; H. H. Szu and C. Hsu, “Blind de-mixing with unknown sources,” Proc. Int. Conf.
Neural Networks, vol. 4, pp. 2518-2523, Houston, June, 1997.
[4]. Hubel, D. H., and Wiesel, T. N. (1962). “Receptive fields and functional architecture in the cat‟s visual
cortex.” J. Neurosci, 160, 106–154.
[5]. P. Pazo-Alvarez, RE. Amenedo, and F. Cadaveira, “Automatic detection of motion direction changes in the
human brain, “ E. J. Neurosci. 19, pp.1978-1986, 2004.
11
[6]. E. J. Candes, and T. Tao,” Near-Optimal Signal Recovery from Random Projections: Universal Encoding
Strategies,” IEEE Trans IT 52(12), pp.5406-5425, 2006.
[7]. D. Donoho,” Compressive Sensing,” IEEE Trans IT, 52(4), pp. 1289-1306,2006
[8]. Elias H. Cohen, Elan Barenholtz, Manish Singh, Jacob Feldman, “What change detection tells us about the
visual representation of shape,” J. Vis. 5, p. 313-21, 2005
[9]. J. Jay Todd and Rene Marols, “Capacity limit of visual short-term memory in human posterior parietal cortex,”
Nature 428, pp.751-754, 15 Apr. 2004.
[10]. Crick FC, Koch C., “What is the function of the Claustrum?” Phil. Trans. of Royal Soc. B-Bio. Sci. 360:1271-
9, 2005.
[11]. A. J. Bell and T.J. Sejnowski, “A new learning algorithm for blind signal separation,” adv. In Inf. Proc. Sys. 8,
MIT Press pp. 7547-763, 1996; A. Hyvarinen and E. Oja, “A fast fixed-point algorithm for independent
component analysis,” neu. Comp. 9, pp 1483-1492, July 1997; S. Amari, “Information Geometry,” in Geo. and
Nature Contemporary Math (ed. Nencka and Bourguignon) v.203, pp. 81-95, 1997.
[12]. H. H. Szu, and I. Kopriva, “Unsupervised learning with stochastic gradient,” J. Neurocomputing 68, pp. 130–
160, Oct 2005.
[13]. H. Szu, L. Miao and H. Qi, "Unsupervised learning with mini free energy", Proc. SPIE “ICA, Wavelets,
Neural Networks,” Vol 6576, doi:10.1117/12.725198 (2007); L. Miao, H. Qi, H. Szu, “A Maximum Entropy
Approach to Unsupervised Mixed-Pixel Decomposition.” IEEE Trans. Image Proc. 16(4): 1008-1021 (2007)
[14]. C. Hsu, H. Szu, “WaveNet processing of live video via radio,” Journal of Electronic Imaging, V.7, 755-768
(1998);
[15]. H. Szu, C. Hsu, Thaker, M. Zaghloul, “Image Wavelet transform implemented with discrete wavelet chip,”
Optical Engineering, July (1994)
[16]. R. K. Adair, “The Physics of Baseball,” (Perennial NY, 3rd Ed, 2002); L. Bloomfield, “The Physics of
Everyday Life”

Nueral Net 2012 Article HS1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nueral Net 2012 Article HS1

Uploaded by

Copyright:

Available Formats

Accepted Manuscript

Capturing significant events with neural networks

Harold Szu, Charles Hsu, Jeffrey Jenkins, Jefferson Willey, Joseph

To appear in: Neural Networks

Received date: 29 August 2011

Capturing Significant Events with Neural Networks

z (mouth) z (mouth) w31 w32 w33

w21 w22 w23

Write by the vector outer product repeatedly over-

3. SIGNIFICANT EVENT STORAGE AND RETRIEVAL

The pseudo code is described as follows.

Fig. 3: How do we hit a curve baseball? A curve baseball takes about

Fig.5 - This HVS strategy ensures a sequence of

4. ICA Linear Algebra versus CS Linear Algebra

extraction. We thereby generalize the purely randomness pseudo-orthogonality sparseness condition to

(1) Symmetric Wiener Whitening in ensemble average matrix .

5. Concluding Remarks: Physics and Mathematics Challenges

You might also like