A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel

-
Sensor-Based Parallel Architecture
Gandhiji Institute Of Science & Technology, Jaggayapeta, A.P, INDIA
Author: B.Saieesh, GIST
Guide: K.Jaya swaroop, Assit.prof. (GIST)

ABSTRACT:- Feature extraction (MFE), as an important and
A very-large-scale integration capable of computationally heavy process in motion recognition
extracting motion features from moving images in [6], [7], is required to be processed in less than 1 ms.
real time has been developed employing row- The traditional software-based MFE algorithms are
parallel and pixel-parallel architectures based on computationally very expensive, and building high-
the digital pixel sensor technology. Directional speed systems is quiet difficult even when state-of-
edge filtering of input images is carried out in the-art multicore general purpose processors are
row-parallel processing to minimize the chip real utilized [8], [9]. By developing finetuned software
estate. To achieve a real-time response of the exploring the single-instruction multiple-data
system, a fully pixel-parallel architecture has been Operations on modern processors and/or graphics
explore in adaptive binarization of filtered images processing units for particular applications [10], [11],
for essential feature extraction as well as in their high hardware costs and large power consumptions
temporal integration and derivative operations. As are severely limiting such approaches to be used in
a result, self-speed-adaptive motion feature portable devices, as well as in large-scale systems. As
extraction has been established. The chip was a result, high-speed and low-power architectures for
designed and fabricated in a 65-nm CMOS MFE are now highly demanded. A number of very-
technology and used to build an object detection large-scale integration (VLSI) processors with
system. Motion-sensitive target image localization parallel architectures have been developed for
was demonstrated as an illustrative example. performance enhancement in image processing
algorithms. Image data processing usually includes
Index Terms—Block-readout scheme, digital pixel some computationally intensive low-level data
sensor (DPS), motion feature extraction (MFE), processing [12], such as image filtering, which needs
parallel architecture, vision chip. be performed repeatedly in every pixel site in the
entire image, or at least in the region of interest.
I. INTRODUCTION Processors in [13]–[16] improved the performance
REAL-TIME motion recognition is becoming and power efficiency by developing parallel
increasingly important in various applications, such processing circuitries mainly for such low-level
as automotive vehicle control, efficient human– image processing tasks. Processors in [17]–[23]
computer interaction [1], video surveillance [2], implemented fine-grain parallel processing
remote gesture control [3], sign language architectures by investigating image processing tasks
interpretation [4], [5], surgery support, and so forth. at all levels (low, middle, and high levels [12]) in
In automotive vehicle control, for instance, it can be some specific applications. Such processors have
used for slip detection and in surgery support, made real-time image processing systems feasible
monitoring the behavior of a beating heart would [14], [20]. Therefore, the parallel architecture is
play a crucial role. For such applications, high-speed important for improving the performance of image
image processing employing high-frame-rate image data processing. However, all these processors are
sensors [about 1000 frames/s] can significantly primarily targeting still image recognition, and not
provide many details of the motion, which is very dealing with motion or action recognition problems.
useful for improving the speed and precision of the In addition, the delay caused by data transfer from
Recognition. Many applications benefit much from image sensors to processors, which is usually carried
the high speed image processing (about 1000 out one pixel data per clock cycle, severely limits the
frames/s), such as the system reported in [74] for efficiency of image data access, making such a
counting objects flowing rapidly on the line, in [75] system unsuitable for time-critical applications, such
for microorganisms observation under microscope, as automotive vehicle control. Furthermore, systems
and in [76] for object regrasping using a robot hand. composed of multiple chips including image sensors,
Capabilities of 1000 frames/s have a lot of potential processors, and in certain cases, memories that buffer
to be implemented. For building such high-speed frame image data are still very power hungry. In this
systems, motion sense, building system-on-a-chip (SoC) image
processing systems is a very promising approach to

the developed to process the data at all levels. Therefore. the DPS-based interframe processing.P. In and. Therefore. building circuits are easy to scale down with advanced process analog processing circuits using more advanced technologies. Assit. which can reduce the computation captured image and the storage of the digitized image Complexity by five times than the algorithms without data.Saieesh. Optimized VLSI architecture provide an efficient complex algorithms such as on-chip image approach for developing high-speed and low-power compression can be implemented with low-power image processing systems. the image data first need to be buffered in explored very extensively. neural signal recording was proposed. For Two important issues arise in the direct connection example. As a result. However. such as those being separated from the image sensor array. employing VLSI processing achieved with more conservative implementation-friendly algorithms is quite essential Technologies. it is not particular those for motion recognition applications. One is the ADC of the developed. in which multiple-row data access is developed scalable architecture for implantable essential for kernel processing like image filtering. However.digital conversion (ADC). friendly algorithms have been proposed . A. it is of paramount PEs are directly connected. Mimicking the brain consumption. visual sensor So Cs employing because it allows us to build the system in compact Parallel digital processing elements (PEs) are and power-efficient parallel circuitries. the processing circuits importance to consider the VLSI-implementation need to wait for 256 clock cycles every time they friendliness from the very early stages of algorithm access the digitized image data in a single row Development. Digital used in motion recognition. allow u to build image processing SoCs. In a image data transfer from the storage to processing neural feature extraction algorithm with a particularly circuitries. These most common approaches are to translate already works demonstrated the real-time performance of the available software tools [40]–[42] into VLSI image processing SoC concept. Such and digital memories [35]. INDIA Author: B. edge extraction. This would not be a very efficient way to the access to analog image data in active pixel take. Intelligent image processing algorithms. although the image sensor and row-parallel implementation. (GIST) achieving better power efficiency and small latency. regarding hardware. One of the drawbacks of the signal image processing systems. In [33]. and promising solution to implementing high-speed motion detection. and how to make them sensor and processing circuitries are not directly VLSI-implementation friendly has not yet been connected. which integrate both processing employed in most of the motion analysis image sensors and processing circuits on the same algorithms. GIST Guide: K. since the image are usually very complex. The other is the scheme for efficient digital Consideration on the hardware friendliness. a human posture recognition between the image sensor array and digital algorithm utilizing specific smart sensors was processing circuitries. has good compatibility VLSI implementation-friendly algorithms with with the digital processing circuitries. which makes such approaches not circuits not only in each pixel data control. Smart with direct connection between the image sensor sensors with analog processing circuits have been array and digital processing circuitries provides a developed for image filtering. Jaggayapeta. Furthermore. between image processing algorithms and their VLSI In [34]. developing DPS-based VLSIs chip. very efficient in these works. Another advantage of processing in analog domain is the lack of using DPS-based VLSIs is the employment of digital programmability. in [43]. and therefore the functionality of the process technologies requests much harder design chip could be easily enhanced. In particular. in sensors for on-chip digital data processing.prof. many VLSI implementation- because of the analog to.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. a fundamental Computational image sensors. resulting in a large gap the on-chip memory for the following computation. the VLSI implementation functions provides us with a lot of inspiration in of a spatial temporal movie compression algorithm algorithm development because the brain has has demonstrated that it is very effective for excellent ability in a variety of image processing tasks. with in-pixel ADC reduction of hardware resource was reported.Jaya swaroop. and a 95% The digital pixel sensor (DPS). In the development of efforts to guarantee the same level of accuracy in application specific VLSIs. According to the biological discoveries. . but also in suitable for performing the middle-level high-level the image processing circuit modules that are built processing in complex algorithms. Recently.

parallel architecture has been explored in adaptive Section III describes the VLSI architecture. high-speed image processing systems have been realized. and circuitries to perform 5 × 5-pixel kernel convolution logical OR of directional edge filtering immediately following The remainder of this paper is organized as follows.9 msafter capturing the image when running at 20 output signals from the primary visual cortex are MHz at a power consumption of 9 mW. An the inspiration of such biological principles. Jaggayapeta. Assit.Saieesh.adaptive motion features from . The block-readout scheme is combined with row-parallel bit-serial processing Fig. The succeeding works further developed edge-based motion recognition algorithms in which the self-speed-adaptive MFE processing is included as one of the most crucial steps. and a speed of 900 frames/s was expected if the algorithm was directly implemented on VLSI with an on-chip image sensor. 1. GFE. To achieve a real time response of the Section II explains the MFE algorithm that we system for moving image processing. This allows us to operate the in Section V. [59]. Section binarization of filtered images for essential feature IV presents the experimental results obtained from extraction as well as in their temporal integration and the prototype chip. the ADC. Although VLSI chips for post-MFE optical flow detection and pattern recognition have been implemented. a fully pixel. With the motion features. GIST Guide: K. As a result. and discussions are given derivative operations.Jaya swaroop. MSEM generation including LFE.prof. chips for directional edge detection] and those for image recognition were developed to achieve real- time performance with low power consumption. The chip was developed in a 65-nm CMOS process to implement II. From image recognition and object tracking [58]. A. a VLSI capable of extracting motion features from moving images in real-time has been developed employing fine-grain row-parallel and pixel-parallel architectures based on the DPS technology.P. For example. INDIA Author: B. (GIST) directional edges are detected in the primary visual results show that this chip can perform MFE within cortex as important clues for visual recognition. MFE ALGORITHM the parallel on-chip digital circuits for DPS with the resolution of 100 × 100. a still image recognition system with a processing time of only 906 μs per frame is reported in using the processor in. in which a speed of 150 frames/s was experimentally demonstrated. Besides the still image analysis. system in a self-speed adaptive manner. Section VI concludes this paper. The 0.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. Finally. The measurement This section explains the MFE algorithm that generates self speed. developed for VLSI implementation in this paper. The transferred to two different processing pathways directional-edge-based features generated by this chip : 1) The ventral stream for pattern recognition and can be used for motion recognition as well as for still 2) The dorsal stream for motion recognition. the system image feature representation algorithms were can localize the images of target objects only when developed in and Because of such algorithms. VLSI they are in motion. In this paper. a VLSI dedicated to high-speed and low-power MFE has not yet been developed. VLSI. object detecting system is also developed using the implementation friendly directional-edge-based fabricated chip. a directional-edge-based object tracking algorithm is also proposed in with its FPGA implementation in.

and this generated. In the second step that extracts motion features only LFE. Since each pixel has an edge flag in one of the four binary edge maps after LFE. In the first step. If it is too long. the accuracy of local . motion features. First. According to the edge direction at each pixel site. as described in Section II-B. MFE edge map by taking logical OR. the self speed. This is done by selecting only a predetermined percentage (which we call the edge detecting threshold) of significant edge flags out of all pixels that have larger gradient values than the rest.Jaya swaroop. If it is too short. pixel motion cannot be detected. shown at the top and GFE is shown in the middle. first. INDIA Author: B. Four directional kernels produce four binary edge maps. Such a selection is possible because convolution values obtained from every pixel site are all preserved. 1 shows the MSEM generation. For static image recognition. which we call an The motion features are extracted from the sequence MSEM. defined as composed of two main processes: the number of frames between the starting frame and 1) the local feature extraction (LFE) and the end frame.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. the maximum gradient value is selected to step. Assit. four binary edge map. or −45° edge map. Jaggayapeta. For example. Movie clip from transform the four-directional SEMs into a single 64- Weizmann human action data set the four-directional D feature vector. then this pixel has an edge flag in input image based on the horizontal edge map and no edge flag in the +45°. a feature vector representation algorithm called projected principal edge distribution or averaged principal-edge distribution (APED) is employed to Fig. the four- edge filtering (horizontal. For motion analysis. local features contain a lot of redundant information. if the direction of a significant edge maps (SEMs) are generated from an pixel is horizontal. Fig.adaptive motion feature is determine the edge direction at the pixel site. (GIST) video sequences. which we call local features. The length of the sequence. In the second results. The SEMs are used for both static image recognition and motion analysis. vertical and −45°). +45°. Then. GFE is performed after LFE to retain only salient features. The four-directional edge filtering is of MSEMs. vertical. This algorithm has two steps: the Fig. an edge flag bit 1 is A. based on the four convolution features. as directional SEMs are merged into one MSEM in [52]. the four SEMs are merged into a single B. MFE employing AEM. Only the salient features in the original image are highlighted. convolution value is preserved. Then. Such a process is repeated for every pixel. is very important for extracting good 2) the global feature extraction (GFE). 2. 1 shows the SEMs in four directions by setting the edge detecting threshold at 20%. the merged significant filtering kernels (one for each direction) are edge map (MSEM) is generated to represent static calculated. convolutions between a 5 × 5-pixel local from the frames where significant amount of motion image centered at each pixel site and four 5 × 5-pixel is detected.prof. MSEM Generation set at the corresponding location in the respective Fig. Therefore.Saieesh. as described in Section II-A.P. making the entire image seamlessly scanned by the four filtering kernels. GIST Guide: K. 1 shows these two processes where LFE is first step that extracts static features in every frame. A. The four edge maps having only the significant flags after the selection very well represent global features and we call them the SEMs.

Jaya swaroop. and accumulation is continued DRAM cells. If the edge count is smaller than the threshold value. 3 shows the chip configuration. a reset transistor. in which only the edge flags due Fig. 4(a) shows the basic concept of the DPS .P.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. Then. an analog comparator. an optimum approximately corresponding to the equal amount of length of the MSEM sequence must be determined III. At the computationally expensive processing. VLSI ARCHITECTURE adaptively according to the speed of moving objects. 2. A. the difference between the AEM at Frame 47 and the MSEM of Frame 47 is calculated by taking exclusive OR between them. The SEM generation and the MFE are MSEM is generated with the same edge detecting both conducted by the MFE module in fully pixel- threshold. sequence are accumulated and remaining. and the LFE module frame (Frame 45) with a predetermined edge carries out 5 × 5-pixel kernel convolutions of detecting threshold and recorded directly as the kth fourdirectional edge filtering in row-parallel AEM. Chip configuration for extracting both static to the motion having occurred in the MSEM and motion features. This chip generate an edge displacement map. Frame 47 is the frame in which significant motion is detected.Chip Configuration adaptive manner using MSEMs. an accumulated edge map (AEM) is first produced. In the example of Fig. DPS Pixel displacement map is utilized as the motion features Fig. Then. In the global reset until the next edge displacement map is obtained. the length of shutter transistor is ON. This edge B. and logical OR is taken between this parallel processing. a sampling metal–oxide sequence. which indicates that there has been significant motion since the start of the accumulation. (GIST) motion detection degrades. which is the extracts image features. Then. MSEM and the latest AEM to update the AEM for this frame. the MSEM for the following frame (Frame 47) is extracted and merged with the AEM by taking OR again. Then. The edge map produced in this manner is called the edge displacement map. the edge count in the new AEM is calculated and compared with the motion detection threshold (this threshold is also given as the percentage to the total number of pixels). which is the most Fig. 2 shows the process of MFE in detail. in particular the motion motion feature to be used for motion analysis. restarted from the end frame of the previous MSEM a shutter transistor. features from moving images. which forces the voltage V edge displacement map is set to a constant value. INDIA Author: B. the MSEM of Frame 47 is used as semiconductor capacitor. then the decrease in the the MSEM sequence for generating an edge MOS capacitor voltage yields the light intensity at displacement map is automatically adjusted as each pixel.Saieesh. Assit. Jaggayapeta. After [motion detecting threshold (%)−edge detecting this. 3. Such an accumulation process continues until the edge count of the AEM becomes equal or larger than the motion detection threshold. A 100 × 100 beginning. After a certain period of time (photo . at the MOS capacitor to Vreset in all pixels. the processing. GIST Guide: K.prof. both the reset transistor and the shutter Since the number of edge flags remaining in the transistor are turned ON. 8-bit the new AEM at k + 1. the same procedure is pixel unit consists of a photodiode. process. it indicates that there has not been significant motion in the scene. namely. and a readout buffer. From the next frame (Frame 46). and then need to Fig. Therefore. A. The for motion analysis. an MSEM is generated from the starting DPS array captures images. To extract the motion features in a self-speed. the reset transistor is turned OFF while the threshold (%)] × total number of pixels.

P. Each LFE circuitry receives two 4 and (c) horizontal edge filter. the completed for all dark gray pixel sites in Fig. 8 × 8-pixel data are input to two 64-input maximum gradient selection in the LFE at all dark adders via two 8 × 8 masking circuitries. The absolute difference of the adder outputs module rational in the interconnection and chip areas.Jaya swaroop. and 24 groups of LFE circuits are all masks for 16 times. and the digital code corresponding to V is stored in the memory. black pixel sites (24 pixel sites in total) are processed Thus. which form an 8 × 8 data block. INDIA Author: B. Since this operation is conducted simultaneously at all pixels. and the comparator output (write enable) is 1. C. For example. 5. 5(c) shows the horizontal edge filter in detail. GIST Guide: K. 5(a). LFE at 96 × 96 pixel sites is accomplished very by these LFE circuits simultaneously. the data go-through locations (masking patterns) in parallel manner. (a) Block-readout scheme of DPS. (GIST) integration time). for example.prof. edge direction and the maximum gradient value are the masking patterns are generated in the kernel generated in each LFE. At the timing when the analog ramp voltage exceeds V . In one read cycle.Saieesh. as shown in Fig. thus the ADC of the pixel data is accomplished. In to perform the 5 × 5 kernel calculations and the this filter. select which bit in the pixel to read out. every four rows are grouped together and controlled by the same enable signal generated by block-readout control circuits. 4(b). 5(a). circuits bit serially. all shutter transistors are turned contain OFF simultaneously and the light intensity in each pixel is converted into the analog voltage signal V. the 100 × 8 pixels) are simultaneously read out to 24 LFE Fig. The efficiently without any extra buffer memories. A. (b) LFE. At the beginning of ADC. an analog ramp signal and 8-bit digital ramp codes that are synchronized to each other are provided to all pixel units for ADC. Different from the DPS pixel developed in [35]. to keep the LFE white). 5(a) shows the block-readout scheme for row parallel processing. Therefore. 4. together with the readout buffer. the analog ramp is below V . the same bits of image data in pixels from two consecutive groups of rows are read out. memory circuit The 8-bit control signals.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. yields the gradient value. Then. The data in light gray are sufficient Fig.image program control circuits (Fig. 5(b). Block-Readout Scheme Fig. allowing gray pixel sites (96 × 4 pixel sites) immediately with only necessary data to go through (indicated in no extra buffer memories. as shown in Fig. To seamlessly scan the entire pixel array with 5 × 5-pixel-size filtering kernels. Such a design allows us to perform the block-readout scheme [69] of DPS. Assit. The analog comparator in each pixel compares the voltage V with the analog ramp. in Fig. Jaggayapeta. 3) using shift register data in eight rows (indicated in light gray which . kernel convolutions are employed in this design. By simultaneously shifting the kernel calculation should be performed in a row. (a) Circuits in one pixel and (b) 1-bit pixel has a separated control signal. each bit of memory Fig. as described in the following. × 8 data blocks from the left and right. write enable changes to 0. Here. However. massively parallel ADC is achieved.

The array of memory efficiently . Such accumulation is continued until the algorithm. for instance) having larger the General Regs. but it does increase when there exists some this configuration. 2. In MFE shown in Fig. 2) the functional circuits for MFE and data output. all General Regs. the total parallel adder are fully connected. Chip photomicrograph and circuit D. the Thresholding Mode signal selects the MFE function has been developed for the first output signal from the binary sorting circuits time in this paper. namely. INDIA Author: B. GIST Guide: K.. MFE Module layout of one pixel. an AEM is The memory array. is accomplished in only 11 comparison detecting value. and the full generated. When combined with the 2-bit high lighted in a self-speed-adaptive manner. Each memory column has a Mode Select signal.18-μm technology in [52] for still image have been already recorded on the memory array. recognition with a 68 × 68 DPS array. the MSEM datum in the Flag Reg. the MSEM is (Fig. 8. 6. This LFE architecture was first implemented in a When the LFE for all pixel sites finishes. preserves the AEM datum. A. and the present MSEM data in the gradient values can be selected and retained in the Flag Regs. 7 illustrates the PE configuration composed of are erased. Controlled by direction (maximum gradient direction) and 11 bits the Mode Select signal.Jaya swaroop.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. or ORed/EORed with the AEM datum. (GIST) arrays and broadcast to all groups of LFE circuits. two main parts: 1) the binary sorting circuits for MSEM generation and . i.P. the MSEM can be extracted Fig.e. The adder’s output number of edge flag bits in the AEM does not change is fed back to the PE array via the comparator. an efficient binary sorting motion. The summation result is compared with the motion detecting value by the comparator in Fig. same time. to the rank-order filtering structure. the MSEM columns at the top stores the filtered image data datum of each pixel site is always preserved in the provided Flag Reg. the edge flags from stationary background Fig. Controlled by the Data Load signal and from the LFE module. and the comparison result (feedback signal) is used to indicate whether there has been significant motion in the scene. At the direction data. In this way.Saieesh. Fig. can for the gradient value. If there is no significant motion. a predetermined number of edge flags generated by taking EOR between the AEM data in (like 20% of all pixel sites. Then. A common signal is provided as the clock for the MSEM in each frame is accumulated. this MSEM represents the four SEMs. Controlled by the Frame Proc. The array of 24 × 384 memory be configured into 24 circulating shift registers to columns stores all the filtering results obtained from transfer the stored data outside the chip and restore the 96 × 96 pixel sites in the 100 × 100 DPS these data after the transfer.prof. can be either copied directly to. the trajectory of moving parts edge map. the array of PEs. With much. the edge displacement map is cycles. Thus. Controlled by the Thresholding Mode signal. The General Reg. After the binary sorting. of the binary sorting circuits. the data in the 13-bit SRAM can 13-bit static RAM (SRAM): 2 bits for the edge be directly loaded to the General Reg. 1) is obtained. signal and Mode Select signal. the whole frame of the binary AEM can be directly transmitted to the full parallel adder. array. 6 shows the MFE module. taken among consecutive frames. Assit. the results 0. Therefore. the rank ordering [71] of all 9216 gradient total edge count in the AEM exceeds the motion values. OR is timing control in pixel-parallel SRAM data access. and all digital circuits are to be transferred to the full parallel adder. Jaggayapeta. According implemented in a 65-nm technology. The following Then.

The socket board is put beneath the lens. Fig.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. It contains an Altera Cyclone II 2C70 FPGA as the main device. Assit. Fig. 13(b) shows a photo of the demonstration system without the logic analyzer and the monitor. The chip was controlled by a pattern generator (Tektronix TLA715). INDIA Author: B. The DE2-70 FPGA board Fig. from Terasic [72] is used in the system. Original photo taken by this VLSI. Experimental setup of object recognition Fig. in which feature vector generation (FVG) and pattern recognition are performed. 10. Object Detecting System An experimental object recognition system was built using a fabricated chip and an FPGA board for demonstration. both LFE and MFE modules can be improved to implement vision sensor SoCs for other image processing algorithms. 13(a) shows a photo of the packaged chip mounted on socket board. signal from the chip when thresholding instruction is executed. DISCUSSION This paper is originally developed to perform the MFE algorithm introduced in Section II. 9. Testing objects were put about 30 cm above the lens.Saieesh. and transfers them to the FPGA. A.Jaya swaroop. The chip extracts both the static features and motion features.P. B. (GIST) Fig. However. A summary of the FPGA part in this system V. Measured waveforms showing feedback system built with fabricated chip. as well as some auxiliary devices including the video graphics array (VGA) digital-analog convertor (DAC).prof. Two 100-W lights are used as light sources. Jaggayapeta. Improvement of LFE Module . GIST Guide: K. 13. A. and a monitor was used to display the results from the FPGA board.

in two directions (or more) by increasing the number [5] T. Cybern. B. such as Gaussian filters or Gabor target objects only when they are in motion. employing the DPS.” IEEE motion fields can be generated in both horizontal and Trans. Pattern Anal. CONCLUSION and cross ratio for view-invariant action recognition. 37. pp.. 10 is taken by activating capturing an image has been achieved when only one mask with only one position allowing the the chip was running at 20 MHz. Intell. The MFE module is developed to extract the edge vol. A.Saieesh. pp. INDIA Author: B. Man. 5(b)] can be enhanced to Syst. However. Because to the LFE module. May 2007. 42. not both at the same time. Assit. The maximum gradient survey. Acharya.. can also be [1] S. 1(a). recognition systems. Mar.” A self-speed-adaptive MFE VLSI has been developed IEEE Trans. allowing the row-parallel there are only +1 and −1 in the filtering kernels in processing. The architecture of the 12. A. Stern. Y. GIST Guide: K. pp. Rev. 4. and implemented in a standard 65-nm CMOS process Image Process. vol. this chip can extract motion computer based video. The REFERENCES number of masks. but IEEE Trans.” IEEE Sensors J. vol. simple binary masks are used in this fully pixel-parallel feature extraction. 1998. Improvement of MFE Module cameras for real-time monitoring of indoor spaces. “Human action recognition using shape . Shanableh. [6] K. Mach. 20.9 ms for the MFE after the original image in Fig. vertical directions for motion analysis. Mitra and T. Al-Rousan. 21. pp. Huang. [2] P. Tan. back to the techniques for isolated gesture recognition in Arabic SRAM columns in pixel parallel. Man. Ahmad and S. masks with each position having a corresponding weight can be employed. Syst.. pp.. Assaleh. Cybern..-W. 1371–1375.. Lee. filters [8]. For example. Berman and H. “Gesture recognition: A changed accordingly. of bits in SRAM columns and designing circuits in “Spatio-temporal featureextraction PEs to write the data in General Regs. low-level image processing tasks. 277–290. directions. processing time of 0. C. Appl. vol. no.Jaya swaroop. 3. Pentland. no.prof. C. a paper. experimental results in [60] show that the motions in May 2012. 2011. In this way. 641–650. Zhang. MFE module can be enhanced for performing MFE Dec.P. By 2012. As a result.” features for either horizontal or vertical direction. These masks are programmable.. selection circuitry [Fig. For applications that need more complex object detecting system that can localize images of filtering kernels. pixel-parallel ADC is achieved. the 3. Turaga and Y. Starner. “Sensors for gesture in Section II. 3. no. By activating American sign two masks for one directional kernel and blocking all language recognition using desk and wearable six other masks. vol. eight in this paper. and T. 3.. 2007. “Real-time horizontal edges only and vice versa. displacement map for the MFE algorithm introduced [3] S. K. for a VLSI-implementation-friendly algorithm. Apr. [7] M. the sign MFE algorithm introduced in [60]. Appl. “A discriminative model of motion VI. The effectiveness of corresponding datum in the 8 × 8 data block to pass such an architecture was demonstrated by building an through. displacement maps in both horizontal and vertical Cybern. Based on the extracted motion features. Jaggayapeta.” IEEE Trans.. MFE module has been developed for this algorithm. J. Rev. vol. more general purpose processing circuits for other 311–324. Man. no. “Diamond sentry: Integrating sensors and B. Weaver. Ivanov. eight masks are The block-readout scheme provides a direct employed for directional filtering in four directions connection from the image sensor according to the LFE algorithm in Fig. and A. and M. can be efficiently implemented. Jun. Syst. 37. pp. no. vertical direction are more easily detected using the [4] T.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. 2187–2197. Cybern. (GIST) In the row-parallel LFE module. no.” IEEE Trans. 11. 593–602. which needs edge language.

Verlag. and I. 892–901. Hong. “Visconti: Multi-VLIW image pp. vol. 7. GIST Guide: K. motion recognition. Oct. and H. visual attention engine. 2237–2252. pp. Nov. T. 183–192. of the massively pp. and H. 2008.” in Otsu. I. J. IEEE Int. J. processor for video scene analysis. 20. Lett. vol. processor for HD 1992–2000.” IEEE J. Tanahe et al. pp. applications]. May 2007. Y.” [9] T.. Kyo. “Three-way auto. and N. CICC. K.. Papers. Lee. on a linear array of 128 four-way VLIW processing Yoo. J. 5th Int.” Pattern mW massively parallel Recognit. S. Yoo. Iwata. and H. Y. Otsu.prof. pp. H. pp. 370–383. Solid-State Circuits. 2003. Y. “Design and implementation of vision systems. J. Kobayashi and N.. 5. 720p video streams. 628–636.P. “81. NY. Kim. Kim. and T. pp. vol. Park. A. S. 11. video surveillance. IEEE 11th [16] H. Tech. 41. pp. processor for vehicle [21] H..” IEEE Des. 43. pp. [17] J. S. Okazaki. Wolf. 38.2-GOPS scalable 4. no. “A 320mW elements. no. the unusual motion detection using CHLAC to the [18] D. recognition processor Feb. 2007. “Application of Proc. “A processors. 2007. a configurable 1. control based [22] J. I. 51. Kondo et al. Okazaki. 437–442. Cao. vol. pp. and [11] J. vol. Sep. S.. USA: Springer. Noda et al. (VLSI) the CUDA framework. Test Comput.6 GOPS object recognition processor based on a “GPU-accelerated computation for robust motion memory-centric tracking using NoC. Yoo. Serre. based on configurable processor [obstacle detection [10] K.Saieesh. Lee. 185–188. Assit. and T.” in IEEE ISSCC Dig. J. [19] D. 622–634. INDIA Author: B. Satoh. Kim. Kim. Abbo et al. 2008. “The design and implementation ICCV. 17.” IEEE Trans. Raab et al. pp. Kim. pp. Mar.” in Proc. 8–15. K. 443–446. Comput. “A [12] S. 42. Huang. S. 2003.” in Proc. [20] K. New York. Y. and F. Oh. Kim.” Yoo. 2008. Feb.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. Lee. Very Large Scale Integr. Kyo.. video recognition processor for intelligent cruise Apr. A. Sep. vol. 2008. 212–221. vol. S.” IEEE J. Jhuang.” with bio-inspired IEEE Trans. Lee. Solid-State Circuits. 2009. 30. heterogeneous multicore SoC with nine CPUs and Jan. Quek. 2008. 220–221. Poggio. Kim. Solid-State Circuits. Ponce. no. no. T. 1. S. Lee. “A State Circuits. IEEE CICC. [13] W. no. Jaggayapeta.. 2009. 192–201. 1.6 GOPS in Neural Information Processing (Lecture Notes in object recognition processor based on NoC and visual Computer image processing Science). 56.. Kim.Jaya swaroop. Park. parallel processor based on the matrix architecture. Kuroda. 2009. memory. vol. J. Syst. vol. Jan. 136–147. 2003. Solid-State correlation approach to Circuits. “A 100-GOPS programmable pp. “An 81. pp.” IEEE J. J. 4985. S. 1–8. (GIST) and CLG-motion flow from multi-view image [15] A.” 342GOPS real-time moving object recognition IEEE J. no. Yoda. no. VIE. 600 sequences. Kim. “An integrated 125 GOPS memory array 583 mW network-on-chip based parallel processor processor for embedded image recognition systems. vol.” Pattern Recognit.” in Proc. Arai.. system for action recognition. P. 2012. 44. Jan. Conf. Jan. 43. S. L. 3. pp. . M. Kim. two matrix [14] S. 1. 2007. IEEE J. no.. no. Kobayashi. I. biologically inspired vol. Solid- [8] H. T. “Xetal-II: A 107 GOPS. Koga. G. pp.

. 3. 2008. Solid-State Circuits. vol.prof. Solid-State Circuits. pp.P. “A 10000 [24] N.” vision sensor with programmable image processing IEEE J. Paindavoine. no. [25] J.Saieesh. 2012. Feb.Jaya swaroop. 43.” IEEE J. A.” in IEEE Mar. Massari and M. 2007. heterogeneous multi-core 3. SoC for image-recognition applications. 706– and global feature 717. no. 42. Ginhac. Tanabe et al. Papers. 647–657. ISSCC Dig. Gottardi. Heyrman. “A 464GOPS 620GOPS/W extraction. M. Dubois. Tech. “A 100 dB fps CMOS sensor with massively parallel image dynamic-range CMOS processing. GIST Guide: K. vol. (GIST) [23] Y. . pp.A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel- Sensor-Based Parallel Architecture Gandhiji Institute Of Science & Technology. Jaggayapeta. D. 222–223. pp. and B. INDIA Author: B. Assit. Mar.