Motion Estimation

ECE 569 – Spring 2010 Toan Nguyen Shikhar Upadhaya

• What is new with motion estimation • Four Step Search and Hexagon Search Algorithms • Parallelization strategies • Results and discussions

What is new with motion estimation? • The familiar way – Full search • Full search is not so efficient • Some of the most popular fast search algorithms:  Diamond search  Hexagon search  Three-step search  Four-step search  Orthogonal search  And many more .

• Full search will be most accurate because of exhaustive search.So what is the best? • There is a trade-off between the run time and the accuracy. • We implemented two of the most popular fast search algorithms for comparison:  Four Step Search  Hexagon Search . but will require more time • Fast search is faster but the accuracy will be reduced because of estimation algorithms.

. but finally it will go to Step 4.Four Step Search Algorithm • Step 1: A minimum BDM point is found from a nine-checking points pattern on a 5 x 5 window located at the center of the 15 x 15 searching area. • Step 2: The search window size is maintained in 5 x 5. • Step 4: The search window is reduced to 3 x 3 as shown in Fig. the search pattern will depend on the position of the previous minimum BDM point. 2(d) and the direction of the overall motion vector is considered as the minimum BDM point among these nine searching points.  If the previous minimum BDM point is located at the middle of horizontal or vertical axis of the previous search window. go to Step 4. If the minimum BDM point is found at the center of the search window. go to Step 4. However. 2(c) are used. • Step 3: The searching pattern strategy is the same as Step 2. otherwise go to Step 2. five additional checking points as shown in Fig.  If the previous minimum BDM point is located at the corner of the previous search window. otherwise go to Step 3. three additional checking points as shown in Fig.  If the minimum BDM point is found at the center of the search window. 2(b) are used.

Four Step Search Example .

and the MBD point is again identified. If the MBD point is found to be at the center of the hexagon. otherwise. Three new candidate points are checked. otherwise. proceed to Step 2. repeat this step continuously. . proceed to Step 3. • Step 3: Switch the search pattern from the large to the small size of the hexagon. The new MBD point is the final solution of the motion vector. the center of a predefined search window in the motion field. The four points covered by the small hexagon are evaluated to compare with the current MBD point. If the MBD point is still the center point of the newly formed hexagon. • Step 2: With the MBD point in the previous search step as the center. then go to Step 3.Hexagon Search Algorithm • Step 1: The large hexagon with seven checking points is centered at. a new large hexagon is formed.

Hexagon Search Example .

the minimum SAD of each subimage is compared to get the final minimum SAD and avoid local minimum.e Four step search or Hexagon Search). • Each thread will work on a sub-image independently using a designed algorithm ( i. • At the end. .Design Implementation • Parallelization is possible by dividing the image into small sub-image partitions.

Implementation Notes • Since the number of threads we use is multiple of 2’s. we need to pad the image with additional rows and columns and we ignore the results from those extra sub-images. . if the number of sub-image is not multiple of 2’s. • We excluded the time it takes to read a text file and store data into the window and image arrays when we compare the runtime for performance analysis.

6 Runtime of full search on various threads/block Runtime (seconds) 5 4 3 2 1 0 32 64 128 256 512 Image Size  256 threads/block give the best performance.Simulation Results • First we varied the number of threads per block to find the maximal configuration that gives the best run time. .

Image Size Runtime (seconds) . parallel 4 3.) 3.5 3 2. parallel Hexagon Search serial vs.5 0 Full Search serial vs.5 1 0.5 2 1. 30 Runtime (seconds) 25 20 15 10 5 0 FSS_Serial FSS_Parallel Runtime (seconds) Simulation Results (cont.5 3 2.5 0 • We only see the performance improvement when the image size is 4SS_Serial 256x256 or bigger.5 1 0.5 2 1.• The runtime of the serial versions and the parallel versions of different algorithms are collected and compare to see what kind of performance improvement we achieved. Any image of size 4SS_Parallel smaller than this will actually decrease the performance. parallel Hexagon_Serial Hexagon_parallel Image Size Image Size Four Step Search serial vs.

) • So how much speed up do we get and which algorithm is better. Four Step Search.Simulation Results (cont. or Hexagon Search? Parallel vs. serial versions speedup 35 30 25 Speed up 20 15 10 5 0 Speed_UP_FS Speed_UP_4SS Speed_UP_Hexagon Image size . Full Search.

016 0.236 0.047 0.56 26.078 0.41 1.157 FSS_Serial FSS_Parallel 4SS_Serial 4SS_Parallel Hexagon_Serial 5 0 Hexagon_parallel .06 0.047 0.01 0.29 0.) • Overall performance 16X16 32X32 64X64 128X128 256X256 512X512 1024X1024 2048X2048 4096X4096 30 25 20 15 10 Full_Serial Full_Parallel 4SS_Serial 4SS_Parallel Hexagon_Serial Hexagon_parallel 0 0 0 0.063 0.87 3.016 0 0.02 0.09 0.031 0.85 3.265 0.64 6.01 0.01 0.38 0.11 0.922 3.3 0.016 0.02 0.032 0.015 0 0.016 0 0.02 0.719 0.047 0.062 0.016 0.016 0.078 0.22 0.01 0.078 0 0.06 0.Simulation Results (cont.062 0.015 0.015 0.01 0.062 0.

9800 GT performance 4.5 0 Image Size .5 Speed-up_FSS Speed-up_4SS Speed-up_Hexagon 1 0.5 4 3.5 2 1. NVIDIA 8400 GS vs.Simulation Results (cont.5 3 Speed up 2.) • Performance comparison between NVIDIA 8400 GS and 9800 GT GPUs.

) • Distortion measurement (motion estimation quality). Fast Search Distortion 600 500 400 Distortion Min. SAD 1400 1200 1000 800 600 400 200 0 Full-Step 4SS Hexagon Min.Simulation Results (cont. SAD returned by different algorithms 300 200 100 0 4SS distortion Hexagon distortion Image size Image size .

hence “fast”. Smaller image will reduce performance. Motion estimation parallel versions performance only improve when image is large (256x256).Result Analysis Summary 1. Fast search algorithms outperform full search algorithm. 4. 3. The distortion we see on the two fast search algorithms are similar.  Larger image ~ greater speedup 2. Parallelization on Four Step Search gives a slightly edge improvement over Hexagon Search. .

Four Step Search is a better fast search algorithm than Hexagon Search. while the distortion is very similar.Result Conclusions • Based on the data collected from different algorithms. . Hence. Smaller image size should be ran serially on CPU. Only perform motion estimation algorithms on GPU if image size is larger than 256x256. Four Step Search gives a slightly better performance than Hexagon Search.

• Not make use of shared memory .Limitations • Image and window files are random.

we already have them computed by the threads in previous step. the SAD of the new checking points will be computed. • Drawback of this strategy:  Not getting a considerable amount of speedup  Lots of data transfer between host and device  More complicated implementation . We can parallelize by having threads to compute SAD’s of all the points in the sub-image.Other parallelization strategy • After each step. • Then after each step complete and the SAD for the new checking points needed.

edu/~ee899/project/deepak_mid. Wing-Chung Ma. • Lai-Man Po. March 06. "An Efficient Three-Step Search Algorithm for Block Motion Estimation". "Search Algorithms for BlockMatching in Motion Estimation". JUNE 1996 • Xuan Jing.cmu. .ece. Lap-Pui Chau. • Chen Lu. "Diamond Search Algorithm". March 06.htm>.utexas. Wang. 2010 <http://users. U of Texas. 2010 <http://www. ECE.CMU.ece. ECE .htm>. Mohamed Alkanhal . A Novel Four-Step Search Algorithm for Fast Block Motion en-lu-wang/presentation/sld012.References • Deepak Turaga . IEEE TRANSACTIONS ON MULTIMEDIA JUNE 2004: 435-437.

Questions? .