FINALREPORTA6CV

Tracking Visual Object As an Extended Target
A Mini Project Work

Submitted in partial fulfilment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
SUBMITTED
BY
THURAKA SAI - 20EG104155
GADDALA SHARATH - 20EG104119
Under the guidance of

Dr. B. PAVITRA
Assistant Professor
Department of ECE
Department of Electronics and Communication Engineering

ANURAG UNIVERSITY
Venkatapur(V), Ghatkesar(M), Medchal-Malkajgiri Dist-500088
2023-2024
ANURAG UNIVERSITY
Venkatapur(V), Ghatkesar(M), Medchal-Malkajgiri Dist-500088
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
CERTIFICATE
This is to certify that the project report entitled Tracking Visual Object As an Extended Target
being submited by

in partial fulfillment for the award of the Degree of Bachelor of Technology in Electronics &
Communication Engineering to the Anurag University, Hyderabad is a record of bonafide work
carried out under my guidance and supervision. The results embodied in this project report
have not been submitted to any other University or Institute for the award of any Degree or
Diploma.
Dr. B. PAVITRA Prof. N. MANGALA GOURI

Assistant Professor Head of the Department
Department of ECE
External Examiner
ACKNOWLEDGEMENT
This project is an acknowledgement to the inspiration, drive and technical assistance

contributed by many individuals. This project would have never seen light of this day without
the help and guidance we have received. We would like to express our gratitude to all the people
behind the screen who helped us to transform an idea into a real application.
It’s our privilege and pleasure to express our profound sense of gratitude to Dr. B.
PAVITRA, Assistant Professor, Department of ECE for her guidance throughout this
dissertation work.
We express our sincere gratitude to Prof. N. Mangala Gouri, Head of Department,
Electronics and Communication Engineering for her precious suggestions for the successful
completion of this project. She is also a great source of inspiration to our work.
We would like to express our deep sense of gratitude to Dr. V. Vijaya Kumar, Dean School
of Engineering. Anurag University for his tremendous support, encouragement and
inspiration. Lastly, we thank almighty, our parents, friends for their constant encouragement
without which this assignment would not be possible. We would like to thank all the other staff
members, both teaching and non- teaching, which have extended their timely help and eased
my work.
BY

DECLARATION
We hereby declare that the result embodied in this project report entitled “Tracking
Visual Object As an Extended Target” is carried out by us during the year 2023-2024 for the
partial fulfilment of the award of Bachelor of Technology in Electronics and
Communication Engineering, from ANURAG UNIVERSITY. We have not submitted this
project report to any other Universities / Institute for the award of any degree.
BY
GADDALA SHARATH – 20EG104119

ABSTRACT
Most visual object tracking (VOT) algorithms treat the object as a single point in the output
score map, and the bounding box is estimated by a multi-scale search. Further, most of
them are based on the concept of tracking-by-detection, which is focused on the detection
step and ignores the tracking step and object’s dynamics. In this paper, we address these
limitations by developing a new VOT framework. Instead of a point object, we
mathematically model the shape of the extended visual object as an ellipse. We allow
multiple detections in the score map, and derive an elliptical gating method to discard
possible clutters. We apply a sophisticated extended target tracking algorithm to track the
object’s kinematic state and shape simultaneously. Experiment results are provided to show
that the proposed algorithm outperforms several state-of-the-art methods.
CONTENTS
LIST OF CONTENTS PAGE NO
Acknowledgement 3
Abstract 4
List of Figures 7
CHAPTER 1: INTRODUCTION 1-5
CHAPTER 2: LITERATURE SURVEY 6-8
CHAPTER 3: INTRODUCTION TO IMAGE PROCESSING
3.1 Image Pixels 9-10
3.2 Image file formats 11
3.3 Fundamental steps in DIP 12-17
3.4 Components of an image processing system 18
CHAPTER 4: DIGITAL IMAGE PROCESSING
4.1 Digital image processing 19-21

LIST OF CONTENTS PAGE NO
CHAPTER 5: SOFTWARE INTRODUCTION
5.1 Introduction to matrix lab 22
5.2 Graphical UI (Graphical UI) 22-26
CHAPTER 6: EXISTING METHODOLOGY 27-29
CHAPTER 7: PROPOSED SYSTEM
7.1 Analysis on single point CNN based detector 30
7.2 Detectors with multiple detection points 31
CHAPTER 8: RESULT ANALYSIS 32-33
Chapter 9 34-39
Conclusion 35-37
Future scope 34
Applications 34
References 37-39
LIST OF FIGURES
Figure Name page No
1.1 Tracked object and its pseudo labels with different σ 4
3.0 General image 9
3.1 Image pixel 10
3.2 Transparency image 10
3.3 Resolution image 11
3.4 Image fundamental 12
3.5 Digital camera image 12
3.6 Digital camera cell 13
3.7 Image enhancement 13
3.8 Image restoration 14
3.9 Color & Gray scale image 15
3.10 Rgb histogram image 15
3.11 Blur to deblur image 16
3.12 Image segmentation 17
3.13 Component of image processing 18

CHAPTER 1
INTRODUCTION
Recently, dynamic projection mapping which projects images to real target object moving
freely has been attracting considerable attention as the technique proving fantastic appearance
editing. In the field of entertainment such as a dance and a stage play, projection to moving
target can expand the range of expressions, and they are being introduced gradually. Also, this
projection technique has been expected to be applied for a wide range of fields, for example
reviewing the design in the site of product development, improving education by understand
internal structure of objects and supporting medical practice by high presence simulation.
However, the image projection to moving target requires estimating 3D posture of the target in
real-time and with high accuracy. Although previous approaches have used a magnetic sensor
or an optical motion capture system for obtaining the posture, these approaches are not suitable
for projection mapping because attaching sensors or markers impair the appearance of the
targets. Also, camera-based approaches using depth information or feature points of the target
without attaching sensors or makers disturb real-time processing because feature point
detection and alignment require large calculation time. Therefore, in this research, we propose
a camera-based object tracking method using contours of target objects that is suitable for real-
time processing because the contour is obtained at high speed by edge detection. We making
this tracking more robust against the shape of the target object and the environment outside the
target while maintaining the highspeed property of the edge detection. Consequently, we realize
dynamic projection mapping that can be applied to various target objects and usable as realistic
image expression technique.
1
1.1 Proposed Method
➢ the proposed method extracts the accurate contours of the target as tracking output, which
achieves better description of the non-rigid objects while reduces background pollution
to the target model.
➢ Moreover, conventional level set models only emphasize the regional intensity
consistency and consider no priors.
➢ Differently, the curve evolution of the proposed SLSM is object-oriented and supervised
by the specific knowledge of the targets we want to track.
➢ Therefore, the SLSM can ensure a more accurate conver-gence to the exact targets in
tracking applications.
➢ In particular, we firstly construct the appearance model for the target in an online boosting
manner due to its strong discriminative power between the object and the background.
➢ Then, the learnt target model is incorporated to model the probabilities of the level set
contour by a Bayesian manner, leading the curve converge to the candidate region with
maximum likelihood of being the target.
Objects tracking, which refers to the task of generating the trajectories of the moving objects
in a sequence of images, is a challenging research topic in the field of computer vision.
The problem and its difficulty depend on several factors, such as the amount of prior
knowledge about the target object and the number and type of parameters being tracked, e.g.,
location, scale, detailed contour. Although there has been some success with building trackers
for specific object classes tracking generic real-world objects has remained challenging due to
unstable lighting condition, pose variations, scale changes, view-point changes, and camera
noise etc
2
Early tracking methods use fixed appearance model to describe the target, which are unable to
successfully track the target over long time. To overcome this drawback, some tracking
algorithms try to update the target appearance over time in an online manner.
The appearance models adopted by these methods include histogram, subspace models as well
as sparse representation models. Besides, some researchers resort to adopting discriminative
learning methods to make the trackers easy to distinguish the target from its background. The
methods based on boosting and SVMs show impressive performance and attract much
attention. In contrast with constructing two separate models for the target and background
respectively, classifier learning based approaches are more inclined to seize properties of most
discrimination between them.
Despite having the promising performance, these traditional trackers face a practical problem
that they use rectangular box or oval to approximate the tracked target
However, objects in practice may have complex shapes that cannot be well described by simple
geometric shapes, for some examples. Since the rectangle box used for presenting the tracked
target directly determines the samples to be extracted in the subsequent target appearance
modeling/unpdate step, it is a critical factor to tracking performance.
Inaccurate target presentation easily results in performance loss due to the pollution of non-
object regions residing inside the rectangle box. In order to better fit the object shape, some
methods adopt the scale selection mecha-nism that aims to search for the best scale that covers
the target accurately.
An intuitive idea is to run the algorithm in different scales, then select the one maximizing the
object function of the tracking algorithm. Further, this selection mechanism is also extended to
orientation. By simultaneously controlling both the scale and orientation
3
Ideally, a better manner to describe the target is to use the accurate contour along the target’s
surface
Fig1.1 Tracked object and its pseudo labels with different σ
the literature have been made to use silhouette or contour, segmenting technique for dynamic
tracking. In contrast with explicit representation of contours in parametric active contour
models, such as snake model, level set technique is an implicit representation of contours and
able to deal with changes in topology. The basic idea of the level set approach is to embed the
contour as the zero level set of the graph of a higher dimensional function. Then evolve the
graph so that this level set moves according to the prescribed flow until it minimizes an image-
based energy function.
Binary level set model, uses a two-valued level set function to replace the signed distance
function used in traditional Chan-Vese manner . Since it avoids the reinitialized process of the
level set function in each iteration as well as the cumbersome numerical realization, it greatly
improves the computational efficiency and hence is more suitable for tracking tasks.
Nevertheless, from a performance perspective, the binary level set model is more inclined to
segment out the region with consistent intensity, which is similar to the threshold segmentation
method. Though, recently, some works have been proposed to apply level set models to visual
tracking, introducing the prior target knowledge in a level set formalism has been still
challenging, since the level set framework aims at optimally grouping regions whose pixels
have similar feature signatures.
4
This makes it difficult for level set approaches to reliably segment and track real-world, multi-
mode objects in front of complex, cluttered backgrounds. In this project, we present a novel
supervised level set model (named SLSM) for real-world objects contour tracking Instead of
acting towards intensity consistent direction, the curve evolution of the SLSM is target-oriented
and supervised by the knowledge of the specific targets in tracking application. Boosting
approach is used for online construction of the target appearance model due to its strong ability
in distinguishing the target from its background. Then the learned target model is incorporated
to model the level set contour probabilities by a Bayesian manner, leading the curve converge
to the candidate region with maximum likelihood of being the tracked target. Finally, samples
extracted from accurate target region are fed back to the boosting procedure for target
appearance update. We use the positive decrease rate to adjust the target learning pace over
time, which enables tracking to continue under partial and total occlusion. Then we propose
the generalized multi-phase SLSM for dealing with multitarget tracking cases shows some
tracking examples of our method in various challenging cases.
5
CHAPTER – 2
LITERATURE SURVEY
S.no Author Title of the Journel/conference Methodology Year

paper
paper
This survey
IEEE Transactions provides a
on Pattern Analysis comprehensive
1. Ba-Ngu Vo and Ba-Tuong "Extended and overview of the
Vo Object Tracking Machine state-of-the-art in
2019
Intelligence extended object
(TPAMI) tracking, discussing
various algorithms,
models, and
evaluation metrics.
This paper
introduces a
"Tracking-by- International tracking-
2. Dirk Schulz and Michael Detection of Journal of bydetection
Arens Extended Computer Vision framework
2013
Objects (IJCV) specifically
designed for
extended object
tracking. It
discusses the
challenges and
presents a solution.
6
Focusing on human
"Part-Based Computer Vision tracking, this paper
Tracking of and Image treats humans as
3. Piotr Dollár et al Humans as Understanding collections of body 2019
Corpora of (CVIU) parts and uses this
Body Parts" approach for
improved tracking
accuracy.
This paper presents
"Part-Based a multi-object
Journal of Field
Tracking of tracking framework
Robotics:
4. Stephen J. McKenna and Humans as using quadratic
Andrew D. Bagdanov : Corpora of programming,
2017
Body Parts" which can be
applied to extended
objects as well.
This work
Extended IEEE Transactions discusses the
Object on Intelligent application of the
5. Amir Reza Khoshkangini Tracking with Transportation Gaussian sum filter 2019
et al Gaussian Sum Systems (TITS) to extended object
Filter: Models tracking and
and provides insights
Algorithms into its
performance
The authors
"Tracking propose a multi
Machine Vision and
Multiple object tracking
Applications
6. Ba-Ngu Vo et al Extended framework based
(MVA)
Objects with the on the multi-
2015
Multi- Bernoulli filter,
Bernoulli Filter" which can handle
multiple extended
objects efficiently.
7
Although not
exclusively focused
"Deep Learning on extended
Pattern
7. Shuai Yi et al. for Extended objects, this survey
Recognition (PR)
Object Tracking discusses how deep
learning techniques
are applied to 2015
multiobject
tracking, which can
be extended to
handle object
extension.
This paper explores
extended object
"Extended tracking in 3D point
8. Peter M. Kögler et al. Object Tracking Sensors clouds, which is 2016
in 3D Point particularly
Clouds” relevant in
applications such as
autonomous driving
and robotics.
Discusses the
"Sensor Fusion IEEE Transactions fusion of radar and
for Extended on Intelligent camera data for
9. A. Alahi et al. Object Transportation tracking extended 2017
Tracking” Systems (ITS) objects, which is
crucial in advanced
driver-assistance
systems.
8
CHAPTER 3
INTRODUCTION TO IMAGE PROCESSING
INTRODUCTION
3.1 IMAGE PIXELS
An image is a two-dimensional picture, which has a similar appearance to some subject
usually a physical object or a person.
Image is a two-dimensional, such as a photograph, screen display, and as well as a three-

dimensional, such as a statue. They may be captured by optical devices—such as cameras,
mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the
human eye or water surfaces.
The word image is also used in the broader sense of any two-dimensional figure such
as a map, a graph, a pie chart, or an abstract painting. In this wider sense, images can also be
rendered manually, such as by drawing, painting, carving, rendered automatically by printing
or computer graphics technology, or developed by a combination of methods, especially in a
pseudo-photograph.
Fig 3.0 General image
9
An image is a rectangular grid of pixels. It has a definite height and a definite width counted in
pixels. Each pixel is square and has a fixed size on a given display. However different computer
monitors may use different sized pixels. The pixels that constitute an image are 8ordered as a
grid (columns and rows); each pixel consists of numbers representing magnitudes of brightness
and color.
Fig 3.1 Image pixel
Each pixel has a color. The color is a 32-bit integer. The first eight bits determine the
redness of the pixel, the next eight bits the greenness, the next eight bits the blueness, and the
remaining eight bits the transparency of the pixel.
Fig3.2Transparency image
10
3.2 IMAGE FILE FORMATS:
Image file formats are standardized means of organizing and storing images. This entry
is about digital image formats used to store photographic and other images. Image files are
composed of either pixel or vector (geometric) data that are rasterized to pixels when displayed
(with few exceptions) in a vector graphic display. Including proprietary types, there are
hundreds of image file types. The PNG, JPEG, and GIF formats are most often used to display
images on the Internet.
Fig 3.3 Resolution image
In addition to straight image formats, Metafile formats are portable formats which can include both
raster and vector information. The metafile format is an intermediate format.
Most Windows applications open metafiles and then save them in their own native format.
11
3.3 FUNDAMENTAL STEPS IN DIGITAL IMAGE PROCESSING:
Digital image processing focuses on two major tasks (i.e.,) Improvement of pictorial
information for human interpretation and Processing of image data for storage, transmission
and representation for autonomous machine perception
Fig 3.4 Image fundamental
3.4 Image Acquisition:
Image Acquisition is to acquire a digital image. To do so requires an image sensor and

the capability to digitize the signal produced by the sensor. The sensor could be monochrome
or color TV camera that produces an entire image of the problem domain every 1/30 sec. the
image sensor could also be line scan camera that produces a single image line at a time. In this
case, the objects motion past the line.
Fig 3.5 Digital camera image
Scanner produces a two-dimensional image. If the output of the camera or other imaging sensor
is not in digital form, an analog to digital
12
The nature of the sensor and the image it produces are determined by the application.
Fig 3.6 digital camera cell
3.7 Image Enhancement:
Image enhancement is among the simplest and most appealing areas of digital image
processing. Basically, the idea behind enhancement techniques is to bring out detail that is
obscured, or simply to highlight certain features of interesting an image. A familiar example of
enhancement is when we increase the contrast of an image because “it looks better.” It is
important to keep in mind that enhancement is a very subjective area of image processing.
Fig 3.7 Image enhancement
13
3.8 Image restoration:
Image restoration is an area that also deals with improving the appearance of an image.
However, unlike enhancement, which is subjective, image restoration is objective, in the sense
that restoration techniques tend to be based on mathematical or probabilistic models of image
degradation.
Fig 3.8 Image restoration
Enhancement, on the other hand, is based on human subjective preferences regarding

what constitutes a “good” enhancement result. For example, contrast stretching is considered
an enhancement technique because it is based primarily on the pleasing aspects it might present
to the viewer, where as removal of image blur by applying a deblurring function is considered
a restoration technique.
3.9 Color image processing:
The use of color in image processing is motivated by two principal factors. First, color
is a powerful descriptor that often simplifies object identification and extraction from a scene.
Second, humans can discern thousands of color shades and intensities, compared to about only
two dozen shades of gray. This second factor is particularly important in manual image
analysis.
14
Fig 3.9 Color & Gray scale image
3.10 Wavelets and multiresolution processing:
Wavelets are the formation for representing images in various degrees of resolution. Although
the Fourier transform has been the mainstay of transform based image processing since the
late1950’s, a more recent transformation, called the wavelet transform, and is now making it
even easier to compress, transmit, and analyze many images. Unlike the Fourier transform,
whose basis functions are sinusoids, wavelet transforms are based on small values, called
Wavelets, of varying frequency and limited duration.
Fig 3.10 rgb histogram image
15
Wavelets were first shown to be the foundation of a powerful new approach to signal
processing and analysis called Multiresolution theory. Multiresolution theory incorporates and
unifies techniques from a variety of disciplines, including sub band coding from signal
processing, quadrature mirror filtering from digital speech recognition, and pyramidal image
processing.
3.11 Compression:
Compression, as the name implies, deals with techniques for reducing the storage
required saving an image, or the bandwidth required for transmitting it. Although storage
technology has improved significantly over the past decade, the same cannot be said for
transmission capacity. This is true particularly in uses of the Internet, which are characterized
by significant pictorial content. Image compression is familiar to most users of computers in
the form of image file extensions, such as the jpg file extension used in the JPEG (Joint
Photographic Experts Group) image compression standard.
3.12 Morphological processing:
Morphological processing deals with tools for extracting image components that are
useful in the representation and description of shape. The language of mathematical
morphology is set theory. As such, morphology offers a unified and powerful approach to
numerous image processing problems. Sets in mathematical morphology represent objects in
an image. For example, the set of all black pixels in a binary image is a complete morphological
description of the image.
Fig 3.12 blur to deblur image
16
In binary images, the sets in question are members of the 2-D integer space Z2, where
each element of a set is a 2-D vector whose coordinates are the (x,y) coordinates of a black(or
white) pixel in the image. Gray-scale digital images can be represented as sets whose
components are in Z3. In this case, two components of each element of the set refer to the
coordinates of a pixel, and the third corresponds to its discrete gray-level value.
3.13 Segmentation:
Segmentation procedures partition an image into its constituent parts or objects. In

general, autonomous segmentation is one of the most difficult tasks in digital image processing.
A rugged segmentation procedure brings the process a long way toward successful solution of
imaging problems that require objects to be identified individually.
Fig 3.13 Image segmentation
On the other hand, weak or erratic segmentation algorithms almost always guarantee eventual
failure. In general, the more accurate the segmentation, the more likely recognition is to
succeed.
17
3.4 COMPONENTS OF AN IMAGE PROCESSING SYSTEM:
As recently as the mid-1980s, numerous models of image processing systems being sold
throughout the world were rather substantial peripheral devices that attached to equally
substantial host computers. Late in the 1980s and early in the 1990s, the market shifted to image
processing hardware in the form of single boards designed to be compatible with industry
standard buses and to fit into engineering workstation cabinets and personal computers. In
addition to lowering costs, this market shift also served as a catalyst for a significant number
of new companies whose specialty is the development of software written specifically for
image processing.
Network
Image displays computer Mass storage
Specialized Image
Hard copy image processing
processing software
Image sensor
Problem domain
Fig 3.6 Component of image processing
Although large-scale image processing systems still are being sold for massive imaging
applications, such as processing of satellite images, the trend continues toward miniat
and blending of general-purpose small computers with specialized image processing hardware
18
CHAPTER-4
DIGITAL IMAGE PROCESSING
4. Digital image processing
Background:
Digital image processing is an area characterized by the need for extensive experimental
work to establish the viability of proposed solutions to a given problem. An important
characteristic underlying the design of image processing systems is the significant level of
testing & experimentation that normally is required before arriving at an acceptable solution.
This characteristic implies that the ability to formulate approaches &quickly prototype
candidate solutions generally plays a major role in reducing the cost & time required to arrive
at a viable system implementation.
An image may be defined as a two-dimensional function f (x, y), where x & y are
spatial coordinates, & the amplitude of at any pair of coordinates (x, y) is called the intensity
or gray level of the image at that point. When x, y & the amplitude values of f are all finite
discrete quantities, we call the image a digital image. The field of DIP refers to processing
digital image by means of digital computer. Digital image is composed of a finite number of
elements, each of which has a particular location & value. The elements are called pixels.
Vision is the most advanced of our sensor, so it is not surprising that image play the
single most important role in human perception. However, unlike humans, who are limited to
the visual band of the EM spectrum imaging machines cover almost the entire EM spectrum,
ranging from gamma to radio waves. They can operate also on images generated by sources
that humans are not accustomed to associating with image.
There is no general agreement among authors regarding where image processing stops &
other related areas such as image analysis& computer vision start. Sometimes a distinction is
made by defining image processing as a discipline in which both the input & output at a process
are images.
19
This is limiting & somewhat artificial boundary. The area of image analysis (image
understanding) is in between image processing & computer vision.
There are no clear-cut boundaries in the continuum from image processing at one end to
complete vision at the other. However, one useful paradigm is to consider three types of
computerized processes in this continuum: low-, mid-, & high-level processes. Low-level
process involves primitive operations such as image processing to reduce noise, contrast
enhancement & image sharpening. A low- level process is characterized by the fact that both
its inputs & outputs are images.
Digital image processing plays a significant role in object tracking, a crucial task in computer
vision and surveillance applications. Object tracking involves locating and following objects
in a sequence of frames in a video or a set of images. Here's an overview of the steps involved
in object tracking using digital image processing:
1.Frame Acquisition:
Obtain a sequence of frames (images) from a video or a set of images. These frames will be
processed to track the objects within them.
2. Preprocessing:
Enhance the acquired frames to improve tracking accuracy. This step might involve operations
such as noise reduction, contrast enhancement, and image sharpening.
3. Object Detection and Segmentation:
Use techniques like object detection algorithms (e.g., YOLO, SSD, Faster R-CNN) to identify
and locate objects in the frames. This step may also involve object segmentation to isolate the
detected objects from the background.
4. Feature Extraction:
Extract features from the detected objects, such as color, shape, texture, or keypoints. These
features help in distinguishing and tracking the objects.
5. Object Tracking Algorithm:
Choose a tracking algorithm based on the application and requirements. Common algorithms
include:
Kalman Filter: Predicts the object's location based on previous information and corrects it
using the current measurements.
20
Mean-Shift Algorithm: Iteratively adjusts the object's position to maximize a density function.
Correlation Filters: Utilizes correlation-based approaches to track objects.
6.Tracking and Updating:
Apply the selected tracking algorithm to track the objects from frame to frame. Update the
object's position and other relevant parameters based on the tracking algorithm's output.
7.Postprocessing:
Refine the tracked object's position or trajectory to improve accuracy. This might involve
filtering, smoothing, or interpolation techniques.
8.Visualization and Analysis:
Display the tracking results on the frames, visualizing the path or trajectory of the tracked
objects. Analyze the tracking data for further insights or actions.
9.Object Re - Identification:
In cases where objects may leave the frame or become occluded, re-identification techniques
can be employed to associate the same object across different frames.
10.End-Goal Application:
Utilize the tracked object information for the specific application, such as surveillance, traffic
monitoring, or human-computer interaction.
Each step in object tracking involves various techniques and methodologies, and the choice
of methods depends on the specific tracking requirements and the nature of the tracked
objects.
21
CHAPTER 5
Software Introduction
5.1. Introduction to MATRIX LABORATORY
MATRIX LABORATORY remains a common show language for express managing. It works
with estimation, keenness, in addition package design in an informal to-use atmosphere where
remainssues in addition systems are attributed to fered in stin additionard precise
documentation. All around typical purposes coordinate
Math in addition calculation
Computation progress
Statistics getting
Illustrating, redirection, in addition prototyping
Statistics examination, appraisal, in addition portrayal
Discerning in addition orchestrating frames
Lineups improvement, including graphical UI edifice
MATRIX LABORATORY remains a splendid system whose key data part remains a pack that
doesn't require dimensioning. This grants you to manage different express dealing with
remainssues, especially those with structure in addition vector plans, in an irrelevant piece
attributed to the period it would take to shape a program in a scalar non typical language like
C or FORTRAN.
The name MATRIX LABORATORY will in general arrange research headquarters. MATRIX
LABORATORY was at first molded to bounce principal acknowledgment to structure
programming made by the LINPACK in addition EISPACK projects. Today, MATRIX
LABORATORY engines concrete the LAPACK in addition BLAS libraries, embedding the
state attributed to the art in programming for network evaluation.
22
MATRIX LABORATORY has made all through a period attributed to years with input from
various clients. In school conditions, it remains the stin additionard illuminating contraption
for major in addition unquestionable level courses in number rearranging, orchestrating, in
addition science. In industry, MATRIX LABORATORY remains the mechanical social
gathering attributed tochoice for high-productivity assessment, progress, in addition
evaluation.
MATRIX LABORATORY structures a social event attributed to additional lineups-express

plans called gadget compartments. Head to most motivations driving MATRIX
LABORATORY, contraption compartments grant you to learn in addition apply express new
development. Gadget stash are wide plans attributed to MATRIX LABORATORY limits (M -
records) that loosen up the MATRIX LABORATORY milieu to oversee express modules
attributed to remain issues. Regions in which device stash are open set sign managing, control
structures, cerebrum affiliations, happy with thinking, wavelets, redirection, in addition
different others.
5.2 GRAPHICAL UI :
MATRIX LABORATORY's Graphical UI Development Milieu (GRAPHICAL UI DE)

bounces a ironic game plan attributed to instruments for solidifying graphical UIs
(GRAPHICAL UI s) in M-limits. Using GRAPHICAL UI DE, the instances attributed to
fanning out a GRAPHICAL UI (i.e., the issue buttons, spring up menus, etc.)in addition
software design the improvement attributed to the GRAPHICAL UI are separated consistently
into two really coordinated in addition pleasantly free responsibilities. The resulting graphical
M-work remains delivered utilizing two enigmatically named (disregarding increases) reports:
• A report with progress. fig, termed a FIG-record that contains a fullscale graphical depiction
attributed to all the limit's GRAPHICAL UI things or parts in addition their spatial strategy. A
FIG-report contains comparable data that shouldn't momentarily worry about to be parsed when
he related GRAPHICAL UI -based M-work remains executed.
23
• A greatest through progress .m, called a GRAPHICAL UI M-report, which contains the code
that wheels the GRAPHICAL UI action. This report consolidates limits that are called when
the GRAPHICAL UI remains conveyed attributed to f in addition left, in addition callback
works that are completed when a client connects with GRAPHICAL UI objects for example,
when a button remains pushed.
To convey attributed to GRAPHICAL UI DE from the MATRIX LABORATORY request

window, type graphical UI de filename
Where filename remains the name attributed to a ceaseless FIG-record on the predictable way.
It remains excused to Expect filename,
GRAPHICAL UI DE opens a new (i.e., clear) window.
24
A graphical UI (GRAPHICAL UI ) remains a graphical show in something like one windows
containing controls, called parts that interface with a client to perform sharp endeavors. The
client attributed to the GRAPHICAL UI doesn't have to make a substance or type orders at
the requesting line to accomplish the endeavors. In no way, shape or form like coding tasks to
realize attempts, the client attributed to a GRAPHICAL UI need not fathom the nuances
attributed to how the endeavors are performed.
GRAPHICAL UI parts can consolidate menus, toolbars, press buttons, radio buttons, list
boxes, in addition sliders just to give two or three models. GRAPHICAL UIs made using
MATRIX LABORATORY gadgets can similarly play out an appraisal, read in addition
construction data reports, talk with various GRAPHICAL UIs, in addition show data as tables
or as plots.
5.3 Matrix Laboratory Tools
Get Started
Learn the basics of Image Processing Toolbox
Import, Export, and Conversion

Image data import and export, conversion of image types and classes
Display and Exploration

Interactive tools for image display and exploration
Geometric Transformation and Image Registration

Scale, rotate, perform other N-D transformations, and align images using intensity correlation,
feature matching, or control point mapping
Image Filtering and Enhancement

Contrast adjustment, morphological filtering, deblurring, ROI-based processing
Image Segmentation and Analysis

Region analysis, texture analysis, pixel and image statistics
25
Deep Learning for Image Processing
Perform image processing tasks, such as removing image noise and performing image-to-
image translation, using deep neural networks (requires Deep Learning Toolbox™)
3-D Volumetric Image Processing

Filter, segment, and perform other image processing operations on 3-D volumetric data
Hyperspectral Image Processing

Import, export, process, and visualize hyperspectral data
Code Generation and GPU Support

Generate C code, HDL code, and MEX functions, and run image processing code on a graphics
processing unit (GPU)
26
CHAPTER 6
EXISTING METHODOLOGY
Tracking visual objects as extended targets requires specialized methodologies to overcome the
unique challenges presented by their size, shape, and dynamic behavior. Multiple Hypothesis
Tracking (MHT) is a probabilistic framework that concurrently considers multiple object
trajectory hypotheses, effectively handling uncertainties. Markov Chain Monte Carlo (MCMC)
methods, such as Gibbs sampling and Metropolis-Hastings algorithm, iteratively estimate
object states by sampling the state space based on acquired observations, making them adept
at handling complex motion models and uncertainties. Random Finite Set (RFS) filters, like
Probability Hypothesis Density (PHD) filters and Cardinalized PHD (CPHD) filters, enable
the tracking of multiple extended objects in cluttered environments by estimating object
numbers and states. Shape-based tracking algorithms use object shape information to match
observed and predicted shapes, aiding accurate tracking. Level Set Methods effectively
segment and track extended object boundaries by evolving their contours over time. Particle
Filters and Sequential Monte Carlo (SMC) methods use particles to represent possible object
states and update their weights based on observations, facilitating robust tracking even in non-
linear scenarios. Deep learning-based approaches, employing Convolutional Neural Networks
(CNNs) and Recurrent Neural Networks (RNNs), have significantly impacted extended object
tracking by enabling object detection, segmentation, and motion modeling. Model-based
tracking approaches utilize mathematical or physical models of object behavior and appearance
for state prediction and tracking. Lastly, Extended Kalman Filter (EKF) and Unscented Kalman
Filter (UKF) variants provide effective tracking for extended objects with non-linear dynamics
by iteratively updating the object state based on measurements. These methodologies can be
combined and tailored to specific tracking scenarios to achieve accurate and efficient tracking
of extended visual objects.
Tracking visual objects as extended targets involves specialized methodologies to handle the
unique challenges posed by the size, shape, and dynamics of these objects. Here is an overview
of existing methodologies in tracking extended targets:
1. Multiple Hypothesis Tracking (MHT)
MHT is a probabilistic framework that allows tracking of multiple potential object trajectories
simultaneously. It maintains multiple hypotheses about the object's state and updates them as
27
new data is acquired. MHT is effective for handling uncertainties and complex scenarios,
including extended object tracking.
2. Markov Chain Monte Carlo (MCMC) Methods
MCMC methods, like Gibbs sampling and Metropolis-Hastings algorithm, are used to estimate
the state of extended objects. They iteratively sample the state space to find the most probable
state given the observations. MCMC-based methods can handle complex motion models and
measurement uncertainties.
3. Random Finite Set (RFS) Filters
RFS filters, such as Probability Hypothesis Density (PHD) filters and Cardinalized PHD
(CPHD) filters, are used for tracking multiple extended objects in cluttered environments.
These filters estimate the number and states of objects, making them suitable for tracking
extended targets.
4. Shape-Based Tracking Algorithms
Some methodologies use shape information to track extended objects. This involves modeling
the shape of the object and utilizing shape-based metrics or models for tracking. Matching the
observed shape with the predicted shape helps in tracking the object accurately.
5. Level Set Methods
Level set methods are utilized to segment and track the contours or boundaries of extended
objects. By evolving the object boundary over time using level set equations, these methods
can handle shape variations and deformations.
6. Particle Filters and Sequential Monte Carlo (SMC) Methods
Particle filters and SMC methods are widely used for tracking extended objects. They handle
non-linear motion models and measurement models effectively. Each particle represents a
possible state of the object, and their weights are updated based on observations.
7. Deep Learning-Based Approaches
Deep learning has revolutionized extended object tracking. Convolutional Neural Networks
(CNNs) are employed for object detection and segmentation. Recurrent Neural Networks and
Long Short-Term Memory (LSTM) networks are used for object tracking over time.
28
8. Model-Based Tracking Approaches
Model-based approaches formulate mathematical or physical models of the object's behavior

and appearance. These models are then used to predict the object's state and track it over time.
They can incorporate aspects like shape, motion, and appearance.
9. Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF)
EKF and UKF are variants of the Kalman filter and are used for tracking extended objects with
non-linear dynamics. They estimate the state of the object, including position, velocity, and
shape parameters, by iteratively updating the state based on measurements.
These methodologies can be combined, customized, or adapted based on the specific tracking
scenario and requirements to achieve effective and robust tracking of extended visual objects.
The choice of methodology often depends on the nature of the object being tracked, the
available sensor data, and the desired level of accuracy and efficiency
29
CHAPTER 7
PROPOSED SYSTEM
The proposed detection network is based on the popular ATOM tracker . Here, we use the pre-
trained ResNet as a backbone network represented as φ(.), and the convolutional classifier is
trained online during the tracking process. The classifier is constructed by 2 fully convolutional
layers. The score map for each search image is calculated by a convolutional operation: sθ(x)
= wθ ∗ φ(x), where wθ is the weight for the convolution kernel, and φ(x) represents the feature
extracted from the search image using the pre-trained backbone network. wθ is learned online
by a series of tracked target appearances and pseudo labels. The loss function is the square
difference between the output score s and pseudo label α at each evaluated location y ∈ R 2 ,
L(s, α) = (s−α) 2 . The pseudo labels are constructed by using a Gaussian function centered at
y ∗ , where the score map has the highest score:
7.1 Analysis on Single Point CNN Based Detector
Since the whole network is created by fully convolutional neural networks, strict translation
invariance is maintained between the score map and the search image, which ensures the
pseudo label works both for the training and detection . The detector constructed by this concept
has inherent restrictions. The target center detection is an ill defined problem, due to the
appearance change, occlusion, and camera orientation. The detected target center is not
equivalent to the center of the true extended target. A small σ encourages the classifier to output
detections only at a small area of the object, and to ignore the other contents, . Padding in deep
CNN will destroy the translation invariance that only exists in non-padding networks and it
will lead to a bias on center point estimation. As a result, only detecting the point with the
highest score in the score map may provide a biased estimate of the target center.
30
7.2 Detectors with Multiple Detection Points
To avoid these limitations , we propose a detector with multiple detection outputs, instead of a
single detection.Detection with multi-detection points, map which have been selected as
exceedances above a certain threshold ζ. In this paper, the pseudo labels are created based on a
larger σ, to cover the whole target area for the online training samples. From the raw score map,
we first apply thresholding to get multiple detections, which are projected back to the original
search image. The dimension of the score map is usually smaller than the search image, and a
certain translation is performed to project the detections back to the search image. The red dots
represent the detections in the search image, and the green ellipse is the extended target. The
detection point and receptive field in a CNN, We can use the receptive field concept to explain
the detection process. The receptive field in a fully CNN is the region of the input space that
corresponds to a particular cell in the final score map. The input image, intermediate feature
value, and the output score map. The blue region is the target we want to distinguish in the
search image. If a cell in the score map is marked as detection, the corresponding receptive
field in the input image has a high probability of overlapping with the object. The size of the
receptive field depends the CNN network structure. The visual object has been modeled as an
elliptical extended target. The kinematic state rk and shape parameter pk of the object at time
k are defined as: rk = [mT k , m˙ T k ] T R 4 (2) pk = [γk, lk,1, lk,2] T R
31
CHAPTER 8
RESULT ANALYSIS
Fig 8.1 : Input image from video frame
Predictions : For a given frame, the first task is to predict the object’s location.
Extract Search Image : extracting a small patch as search image from the entire image as
illustrated
Detection : The detection network uses this search image to get multiple detections.
Fig 8.2 : Object tracking using elliptical gateing

Gateing : We use an elliptical gate to filter out clutters and other detections far away from
the predicted target center and keep the detections inside the gate.
32
Kinematic state update: which are then used to update both the kinematic and shape states
according to MEM-EKF.
Shape state update: The shape representation captures the object's geometry, allowing for
more accurate tracking.
Fig 8.3 : Tracking of an object with boundary box
⚫ The above figure are obtained after converting the input video into frames using matlab.
From the above video,we observe the change in position of the object in the different
frames .In select object for the tracking from above frames by drawing a small
rectangular box called bounding box and then track the selected object.
⚫ Tracking visual extended objects is challenging due to size variations, occlusions,

deformations, and multiple objects. Techniques involve motion models, sensor fusion,
deep learning, and shape estimation. Evaluation metrics include IoU and AP.
Applications include autonomous driving and surveillance. Challenges include
appearance changes and real-time tracking. Future directions focus on improving
accuracy and integrating 3D information.
33
CHAPTER 9
CONCLUSION
In summary, single-point CNN detectors like YOLO offer rapid, real-time object detection.
They excel in applications such as autonomous driving, surveillance, and retail. Despite
challenges with small object detection and precision limitations, their efficiency and
adaptability make them a key player in computer vision, likely to continue evolving and
transforming various industries.
This thesis report discusses about the most suitable deep-learning models for real-time object
detection and recognition and evaluates the performance of these algorithms on the detection
and recognition of the construction vehicles at a scaled site.
FUTURE SCOPE
• Future focus: Enhanced accuracy and speed in real-time processing.

• Attention mechanisms: Integration for better focus on relevant regions.
• Multi-modal integration: Combining data from different sources for richer
understanding.
• Object tracking: Integration for robust temporal understanding.
• Edge device optimization: Efficient deployment on devices with limited resources.
• Ethical considerations: Addressing biases and ensuring fairness in predictions.
• Continual learning: Models adapting and improving over time with new data.
• Customization: Easy adaptability and tailoring for specific applications.
• Few-shot learning: Effective object detection with limited annotated data.
• Context integration: Leveraging contextual information for more accurate detection.
APPLICATIONS
A Single Point CNN-Based Detector, often referred to as a single-shot object detector, is a type
of deep learning model designed for object detection tasks. Unlike traditional two-step
approaches that involve region proposal networks and classification, single point detectors can
detect and classify objects in a single forward pass through the neural network. One popular
34
example of a single-point CNN-based detector is the You Only Look Once (YOLO)
architecture. Below is an analysis of the advantages of single-point CNN-based detectors:
1. Efficiency and Speed:
One of the primary advantages of single-point CNN-based detectors is their efficiency and
speed. Since these detectors perform detection and classification in a single pass through the
network, they are significantly faster compared to two-step detection models. Real-time or
near-real-time object detection can be achieved, making them ideal for applications where
speed is crucial, such as in autonomous vehicles and surveillance systems.
2. Unified Framework:
Single point detectors provide a unified end-to-end framework for object detection. They
simultaneously predict the object's bounding box coordinates and class scores in a single
network, streamlining the detection process. This simplifies the architecture and makes training
and deployment more straightforward, enhancing overall model efficiency and manageability.
3. Improved Accuracy:
Single point detectors often achieve competitive accuracy and precision compared to
traditional methods, especially in detecting smaller objects or multiple objects within an image.
Advances in network architectures and optimization techniques have helped in enhancing the
accuracy of single point detectors, closing the performance gap with two-step approaches.
4. Reduced False Positives:
The unified nature of single point detectors can lead to fewer false positives since the model
considers objectness and class predictions together. This can result in more reliable and
accurate object detection, reducing the number of incorrect detections and improving the
model's performance in practical applications.
5. Object Localization:
Single point detectors excel in precise object localization, as they predict the object's
bounding box coordinates with high accuracy. This is vital in various applications, particularly
in scenarios where the accurate localization of objects is crucial, such as in medical imaging or
robotics.
35
6. Memory Efficiency:
Single point detectors typically have a fixed-size grid for predictions, resulting in a fixed
number of output tensors regardless of the number of objects in the image. This contributes to
memory efficiency during inference, allowing the model to be deployed on devices with limited
computational resources.
7. Scalability and Flexibility:
The single-point detection approach allows for easy scaling to different input image sizes
without significant modifications to the model architecture. This flexibility is essential for
adapting the model to various use cases and deployment scenarios.
8. Adaptability to Real-time Applications:
Due to their efficiency and speed, single-point CNN-based detectors are highly suitable for
real-time applications such as video analysis, object tracking, and live video processing. Their
ability to provide rapid and accurate detection makes them ideal for time-sensitive tasks.
In summary, single-point CNN-based detectors offer a compelling solution for object detection
tasks, balancing efficiency, speed, accuracy, and ease of implementation. Their advantages
make them a popular choice for a wide range of computer vision applications.
Single-point CNN-based detectors, such as YOLO (You Only Look Once), find applications in
various domains due to their speed, efficiency, and accuracy in object detection. Here are
notable applications:
1. Autonomous Driving and Vehicle:
Real-time object detection for collision avoidance, pedestrian detection, and traffic sign
recognition.
2. Surveillance and Security:
Intrusion detection, people counting, and monitoring suspicious activities in crowded areas.
3. Retail and Inventory Management:
Automated inventory tracking, retail analytics, and monitoring customer traffic within stores.
36
4. Medical Imaging:
Identifying anomalies in medical images, tumor detection, and assisting radiologists in

diagnostics.
5. Agriculture:
Crop monitoring, disease detection in plants, and automated pest control.
REFERENCES
[1] T. Bouwmans, F. E. Baf, and B. Vachon, “Background modeling using mixture of

gaussians for foreground detection - a survey,” Recent Patents Comput Sci., vol. 1, no. 3,
pp. 219–237, 2008.
[2] T. Bouwmans, “Recent advanced statistical background modeling for foreground
detection: A systematic survey,” Recent Patents Comput Sci., vol. 4, no. 3, 2011.
[3] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,”
IEEE Trans. Patt. Anal. Mach. Intell., vol. 22, no. 8, pp. 747–757, 2000.
[4] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric model for background
subtraction,” in Computer Vision–ECCV 2000, 2000, pp. 751–767, Springer.
[5] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Background modeling and
subtraction by codebook construction,” in Proc. ICIP, 2004, pp. 3061–3064.
[6] K. K. Hati, P. K. Sa, and B. Majhi, “Intensity range based background subtraction for
effective object detection,” IEEE Signal Process. Lett., vol. 20, no. 8, pp. 759–762, 2013.
[7] F. E. Baf, T. Bouwmans, and B. Vachon, “A fuzzy approach for background subtraction,”
in Proc. ICIP, 2008, pp. 2648–2651, IEEE.
[8] M. Heikkilä and M. Pietikäinen, “A texture-based method for modeling the background
and detecting moving objects,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 4, pp.
657–662, 2006.
[9] S. Zhang, H. Yao, and S. Liu, “Dynamic background modeling and subtraction using
spatio-temporal local binary patterns,” in Proc. ICIP, 2008, pp. 1556–1559.
[10] V. M. Antoni, B. Chan, and N. Vasconcelos, “Generalized stauffer–grimson background
subtraction for dynamic scenes,” Mach. Vis. Applicat., vol. 22, pp. 751–766, 2011.
37
[11] W. Kim and C. Kim, “Background subtraction for dynamic texture scenes using fuzzy
color histograms,” IEEE Signal Process. Lett., vol. 19, no. 3, pp. 127–130, 2012.
[12] P. Chiranjeevi and S. Sengupta, “New fuzzy texture features for robust detection of moving
objects,” IEEE Signal Process. Lett., vol. 19, no. 10, pp. 603–606, 2012.
[13] P. Chiranjeevi and S. Sengupta, “Detection of moving objects using multi-channel kernel
fuzzy correlogram based background subtraction,” IEEE Trans. Cybern., vol. 44, no. 6,
pp. 870–881, 2014.
[14] O. Küçüktunç, U. Güdükbay, and Ö. Ulusoy, “Fuzzy color histogrambased video
segmentation,” Comput. Vis. Image Understand., vol. 114, no. 1, pp. 125–134, 2010.
[15] C. F. Lam and M.-C. Lee, “Video segmentation using color difference
histogram,” in Multimedia Information Analysis and Retrieval, ser. Lecture Notes in
Computer Science. Berlin, Germany: Springer, 1998, vol. 1464, pp. 159–174.
[16] G.-H. Liu and J.-Y. Yang, “Content-based image retrieval using color difference
histogram,” Patt. Recognit., vol. 46, no. 1, pp. 188–198, 2013.
[17] [Online]. Available: http://imp.iis.sinica.edu.tw/ytchen/testvideos.rar
[18] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Berlin,
Germany: Springer Science & Business Media, 1981.
[19] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis., vol. 7, pp. 11–32,
1991.
[20] [Online]. Available: http://perception.i2r.a-star.edu.sg/bk_model/ bk_index.html
[21] [Online]Available:http://research.microsoft.com/en-us/um/people/
jckrumm/WallFlower/TestImages.htm
[22] [Online]. Available: http://www.changedetection.net/
[23] J. Davis and M. Goadrich, “The relationship between precision-recall and roc curves,” in
Proc. 23rd Int. Conf. Machine Learning, 2006, pp. 233–240, ACM Press.
[24] A. D. L. Lacassagne and A. Manzanera, “Motion detection: Fast and robust algorithms for
embedded systems,” in Proc. ICIP, 2009, pp. 3265–3268.
[25] P. L. Rosin and E. Ioannidis, “Evaluation of global image thresholding for change
detection,” Patt. Recognit. Lett., vol. 24, pp. 2345–2356, 2003
38
APPENDIX
%%% fuzzy color difference histogram (FCDH)

%% clear memory & command window
clc;
clear all;
close all;
%% initialize
input_video = VideoReader('6.avi');
data = get(input_video,'numberOfFrames');
starting_frame = 2;
end_frame = 200;
matrix = 8;
img = read(input_video,starting_frame);
output = zeros(end_frame-starting_frame+1,2);
figure(); imshow(img);
centroid_data = round(getrect(figure(1)));
% or = [67,405,43,80];
corner_data = round([centroid_data(2)+centroid_data(4)/2 centroid_data(1)+centroid_data(3)/2]);
% close Figure 1
%%
for loop = starting_frame:end_frame
img = read(input_video,loop);
[red,green,blue] = sarea(img,centroid_data);
I = qnt(img,matrix);
%% fuzzy color difference histogram (FCDH)
if loop == starting_frame
SD = liklihood_ratio(I,red,green,matrix);
grouping = map(I,SD,blue);
ac = avcol(img,grouping);
elseif loop ~= starting_frame
SDt = liklihood_ratio(I,red,green,matrix);
Mt = map(I,SDt,blue);
act = avcol(img,Mt);
if abs(ac - act) > (0.05 * 256)
SD = SDt;
ac = act;
end
end
%% detection
for mi = 1:5
grouping = map(I,SD,blue);
39
%% centroid
CDp = corner_data;
[Cnr,Cnc,v] = find(grouping ~= 0);
red_data = sum(Cnr);
green_data = sum(Cnc);
blue_data = sum(v);
if blue_data == 0;
corner_data = CDp;
else
corner_data = round([green_data/blue_data red_data/blue_data]);
end
centroid_data = round([corner_data(1)-centroid_data(3)/2 ...
corner_data(2)-centroid_data(4)/2 centroid_data(3) centroid_data(4)]);
% Delta = norm(corner_data - CDp);
% if Delta < 2
% break
% end
end
% output(loop-starting_frame+1,:) = corner_data;
%% show
Im = im2double(img);
Oi = grouping(centroid_data(2)+1:centroid_data(2)+centroid_data(4),...
centroid_data(1)+1:centroid_data(1)+centroid_data(3));
Im(1:centroid_data(4),1:centroid_data(3),:) = double(cat(3,Oi,Oi,Oi));
imshow(Im)
hold on
title([num2str(loop-starting_frame+1),'/',num2str(end_frame-starting_frame+1)]);
rectangle('Position',[centroid_data(1),centroid_data(2),centroid_data(3),centroid_data(4)],
'LineWidth',8,'EdgeColor','r');
% plot(output(1:loop-starting_frame+1,1),output(1:loop-starting_frame+1,2),'m','LineWidth',2,'LineStyle','--')
hold off
pause(0.01)
end
% save output.mat output
40

FINALREPORTA6CV

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FINALREPORTA6CV

Uploaded by

Copyright:

Available Formats

Tracking Visual Object As an Extended Target

A Mini Project Work

Under the guidance of

Department of Electronics and Communication Engineering

THURAKA SAI - 20EG104155

Dr. B. PAVITRA Prof. N. MANGALA GOURI

This project is an acknowledgement to the inspiration, drive and technical assistance

THURAKA SAI - 20EG104155

GADDALA SHARATH – 20EG104119

CHAPTER 1: INTRODUCTION 1-5

CHAPTER 2: LITERATURE SURVEY 6-8

CHAPTER 3: INTRODUCTION TO IMAGE PROCESSING

3.1 Image Pixels 9-10

3.2 Image file formats 11

3.3 Fundamental steps in DIP 12-17

3.4 Components of an image processing system 18

CHAPTER 4: DIGITAL IMAGE PROCESSING

4.1 Digital image processing 19-21

CHAPTER 5: SOFTWARE INTRODUCTION

5.1 Introduction to matrix lab 22

5.2 Graphical UI (Graphical UI) 22-26

CHAPTER 6: EXISTING METHODOLOGY 27-29

CHAPTER 7: PROPOSED SYSTEM

7.1 Analysis on single point CNN based detector 30

7.2 Detectors with multiple detection points 31

CHAPTER 8: RESULT ANALYSIS 32-33

1.1 Tracked object and its pseudo labels with different σ 4

3.0 General image 9

3.1 Image pixel 10

3.2 Transparency image 10

3.3 Resolution image 11

3.4 Image fundamental 12

3.5 Digital camera image 12

3.6 Digital camera cell 13

3.7 Image enhancement 13

3.8 Image restoration 14

3.9 Color & Gray scale image 15

3.10 Rgb histogram image 15

3.11 Blur to deblur image 16

3.12 Image segmentation 17

3.13 Component of image processing 18

Fig1.1 Tracked object and its pseudo labels with different σ

S.no Author Title of the Journel/conference Methodology Year

3.1 IMAGE PIXELS

An image is a two-dimensional picture, which has a similar appearance to some subject

usually a physical object or a person.

Image is a two-dimensional, such as a photograph, screen display, and as well as a three-

Fig 3.0 General image

Fig 3.1 Image pixel

Fig 3.3 Resolution image

Fig 3.4 Image fundamental

3.4 Image Acquisition:

Image Acquisition is to acquire a digital image. To do so requires an image sensor and

Fig 3.5 Digital camera image

is not in digital form, an analog to digital

Fig 3.6 digital camera cell

3.7 Image Enhancement:

Fig 3.7 Image enhancement

Fig 3.8 Image restoration

Enhancement, on the other hand, is based on human subjective preferences regarding

3.9 Color image processing:

3.10 Wavelets and multiresolution processing:

Fig 3.10 rgb histogram image

3.12 Morphological processing: