Professional Documents
Culture Documents
2
Course Details
❑ Course Description:
▪ Computer vision systems are becoming increasingly important to solve problems in a
variety of areas, including manufacturing and surveillance. It is therefore important
for future computer Engineering graduates to have solid knowledge of this field. For
this purpose, the course “Computer Vision” provides students with both theoretical
knowledge and practical experience with fundamental and advanced Computer
Vision algorithms. Topics range from basic image processing techniques such as
image convolution and region and edge detection to more complex vision algorithms
for contour detection, depth perception, dynamic vision, and object recognition.
Moreover, core topics like color processing, texture analysis and visual geometry are
covered. In programming assignments, students gain practical insight into the
development of vision applications by implementing Computer Vision algorithms in
the Python/MATLAB programming language. Their final project is the development
of their own computer vision program that solves a given problem; this could be
a simple object recognition task.
3
Learning Objectives
❑ Upon completion of this course, students will:
▪ Be familiar with both the theoretical and practical aspects of computing
with images;
▪ Have described the foundation of image formation, measurement, and
analysis;
▪ Have implemented common methods for robust image matching and
alignment;
▪ Understand the geometric relationships between 2D images and the 3D
world;
▪ Have gained exposure to object and scene recognition and categorization
from images;
▪ Grasp the principles of state-of-the-art deep neural networks; and
▪ Developed the practical skills necessary to build computer vision
applications.
4
Useful References
❑ Reference Books
▪ Rafeal C. Gonzales, ” Digital Image Processing”,pearson,3rd edition,2007.
▪ Richard Szeliski, ”Computer Vision: Algorithms and Applications”, Springer 2010-11.
5
Grading
❑ Grading :
▪ Midterm:
▪ Quizzes: 50
▪ Assignments:
▪ Project:
▪ Final: 50
6
Computer Vision
❑ Computer Vision is the science of building systems that can extract
certain task-relevant information from a visual scene.
❑ Such systems can be used for applications such as optical character
recognition, analysis of satellite and microscopic images, magnetic
resonance imaging, surveillance, identity verification, quality control in
manufacturing etc.
7
Computer Vision
❑ In a way, Computer Vision can be considered the inversion of Computer
Graphics.
❑ A computer graphics systems receives as its input the formal description
of a visual scene, and its output is a visualization of that scene.
❑ A computer vision system receives as its input a visual scene, and its
output is a formal description of that scene with regard to the system’s
task.
❑ Unfortunately, while a computer graphics task only allows one solution,
computer vision tasks are often ambiguous, and it is unclear what the
correct output should be.
8
What is (computer) vision?
9
Artificial
Machine Intelligence
Learning
Robotics
Deep
Learning
Human Computer
Interaction
Computational
Photography Medical Imaging
Optics
10
The goal of computervision
❑ To bridge the gap between pixels and
“meaning”
11
Computer Vision
❑ A simple two-stage model of computer vision:
feedback (tuning)
14
Computer Vision
❑ The image processing stage prepares the input image for the
subsequent scene analysis.
❑ Usually, image processing results in one or more new images that
contain specific information on relevant features of the input image.
❑ The information in the output images is arranged in the same way as in
the input image. For example, in the upper left corner in the output
images we find information about the upper left corner in the input
image.
15
Computer Vision
❑ The scene analysis stage interprets the results from the image
processing stage.
❑ Its output completely depends on the problem that the computer vision
system is supposed to solve.
❑ For example, it could be the number of bacteria in a microscopic
image, or the identity of a person whose retinal scan was input to the
system.
❑ In the following lectures we will focus on the lower-level, i.e., image
processing techniques.
❑ Later we will discuss a variety of scene analysis methods and
algorithms.
16
Computer Vision
❑ Have you ever used computer vision?
❑ How? Where?
17
Laptop: Biometrics auto-login (face recognition, 3D), OCR
Smartphones: QR codes, computational photography (Android Lens Blur, iPhone Portrait
Mode), panorama construction (Google Photo Spheres), face detection, expression
detection (smile), Snapchat filters (face tracking), Google Tango (3D reconstruction),
Night Sight (Pixel)
Web: Image search, Google photos (face recognition, object recognition, scene
recognition, geolocalization from vision), Facebook (image captioning), Google maps
aerial imaging (image stitching), YouTube (content categorization)
VR/AR: Outside-in tracking (HTC VIVE), inside out tracking (simultaneous localization
and mapping, HoloLens), object occlusion (dense depth estimation)
Motion: Kinect, full body tracking of skeleton, gesture recognition, virtual try-on
Medical imaging: CAT / MRI reconstruction, assisted diagnosis, automatic pathology,
connectomics, endoscopic surgery
Industry: Vision-based robotics (marker-based), machine-assisted router (jig),
automated post, ANPR (number plates), surveillance, drones, shopping
Transportation: Assisted driving (everything), face tracking/iris dilation for drunkeness,
drowsiness, automated distribution (all modes)
Media: Visual effects for film, TV (reconstruction), virtual sports replay
(reconstruction), semantics-based auto edits (reconstruction, recognition) 18
The Four Rs of Computer Vision
▪ Reconstruction
▪ Registration
▪ Reorganization R1
▪ Recognition
R2
R3
R4
19
R1: Reconstruction
❑ In computer vision, 3D reconstruction is the process of capturing the
shape and appearance of real objects.
❑ Multiview Geometry, 3D Vision, Shape-from-X
20
R2: Registration
❑ Image registration is the process of transforming different sets of data
into one coordinate system. Data may be multiple images, data from
different sensors, times, depths, or viewpoints.
❑ Tracking, Alignment, Optical Flow, Correspondence
21
R3: Reorganization
❑ Clustering, Unsupervised Learning, Segmentation, Perceptual
Organization
22
R4: Recognition
❑ Image recognition, in the context of machine vision, is the ability to
identify objects, places, people, writing and actions in images.
❑ Verification, Identification, Detection.
23
24
Companies Using Computer Vision
26
Computer Vision Applications
27
Optical character recognition (OCR)
❑ Technology to convert images of text into text.
▪ If you have a scanner, it probably came with OCR software
Live
Camera
Translation
29
Smile detection
31
How does it work?
32
How does it work?
33
34
Vision-based biometrics
35
Facial login without a password…
36
Facial login without a password…
37
Object recognition (in mobile phones)
e.g., Google Lens
38
Human shape capture
39
Human shape capture
40
Human shape capture
41
Human shape capture
42
Special effects: motion capture
43
Interactive Games
Object Recognition:
http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
JH 44
Medical imaging
45
AutoCars - Uber bought CMU’s lab
46
47
Industrial robots
JH 48
Vision in Spaaaaace
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Skydio 2 drone
6x fisheye cameras for
obstacle avoidance
Onboard NVIDIA GPU 50
Augmented Reality
51
Virtual Reality
52
53