You are on page 1of 3

Computer vision, or CV for short, is a machine learning paradigm that deals with

the ability of a machine to receive and analyze visual data on its own, and then
make decisions about it. That can include photos and videos, but more broadly
might include “images” from thermal, or infrared sensors, detectors and other
sources.

Until recently (a decade back or so), computer vision only worked in limited
capacity. Thanks to advances in artificial intelligence and innovations in deep
learning and neural networks, the field has been able to take great leaps in recent
years and has been able to surpass humans in some tasks related to detecting and
labeling objects.

One of the driving factors behind the growth of computer vision is the amount of
data we generate today that is then used to train and make computer vision models
better.

How massive the data truly is?


“it would take a person 10 entire years to even look at all the photos shared on
Snapchat—in just the last hour”

Along with a tremendous amount of visual data, the computing power required to
analyze the data is now accessible.

How neural nets work?


Despite the fact that Neural Nets are supposed to “mimic the way the brain
works,” nobody is quite sure if that’s actually true.

The same paradox holds true for computer vision — since we’re not decided on
how the brain and eyes process images, it’s difficult to say how well the
algorithms used in production approximate our own internal mental processes.

So what do these deep learning networks actually do?


On a certain level Computer vision is all about pattern recognition. So one way to
train a computer how to understand visual data is to feed it images, lots of images
thousands, millions if possible that have been labeled, and then subject those to
various software techniques, or algorithms, that allow the computer to hunt down
patterns in all the elements that relate to those labels.
So, for example, if you feed a deep learning algorithm a million images of cats, it
will analyze the colors in the photo, the shapes, the distances between the shapes,
where objects border each other, and so on, so that it identifies a profile of what
“cat” means. When it’s finished, the computer will (in theory) be able to use its
experience if fed other unlabeled images to find the ones that are of cats.

More advanced CV-enabled computers today not only know that there are different
objects in a given image, but can actually understand what these objects are, so not
only faces, but also vehicles, trees, buildings, birds, money, and on and on. Even
when these objects are partially obscured or displayed at an angle (a process
known as occlusion). It’s still early days, but we’ve begun the process when
computers will have a functional understanding of the world around us.

A simple yet powerful application


If you’ve ever used the Google translate app, you may have discovered the ability
to point your smartphone’s camera at text from any number of languages and have
it translate to another language on screen almost instantly. That’s a form of
“augmented reality” (AR), when computer vision -- specifically, optical character
recognition -- enables an accurate translation that’s then transformed into an
overlay onto the real world (essentially, the translated text in place of the original
text). It’s so simple and instant that it’s easy to forget this is a mind-blowing
capability in nearly everyone’s pocket, and a glimpse of the power of computer
vision in action.

Commercial Usage:
Think of any futuristic situation, and there’s likely a computer vision-related
solution that can or will someday be applied. Take those fancy Tesla cars you’ve
heard so much about: They rely on a host of cameras as well as sonar, that not only
prevent your car from drifting out of a lane, but are able to see what other objects
and vehicles are around you and also read signs and traffic signals.

In near future, computer vision will enable new ways of doing diagnostics that are
closer to Star Trek to analyze X-rays, MRI, CAT, mammography, and other scans.
(After all, some 90 percent of all medical data is image based.)

And computer vision will also help make robots and drones an ordinary part of
everyday life. Imagine fleets of firefighting drones and robots sent into wildfires to
cut down trees and guide water delivery. Or fleets of drones sent to search for lost
hikers, or earthquake survivors, or shipwrecked sailors. In fact, drones are being
used to help farmers keep tabs on crops, look for signs of drought or infestation,
perhaps even analyze soil types and weather conditions to optimize fertilization
and planting schedules.

In sports, computer vision is being applied to such tasks as play and strategy
analysis and on-field movement in games, ball and puck tracking for automated
camera work, and comprehensive evaluation of brand sponsorship visibility in
sports broadcasts, online streaming, and social media

Immersive technology that makes the viewer feel physically transported is arriving
already in the form of virtual and augmented reality, which is familiar to anyone
who has witnessed frenzied Pokémon Go players searching for imaginary
monsters in the real world using their phones. That’s rudimentary tech, but it
shows how convincing and satisfying it can be—wait until we all own VR
goggles.

You might also like