You are on page 1of 31

Large Scale Image Processing

with Hadoop
Brandyn White
bwhite@cs.umd.edu
Advisor: Prof. Larry Davis

Outline

'Big Data' in Computer Vision


Map/Reduce and Computer Vision
Map/Reduce Image Search
Application: Screenshot Retrieval

'Big Data' in Vision


Traditional Vision: Focus on the model
o Pose Est.: 2D Image -> Virtual 3D model + Camera
Under-constrained, slow, sensitive to noise
o Object Recognition: SVM + features
Breaks with many classes (e.g., every flickr tag)
New Trend: Focus on the data
o DB of images (w/ metadata) -> query image
o Problem becomes similar image search
o Transfer metadata from DB images to query image
o KNN methods simple and scalable
Clustering, hashing, metric learning
NLP: rule-based models -> statistical models

Example: Image Search -> Metadata


Query Image

Example: Image Search -> Metadata


Query Image

Retrieved Images (flickr)


Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Example: Image Search -> Metadata


Query Image

Retrieved Images (flickr)

Output Metadata

Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Tags
Location (GPS)
Title
Date
Groups
Comments
Owner
Views

Tags
Location (GPS)

Big Data in Vision: Pose Estimation


Goal: Given an image of a person, estimate 3D pose.

G.Shakhnarovich,P.Viola,T.DarrellFast pose estimation with parameter-sensitive hashing,October2003.

Big Data in Vision: Scene Completion


Goal: Given an image and a selected region, fill the region with
a plausible texture.

J.HaysandA.A.Efros,"Scenecompletionusingmillionsofphotographs,"inSIGGRAPH '07: ACM SIGGRAPH 2007


papers.NewYork,NY,USA:ACM,2007,pp.4+.

Big Data in Vision: IM2GPS


Goal: Given an image, guess where in the world it was taken.

J.HaysandA.A.Efros,"Im2gps:estimatinggeographicinformationfromasingleimage,"Computer Vision and Pattern


Recognition, IEEE Computer Society Conference on,vol.0,pp.1-8,2008.

Big Data in Vision: Object Recognition


Goal: Given an image, select a noun that describes it.

A.Torralba,R.Fergus,andW.T.Freeman,"80milliontinyimages:Alargedatasetfornonparametricobjectandscene
recognition,"Pattern Analysis and Machine Intelligence, IEEE Transactions on,vol.30,no.11,pp.1958-1970,May2008

Big Data in Vision: Pixel Annotation


Goal: Given an image, annotate every pixel (e.g., building).

C.Liu,J.Yuen,andA.Torralba,"Nonparametricsceneparsing:Labeltransferviadensescenealignment,"Computer
Vision and Pattern Recognition, IEEE Computer Society Conference on,vol.0,pp.1972-1979,2009.

Big Data in Vision: One Frame Motion


Goal: Given an image, estimate the pixel motion.

C.Liu,J.Yuen,A.Torralba,J.Sivic,andW.T.Freeman,"Siftflow:Densecorrespondenceacrossdifferentscenes,"
inECCV '08: Proceedings of the 10th European Conference on Computer Vision.Berlin,Heidelberg:Springer-Verlag,
2008,pp.28-42.

Outline

'Big Data' in Computer Vision


Map/Reduce and Computer Vision
Map/Reduce Image Search
Application: Screenshot Retrieval

Hadoop+CV: No Reducer

Map

Map

Map

Example Maps
Object Detection (e.g., cars, faces)
Feature Computation (e.g., SIFT)
Sliding Windows (given a region+image)

Hadoop+CV: Model Creation

Map

Map

Map

Reduce
Map: Feature Computation
Red: Model Creation
Examples
Classifiers (e.g., SVM, Bayes)
Geometry Problems (e.g., RANSAC, SfM)

Hadoop+CV: Expectation Maximization


Vec0
Map

Vec1
Map

Vec2
Map

Parameter Estimate
(in JAR or cache)

Reduce
Map: Fit data to model given parameters (E-Step)
Red: Compute new model parameters given data (M-Step)
Iterate until stopping conditions are met.
Examples
Clustering (e.g., K-Means)
Mixture Models (e.g., MoG)

Outline

'Big Data' in Computer Vision


Map/Reduce and Computer Vision
Map/Reduce Image Search
Application: Screenshot Retrieval

Image Retrieval with Hadoop


Analogies between image and text retrieval
o Bag of Words -> Bag of Features
o Document -> Image
o Visual Word: Cluster of similar visual features

Compute Local Image Features (e.g., SIFT)


Cluster Features (i.e., create visual words)
Find cluster medians
Make Hamming Embeddings (compact feature) [1]
o Efficient binary code (256 -> 8 Bytes per feature)
o Hamming Distance
o Benefit: Small size means more in memory
Inverted Index
[1]H.Jegou,M.Douze,andC.Schmid,"Hammingembeddingandweakgeometricconsistencyforlarge
scaleimagesearch,"inECCV '08: Proceedings of the 10th European Conference on Computer Vision.
Berlin,Heidelberg:Springer-Verlag,2008,pp.304-317

Hadoop Job Workflow


(Database Images)
Image Features (SURF 64D)
Remove Dupes (Curr./Prev.)
K-Means Clustering (Initial)
K-Means Clustering
Median Computation
Hamming Embedding

Hadoop Job Workflow: Image Features


(Database Images)

Image Features (SURF 64D)

Map In: (image_url, image_hash, image_data, image_tags)


Map Out: (image_hash, image_url, image_features)

Hadoop Job Workflow: Remove Dupes


Image Features (SURF 64D)
Remove Dupes (Curr./Prev.)
Map In: [image_hash, image_url, image_features]
or
Map In: [image_hash] (for images already in the DB)
Map Out Key: image_hash
Map Out Val: image_features
Reduce Out: [image_hash, image_feature]

Hadoop Job Workflow: K-Means (init)


Remove Dupes (Curr./Prev.)
K-Means Clustering (Initial)
Map In: [image_hash, image_feature]
Map Out Key: random [0,1]
Map Out Val: image_feature (extended by 1 dim to get count)
1 Reducer (outputs once per cluster)
Reduce Out: [cluster_num, cluster_mean]

Hadoop Job Workflow: K-Means


K-Means Clustering (Initial)
K-Means Clustering

File: cluster_means
Map In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)
Map Out Val: image_feature (extended by 1 dim to get count)
Reduce Out: [cluster_num, cluster_mean]

Hadoop Job Workflow: Medians


K-Means Clustering
Median Computation
File: cluster_means
Map In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)
Map Out Val: image_feature
Reduce Out: [cluster_num, cluster_median]

Hadoop Job Workflow: Ham. Emb.


Median Computation
Hamming Embedding

File: cluster_means, cluster_medians


Map In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)
Map Out Val: hamming_embedding
Reduce Out: [cluster_num, hamming_embedding]

Image Retrieval Overview: Query


(Query Image)
Image Features (SURF 64D)
For each feature...
Find Nearest Cluster
Compute hamming embedding
(using cluster median)
Vote (tf-idf) for DB image if a feature if
hamming dist < Thresh

Outline

'Big Data' in Computer Vision


Map/Reduce and Computer Vision
Map/Reduce Image Search
Application: Screenshot Retrieval

Current Work: PC Help Doc. Retrieval


Goal: Take a screenshot and retrieve books and websites
that provide relevant help documentation.

TomYeh,BrandynWhite,LarryDavis,andBorisKatz

Outline

'Big Data' in Computer Vision


Map/Reduce and Computer Vision
Map/Reduce Image Search
Application: Screenshot Retrieval

Conclusion

Vision has 'Big Data' applications


Many image search applications
Common design patterns for M/R+Vision
Hadoop useful image search

References
[1]P.Duygulu,K.Barnard,J.deFreitas,andD.Forsyth,"Objectrecognitionasmachinetranslation:Learningalexiconforafixedimagevocabulary,"
inComputer Vision ECCV 2002,ser.LectureNotesinComputerScience,2002,ch.7,pp.349-354.
[2]A.Makadia,V.Pavlovic,andS.Kumar,"Anewbaselineforimageannotation,"inECCV '08: Proceedings of the 10th European Conference on
Computer Vision.Berlin,Heidelberg:Springer-Verlag,2008,pp.316-329.
[3]MatthieuGuillaumin,ThomasMensink,JakobVerbeekandCordeliaSchmid,"Tagprop:Discriminativemetriclearninginnearestneighbormodels
forimageauto-annotation."ICCV2009
[4]A.Torralba,R.Fergus,andW.T.Freeman,"80milliontinyimages:Alargedatasetfornonparametricobjectandscenerecognition,"Pattern
Analysis and Machine Intelligence, IEEE Transactions on,vol.30,no.11,pp.1958-1970,May2008.