You are on page 1of 28

Modelling visual complexity

of real-word scenes

Fintan Nagle

Imperial College London

f.nagle@imperial.ac.uk
About me

• Cognitive scientist and developer


• Experience with deep learning and building interactive websites

Research interests:

• Temporal visual search & dynamic object recognition (PhD work)


• Visual complexity (postdoc work)
• How is episodic memory implemented in connectionist networks? (opposing
Gallistel and King's view; talk next week at GEM Bochum)
• How distrust and misinformation spread in online communities; how critical
thinking develops online (co-PI, grant with NCH & Northeastern)
• Knowledge representation and question-answering: NLP vs. semantic networks
and knowledge networks
Visual complexity and perceptual load

Espeseth 2010 PLOS One Molloy 2015 J Neurosci


Visual complexity and perceptual load

How do we estimate the perceptual load of real-world scenes?


Visual complexity and perceptual load

Which is more visually complex?


Visual complexity and perceptual load

Which is more visually complex?


Modelling visual complexity: previous work

Machado et al 2006 Acta Psychologica

• Model: feedforward network with one hidden layer to


predict complexity from 329 image features
• 800 images of several classes
• Spearman’s ρ rank correlation: .83 on unseen
validation set

Corchs et al 2016 PLOS One ➤ Previous work has used


classifiers on small feature
• Model: particle swarm optimised vectors and has not
regression on features demonstrated wide applicability
• 49 real-world scenes to real-world scenes.
• r = .81 on unseen validation set of 49
images
How subjective is complexity?

We asked 4 observers to choose the more complex image from 100 randomly
selected pairs of PASCAL VOC images.

• 48% were consistently judged by all observers


• 33.5% were agreed upon by 3 out of 4 observers
• 18.5% were tied between 2 and 2 observers

This indicates that there is a good degree of agreement between observers’


judgements of natural scene images: the percept of visual complexity is not so
subjective that it cannot be modelled.
Data collection

or

Data collected using a custom Web platform (Flask server


in Python, hosted on Amazon Web Services)

• 4000 images
• 62 observers, 11 male, mean age 24.9
Continuum of images
ordered by complexity • 75,020 comparisons (1,210 each on average)
TrueSkill convergence

Final distribution of complexity ratings for


Convergence of the TrueSkill algorithm
all 4,000 images.
after rating of the first 2,000 images
(iterations 0 - 4) and after the addition of
2AFC judgements matched the TrueSkill
the second 2,000 images (iterations 5 to 8).
ratings in 76.1% of comparisons
Visual complexity and perceptual load

Which is more visually complex?

18.7 30.1
Visual complexity and perceptual load

Which is more visually complex?

16.9 30.7
Visual complexity and perceptual load

Which is more visually complex?

34.0 19.27
Modelling

r = 0.83
Object taggings and their influence on complexity

Image count

Correlations with perceived complexity ratings


of predictions from the PASCAL VOC object taggings
Complexity and feature congestion

Original image Contrast clutter map

Rosenholtz et al
2007 JoV
Other features

Replication of Corchs et al
Object taggings
from Mask R-CNN We replicated the linear feature
combination proposed, running it on
our images.

Edge ratio Result: r = .47

This feature combination does not


work well on a diverse set of real-world
images.
Salience models
Predicting complexity from image statistics and features
Complexity maps: occlusion method
Moving from local complexity to global complexity

Perceived complexity maps


Images split into discs

Discs rated via 2AFC Example


disc ratings
Correlations between disc and whole-image ratings
Complexity maps: interpolation and blurring
Complexity maps: bicubic interpolation
Complexity maps: bicubic interpolation
Complexity maps: bicubic interpolation
Conclusions

• Observers show good agreement on 2AFC decisions


on visual complexity.

• A one-dimensional embedding explains these decisions well


(agreeing with 76.1% of comparisons).

• Visual complexity can be effectively predicted by a convolutional network


(r = 0.83) on a large, diverse set (3600 training, 400 test)
of indoor and outdoor real-world (natural and man-made) images.
This approach outperforms the previous state of the art (feature-based models).

• Our model allows evaluation of the complexity of natural scene images


and investigation of the relationship between visual complexity
and visual perceptual load.

• Some preliminary work in this area: Chech & Lavie 2019 ECVP (poster)
Thanks

fintan.nagle@ucl.ac.uk

You might also like