You are on page 1of 13

Let us consider that we have two ideal filters which give responses R1 and R2 to a certain presentation of a

movie with speed S1. By ideal I mean filters that are perfectly tuned to get speeds, apart from the fact that
there is some stochastic noise in their response. Their joint response, to the presentation of the movie, would
look something like the following (red dot).

R2

R1

If we present the same movie repeatedly to these ideal speed discrimination filters (R1,R2), we will get a set
of response that will be similar but have some inherent noise and the response of all presentation would look
like the following ( similar to a 2d gaussian with 1 = 2 ).

R2

R1

The ergodic hypothesis says that the response of one set of filters (R1, R2) to multiple presentation of the
same movie is the same as the response of multiple copies of the filters (Ri1, Ri2), (Rii1, Rii2), , (Rn1, Rn2) to
a single presentation of the movie.

R2

R2

R1
Multiple presentation of same movie.

R1
Single presentation of movie to multiple set of
filters. The response of each set is plotted on
the ideal (R1, R2) space.

Now let us go back to one set of ideal receptors and present a different movie multiple times, but with the
same latent variable (same speed). What differentiates these two movies is something that is not captured by
the filters. Because of the extrinsic fluctuations coming from the movies, the response now would be different,
although the latent variable is the same. The response would look something like this.

Movie 1
Movie 2

R2
*** For simplicity, we assume that the noise
model is additive. The mean changes, the
spread remains the same. But this might not
be so essential.
R1

If we present several such movies, several number of times, we would get the response distribution like the
following.

S1

R2

R1

The idea here is that although all the movies


have the same speed, and the filters are tuned to
speed, the response of filters to different movies
is not a Gaussian with zero correlation. The
inherent variability in different movies (due to
local contrast, illuminance, etc.) leads to a
correlated output.

There is nothing new here, we already


know all this.

For a different set of movies, with speed S2 the response would be different.

S1

S2

R2
I have intentionally chose the ellipses to be like
mirror images.

R1

For a different set of movies, with speed S2 the response would be different.

S1

S2

Let us now present a movie with one of the


speeds (S1 or S2 but unknown), and assume that
the response is a point on one of the red lines
shown in the figure.

R2
If we calculate the likelihood of such responses, it
will not be possible to identify whether the speed
is S1 or S2.
R1

In fact if we have multiple such speed ellipses, given any response, one can always find two ellipses that will
have the same likelihood for speeds.

S3
S1

R2

R1

Let us invoke ergodicity once again. To remind us, ergodic hypothesis, means that the response of one set
of filters (R1, R2) to multiple presentation of the same movie is the same as the response of multiple copies of
the filters (R1, R2) to a single presentation of the movie.

R2

R2

R1
Multiple presentation of same movie.

R1
Single presentation of movie to multiple set of
filters. The response of each set is plotted on
the ideal (R1, R2) space.

Lets go back the to case where we presented two movies multiple times.
Here, the difference in the mean response for the two movies is coming from the fluctuations in the variables
that are not captured by filter (R1, R2), this could be local illuminance, contrast, texture, etc.

Movie 1
Movie 2

R2

R1

Now I am going to make an assumption. Suppose I have two sets of filters (P1, P2) and (Q1, Q2). Both these
filters code for the same latent variable (speed), but they differ from (R1, R2). One could think of (P1, P2) and
(Q1, Q2) as some noisy version of (R1, R2) where some of the pixels of (R1, R2) got flipped.
The assumption is this: The response of the ideal filter (R1, R2) to two movies that differ by a non-latent
variable (say local contrast), is the same as response of two filters (P1, P2) and (Q1, Q2) to either of the
movies, where the two filters (P1, P2) and (Q1, Q2) code the same latent variable as (R1, R2) speed but differ in
the variable that differentiates the two movies (local contrast).

Movie 1
Movie 2

R2

(Q1, Q2)
projected
on (R1,
R2).

(P1, P2)
projected
on (R1,
R2).
R2

Movie 1 or
Movie 2

R1

R1

Now if I have a population of neurons that code the same latent variable (speed), but are not identical (as
in they differ from the ideal speed filter because they code local contrast, illuminance, texture, etc.
differently). To such a population if we present a movie with some speed (S1 or S2). The population
response would be something like the following:

R2

R1
Colors represent filters with different non-latent
variables.

Now if I have a population of neurons that code the same latent variable (speed), but are not identical (as
in they differ from the ideal speed filter because they code local contrast, illuminance, texture, etc.
differently). To such a population if we present a movie with some speed (S1 or S2). The population
response would be something like the following:

This will provide a unique decoding of the speed.


R2
Notice that by population, I do not mean multiple
identical copies of the ideal filter. (That would only
suppress intrinsic noise). By population I mean a set of
filters that code the same latent variable, but are not
identical.

R1
Colors represent filters with different non-latent
variables.

You might also like