From natural scene statistics to models of neural coding and representation (part 1)

Size: px

Start display at page:

Download "From natural scene statistics to models of neural coding and representation (part 1)"

Jeffry Clark
5 years ago
Views:

Olshausen Helen Wills Neuroscience Institute, School

1 From natural scene statistics to models of neural coding and representation (part 1) Bruno A. Olshausen Helen Wills Neuroscience Institute, School of Optometry and Redwood Center for Theoretical Neuroscience UC Berkeley

2 Review article: What Natural Scene Statistics Can Tell Us about Cortical Representation Bruno A. Olshausen & Michael S. Lewicki To appear in: The New Visual Neurosciences Chalupa and Werner, Eds. MIT Press

3 Today s talk Why natural scene statistics? - (a bit about biology) Theory of Redundancy Reduction Sparse Coding

4 What are the principles of computation and representation governing this system?

5 Natural images are full of ambiguity

6 Natural images are full of ambiguity

7 Vision as inference lens World Image Model

8 Visual cortical areas - macaque monkey

9 Visual cortical areas 46 TF TH STPa 7a FEF STPp AITd CITv AITv CITd Faces/objects VIP LIP MSTd MSTI FST PITd PITv DP VOT Intermediate-level MDP MIP PO MT V4t V4 vision PIP V3A V2 V1 (courtesy of Jeff Hawkins)

13 1 mm 2 of cortex analyzes ca. 14 x 14 array of retinal sample nodes and contains 100,000 neurons (Anderson & Van Essen, 1995)

17 Anatomy of a synapse

18 The evolution of eyes Land & Fernald (1992)

20 Efficient Coding represent the most relevant visual information with the fewest physical and metabolic resources

21 Theory of Redundancy Reduction Attneave (1954) Some Informational Aspects of Visual Perception Barlow (1961) Possible Principles Underlying the Transformations of Sensory Messages - nervous system should reduce redundancy - makes more efficient use of neural resources - enables storing information about prior probabilities since P (x) =Π i P (x i ) suspicious coincidences

22 From theory to models Laughlin (1981) - histogram equalization Laughlin, Srinivasan and Dubs (1982) - Predictive Coding: A Fresh View of Inhibition in the Retina Field (1987) - natural images have 1/f 2 power spectra Atick & Redlich (1992); van Hateren (1992; 1993) - whitening Dan, Atick, and Reid (1996) - LGN whitens natural movies

23 Natural scene statistics and visual coding f A B C D E F Aa:.:i (Field 1987)

24 Whitening (or decorrelation) theory (Atick & Redlich, 1992) 1/f image whitening filter decorrelated image amplitude amplitude x = amplitude frequency frequency frequency

27 Robust Coding Doi & Lewicki (2006) - A Theory of Retinal Population Coding sensory noise channel noise ν δ optical blur encoder decoder s H x W r A ŝ image observation representation reconstruction (a) 20dB 10dB 0dB -10dB (b) (d) Magnitude 20dB -10dB (c) Spatial freq.

Robust Coding b Karklin & Simoncelli (2011) - Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons a c 15 0 b 16 1 Figure 1: a.

28 Robust Coding b Karklin & Simoncelli (2011) - Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons a c 15 0 b 16 1 Figure 1: a. Schematic of the model (see text for description). The go transfer between images x and the neural response r, subjecttomet Information ON center about the stimulus isoff center conveyed both by the arrangement of the neural nonlinearities. Top: two neurons encode two stimulus c 16 an image, x 1 and x 2 )withlinearfilters(blacklines)whoseoutputisp functions (thick color lines; thin color lines show isoresponse con levels). The steepness of the nonlinearities specifies the precision represented: regions of steep slope correspond to finer partitioning o uncertaintyabout the input. Bottom: joint encoding leads to binning o the isoresponse lines above. Grayscale shading indicates the level of u of the input (lighter shades correspond to higher uncertainty). Efficien subject to input distribution, noise levels, and metabolic costs on the Parameter λ j specifies the trade-off between information gained by fi of generating them. It is difficult to obtain a biologically valid est ultimately, the value of sensory information gained depends ontheb [26]. Alternatively, we can use λ j as a Lagrange multiplier to enfor

29 Beyond efficient coding RR is appropriate when there is a bottleneck. But V1 expands dimensionality - many more neurons than inputs The real goal of sensory representation is to model the redundancy in images, not necessarily to reduce it (Barlow 2001) What we desire is a meaningful representation. RR provides a valid probabilistic model only when the world can be described in terms of statistically independent components. To understand cortical representation we must appeal to a different principle.

30 V1 is highly overcomplete LGN afferents IVb layer 4 cortex Barlow (1981) 0 1mm C I b and IV

31 Sparse, distributed representation a i I(x,y) Provides a way to group things together so that the world can be described in terms of a small number of events at any given moment. Converts higher-order redundancy in images into a simple form of redundancy.

32 Sparse vs. dense vs. grandmother cell codes Dense codes (ascii) Sparse, distributed codes Local codes (grandmother cells) High combinatorial capacity (2 N ) - Difficult to read out + Decent combinatorial capacity (~N K ) + Still easy to read out - Low combinatorial capacity (N) + Easy to read out

33 Gabor-filter response histogram

34 ... Sparse coding External world Internal model image model I(x,y) φ i (x,y) ai image neural features other activities stuff (sparse)

35 Learned dictionary Φ (critically sampled) 180 orientation bandwidth (degrees) radial bandwidth (octaves)

36 Effect of overcompleteness and hard sparsity (Rehn and Sommer 2006)

37 Learned dictionary 10x overcomplete (joint work with David Warland, UC Davis)

38 Sparsity 10% 5% 1% 27db 20db 8.2db 19db 13db 5.3db 13db 8.5db 5x 2.5x 180 orientation bandwidth (degrees) Over-completeness 10x 160 SNR: 3.2db radial bandwidth (octaves) 2.5

39 10x, 1%

40 Blob Ridge-like Grating

41 Sparsity 10% 5% 27db 20db 19db 13db 13db 8.5db blob 1% 8.2db grating ridge-like 5.3db 5x 2.5x 180 orientation bandwidth (degrees) Over-completeness 10x 160 SNR: 3.2db radial bandwidth (octaves) 2.5

42 Solutions are stable

43 Tiling properties: Blobs (highest spatial-frequency band only) y position x position

44 Tiling properties: Ridge-like orientation spatial position

Sparse coding of time-varying images " # #!$!&! % $ # '$(%(!)!&* t % $ +#$%%%!$!"!"! Speed vs. direction 90 3 120 60 3 Speed vs.

45 Sparse coding of time-varying images " # #!$!&! % $ # '$(%(!)!&* t % $ +#$%%%!$!"!"! Speed vs. direction Speed vs. spatial frequency I(x, t) = i a i (t) φ i (x, t)+(x, t)! Speed (pixels/frame) Spatial frequency (cy/pixel)

46 Learned basis space-time basis functions (200 bfs, 12 x12 x 7)

Sparse coding and reconstruction 2 sparsified 0 amplitude 2

47 Sparse coding and reconstruction 2 sparsified 0 amplitude convolution time (sec)

48 Extensions to color, disparity Wachtler, Lee and Sejnowski (2001), Hoyer & Hyvarinen (2000)

49 Nonlinear encoding Solutions may be computed by a network of leaky integrators and threshold units (Rozell et al. 2008) Explaining away Feedforward response ("#) Sparsified response ($#) g g g g g!!!!!!!!!!

50 Nonlinear encoding explaining away explains ncrf effects Lee et al. (2007), Zhu & Rozell (2010) end-stopping contrast-invariant tuning c =0.1 c =0.2 c =0.3 c =0.4 c =0.5 response 3 2 Response (a) target competing (a) Bar length (pixels) (b) (a) Orientation (deg) (b) surround suppression Iso surround Orth surround Uniform (a) Center Orth surround Iso surround Response Iso surround Orth surround Uniform Surround contrast (b) Center Orth surround Iso surround Response (a) (c) Center contrast (d)

51 Energy-based models Osindero, Welling and Hinton (2005) A 1.0x B ICA P (x) = 1 Z e E(x) E(x) = M i=1 α i log 1+ (J i x) Beta=5 Beta=2 Beta=1/ C 1.7x D 2.4x

Perception as an Inference Problem

Perception as an Inference Problem Bruno A. Olshausen Helen Wills Neuroscience Institute, School of Optometry and Redwood Center for Theoretical Neuroscience UC Berkeley What are the principles governing