• Home
  • DARPA SyNAPSE
  • Business-minded
  • Compute Me
  • Brainplug
  • Biophys-Ed

An attractive IT

Jeff Markowitz | March 12, 2009

Oh how beautiful!

Most researchers presume that the meat of visual object recognition occurs in inferotemporal cortex (IT), though there is nothing near a consensus on how this is done (i.e. the, eh em, how the meat is prepared). Some claim that the firing of IT cells, in particular cells in anterior IT (ITa), represent categories of objects. That is, a cell might fire for cats and another dogs, responding in the same way to different retinal images from one category. This sort of simplistic view seems approximately correct given the volume of data amassed over the past 30 years in monkey electrophysiology, but the evidence remains frustratingly indirect. Only a few things are certain: (1) ITa cells love “complex” objects (i.e. something more complicated than an oriented bar) and (2) they appear to have large receptive fields relative to striate cortex. How these characteristics lead to the formation of category representations in IT is a mystery, and it will probably stay that way until we find better ways to look at IT cells, perhaps using dual-photon calcium imaging. Current electrophysiological methods can only record from tens of nearby cells at the most, and imaging methods don’t have the resolution to tell us what particular cells are doing at the millisecond time scale.

A recent article from Akrami et al. attempts to approach this issue through both electrophysiology and modeling.  Whereas most physiology studies in IT use a large set of natural images separated into categories (e.g. horses or planes), Akrami et al. took predetermined pairs of images and tested to see which of the two was most effective at driving an IT cell. Then, they used MorphX, a simple open source program, to create images that are varying combinations of the effective and ineffective stimulus in each pair. For instance, a pair might include a boxing glove and a face, which would lead to the presentation of a series of images that morph from the boxing glove to the face. The effect is difficult to describe, so I highly recommend checking out some demos on the MorphX website.

They found a very curious property in IT cells: their response almost linearly co-varied with the degree of morph away from the effective to the ineffective stimulus. So, if a cell responds preferentially to the boxing glove, then its firing rate dies down the further the stimulus is morphed toward the face. The result is quite intuitive and supports the hypothesis that IT cells code for prototypes of visual categories, and their firing rate is proportional to the distance from that prototype. If the face happened to be inordinately ugly and looked like a boxing glove, then the firing rate may have only changed slightly across the morphed images. The study is quite clever in this regard, though the concept of a morphed image is ambiguous. To start, how should we quantify the degree of morphing? Some have dreamt up interesting metrics for faces, but it’s hard to wrap your head around the idea of a “face space”, let alone an “object space”. To constrain a mathematical model, it seems necessary to have an actual measurement of the degree of morphing, some kind of number, rather than “this picture is more morphed than that picture.” Still, I don’t want to get too wrapped up in this detail at the expense of the broader implications of the study.

Coding visual prototypes could provide a way to compress mounds and mounds of retinal images into a distributed representation of a category in IT. Similar to the use of membership in fuzzy set theory, each IT cell could code the degree to which a retinal image falling under its receptive field participates into its prototype. This could potentially explain the difficulty we all have identifying morphed images that lie “halfway” between two other images. In terms of firing rate or whatever code IT uses, there may be no clear winner in the electrical activity, thus making recognition of that code difficult. Reducing this a bit, if two cells fire at 15 Hz for two different objects, then how well could a simple classifier, maybe prefrontal cortex (PFC), determine the category?

The famed Lorenz attractor

Akrami et al. use these ideas to push for an attractor network, which implements the notion of prototypes using the mathematical formalisms of dynamical systems. First explicitly realized as Hopfield nets (though very similar work was conducted almost a decade beforehand by Grossberg, here and here), a memory could be a simple point in an N-dimensional space, and that memory is more strongly evoked the closer an input is to that point. So, an image of 90 x 90 pixels could be a visual prototype, encoded as a point in 8100-dimensional space (if the RGB values are averaged). Then, an image from the retina is represented as another point in this ridiculously high-dimensional space, with the distance between the two points acting as a reflection of their similarity. In terms of the experiment, the further one morphs an image away from the effective stimulus, the further the distance between a cells’ memory and the retinal image in this high dimensional space, a distance perhaps encoded by an IT cell.

Their simulations with slightly-modified attractors model the data rather convincingly; that is, the activity of each node in the attractor network looks like the firing rate of an IT cell during the experiment. All the same, it’s hard to say what in fact modelers should model in the case of IT. If we assume that IT cells code one prototype each, then what would these prototypes look like if we could decode their synaptic memories? If there is some fundamental unit in IT like the oriented bar for V1, then is it like a boxing glove? In other words, if we use some nodes in a neural network to code a prototype, some point in N-dimensional space, then what would that point be for IT? Akrami et al.’s study provides a good (though certainly not final) step in identifying IT as a visual memory center that uses prototypes to represent each memory. The next masochistic-ly difficult step is to find out exactly what those prototypes are.

Cerebral Cortex, 2009. DOI:  10.1093/cercor/bhn125

(First image from me using Matlab, GIMP, and boredom; second image is licensed under GNU FDL from the Wikipedia entry for Lorenz attractor)

Categories
Uncategorized
Tags
it, object recognition
Comments rss
Comments rss
Trackback
Trackback

« Money on the brain Being a robot with good “intentions” »

Leave a Reply

Click here to cancel reply.

Jump to

About Neurdon
About SyNAPSE
Contact
Contributors
Editors
Glossary
Neurdon Merch

Tags

adaline adaptive resonance theory arm processor artificial intelligence auditory cat brain cochlear implant consciousness continous firing neurons controller cortical column DARPA DARPA SyNAPSE Dharmendra Modha events Excitatory Postsynaptic Potentials FACETS flash memory global workspace theory Greg Snider hearing HP HRL IBM Inhibitory Postsynaptic Potentials iSLC it Izhikevich law and robotics learning Leon Chua markram MATLAB MATLAB code Melanie-Mitchell memristor memristors Minsky modha modular robotics money Moore's Law Narayan Srinivasa neural engineering neural prosthesis neuromorphic technology NSF object recognition poggio rat brain rate-based models Ray Kurzweil riesenhuber robot robotics robotic weapons sensory fusion serre software SPICE model spike-based models spiking neurons Stanley Williams stdp super computer supercomputer synaptic plasticity time as supervisor

Blogroll

  • CELEST
  • CNS Tech Lab
rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox