What IT does
Jeff Markowitz | June 9, 2009
First, a hearty welcome to Ethan, you’re starting to make this whole enterprise a little less incestuous! Anyway, your recent post raises a number of interesting issues regarding inferotemporal cortex (IT), most prominently: how does IT learn to do what we think it does?
I’d first like to address what we think IT does, which is a step I find myself skipping quite a lot (awful scientist am I!). Based a number of classical studies which compared lesions of IT with lesions of parietal cortex, for example, it was determined that IT mediated some form of visual discrimination and perhaps limited `size constancy’, or at least was a key pathway in whatever area in fact does this (see here, here, and here, for instance). The presumption, based on newer electrophysiology in macaque TE and TEO (analogous to anterior IT, ITa, and posterior IT, ITp, respectively) is that IT performs some sort of hashing to signal the presence of an object across sizes, retinal translations, clutter conditions, whatever.
Even if the firing rate of an IT cell changes across these different visual transformations, it is assumed that the ordinality of firing rate is key in maintaining some sort of invariant signal. I don’t know quite what to think about this, since the preservation of rank ordering in firing rate across transformations could be due to a number of things other than this cell prefers that object over that object and is thus a hash for object a. In other words, this doesn’t show, to me at least, that IT is building an object code as opposed to simply responding in a precise way to the visual stimulus as it is presented via V4 and V2 (though newer anatomy shows the connectivity to be frustratingly complex).
In the absence of lesioning studies that show deficits in these specific capacities, it seems that considering IT the locus of visual recognition invariance is a conjecture based on IT, at least anteroventral IT, being the last `purely visual’ area in the ventral stream and the intuition that we can recognize objects no matter where they are in the visual field, which might not also be completely true, and on a lot of hard-to-interpret single unit and optical recordings. Further, the primate physiology shows an enormous amount of variance, with really no constraints of what sort of stimuli to use, which has ranged from oriented bars to Fourier descriptors to toys. This particular point makes the data especially hard to interpret from a computational standpoint. Does the area provide a hash table or objects to support recognition invariance, or might we think of it as another `filter’ (in the loosest, we-call-V1-a-filter, sense of the word)? What about the persistence of mildly stimulus-selective activity in IT? Does that mean IT is a memory-related area or simply supports visual memory in perirhinal, entorhinal, or the inferior convexity of prefrontal cortex (PF). Does it switch from a simple filter to a memory mode when visual working memory is required? Maybe it’s a lot more general purpose than anyone has yet considered. Without a comprehensive investigation of the stimulus and task space for IT, I take it that there simply isn’t enough data out there to really nail the function of IT. (Unfortunately, like any unchecked cynic, I can’t back this cynicism up with a prescription for how to fix things!)
That being said, temporal contiguity learning seems to be a fascinating variant on an older line of research on persistent activity in IT (here, here, and here), and PF (here and here). That is, during a delayed matching-to-sample task, after the presentation of a sample stimulus (i.e. during the blank interval) IT cells showed stimulus-selectivity activity. Perhaps a visual representation of some sort is maintained after the presentation of a stimulus for comparison by IT or another brain area. Anyway, perhaps some form of plasticity allows a persistently active cell to learn the representation of a subsequently presented stimulus, as shown by Li and DiCarlo. What is not obvious, to me at least, is whether this form of learning underlies invariant recognition. The Li and DiCarlo study demonstrates that a cell’s preference might change in the case of a mid-saccade switch if a “preferred” object is first presented in the periphery, but what about the case of smooth tracking? Keeping an object in the foveal region is a likely way for some brain area to form associations between views of the same object, but I haven’t seen any data to show where this may happen. Indeed, the study shows that IT cells can be highly plastic in a way I haven’t seen before, but does this show that IT leverages temporal contiguity to form invariant object representations, as hypothesized in a few older models of the ventral stream, e.g. Wallis and Rolls’ work?
Image from Flickr user mandj98






