Object recognition in mobile robots

By Massimiliano Versace | June 15, 2012

If you want to design robots able to interact to the real world in a useful way, you will eventually bump into the problem of implementing robust object recognition, when by robust I mean able to recognize objects irrespective of (or at least able to tolerate variation in..) distance from the object, its orientation, illumination conditions, etc.

This post describes work done the Neuromorphics Lab, using the Cog Ex Machina software platform to recognize objects in an iRobot Create platform.

* Thanks to Jasmin Leveille for producing all the simulations, robotic demos, and much of the text below

First, a disclaimer: I work in that lab. Second, a bit of history may be useful. Many of you recall the DARPA SyNAPSE project, which initially sponsored HP, IBM and HRL in the design of brain-inspired hardware. Along with my colleagues of the Neuromorphics Lab, I was (and I am still...) collaborating with HP in the design of a software framework to exploit massively parallel processors (e.g., GPUs and GPU clusters) via the Cog Ex Machina (or Cog) software platform (although our group and HP parted way with SyNAPSE some time ago).

Now, the interesting stuff. After our initial effort to implement learning in a virtual world with Cog, we have turned our attention to learning to recognize objects. Among candidate learning models to be used to learn complex stimuli, such as objects, one which marry simplicity with performance is Contrastive Divergence. Well... this is not properly an "object recognition" algorithm, rather a learning algorithm that lacks the additional, and wonderful machinery that the brain uses to perceive, segment, group, and in short build a meaningful set of features that can be used to really make sense of complex visual scenes. But in simple environments and fairly simple cases, Contrastive Divergence would do the job.

Contrastive Divergence is a recently proposed method to train Products of Experts (PoE) models by approximating the gradient of the log-likelihood (Hinton, 2002). A restricted Boltzmann machine (RBM) is an example of a PoE in which each hidden unit corresponds to one expert. The topology of a RBM can be displayed as a two-layer neural network as in Fig.1.

Fig.1. A restricted Boltzmann machine.

How are units (or neurons) described in binary RBMs? The activity of each binary unit yj is set to 1 with probability:


where sigma is the logistic function, xi is the input from unit i, wij is the synaptic weight from unit i to unit j and bj is a bias (Hinton and Salakhutdinov, 2006). Synaptic weights can be trained by following the gradient of the log-likelihood L of the data:

where the first and second terms correspond to the expectations over the data distribution (Q0) and over the equilibrium distribution (Qinf), respectively. Unfortunately, evaluating the second term is very inefficient, and Hinton (2002) proposed instead computing the expectation over the distribution of the one-step reconstruction Q1, that is:

Although Eq.3 does not strictly follow the gradient of the log-likelihood (Carreira-Perpiñán and Hinton, 2005), learning the Contrastive Divergence was shown to be good at learning useful representations in various classification tasks (e.g. Hinton and Salakhutdinov, 2006). Which is the main reason why we used this algorithm as a first attempt!

Despite its simplicity, Eq.3 requires that all four quantities (i.e. the input and output activities under the two different distributions) be available at the same time for one iteration of learning. This leads to subtle difficulties when implemented on a software architecture – such as Cog – in which “transmission” delays are present.

Fig.2 shows how delay lines can be used to synchronize the four activities needed by Eq.3. Here, the two-layer Boltzmann machine is in fact implemented as a four-layer network, whose activities are denoted respectively as I, y, r and z. I and r correspond to the activity of the input layer when an input data vector is respectively clamped and reconstructed. y and z correspond to the activity of the hidden layer computed from the input data and the reconstructed data, respectively. Each of the four layers is assigned a delay line of specific length (1, 2, 3 and 4 respectively for z, r, y and I). The activity of a given layer x at a particular step of its delay line in response to the vector presented at time t is indicated in Fig.2 as x(t). Unlike the activity of the various layers, a single weight matrix w is used throughout all computations.

Fig.2. Overview of the implementation of the Contrastive Divergence algorithm on Cog.

As shown in Fig.2, the net effect of the delay lines is to synchronize the respective activities prior to a weight update.

To verify that the proposed delay lines lead to a suitable implementation of the Contrastive Divergence algorithm, one RBM was simulated on MNIST data. Fig.3 shows the resulting weights, which are comparable to those reported in previous reported implementations (Hinton, 2002).

Fig.3. Weights learned in a single RBM trained on MNIST data.

Given this initial success in learning digits, we decided to implement more realistic object recognition. The video below shows an RBM network successfully learning 8 objects in Cog Ex Machina. The scorekeeper indicates percent correct. The field "BP" is the ouptut of the network, and should match the desired activity shown in the field to its right (shown as "De"). Sample images and boundaries extracted (with some delay, which means the images don't match in this display) are shown at the top.

Finally, and more crucially, transferring this to the robot. We have chosen our favorite platform, the iRobot Create, for this task. In this example, the robot is told to rotate until it finds the lizard, one of the objects the robot has been trained to recognize. When the robot recognizes the lizard in its field of view, it goes forward for a few seconds toward it. Training is done here offline with standard backpropagation on eight object classes + a dummy class for the black background. The model is a feedforward deep network (one input layer [i.e. the image], 2 hidden layers, and one output layer). The input size is 160x160 (this is relatively big compared to many published object recognition works), hidden layers's sizes are 30x30, and output layer sizes 9 by 1.


  • Carreira-Perpiñán, M. Á. and Hinton, G. E. (2005).On contrastive divergence learning. Proc. 10th Int. Workshop on Artificial Intelligence and Statistics (AISTATS 2005), 59-66.
  • Hinton, G. (2002). Training Products of Experts by minimizing contrastive divergence. Neural Computation, 14, 1771-1800.
  • Hinton, G. and Salakhutdinov, R. R. (2006).Reducing the dimensionality of data with neural networks. Science, 313, 504-507.
  • About Massimiliano Versace

    Massimiliano Versace is co-founder and CEO of Neurala Inc. and founding Director of the Boston University Neuromorphics Lab. He is a pioneer in researching and bringing to market large scale, deep learning neural models that allow robots to interact and learn real-time in complex environments. He has authored approximately forty among journal articles, book chapters, and conference papers, holds several patents, and has been an invited speaker at dozens of academic and business meetings, research and national labs, and companies, including NASA, Los Alamos National Laboratory, Air Force Research Labs, Hewlett-Packard, iRobot, Qualcomm, Ericsson, BAE Systems, Mitsubishi, and Accenture, among others. His work has been featured in over thirty articles, news programs, and documentaries, including IEEE Spectrum, New Scientist, Geek Magazine, CNN, MSNBC and others. Massimiliano is a Fulbright scholar and holds two Ph.Ds: Experimental Psychology, University of Trieste, Italy; Cognitive and Neural Systems, Boston University, USA. He obtained his BS from University of Trieste, Italy.

    Leave a Reply

    Your email is never published nor shared. Required fields are marked *


    You may use these HTML tags and attributes:
    <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>