Track: Spotlights

A Bayesian Framework for Cross-Situational Word-Learning

Michael C Frank · Noah Goodman · Josh Tenenbaum

For infants, early word learning is a chicken-and-egg problem. One way to learn a word is to observe that it co-occurs with a particular referent across different situations. Another way is to use the social context of an utterance to infer the intended referent of a word. Here we present a Bayesian model of cross-situational word learning, and an extension of this model that also learns which social cues are relevant to determining reference. We test our model on a small corpus of mother-infant interaction and find it performs better than competing models. Finally, we show that our model accounts for experimental phenomena including mutual exclusivity, fast-mapping, and generalization from social cues.

Comparing Bayesian models for multisensory cue combination without mandatory integration

Ulrik Beierholm · Konrad P Kording · Ladan Shams · Wei Ji Ma

Bayesian models of multisensory perception traditionally address the problem of estimating a variable that is assumed to be the underlying cause of two sensory signals. The brain, however, has to solve a more general problem: it has to establish which signals come from the same source and should be integrated, and which ones do not and should be segregated. In the last couple of years, a few models have been proposed to solve this problem in a Bayesian fashion. One of these has the strength that it formalizes the causal structure of sensory signals. We describe these models and conduct an experiment to test human performance in an auditory-visual spatial localization task in which integration is not mandatory. We find that the causal Bayesian inference model accounts for the data better than other models.

Congruence between model and human attention reveals unique signatures of critical visual events

Robert J Peters · Laurent Itti

Current computational models of bottom-up and top-down components of attention are predictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, and looking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibit reliable temporal signatures during critical event windows in the task sequence---for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically, we find that an event detector based on fused behavioral and stimulus information (in the form of the model's predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone (model saliency maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings.

Learning Visual Attributes

Vittorio Ferrari · Andrew Zisserman

We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as red',striped', or `spotted'. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of {\em any} two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.

Object Recognition by Scene Alignment

Bryan C Russell · Antonio Torralba · Ce Liu · Rob Fergus · William Freeman

Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge in object recognition. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a deceptively simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images (LabelMe). This provides us with a set of retrieval images, providing hypotheses for object identities and locations. We then transfer the labelings from the retrieval set. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database.

Retrieved context and the discovery of semantic structure

Vinayak Rao · Marc Howard

Semantic memory refers to our knowledge of facts and relationships between concepts. A successful semantic memory depends on inferring relationships between items that are not explicitly taught. Recent mathematical modeling of episodic memory argues that episodic recall relies on retrieval of a gradually-changing representation of temporal context. We show that retrieved context enables the development of a global memory space that reflects relationships between all items that have been previously learned. When newly-learned information is integrated into this structure, it is placed in some relationship to all other items, even if that relationship has not been explicitly learned. We demonstrate this effect for global semantic structures shaped topologically as a ring, and as a two-dimensional sheet. We also examined the utility of this learning algorithm for learning a more realistic semantic space by training it on a large pool of synonym pairs. Retrieved context enabled the model to “infer” relationships between synonym pairs that had not yet been presented.

Sequential Hypothesis Testing under Stochastic Deadlines

Peter Frazier · Angela Yu

Most models of decision-making in neuroscience assume an infinite horizon, which yields an optimal solution that integrates evidence up to a fixed decision threshold. However, under most experimental as well as naturalistic behavioral settings, the decision has to be made before some finite deadline, which is often experienced as a stochastic quantity, either due to variable external constraints or internal timing uncertainty. In this work, we formulate this problem as sequential hypothesis testing under a stochastic horizon. We use dynamic programming tools to show that, for a large class of deadline distributions, the Bayes-optimal solution requires integrating evidence up to a threshold that declines monotonically over time. We will use numerical simulations to illustrate the optimal policy in the special cases of a fixed deadline and one that is drawn from a gamma distribution.

Subspace-Based Face Recognition in Analog VLSI

Miguel E Figueroa · Gonzalo Carvajal · Waldo Valenzuela

We describe an analog-VLSI neural network for face recognition based on subspace methods. The system uses a dimensionality-reduction network whose coefficients can be either programmed or learned on-chip to perform PCA, or programmed to perform LDA. A second network with user-programmed coefficients performs classification with Manhattan distances. The system uses on-chip compensation techniques to reduce the effects of device mismatch. Using the ORL database with 12x12-pixel images, our circuit achieves up to 85\% classification performance (98\% of an equivalent software implementation).

Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity

Robert Legenstein · Dejan Pecevski · Wolfgang Maass

Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how local learning rules at single synapses support adaptive changes in complex networks of spiking neurons. However the potential and limitations of this learning rule could so far only been tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allow us to derive concrete conditions under which the convergence of reward-modulated STDP can be predicted. In particular, we can produce in this way a theoretical explanation and a computer model for a fundamental experimental finding on reinforcement learning in monkeys by Fetz and Baker. We also report results of computer simulations that have tested further predictions of this theory.

Main Navigation

Session

Spotlights

Chris Williams