Course Projects

Extracurricular projects for fun

Mini-projects

Learning Saliency Models

The aim of this project is to learn a visual saliency model from human eye movement data. The project is ongoing until the end of 2010.



Above: A visualization of the confusion matrix on one particular video dataset. I am currently investigating other datasets that do not contain many high-level objects with the hope of identifying good bottom-up saliency regions. For example, one particular hypothesis that is explored in "A Nonparametric Approach to Bottom-Up Visual Saliency"(W. Kienzle, NIPS 2006) is that people tend to look at center-surround regions.



Above: ICA basis computed on random pathes (left), and patches that people looked at (right). It is clear that there are some qualitative differences: mainly, the gaze basis contains more center-structure elements.


Above: The classifier can be used to give a probabilistic estimate of saliency for every patch in the image.
Preliminary details

This is a directed research project course that I am currently taking with Prof. Nando de Freitas.

We are interested in building a classifier that can identify salient patches in an image. Some of the questions that we wish to explore are:
- how feasible is the overall approach?
- is it better to train the classifier on a discrete set of images, or is video better? Does it matter?
- Can the temporal information in the video boost classification accuracy?
- How does the classification performance change if we apply a retina transform to our patches, averaging pixels that are further away from the gaze location?
- Are there differences in the ICA bases of random and gaze patches? Can we quantify them?

Preliminary results on two datasets that I ran so far are that 1. using a retina transform on a patch is a good idea. Or any other way to devote more of the feature vector modelling effort toward the center of the patch. 2. Including color consistently boosts the classification accuracy. 3. Gaze patch statistics are very different from random patch statistics, and contain more center surround structure. This is interesting since if ICA trained on gaze data shows more center surround filters than ICA on random patches, it might point to one of the sources of the discrepancies observed between filters learned by ICA, and those present in V1 cells of macaque monkeys. ( Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. )

I am currently collecting more quantitative data on the conclusions above, as well as conducting more experiments on different datasets, such as the one collected recently at MIT by Tilke Judd for her ICCV 2009 paper Learning to predict where humans look

Code

I uploaded the retina transform code I used for this project to MATLAB Central.

vision data-driven saliency bottom-up eyetracking