Research Themes

My primary research goal is to model the physical and semantic structure of the world, so that computers can better understand scenes from images. A good visual system should interpet images in terms of a physical space where objects interact. Our research focuses on modeling relations among objects, categories, and surfaces, so that the computer gradually becomes better at learning, predicting, and describing.

Recognizing Objects and their Properties

Improve generalization, robustness to unfamiliarity, and scalability of recognition by modeling objects in terms of their properties.

Spatial Understanding from Images

Recover a spatial layout of the scene and its objects from an image.

Physically Grounded Scene Interpretation

Interpret objects and their interactions within the physical space of the scene.

Sample of Projects

attributes_thumb Describing Objects by their Attributes

We shift the goal of recognition from naming to describing. By considering an attribute based representation, we are able to say more about familiar objects and gain information about unfamiliar objects. We introduce a new dataset for learning and evaluating attribute classifiers, which is available for download.

indoor_thumb Recovering the Spatial Layout of Indoor Scenes

We consider the problem of recovering the spatial layout of cluttered indoor scenes from monocular images. We gain robustness to clutter by modeling the global room space with a parameteric 3D “box” and by iteratively localizing clutter and refitting the box. Our dataset is available for download.

cvpr08_thumb Closing the Loop on Scene Interpretation

We model the interactions among occlusion, surface, object, and viewpoint inference, leading to modest quantitative and more substantial qualitative improvements. We also extend Automatic Photo Pop-up with occlusion, object, and viewpoint information, leading to more robust and detailed models. Finding more robust and flexible models for integrating diverse and numerous inference processes is a long-term interest.

cvpr08_thumb Recovering Occlusion Boundaries from a Single Image

We recover occlusion boundaries and figure/ground labels by reasoning about region similarity, 3D geometry, boundaries, and junctions. Code is available.

Putting Objects in Perspective

We propose a method for modeling objects, viewpoint, and surfaces in an integrated fashion. With a simple belief propogation framework, the viewpoint of the camera is recovered with high accuracy, and object detection performance improves considerably.

Recovering Surface Layout from a Single Image

We extend our framework from Automatic Photo Pop-up by subclassifying vertical regions, providing extensive quantitative evaluation, and demonstrating the usefulness of the geometric labels as context for object detection.

Automatic Photo Pop-up

Our system estimates which regions of an outdoor image correspond to ground, vertical objects, and sky. These geometric labels and allow us to construct a coarse 3D model of the scene.

Music Identification

We apply computer vision techniques to identify songs from a short, noisy audio snippet. This research is performed with Yan Ke and Rahul Sukthankar.

Sound Detection and Identification

Our goal is to detect and identify sound objects, such as car horns or dog barks, in audio. Our system, called SOLAR (sound object localization and retrieval) is the first, to our knowledge, that is capable of finding a large variety of sounds in audio data from movies and other complex audio environments.

Object-Based Image Retrieval

The goal is to retrieve images based on the appearance of the objects, such as elephants or race cars, contained in them. Our approach is to extend popular techniques for object detection to be capable of online training with only a few examples. The key idea is that, once the underlying statistical structure of the image domain is modeled, that structure and its parameters can be used to dramatically improve results in a probabilistic object- based image retrieval system.