Holistic Scene Understanding

Stephen Gould, Tianshi Gao, Pawan Kumar, Ben Packer, Daphne Koller


Figure 1. Example Scene Decompositions with Semantic and Geometric Overlays.

Many works in computer vision attempt to tackle problems that form key components in the scene interpretation task: object recognition, image segmentation and 3D reconstruction. However, the vast majority of the work has focused on solving each component in isolation. This approach has allowed researchers to focus their efforts on engineering solutions to each of these key problems, resulting in dramatic improvements in our ability to tackle them. However, this divide-and-conquer approach has two main limitations. First, because there is no global understanding of the scene structure, many methods make errors that appear ridiculous to a human, such as segmenting a person's head as part of the background, or detecting a cow in the shadows in the grass. Second, it is only by providing a consistent set of answers to all of these problems that we can provide a coherent interpretation for an entire scene. In this project, we develop an integrated probabilistic model that provides a consistent, semantic interpretation of all components of an outdoor scene.

Figure 2. Example 3D Geometric Reconstructions (click on image to play).

More coming soon...

related publications

P. Kumar and D. Koller. Efficiently Selecting Regions for Scene Understanding. Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
S. Gould, T. Gao, D. Koller. Region-based Segmentation and Object Detection. Proceedings of Advances in Neural Information Processing Systems (NIPS), 2009.
S. Gould, R. Fulton, D. Koller. Decomposing a Scene into Geometric and Semantically Consistent Regions. Proceedings of International Conference on Computer Vision (ICCV), 2009. [pdf]
G. Heitz, S. Gould, A. Saxena, D. Koller. Cascaded Classification Models: Combining Models for Holistic Scene Understanding. Proceedings of Advances in Neural Information Processing Systems (NIPS), 2008.

datasets and other resources

Stanford Background Dataset containing 715 images of outdoor scenes from Gould et al. (ICCV 2009). [.tar.gz | more]