Current Projects


Systematic Analysis of Breast Cancer Morphology Uncovers Stromal Features Associated with Survival

about The morphological interpretation of histologic sections forms the basis of diagnosis and prognostication for cancer. In the diagnosis of carcinomas, pathologists perform a semiquantitative analysis of a small set of morphological features to determine the cancer’s histologic grade. Physicians use histologic grade to inform their assessment of a carcinoma’s aggressiveness and a patient’s prognosis. Nevertheless, the determination of grade in breast cancer examines only a small set of morphological features of breast cancer epithelial cells, which has been largely unchanged since the 1920s. A comprehensive analysis of automatically quantitated morphological features could identify characteristics of prognostic relevance and provide an accurate and reproducible means for assessing prognosis from microscopic image data. We developed the C-Path (Computational Pathologist) system to measure a rich quantitative feature set from the breast cancer epithelium and stroma (6642 features), including both standard morphometric descriptors of image objects and higher-level contextual, relational, and global image features. These measurements were used to construct a prognostic model. We applied the C-Path system to microscopic images from two independent cohorts of breast cancer patients [from the Netherlands Cancer Institute (NKI) cohort, n = 248, and the Vancouver General Hospital (VGH) cohort, n = 328]. The prognostic model score generated by our system was strongly associated with overall survival in both the NKI and the VGH cohorts (both log-rank P < 0.001). This association was independent of clinical, pathological, and molecular factors. Three stromal features were significantly associated with survival, and this association was stronger than the association of survival with epithelial characteristics in the model. These findings implicate stromal morphologic structure as a previously unrecognized prognostic determinant for breast cancer.
people Andrew H. Beck, Ankur R Sangoi, Samuel Leung, Robert J. Marinelli, Torsten O. Nielsen, Marc J. van de Vijver, Robert B. West, Matt van de Rijn, Daphne Koller,

Holistic Scene Understanding

about Many works in computer vision attempt to tackle problems that form key components in the scene interpretation task: object recognition, image segmentation and 3D reconstruction. However, the vast majority of the work has focused on solving each component in isolation. This approach has allowed researchers to focus their efforts on engineering solutions to each of these key problems, resulting in dramatic improvements in our ability to tackle them. However, this divide-and-conquer approach has two main limitations. First, because there is no global understanding of the scene structure, many methods make errors that appear ridiculous to a human, such as segmenting a person's head as part of the background, or detecting a cow in the shadows in the grass. Second, it is only by providing a consistent set of answers to all of these problems that we can provide a coherent interpretation for an entire scene. In this project, we develop an integrated probabilistic model that provides a consistent, semantic interpretation of all components of an outdoor scene.
people Stephen Gould, Tianshi Gao, Pawan Kumar, Ben Packer, Daphne Koller

Pathways Understanding

about Significant insight about biological networks arises from the study of network motifs small wiring patterns that are overly abundant in the network. However, wiring patterns, like a street map, only reflect the set of potential routes within a cellular network, but not when and how they are used within different cellular processes. Here, we introduce activity motifs, which, like traffic flow, reflect dynamic patterns that are abundant relative to the given network, and use them to study the timing of transcriptional regulation in Saccharomyces cerevisiae metabolism. Specific timing activity motifs, reflecting ordered transcription, are enriched in cellular responses to changing conditions: Linear pathways are enriched for forward activation patterns to produce metabolic compounds efficiently; backward activation to rapidly initiate the production of a critical substrate; and backward shutoff to rapidly stop production of a detrimental product. Branching pathways are enriched for synchronized activation of dependent co-production. We validate our model by measuring protein abundance over a time course, showing that our inferred mRNA timing motifs also occur at the protein level. We also find binding activity motifs, where the genes in a linear chain have ordered binding strength to a particular transcription factor; these binding activity motifs overlap significantly with the timing activity motifs, suggesting a specific biochemical mechanism for ordered transcription. The results show that finely-timed transcriptional regulation is abundant in the yeast metabolic network, and is likely to play a role in its adaptation to new environmental conditions. More generally, the framework of activity motifs is applicable for analyzing a variety of biological networks and functional data, and may be useful in elucidating a broad range of cellular functions. See the accompanying webpage
people Gal Chechik, Daphne Koller

Pathways Reconstruction

about The set of cellular metabolic reactions forms a complex network of interactions, but even in well studied organisms the resulting pathways contain many unidentified enzymes. We study how 'structural' relations between genes in the yeast metabolic pathway are manifested in functional properties of genes and their products, including mRNA expression, protein domain content and cellular localizations. We develop compact and interpretable probabilistic models for representing protein-domain co- occurrences and gene expression time courses. Our models for completing unidentified enzymes in the pathways, achieving accuracy that is significantly superior to existing state-of-the-art approaches.
people Gal Chechik, Daphne Koller

Study of Protein-Protein Interactions Using Probabilistic Graphical Models

about Protein-protein interactions are central to all cellular processes. Discovery of mechanisms underlying protein interaction network will allow for meaningful predictions about the functions of cellular proteins, with possible applications to drug design. We are using probabilistic models to extract patterns from genomic data and make accurate predictions on protein-protein interactions.
people Haidong Wang, Daphne Koller
more info Protein-protein interactions project page

Shape Models for Object Recognition

about We consider the important challenge of recognizing a variety of deformable object classes in images. Of fundamental importance and particular difficulty in this setting is the problem of "outlining" an object, rather than simply deciding on its presence or absence. A major obstacle in learning a model that will allow us to address this task is the need for hand-segmented training images. In this paper we present a novel landmark-based, piecewise-linear model of the shape of an object class. We then formulate a learning approach that allows us to learn this model with minimal user supervision. We circumvent the need for hand-segmentation by transferring the shape "essence" of an object from drawings to complex images. We show that our method is able to automatically and effectively learn, detect and localize a variety of object classes.
people Geremy Heitz, Gal Elidan, Daphne Koller

Past Projects


Acting Rationally with Incomplete Utility Information

about Traditional decision theory assumes a probability distribution over possible states and full knowledge of the user's utility function over these states. In many problems, however, the utility information is unavailable or too complex to be elicited fully. We extend the notion of rational decision making to deal with such cases.
people Urszula Chajewska, Daphne Koller
more info Urszula's home page

Active Learning

about With Active Learning one allows the learner the flexibility to choose the data instances that it feels are most relevant to learn a particular task. We are investigating how active learning can substantially reduce the need for large quantities of data for classification, density estimation and discovering causal structure.
people Simon Tong, Daphne Koller
more info DAGS Active Learning Page
Simon Tong's Research Page

Continuous Time Bayesian Networks

about Continuous time Bayesian networks describe structured stochastic processes that evolve over continuous time. The state of the system is decomposed into a set of local variables whose values change over time. The dynamics of the system are described by specifying the behavior of each local variable as a function of its parents in a directed (possibly cyclic) graph. The model specifies, at any given point in time, the distribution over two aspects: when a local variable changes its value and the next value it takes. These distributions are determined by the variable's current value and the current values of its parents in the graph.
people Uri Nodelman, Christian Shelton, Daphne Koller

Game Theory

about Game theory is a framework for describing the interrelated behavior of multiple agents acting rationally. We are interested in compact representations for structured games, including Multi-Agent Influence Diagrams (MAIDs). We are developing algorithms to exploit this structure in order to compute equilibria efficiently for large games, of the sort that might occur in real-world settings.
people Ben Blum, Daphne Koller, Christian Shelton
more info Game Tracer Software

Hybrid Bayesian Networks

about Many real world problems are naturally described as hybrid systems, which contain both discrete and continuous components. Examples include fault diagnostics in physical systems, tracking human motions and more. We are exploring methods to deal with the challenging problems of represntation, inference and learning that come up in these systems.
people Uri Lerner, Daphne Koller
more info Uri Lerner's Publications Page

Learning Models of Biological and Medical Data

about We are developing probabilistic models for analyzing biological data using Probabilistic relational models (PRMs) - an extension of Bayesian networks to a relational setting, where we have multiple interdependent objects. Using PRMs, we can incorporate multiple sources of data such as gene expression patterns, experimental or clinical data, cellular phenotypes, sequence data, protein 3D structural information, functional information and more, into the analysis. This enables us to build richer models that are more suitable for this complex domain.
more info DAGS Learning Models of Biological and Medical Data Page

Markov Decision Processes

about Markov Decision Processes are formal models for problems in planning, control, and sequential decision making under uncertainty. In our work, we are mainly concerned with the learning of optimal controls from data and with exploiting structure for efficient computation. Focus is on multi-agent systems, partial observability, and continuous states.
people Carlos Guestrin, Christian Shelton, Daphne Koller
more info DAGS MDP Page

Probabilistic Relational Models

about Probabilistic Relational Models (PRMs) are a language based on relational logic for describing statistical models of structured data. PRMs model complex domains in terms of entities, their properties, and the relations between them. These models represent the uncertainty over the properties of an entity, capturing its probabilistic dependence both on other properties of that entity and on properties of related entities. PRMs can also represent uncertainty over the relational structure itself.
people Nir Friedman, Lise Getoor, Daphne Koller, Uri Nodelman, Avi Pfeffer, Eran Segal, Ben Taskar
more info DAGS PRMs Page