Learning Models of Biological Data

Advances in DNA microarray technology and sequencing techniques are producing a wealth of biological data sets on a genome-wide scale. A key challenge is the development of methodologies that are both statistically sound and computationally tractable for inferring biological insights from these large datasets. We are developing probabilistic models for analyzing biological data using Probabilistic relational models (PRMs) -- an extension of Bayesian networks to a relational setting, where we have multiple interdependent objects. Using PRMs, we can incorporate multiple sources of data such as gene expression patterns, experimental or clinical data, cellular phenotypes, sequence data, protein 3D structural information, functional information and more, into the analysis. This enables us to build richer models that are more suitable for this complex domain.

Models for Genomic Data (Gene Expression, Sequence, Protein-Protein Interaction)