Research Interests
My main research interest is in identifying
causal interactions in biological systems from whole-genome studies. We
use two approaches to infer causation: (1) chronology of events
observed in dynamical systems, especially time-dependent gene
expression, and (2) non-Gaussian patterns in joint probability
distributions.
Timing of cell-cycle regulated gene transcription.
We develop
computational methods of analysis of timecourse gene expression data,
based on MAP optimization and the Maximum Entropy principle. We have
designed and implemented an algorithm which deconvolves the measured
culture-average profiles and allows to recover the single cell
expression profile for each cell-cycle regulated gene. Peaks
of transcripts regulated by the yeast cell cycle were recovered with
a precision an order of magnitude better than the resolution of the
source data. We have
identified a previously undescribed, pre-replicative (G1/P) wave of
transcription of cell-cycle genes. Our results have provided new
insight into the assembly and dynamics of molecular complexes involved
in the mitotic cell division (e.g. MCM, ORC), as well as allowed us
to discover transcriptional regulation of genes previously thought to
be constitutively expressed, as Cdc28/Cdk1, the master cell cycle regulator.
Cell-cycle regulation in different species and conditions.
We apply the deconvolution method to comparing
the temporal organization of cell-cycle events in different species and
under different experimental conditions (e.g. high- and low-
nutrient, healthy and disease states, cell cultures representing
individuals of different ages). We identify the preserved regulatory
modules. By analyzing the data in context of regulatory motifs in the
untranslated regions of the genes, we reconstruct the transcription
factor activity and its evolution or dependence on environmental
factors.
Spatiotemporal Organization of Somitogenesis.
Generation of somites in a vertebrate embryo is
dependent on waves of gene expression which exhibits periodicity in the
spatiotemporal domain. We are developing methods of data analysis
tailored to microarray data collected in such systems. The algorithms
include methods of detecting regulated genes, as well as
reconstructing the underlying spatiotemporal expression patterns
using a maximum a posteriori deconvolution procedure, similar to the one applied to the yeast cell-cycle.
Inferring causation in protein networks from non-Gaussian probability distributions
Reconstructing protein networks is important for
selecting candidate biomarkers and targets for drugs. The task is
facilitated if the directionality (or causality) of interactions is
known. We are working on inferring causal interactions in protein
networks without the need for experimental interventions, by
identifying asymmetric features in joint distributions of expression
levels of pairs of genes, collected in a large number of conditions. We
select and calibrate various statistical measures of asymmetry,
using known interactions in yeast and human protein networks as
training sets.
Online Tools for Analysis of Time Course Gene Expression Profiles
SCEPTRANS is a comprehensive on-line tool for analysis
of microarrays from cell-division and metabolic cycles in the
budding yeast. We are expanding this project into a general
repository of periodic expression profiles, including data from
different processes, such as circadian rhythms, sleep phases and
organism development, supplemented with relations based on
ontologies, evolutionary homology, regulatory motifs and profile
clustering.
Other
I am also interested in the numerical and statistical methods of model building and phasing in X-ray crystallography.