A key bottleneck in today’s biology is the interpretation and
integration of exponentially growing genomic data. We are interested
both in developing computational methods and tools for analysis of the
genomic data and in experimental testing (through collaborations) the
predictions of our models. Our research focus is on the analysis of
microarray data and on understanding regulation of the large-scale
cellular processes, especially of the eukaryotic cell cycle and the
Multilevel study of the eukaryotic cell cycle regulation.
Misregulation of the cell cycle is implicated in many diseases and
biological problems. Control of cell division is particularly
important: uncontrollable and persistent cell divisions are observed in
cancer, but rapid and controlled cell division is essential in wound
healing. Eukaryotic cell division is regulated at many levels: gene
transcription, protein production, localization, modification, and
degradation. The precise timeline of cell-cycle gene expression has
been revealed in our previous work. Now we are interested in gaining
similarly precise insight into temporal orchestration of binding of
cell-cycle transcription factors in vivo. We are also
interested in investigating, both computationally and experimentally,
proteome dynamics during the cell cycle and studing all the layers of
the cell cycle regulation together. I am interested in evolution of
the cell cycle regulation and in general principles of regulation of
the large-scale cellular processes.
Identification of basic regulatory modules in gene expression data.
Cellular phenotypes are often determined by multiple genetic and
environmental factors. In many cases underlying basic mechanisms cannot
be identified by experimental isolation of these factors (For
example, observed gene expression levels are results of both a given
biological process, of a cell-line (or strain) used and of growth
conditions). Such environmental or experimental differences can hinder
the analysis of thousands of data sets already gathered in databases.
I plan to develop methods capable of computationally dissecting and
identifying responses to different experimental factors and
activation of transcription factor complexes. Our approach is based
on Independent Component Analysis (ICA), a relatively new and
powerful statistical method for revealing hidden factors that
underlie sets of measurements. ICA is designed especially to
analyze data that contain significantly non-Gaussian components, such
as typical microarray data.
Understanding protein stability and modeling proteome dynamics.
Protein regulation through selective degradation is a key cellular
mechanism. Large scale experimental measurements of protein stabilities
(the protein half-lives in vivo) have become
available only very recently. Consequently, the protein
sequence-stability relationship is not well understood. Examples of the
questions we are interested in are: How is protein stability encoded
in its sequence? Is there evolutionary pressure on conserving
protein stability? Global analysis of mRNA expression is widely
performed, but because of differences in translation rates and
stability between proteins, mRNA levels only roughly approximate the
levels of the corresponding protein. Protein half-lives in vivo,
either measured or predicted from the sequence,can be used,
together with easily accessible expression data, to model dynamical
abundances of the corresponding proteins. Even without taking into
account posttranslational modifications, we expect such model to
provide significantly better predictions of the dynamics of the
proteome than implicitly relying on mRNA abundances as surrogates,
which is a common practice today.