My research combines
mathematics, computer science,
statistics, in order to develop
algorithms with improved accuracy for
large-scale and complex estimation problems in
phylogenomics, multiple sequence alignment, and metagenomics.
I work especially on the hardest computational
problems in these areas, where
large dataset sizes and model complexity
makes existing approaches have insufficient
accuracy. For these problems, I develop
innovative strategies (often including
graph-theoretic algorithms that employ
divide-and-conquer, combined with
machine learning methods), develop software, analyze
biological datasets (in collaboration with biologists around the world), and
prove theorems about the methods we develop.
Recent and current NSF grants to support my work include:
also work in
Historical Linguistics, which seeks to estimate how language
families (e.g., Indo-European) evolved.
We use real data and perform massive
simulations to evaluate the performance
of methods that we develop, and also
collaborate closely with biologists and linguists
in data analysis.
Our current collaborations include the
Transcriptome Project) and the
These collaborations include data analysis and the
new methods for
estimating alignments and trees (both gene trees and
We welcome collaborations with biologists who have data
that are difficult to analyze, either because the
datasets are too large for current methods, or because
current methods fail to have sufficiently high accuracy.
A great deal of my work involves exploration of the design space
of the algorithms we develop, which in turn depends
very much on the availability of
substantial computational resources.
At the University of Illinois, I have been
able to do these analyses using the
Illinois Campus Cluster Program as well
as Blue Waters.
For more about my research:
list of publications, or