CS 581: Algorithmic Genomic Biology
Jan 17: Introduction to course
January 19 to Feb 2: Phylogenetic trees
- Reading assignment:
- January 19: Sections 1.1-1.4 from textbook
- January 24: Sections 1.5-1.8 from textbook
- January 26: Sections 8.1-8.7 from textbook
- January 31: Sections 4.1-4.3 and 8.8 from textbook
- February 2: Sections 11.1 and 11.2 from textbook
- January 19: Is it just the data? (PDF)
and Introduction to trees
- January 24, 26, 31, and Feb 2: Phylogenetic tree estimation
(January 24 slides 1-46,
January 26 slides 47-71,
January 31 slides 72-88,
February 2 slides 89-111)
- We also discussed the strict and majority
consensus trees, why they always exist and are unique,
and how to construct trees from sets of
bipartitions (i.e., binary characters).
See Sections 4.4.1, 6.2.1, and 6.2.2 for this
- Feb 7: Pairwise sequence alignment
Sections 9.1-9.4 from Computational Phylogenetics.
DP algorithm for pairwise edit distance
Feb 9 and 14: Hidden Markov models and
their applications in bioinformatics.
- Reading assignment:
- Feb 9: Sections 9.6-9.7.2 from Computational Phylogenetics,
Mark Paskin's Introduction to Probability Theory (PDF),
- Feb 14:
Sections 9.7.3 and 9.7.4 from
Mona Singh's course notes on profile HMMs
Introduction to HMMs
The Viterbi Algorithm
If time permits, I will also talk
about "Ensembles of HMMs", see
Feb 16: Attend bioinformatics
talks from 11-12 in CSL room B02 (Mike Nute will
HIPPI, research he
did for a prior version of this course).
Register for the CSL conference
at this website.
Feb 21: Multiple sequence alignment.
- Reading assignment:
- Feb 21: Section 9.5 and 9.7.5 from
- Feb 21: Tree Alignment
and Multiple sequence alignment techniques
Feb 23-March 16: Species tree estimation
- Reading assignment:
- Feb 23: Sections 9.11-9.13 from textbook
- Feb 28: Sections 9.15-9.19 from textbook.
- March 2: Sections 10.1-10.4 from textbook.
Read papers from the list below.
Also, during the March 2-16 period, each student
will present a paper in the course.
Make sure you have read the paper they selected
and submitted questions on the paper via Moodle
at least 24 hours before their presentation.
The boldfaced papers below are
those papers selected by students,
I will present.
E.S. Allman, J.H. Degnan, and J.A. Rhodes. Identifying the rooted species tree from the distribution
of unrooted gene trees under the coalescent. Journal of Mathematical Biology, 62:833-862,
E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014.
M.S. Bansal, G. Banay, J.P. Gogarten, and R. Shamir. Detecting highways of horizontal gene
transfer. Journal of Computational Biology, 18(9):1087-1114, 2011.
M.S. Bayzid, T. Hunt, and T. Warnow. Disk covering methods improve phylogenomic analyses.
BMC Genomics, 15(Suppl 6):S7, 2014. A preliminary version appeared in RECOMB-Comparative
V. Berry and O. Gascuel.
Inferring evolutionary trees with strong combinatorial evidence.
Theoretical Computer Science 24 (2000), 271-298.
B. Boussau, G.J. Szollsi, and L. Duret. Genome-scale coestimation of species and
gene trees. Genome Research, 23(2):323-330, December 2013
M. Brinkmeyer, T. Griebel, and S. Bocker. Polynomial supertree methods revisited. Advances
in Bioinformatics, 2011. Article ID 524182, doi=10.1155/2011/524182.
D. Bryant and J. Lagergren.
Compatibility of unrooted phylogenetic trees is FPT.
Theoretical Computer Science 351 (2006), 296-302.
D. Bryant and M.A. Steel. 2001. Constructing optimal trees from quartets. Journal of
Algorithms, 38, 237-259.
J. Chifman and L. Kubatko. 2014. Quartet inference from SNP data under the coalescent.
Bioinformatics, 30(23), 3317-3324.
J. Chifman and L. Kubatko 2015. Identifiability of the unrooted species tree topology
under the coalescent model with time-reversible substitution processes, site-specific
rate variation, and invariable sites. Journal of Theoretical Biology, 374, 35-47.
G. Dasarathy, R. Nowak, and S. Roch 2015. Data requirement for phylogenetic inference
from multiple loci: a new distance method. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, 12(2), 422-432.
M.I. DeGiorgio and J.H. Degnan 2010. Fast and consistent estimation of species trees
using supermatrix rooted triples. Molecular Biology and Evolution, 27(3), 552-69.
L.S. Kubatko and J.H. Degnan 2007. Inconsistency of phylogenetic estimates from concatenated
data under coalescence. Systematic Biology, 56, 17.
- F.-J. Lapointe and G. Cucumel. The average consensus procedure: combination of weighted
trees containing identical or overlapping sets of taxa. Systematic Biology, 46(2):306-312, 1997.
- B. Larget, S.K. Kotha, C.N. Dewey, and C. Ané 2010. BUCKy: Gene tree/species tree
reconciliation with the Bayesian concordance analysis. Bioinformatics, 26(22),
S. Mirarab and T. Warnow 2015. ASTRAL-II: coalescent-based species tree estimation
with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12), i44-i52.
- E. Mossel and S. Roch 2011. Incomplete lineage sorting: consistent phylogeny estimation
from multiple loci. IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 7(1), 166-171.
- E. Mossel and S. Roch 2015. Distance-based species tree estimation:
trade-off between number of loci and sequence length under the coalescent.
arXiv preprint. arXiv:1504.05289v1.
S. Roch and S. Snir 2013. Recovering the tree-like trend of evolution despite extensive
lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology,
S. Roch and M.A. Steel. Likelihood-based tree reconstruction on a concatenation
of aligned sequence data sets can be statistically inconsistent. Theoretical Population
Biology 2015, 100, pp. 56-62.
S. Roch and T. Warnow. 2015. On the robustness to gene tree estimation error (or lack
thereof) of coalescent-based species tree methods. Systematic Biology,
- C. Scornavacca and N. Galtier. Incomplete lineage sorting in mammalian phylogenomics. Syst Biol 2017; 66 (1): 112-120. doi: 10.1093/sysbio/syw082
J. Tonini, A. Moore, D. Stern, M. Shcheglovitova, and G. Orti.
Concatenation and species tree methods have
statistically indistinguishable accuracy under a range of
PLOS Currents: Tree of Life, 2015.
- Y. Yu, J. Dong, K. Liu, and L. Nakhleh. 2014. Maximum likelihood inference of reticulate
evolutionary histories. Proceedings of the National Academy of Sciences (USA),
T. Zimmermann, S. Mirarab, and T. Warnow. 2014. BBCA: Improving the scalability of
*BEAST using random binning. BMC Genomics, 15(Suppl 6), S11. Proceedings of
RECOMB-CG (Comparative Genomics).
- March 4: Section 10.5 from textbook.
- Feb 23:
Coalescent-based species tree estimation (PDF)
- Feb 28: Introduction to supertree methods
- March 2: Introduction to
species tree estimation methods (PDF)
- March 7: Sarah Christensen, Yunan Luo, and
Sarah will present
"Constructing optimal trees from quartets", by
Bryant and Steel, Journal of
Algorithms 2001, 38, 237-259.
Yunan will present
"Weighted quartets phylogenetics" by
Avni, Cohen, and Snir, Systematic Biology, 2015, 64(2):232-242.
Muhammad will present
"Recovering the tree-like trend of evolution despite
extensive lateral genetic transfer: a probabilistic analysis"
by Roch and Snir, Journal of Computational Biology 2013, 20, 93-112.
- March 9: Shoham Das,
Jeremy Kemball, and Ben Kurtovic.
- March 14. Daewon Seo,
Thien Le, and Syed Shalan Naqvi.
- March 16. Qing Ye,
Ehsan Saleh, and
March 20-24: Spring Break
March 28: Preparation for midterm.
Reading assignment: Appendix B.10 from
and Research Ethics.
Also see the following three presentations about
multiple sequence alignment methods, in practice
(PDF) (specifically look at slides 23, 25, 26, 30, 31, 42-44)
(PDF) (about solutions to Generalized
(PDF) (about the impact of
guide trees, and using SATé and PASTA to boost methods)
(PDF) (about method evaluation,
March 30: Distribution of midterm and discussion of the
- April 4: Review of midterm
- April 6: Guest lecture by Erin Molloy
- April 11-13: Large-scale tree estimation
- April 11:
Sections 11.3-11.4 and 8.11-8.12.
- Introduction to chordal graphs and tree construction
- Why divide-and-conquer is helpful
- April 13:
- April 18.
Introduction to historical linguistics.
- April 20.
Computing perfect phylogenies.
- April 25.
Mohammed El-Kebir will talk
Combinatorial Algorithms in Tumor Phylogenetics.
Cancer is a genetic disease, where cell division, mutation and selection produce a heterogeneous tumor composed of distinct clones, i.e. different subpopulations of cells with different complements of mutations. In the later stages of cancer progression, cancerous cells from the primary tumor migrate and seed metastases at distant anatomical sites. Similarly to the evolutionary history of species, we can represent the cell division and mutation history of an individual tumor by a character-based phylogenetic tree, where characters are mutations and taxa are clones. With cancer bulk sequencing data, however, we do not directly observe the leaves (taxa) of the phylogenetic tree. Instead, we are given variant allele frequencies that correspond to a mixture of unknown leaves in unknown proportions. The task in the Perfect Phylogeny Mixture Deconvolution Problem is to infer a two-state perfect phylogeny and mixing proportions of its leaves that explain the given allele frequencies. I will introduce algorithms for solving this problem based on a combinatorial characterization of perfect phylogeny trees as a restricted class of spanning trees in a graph, a characterization that also demonstrates the computational complexity of the problem. In addition, I will introduce a novel theoretical framework for analyzing the history of migrations of cells between anatomical sites in metastatic cancers. Using these methods, I analyze several cancers and identify tumor phylogenies and migration histories that are more biologically plausible than previously reported analyses.
- April 27.
Introduction to metagenomics.
- May 2 (last day of class:
turn in your final projects)