## CS 598 AGB, Spring 2016 Course Schedule

2016 Course website For your homework problems, please use the old version of the textbook, which is here.

• January 19, 2016. Introduction to course.
(PPTX) (PDF)

• January 21, 2016. Introduction to stochastic models of sequence evolution, using the Cavender-Farris-Neyman (CFN) model as an example. Phylogeny estimation under the CFN model.
(PPT) (PDF)
For details about distance-based methods, see these: (PPT) (PDF)
Reading before class: Chapters 1-3 from textbook.

• January 26, 2016. The Newick string representation of rooted trees. Representation of rooted trees using subtrees, distances, clades and bipartitions. Constructing rooted trees from clades by constructing Hasse Diagrams. Constructing unrooted trees from unrooted four-leaf trees using the All Quartets Algorithm.
(PPT) (PDF)
Reading before class: Chapter 4.1-4.4, 4.6, and Chapter 5
Homework #1: Do at least 5 of the following 11 problems from the textbook: 3.3(7), 3.3(8), 3.3(9).1-2, 3.3(17), 4.1(1), 4.2(8), 4.3(1), 4.3(2), 4.6(1), 5.1(4), and 5.2(3).

• January 28 and February 2, 2016. Note: no office hour on February 2. Please come to my office hour on February 1, from 12-1, instead.
January 28: Maximum parsimony (MP): computational complexity and dynamic programming solution for fixed tree variant.
February 2: Parsimony-informative characters and why MP is not statistically consistent under the CFN model (the Felsenstein Zone).
(PPT) (PDF)
Reading before January 28 class: Chapters 6.1-6.3, 6.5, 6.6, 6.8, and 9.13; also the classic paper "Cases in which parsimony and compatibility methods will be positively misleading", by Joseph Felsenstein, Systematic Zoology, Volume 27, No. 4 (1978), pp. 401-410.

• February 4, 2016. Problem solving in class (very similar to homework 2), using these problems. Pranjal Vachaspati and Ashu Gupta, guest lecturers.

• February 9, 2016. Analyzing sets of trees. Consensus methods and supertree methods. The Aho, Sagiv, Szymanski, and Ullman algorithm.
(PPTX)
Homework #2: Do at least 5 of the following problems: 6.2(1), 6.3(2), 6.4(2), 6.4(3), 6.8(5), 7.2(1), 7.3(1), and 7.4(2).
• February 11, 2016. Statistical gene tree estimation, theory and practice.
(PPTX) (PDF)
Reading before class: Chapter 9.1-9.12 and The Hobgoblin of Phylogenetics, by David M. Hillis, John P. Huelsenbeck, and David L. Swofford (Nature, Vol. 369, 2 June 1994), one of the classic papers in phylogenetics.
Homework #3: one page paper (PDF) discussing the assigned paper by Hillis et al.

• February 16, 2016. Multiple sequence alignment. 1) Insertions, deletions, and pairwise sequence alignment. 2) Edit distances. 3) The Needleman-Wunsch algorithm (which can be phrased in terms of maximizing the score (http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm or http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html) or minimizing the edit distance (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/). 4) Multiple sequence alignment optimization problems. 5) MSA methods in practice.
Edit distances and pairwise alignment: (PDF)
MSA methods in practice: (PDF) (PPTX)

• February 18, 2016. No office hours Monday Feb 22
Class discussion of papers selected for Homework #4.
Homework #4: Select a paper (from 2000 to present) that shows a comparison of MSA or tree estimation methods on simulated or biological datasets (e.g., one of the papers from page 6 of the presentation for February 11). Write a paper of 2-5 pages with your discussion of the paper. Make sure to (a) provide full bibliography information about the paper, (b) to summarize the paper, and (c) to discuss whether you agree with the conclusions and why. Suggest a follow-up experiment or study, or identify a question that was not answered by the study. (The point is to be critical.)
In class: Give a 3 minute presentation about the paper you selected and what you thought of it.

• February 23, 2016. Hidden Markov Models (HMMs), and their use in multiple sequence alignment
(PPTX) (PDF)
Reading before class: Chapter 10.4, and http://www.cs.princeton.edu/~mona/Lecture/HMM1.pdf.
Homework #5: Everyone (including biologists):
• Download all software for the tutorial on February 25 from this location.
• (7.5 pts) Do problems 1-17 from Review Questions. You will receive credit for the best 15 problems from 1-17.
• (2.5 pts) Do at least one of problems 18-25 from the Review questions, OR read one of the papers selected by one of the other students from February 18, and write your own 2-3 page review of the paper.
Extra credit: Do one or more of Problems 3.3(18)-3.3(22) from the textbook.

• February 25, 2016. Tutorial by Dr. Nam-phuong Nguyen on PASTA (co-estimation of multiple sequence alignment and tree), computing distances between trees, distances between alignments, and visualizing trees and alignments.

• March 1, 2016. Ensembles of HMMs and their use in biomolecular sequence analysis. Guest lecture by Dr. Nam Nguyen (PPTX).
Reading before class: UPP paper and TIPP paper.
Homework #6: Do at least 13 problems from Chapters 8-10, with at least four problems in each chapter. You will receive credit for the best 10 problems.

• March 3, 2016. Phylogenomics (genome-scale phylogeny estimation): Inferring species trees in the presence of Incomplete Lineage Sorting. (PDF)
Background material: (PDF) (PPTX)
Extra credit: Use PASTA or UPP to compute at least two multiple sequence alignments and phylogenies for some biological sequence dataset of at least 20 sequences using at least two different pipelines (vary the multiple sequence alignment method, and/or vary the method for computing the phylogeny given an alignment). Compare alignments if you can, and compare trees using bipartition distances (also called RF distances). For the comparison of trees, it will be helpful if you estimate branch support (using bootstrapping or some other technique), so that the significance of the differences can be appreciated. Comment on the differences you observe. Write this up! (Also, once you know how to do this, you might look and see what happens when you take two nearly identical sequence datasets, where the first is obtained by replacing one of the sequences in the dataset by a random sequence. How different are the two trees you obtain?)

• March 8, 2016. New methods for species tree estimation in the presence of ILS. Guest lecture: Pranjal Vachaspati (or perhaps Jed Chou) (Pranjal's PDF) (Jed's PDF)
Reading before class: Homework #7: (a) Write a 2-5 page critique of one paper either providing a new method for species tree estimation from multi-locus data, or comparing methods for species tree estimation. (b) Prepare a 5-7 minute presentation (in PDF format) of the paper and your critique. Submit the critique either in class or by email to Tandy Warnow, and submit the PDF presentation by email to Tandy.
No office hours Monday and Tuesday; will reschedule for Friday

• March 10, 2016. Class presentations, discussing papers about methods for species tree estimation from multi-locus data. Here is Siavash's presentation from ISMB about ASTRAL-2.

• March 15, 2016. We will discuss the midterm. Due today (in class): 2-3 page document (PDF) describing one or two final projects you might want to do. Note: survey papers are fine, but research projects are doable and more fun. If you want to do a research project, you can do this with another student in the class; otherwise, you should work by yourself. You must also list two papers (related to your final project) that you have already read, and that you would be willing to present in class. See this list for suggestions of possible final projects.

• March 17, 2016. Class discussion: final projects.
Each person should present their plans for a final project. This does not require any PDF/PPT, but do be prepared to stand up and talk about what you are thinking about doing.
Homework #8: Send PDF (by email to me) of the paper you will present during the March 31 to August 14 period. The paper needs to be related to your final project. Your presentation should be 20 minutes long, and you will need to send me your presentation (in PDF or PPTX/PPT format) at least 48 hours before your presentation date. Your presentations and the paper you are presenting will be posted to the class webpage so that the other students in the class can see both before your talk. Also, you will receive questions from the other students in the class 24 hours before your presentation. I will assign you a date to present the paper by March 19.

• March 21-25: Spring vacation

• March 29, 2016. Midterm papers due in class by 11:10 AM (or emailed to me in PDF format, or delivered to Elaine Wilson before then).
Solutions to Parts 1 and 2.

• March 31, 2016. Student presentations of midterm projects.

• April 5, 2016. Student presentations.
• Mike Nute will talk about "Joint Bayesian estimation of alignment and phylogeny" by B. Redelings and M. Suchard, Systematic Biology 54(3):401-418, 2005. (PDF)
Homework #9: for each presentation, write one paragraph summary of the paper and provide two questions for the student presenting the paper. These homeworks are due by email at least 48 hours before the student presentations (i.e., by Sunday at 11 AM), and will be forwarded to the students.

• April 7, 2016. Student presentations.
• Martin Hellwig will give a talk about "Full modeling versus summarizing gene-tree uncertainty: Method choice and species-tree accuracy" by L.L. Knowles et al., Molecular Phylogenetics and Evolution 65 (2012): 501-509. (PPTX)
• Danielle Campbell will talk about "Comparing two Bayesian methods for gene tree/species tree reconstruction: simulations with incomplete lineage sorting and horizontal gene transfer" by Chung and Ané, Systematic Biology (2011): syr003. (PDF)
Homework #10: for each presentation, write one paragraph summary of the paper and provide two questions for the student presenting the paper. These homeworks are due by email at least 48 hours before the student presentations (i.e., by Tuesday at 11 AM), and will be forwarded to the students.

• April 12, 2016. Student presentations. Homework #11: for each presentation, write one paragraph summary of the paper and provide two questions for the student presenting the paper. These homeworks are due by email at least 48 hours before the student presentations, and will be forwarded to the students.

• April 14, 2016. Student presentations of
• Kajori Banerjee will talk about "FastTree computing large minimum evolution trees with profiles instead of a distance matrix" by MN Price, PS Dehal, and AP Arkin. Molecular Biology and Evolution 26(7), 1641-1650. (PDF)
• Jordan Luber will give a talk about "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform" by K. Katoh et al. Nucleic Acids Research 30.14 (2002): 3059-3066. (PDF) (PPTX)
Homework #12: for each presentation, write one paragraph summary of the paper and provide two questions for the student presenting the paper. These homeworks are due by email at least 48 hours before the student presentations, and will be forwarded to the students.

• April 19, 2016. Advanced topic: New approaches for supertree estimation. Pranjal Vachaspati, guest lecturer. (PDF)
No office hours April 18-22

• April 21, 2016. Advanced topic: New methods for co-estimation of gene trees and species trees. Ashu Gupta, guest lecturer (PPTX)
Homework #13: write one page summary of 4/19 presentation, and include one question.

• April 26, 2016. Advanced topic: Computational Historical linguistics (constructing phylogenetic trees and networks from linguistic data) (PDF) (PPTX)
Homework #14: write one page summary of 4/21 presentation and include one question.

• April 28, 2016. BBCA: Improving *BEAST using random binning (PDF)
Note: no more regularly scheduled office hours; if you wish to meet with me, we can arrange one by appointment.

• May 3, 2016. LAST CLASS DAY. Jian Peng, guest lecturer, will speak about computational methods for predicting protein structure.

• May 5, 2016. FINAL PROJECTS DUE by email (anytime before midnight). Note: If you wish to get feedback on an early draft, submit it by email by May 1, 2016. If you need an extension, please request it in advance.