PLPTH 890
Introduction to Genomic Bioinformatics
Spring 2007
|
|
|
|
|
|
K-State
Online |
Course summary |
|
| This course, offered in spring of odd-numbered years, is oriented to graduate-level students in the biological sciences who seek an introduction to the principles and practice of computational analysis of genomic data. Students from computer-science and other analytical backgrounds are welcome, but should be aware that | much of the material is targeted to the level of analytical aptitude and training of biology students. For this reason the course focuses less on developing tools than on using them appropriately and communicating intelligently with bioinformatics specialists. |
What bioinformatics is and isn't |
|
| It's usually said that bioinformatics is the computational analysis of genomic data. But if you scan the journal Bioinformatics, you'll see very few biological results. What you'll see is a lot of work on computational methods and tools, built on foundations of mathematics and statistics. Few students in biological courses of study | want this, so this course might better be called something like "Introduction to bioinformatics tools." Whatever the name, its aim is to help you become a biologist better equipped to use genomic data, and to just a few students, to show the beauty and power of deeper approaches. |
Programming is not required in this course. |
Working with a programmer is! |
| In the first three years the course was offered, it featured a required series of Perl lectures, labs, and assignments. Programming is a basic means of expression in bioinformatics (in my opinion, in most of the sciences), and a course purporting to cover bioinformatics cannot fail to expose students to Perl. However, as a rule about | half the students can handle
this material, while
the other half are baffled by it and sometimes even
drop out to avoid it. In fact you can do lots of bioinformatics without
writing a line of code -- but you can be much more effective if you
know how, or have a colleague who does. In this course, at least one
of those conditions will be met. |
Two-track instruction plan
|
|
| All students will attend the biweekly lectures. However a second set of four optional sessions on Perl programming will be arranged, beginning in the first month. Problem sets will contain exercises requiring Perl knowledge. Students will be assigned partners, at least one of whom will be able to tackle the programming exercise. These partners may not be the same throughout the semester. Exercises will overlap; some will need to be done jointly and some independently, but not all will be required. | As a sample, an exercise set might contain Required problems (50 - 80% of credit) Elective problems (40 - 70% of credit) with both groups containing problems requiring some or no programming. I will expect independent work where indicated, and will encourage collaborative work elsewhere. |
Prerequisites |
|
|
Most basically: you must be comfortable with using computers for
composing documents and WWW browsing. Beyond this, a familiarity with the biochemistry and genetics of nucleic acids and proteins is assumed. Most bioinformatics texts devote a chapter or two to reviewing these basics. This isn't a biology course and won't teach you the subject from scratch, although we will try not to use jargon unnecessarily. If ORFs, UTRs, ESTs, STSs, cDNA, mRNA, RFLPs, BACs, YACs, |
cosmids, promoters, mitochondria, ADH, ATP, poly-A tails, 5' ends, hybridization, Southern blots, contigs, denaturing, retrotransposons, LTRs, PCR, homology, paralogy, physical maps, and microarrays (to list just a few) are new to you, you risk getting lost. For a molecular biologist these things are core knowledge for bioinformatics work. So if most of these terms are mysteries to you, do consider taking a couple of biology courses first. Completion of Biology 450 (Modern Genetics) at KSU, its equivalent in Animal Science, or the equivalent at another university, should be enough. If you just need a refresher, have a look at this molecular-biology tutorial, this view of computational molecular biology, and this introduction to DNA structure! |
Topics |
|
More computational |
More biological |
|
|
Textbooks |
|
| Textbooks
are not required, but I don't recommend that you rely only on the
lectures to learn bioinformatics. |
Below are listed some books and my impressions of them. |
| Bioinformatics: Sequence and Genome Analysis, 2nd
ed. D. W. Mount; Cold Spring Harbor Laboratory Press, 2004 |
The author's motive is to explain the algorithms that underlie sequence alignment and database searching; reconstruction of phylogenies, genes, and RNA and protein structures; and genome analysis with coverage of microarrays and pathways. Local bookstores have the book in stock, or it costs about $80 plus shipping if ordered from CSHL. I recommend students' acquiring this widely adopted book. It covers a wide range of bioinformatics topics more comprehensively than other texts, if sometimes not as lucidly as might be wished |
| Bioinformatics and Molecular Evolution P. Higgs, T. Attwood; Blackwell, 2005 |
I'm impressed with this book too; I find the explanations clearer and the end-of-chapter problems more interesting. But it's more phylogenetics-centered than Mount and there are several areas, like gene prediction, genome assembly, and physical mapping, that are touched only very lightly |
| Bioinformatics and Functional
Genomics J. Pevsner; Wiley, 2003 |
I haven't examined it yet. |
| Bioinformatics: A Practical Guide to the Analysis of Genes
and Proteins, 3rd edition Baxevanis & Ouellette, eds.; Wiley Interscience, 2004 |
I think there's a new edition. |
|
Fundamental Concepts of Bioinformatics |
This is a shorter book than Mount's but contains solidly useful coverage of the key concepts. I found especially informative the chapters on phylogenetics and protein structure analysis. As a textbook it's designed for an undergraduate-level course -- we don't have one at KSU yet. |
| Discovering Genomics, Proteomics, & Bioinformatics
A. M. Campbell, L. J. Heyer; Benjamin Cummings, 2002 |
This book is unlike others in touching only lightly on algorithms (I don't think it even mentions the sequence-alignment problem) and focusing on practical discovery with existing tools and databases, with principal attention to human genetics and diseases. I would class this too as best adapted to an undergraduate course for future medical or molecular-biological professionals. This is in no way to slight its rich content (Chapters 7 and 8, for example, give a 50+ page coverage of genomic circuitry and its dissection, probably the foundation stone of post-genomic research), but the students in our 890 class are likely to be a mix of 1) plant genetics researchers and 2) crossover students from CS and other computing-heavy fields interested in computational approaches. |
| Developing Bioinformatics Computer Skills Gibas & Jambeck; O'Reilly, 2001 |
Cheaper but less comprehensive. |
| Microarray Bioinformatics D. Stekel, Cambridge University Press, 2003 |
|
Perl
|
There are plenty of these. |
| Learning Perl, 3rd edition Schwartz & Christiansen; O'Reilly |
Good starter; another is Programming Perl |
| Bioinformatics, Biocomputing and Perl M. Moorhouse, P. Barry. Wiley, 2004 |
Haven't seen it. |
| Perl Programming for Biologists D. C. Jamison. Wiley, 2003 |
Haven't seen it. |
| Beginning Perl for Bioinformatics J. Tisdall, B. Waliszewski; O'Reilly, 2001 |
I've only glanced at this, but it appears to be a manual of practical Perl programming, and well worth a look. The followup book, which introduces BioPerl in Ch. 9 (free for online reading), is... |
| Mastering Perl for Bioinformatics,
1st ed. J. Tisdall; O'Reilly, 2003 |
|
| Genomic Perl: from Bioinformatics Basics to Working
Code R. A. Dwyer, Cambridge University Press, 2002 |
Written by a computer scientist and too advanced for beginners. |
Computational biology |
The following books go much deeper than we do in the course. |
| Biological Sequence Analysis: Probabilistic Models of Proteins
and Nucleic Acids Durbin, Eddy, Krogh, and Mitchison; Cambridge University Press 1998 |
Explains hidden-Markov models (HMMs) |
|
Bioinformatics: the Machine Learning Approach |
Relates Bayesian/information-theoretic/likelihood-maximizing/energy-minimizing measures applied to lots of problems including HMMs, phylogenetic trees, microarrays |
|
|
|
Exercises/projects |
|
| Exercises and problem sets will be assigned each week. A research project will be required and may be | developed from your own research data or interests. |
More organizational details |
|
WWW resources |
|
| Here's CN's current bookmarks list and source for ideas, tutorials, readings, and exercises for the course. | |
Journals |
|
| A highly regarded printed journal is Bioinformatics. CN receives this in print and online and will lend issues or help students obtain articles. | Many more journals, including online-only resources like BMC Bioinformatics, appear in this list. |
Former-student work
|
|
| Demonstration of alignment by dynamic programming | Animated tutorial on profile construction. |
About the animation |
|
|
The animated GIF at top was made with a RasMol script that rotates
a myoglobin molecule 360° in 10° increments in each of the three
axes in turn, while occasionally changing the drawing parameters, and
saving an image to disk with each rotation. The 108 image files were then
assembled with Animagic's GIF Animator into the single |
gif image file that is invoked from this WWW page. Much more elaborate movies are possible, showing biochemically interesting aspects of the molecule being displayed. An operation that a few years ago required a graduate-level scientist and a high-end workstation is now an hour's work for a bright ten-year-old on a desktop computer... |