d Vorlesung Bioinformatik I (SS01) Lehr- und Forschungseinheit für Theoretische Informatik,
Institut für Informatik der Ludwig-Maximilians-Universität München

Vorlesung Algorithmische Bioinformatik I (SS01)

Inhaltsverzeichnis dieser Seite

Organisatorisches

Abbildung eines Beta-Strands (mit Rasmol)

Organisatorisches

Vorlesung:: PD Dr. Rolf Backofen,; Bioinformatik-Grundstudiumsvorlesung: Algorithmische Bioinformatik I,; 4-stündig, Mo 14-16, Di 14-16, Raum Z1.09, Oettingestr. 67; Beginn 2. Woche
Übung:: PD Dr. Rolf Backofen,; Di 9-11, Raum Z1.09
Vorkenntnisse:: Gute Erfahrung mit C/C++ oder Java (Datenstrukturen), siehe auch unter Übung, Wahrscheinlichkeitslehre. Grundkenntnisse der Biologie sind hilfreich, aber nicht unentbehrlich, da biologische Aspekte in der Vorlesung behandelt werden. Skriptum in englisch.
Hörerkreis: :: Studierende im Grundstudium der Bioinformatik; Studierende im Hauptstudium der Informatik; Studierende mit Nebenfach Informatik
Schein:: Ausreichende Mitarbeit in der Übung und Prüfung.

zurück zum Inhaltsverzeichnis dieser Seite

Unter dem Begriff Bioinformatik versteht man die Anwendung von Informatik auf biologischen Fragestellungen. Was uns in dieser Vorlesung hauptsächlich interessiert ist die Entwicklung von Algorithmen für die Behandlung von ungenauen Daten (biologische Daten enthalten immer Fehler). Die Vorlesung orientiert sich weitgehend an dem Buch "Computational Molecular Biology: An Introduction".

Nach einem Überblick über biologische/chemische Grundlagen der Bioinformatik werden wir in der Vorlesung einzelne Problembreiche der Bioinformatik näher behandeln. Zum einen handelt es sich um die Datenverarbeitung genetischer Information mit den dazugehörigen mathematischen/informatischen Konzepten (dynamisches Programmieren und Sequenzalignment, physikalische Karte des Genoms und PQ-Bäume, Wahrscheinlichkeitslehre, Markovprozesse, stochastische kontextfreie Grammatiken und Faltung von RNA, Mustererkennung in Nuklein- und Aminosäuresequenzen mit hidden Markov Models.) Desweiteren diskutieren wir die algorithmische und mathematische Behandlung phylogenetischer Bäume und zugrundeliegende Evolutionsmodelle.
Ausserdem behandeln wir die Problematik der Vorhersage der dreidimenisionalen Struktur eines Proteins aus seiner Aminosäurensequenz. Für diese Aufgabe werden Verfahren wie genetische Algorithmen, Monte Carlo Verfahren und Constraint Programming eingesetzt. In der Vorlesung werden verschiedene Algorithmen zur Bestimmung der Proteinfaltung vorgestellt.

zurück zum Inhaltsverzeichnis dieser Seite

Übersicht

In the past, biologists generally grouped living organisms into two distinct life forms or domains:

prokaryotes, represented by cyanobacteria (blue-green algae) and common bacteria such as Escherichia coli, not having a nuclear membrane to separate genomic DNA from the cytoplasm, and
eukaryotes, represented by unicellular organisms such as the flagellum Trypanosoma, common brewer's yeast, and multicellular organisms, all of which have a nuclear membrane.

Methanococcus jannaschii is a methanogenic archaebacterium, first collected in 1982 by the Woods Hole submersible Alvin near white smokers from a hot spot of the sea floor of the Pacific Ocean at a depth of 2600 meters. In August 1996, the 1.66 megabase pair genome of M. jannaschii was published by Bult et al. in Science, where it was asserted that more than 56% of its 1738 genes are completely new, unlike any genes in existent databases. A small initial portion of the DNA sequence, consisting of over 1.6 million characters, is given as follows.

TACATTAGTGTTTATTACATTGAGAAACTTTATAATTAAAAAAGATTCATGTAAATTTCT
TATTTGTTTATTTAGAGGTTTTAAATTTAATTTCTAAGGGTTTGCTGGTTTGATTGTTTA
GAATATTTAACTTAATCAAATTATTTGAATTTTTGAAAATTAGGATTAATTAGGTAAGTA
AATAAAATTTCTCTAACAAATAAGTTAAATTTTTAAATTTAAGGAGATAAAAATACTCTG
TTTTATTATGGAAAGAAAGATTTAAATACTAAAGGGTTTATATTATGAAGTAGTTACTTA
CCCTTAGAAAAATATGGTATAGAAAAGCTTAAATATTAAGAGTGATGAAGTATATTATGT

Analysis of the DNA sequence of M. jannaschii provided solid evidence for a startling hypothesis advanced two decades earlier by Carl Woese: there is a third domain of life called Archaea, which is distinct from Prokarya and Eukarya.

How can one determine the (hypothetical) genes of M. jannaschii from its 1.66 megabase pair genome? Obviously this must be done by a computer program, but if the majority of the (hypothetical) genes in this new life form have no homology to known genes, then how does the program work?

The TIGR group of Bult et al. used the commercial software GenMark, which implements a 5-th order Markov model. We'll study the theory and then implement Hidden Markov Models (HMM), currently used in determining genes, intron/exon splice sites, parts of the genome wrapped around nucleosomes, etc.

Sequence similarity between the new genes of M. jannaschii and those in existent databases were determined by programs. How do these programs work?

Computational biology is a new field, rapidly expanding, and concerns itself with the development of algorithms for

sequence analysis and comparison,
construction of phylogeny trees,
machine learning,
simulation,
mathematical and statistical analysis.

zurück zum Inhaltsverzeichnis dieser Seite

Topics of course

Overview of molecular biology for non-biologists: nucleic acids, proteins, shotgun sequencing, PCR, physical maps, double digest problem
Some probability theory, Markov chains, Shannon entropy
Sequence alignment: Smith-Waterman, global and local alignment
All about Eve, the mitochondrial Eve hypothesis and phylogeny trees, synteny, clustering methods, maximum likelihood
Hidden Markov models and their applications, Baum-Welch (EM) reparametrization and Baldi-Chauvin gradient descent
Combinatorial optimization using Monte Carlo with simulated annealing, application to the study of optimality of the genetic code
Genetic algorithms, application to protein folding in lattice models (Unger-Moult)
How to compute RNA secondary structure, "neutral networks" and mathematical evolution theory
Computing with biomolecules: Adleman's solution of an NP-complete problem using oligonucleotides, further results

zurück zum Inhaltsverzeichnis dieser Seite

Literatur

Bioinformatik Links

Hier sind einige Links für Bioinformatik.

Bioinformatik Bücher

Peter Clote, Rolf Backofen. Computational Molecular Biology: An Introduction, John Wiley & Sons, (2000).
Michael Waterman, Introduction to Computational Biology , Chapman & Hall, London, (1995).
Ausgezeichnete Einführung in die Bioinformatik von einem der Begründer dieses Forschungsbereich. Viel Mathematik (Statistik, und Kombinatorik). Die WWW-Seite für Fehlerkorrekturen ist Introduction to Computational Biology.
J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology , PWS Publishing Co, (1997).
Wesentlich einfacher als Waterman zu lesen (daher weniger Stoff). Gesichtspunkt ist (theoretische) Informatik eher als Statistik.
M. Nei, Molecular Evolutionary Genetics , Columbia University Press (1987).
Einführung in Population Genetics, aber gute Behandlung der statistischen Modellen für punktweise Mutation im Genom, sowie der elementaren Methoden für Sequence Alignment und Stammbaumkonstruktion.
J.D. Watson et al., Molecular Biology of the Gene , 3-rd edition, Benjamin/Cummings Publishing Co, (1987).
Einführung in Biologie.
J. Brandon und C.Tooze, Introduction to Protein Structure , Garland Pub, NY, (1991).
Ein Forschungsbereich der Bioinformatik besteht in der 3-dimensionalen Strukturvorhersage eines Proteins von der Aminosäurenkette. Informatiker finden die biologische Grundlagen in diesem Buch.
Lewin, Genes V , (oder eine spätere Ausgabe).
Klassischer Text für die Genetik.

Artikel

K.S. Booth, G.S. Lueker, Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms, JCSS 13 (1976) 335-379.
Paper zur Einführung der PQ-Bäme als Datenstruktur u.a. zur effizienten Implementierung eines Consecutive Ones-Algorithmus (in Bioinformatik z.B. für Restriktionsanalyse verwendbar).

zurück zum Inhaltsverzeichnis dieser Seite

Folien

Folien zu den Vorlesungen als gezippte Postskriptdateien (nur intern verfügbar, Passwort gibts in der Vorlesung)

zurück zum Inhaltsverzeichnis dieser Seite

Übung

In der Übung sollen einige Algorithmen in C/C++ implementiert werden. Deshalb haben wir hier ein paar C/C++-Einführungen zusammengestellt. Es wäre sicher hilfreich sich diese mal anzuschauen, um eventuell verschüttete oder gar kaum vorhandene Programmier- oder C/C++-kenntnisse aufzufrischen bzw. zu erwerben.

Einige für die Übung relevante Bioinformatiklinks.

Übungsblätter

Blatt 1: Bio-Datenbanken und Wahrscheinlichkeitsrechnung.
Blatt 2: DNA Erzeugung und Analyse.
- L�sungsvorschlag zu Aufgabe 1: DNA-Generierung
- L�sungsvorschlag zu Aufgabe 2: DNA-Analyse
Blatt 3: Lokales Sequenzalignment. Für die Übung am 19.06.01.
Blatt 4: Guide RNA, Markovketten. Wird in der Übung am 2.7.01 besprochen.
Blatt 5: Restriktionskarten, Matrixmultiplikation. Wird in der Übung am 17.07.01 behandelt.

zurück zum Inhaltsverzeichnis dieser Seite

Lehr- und Forschungseinheit

Institut

Universität

Rolf Backofen

Last modified: Mon Jul 23 13:38:39 CEST 2001

Vorlesung Algorithmische Bioinformatik I (SS01)

Kurzbeschreibung

Bioinformatik Links

Bioinformatik Bücher

Artikel

Übungsblätter