*Instructor:* Dr.
István MIKLÓS

*Text:* Durbin, Eddy, Krogh, Mitchison: Biological Sequence Analysis:
Probabilistic Models of Proteins and Nucleic Acids + handouts.

*Prerequisite:* None, but very elementary probability theory and
some degree of mathematical maturity is needed for this course. The course
starts with a short overview of mathematics and biology needed.

*Course description:* Bioinformatics is a new and hot discipline,
which is extremely application oriented, however, it also has a wonderful
background theory consisting of a nice mixture of combinatorics, probability
theory, statistics and algorithm theory. This course is a computer science
flavoured introduction into the mathematical background of bioinformatics
with a special emphasis on problem solving and applications.

*Topics*:

**Basics**: Models in biology. Biological sequences. RNA secondary
structures and pseudo-knotted structures. Protein folding. Evolutionary
trees. Basic concepts of evolutionary and comparative biology. Introduction
to statistical inferring: likelihood function, maximum likelihood estimation,
expectation maximization, the Bayes theorem, Bayesian statistics.

**Sequence alignment**: The classical and automaton approach for
aligning sequences. Hidden Markov Models (HMMs): aligning sequences to
a structure. Aligning sequences with pair-HMMs.

**Stochastic grammars**: The Chomsky hierarchy. Regular grammars
are HMMs. Stochastic Context Free Grammars (SCFGs) and their applications
in RNA structure prediction. The algorithm theory of regular and SCFGs.

**Evolutionary trees**: Concepts for inferring trees. Stochastic
models of evolutionary trees. The Kingmann's coalescent.

**Time continuous Markov models**: Substitution models of nucleic
and amino acids. Insertion-deletion models. Statistical sequence alignment.
Comparative bioinformatics.

*Optional topics (depending on how much time we will have)*:

**Markov chain Monte Carlo**: The concept of MCMC. Metropolis-Hastings.
The Gibbs sampler. Partial Importance Sampler. Simulated Annealing. Parallel
Tempering. Applications: Bayesian statistics of evolutionary trees, multiple
sequence alignment, genome rearrangement.

**RNA structures (advanced)**: Stochastic grammars for inferring
pseudo-knotted structures. Folding simulations. Co-transcriptional folding.