Leiden Institute for Advanced Compuet Science
Faculty of Science, MSc in Computer Science
accademic year 2012-2013, fall semester

STRING, PATTERN AND MOTIFS SEARCH ALGORITHMS

Dr. Nadia Pisanti, email: pisanti-AT-liacs.nl



[]

NEWS

[]


  • [08/01/2013] The examination re-try will take place on January 25th at 10 a.m. in room 412 of Snellius building.
  • [12/04/2012] The examination will take place on December 17th at 14:00 in room 402 of Snellius building.
  • [10/31/2012] The lecture of Monday November 5th will take place at 13:00.
  • [09/17/2012] The examination will be a written test taking place on December 17th at 14:00.
  • [09/17/2012] On October 15th there will be no lecture
  • [09/16/2012] The Book Chapter for Sequences Alignments is available.

    []

    Course Description

    []

    This course is worth 6 EC.

    Lectures take place in room 405 of Snellius Building on Mondays from 13h45 until 15h30.

    Objectives of the Course

    The goal of this course is to give the student an overview of algorithms on strings. We will range from basic problems to more specific ones, and we will show different solutions and corresponding example for the same (or related) problem highlighting how, besides theoretical complexity issues, things change in practice, leading the student to have hands-on the difference between worst case theoretical complexity of a problem, practical behavior of algorithms, and fixed parameter tractability.

    Description

    String algorithms are well suited for educational purposes as they nicely combine elegant combinatorial results with practical applications of growing common interest. Very important data of huge size can be treated as a text. One example is the web, for which indexing and search issues have clear practical impact and deserve the use of the most sophisticated algorithmic techniques. Another main application is the treitment of biological sequences for which storage and analysis tasks raise many challenging algorithmic problems.

    Program

    References

    []

    LECTURES

    []

    September 10th. Introduction to the course: description of the program. Comparing strings: Hamming Distance and Edit Distance. Exponential number of possible pairwise sequences alignments. Recalling dynamic programming.

    September 17th. The dynamic programming approach to global alignment of strings: description of the algorithm, its complexity, and and some examples.

    September 24th. The local alignemt of two strings: dynamic programming solution, example. The semiglobal alignment of two strings: dynamic programming solution and an example.

    October 1st. Optimizing pairwise alignments: alignment of similar strings, and alignments in linear space. How to reconstruct the alignment when using linear space. Computing the longest common subsequence of two strings with dynanic programming.

    October 8th. Global pairwise alignments with affine gap penalty function: new way to count on an optimal substructure and consequent dynamic programming solution. An example.

    October 22nd. Some possible approaches to multiple alignments: sum of pairs, progressive alignments possibly using trees, and star alignments. Introduction to the problem of exact pattern matching: definition, brute-force approach, and idea of the Morris Pratt algorithm.

    October 29th. The algorithm of Morris and Pratt: the idea, the algorithm, an example, correctness proof, complexity analysis.

    November 5th. The algorithm of Knuth, Morris, and Pratt: the idea, an example. Comparison with Morris-Pratt. The algorithm of Boyer and Moore. The algorithm of Karp and Rabin. Some words on the Aho-Corasick automata.

    November 12th. Pattern matching with preprocessing of the text: indexing. The suffix tree: definition, properties, and examples. Generalized suffix tree.

    November 19th The suffix array: definition and example. Properties and use for exact pattern matching. Some words on approximate pattern matching techniques. Motifs extraction: the problem, a basic solution with suffix tree.

    November 26th The KMR algorithm (naming technique) to find exact motifs: algorithm, complexity, an example. Finding approximated motifs on the suffix trees: approach description.

    December 3rd Finding approximated motifs on the suffix tree: an example, and complexity analysis. Finding approximate motifs using a degenerate alphabet: the KMRC algorithm.