Student research opportunities
Computing Optimal Genome Edit Distances via Heuristic Search
Project Code: CECS_796
This project is available at the following levels:
CS single semester, Honours, Summer Scholar, Masters
Supervisor:
Dr Patrik HaslumOutline:
Comparison of the arrangement of genes in mitochondria or chloroplasts is a tool used in determining the evolutionary history of life on Earth. One way to measure similarity between genomes is by their edit distance: the minimal (weighted) sequence of rearrangement events needed to transform one genome into another. An edit distance metric is defined by a set of edit operations (and their relative weights). While polynomial-time algorithms for calculating the optimal edit distance are known when the only operation considered is inversion, no such algorithm is known for more general sets of edit operations, such as the commonly considered inversion, transposition and transversion operation set.
The genome rearrangement problem under these operations (and, indeed, under many other edit operation sets) is naturally formulated as a search problem. Thus, we can try to solve it optimally using optimal informed search algorithms like A*, together with admissible heuristics. However, the genome edit distance problem challenges existing heuristic search methods in ways that other puzzle-type problems (which are often used as benchmarks in research into heuristic search) don't. A* search, even with state-of-the-art admissible heuristics such as pattern databases, does not scale up to solve problems as large as we would like.
Goals of this project
The next step in this research project is to examine more advanced optimal search enhancements, such as symmetry reduction, partial expansion, and others.
Requirements/Prerequisites
Good programming skills. A basic understanding of optimal heuristic search is an advantage.


