UC Davis Mathematics

Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Combinatorial Optimization Problems from Population Genomics

Algebra & Discrete Mathematics

Speaker: Dan Gusfield, UC-Davis (Computer Science)
Location: 2112 MSB
Start time: Fri, Apr 11 2008, 1:10PM

Geneticists have long dreamed of determining the genetic basis of disease susceptibility by comparing variations in the human genome sequences of a large number of individuals (P.Y. Kwok). Now that high-throughput genomic technologies are available (for sequencing, resequencing, finding SNPs and haplotypes, micro-array screening of sequences etc.), the dream of comparing sequence variations at the population level is becoming a reality. Moreover, population-scale sequence variations have applications beyond gene-finding, and can be used to address evolutionary and historical questions such as the movement of populations, and even questions concerning molecular genetic processes such as the mechanics of mutation, recombination and repair. In a sense, nature and history, through mutation, recombination, gene-conversion, genome rearrangements, lateral gene transfer, admixture of populations, random drift, etc. have conducted a huge number of experiments in which genes and gene alleles have been mixed in different ways to create a large variety of mosaic genomes among individuals who can be sampled and studied today. The grand challenge is to exploit these natural experiments by finding patterns in and among these different mosaic genomes that have significant and biologically meaningful associations with important traits of interest.

Many fundamental and applied algorithmic problems make meeting this grand challenge difficult. In this talk I will discuss several central combinatorial optimization problems that arise from the goals of obtaining and analyzing population variation data. These relate to deducing ``haplotypes" from genotype data, or relate to deducing occurrences of recombination's in the derivation of current genomic sequences. The methods involve graph theory, matroid theory, and integer programming.