• Aucun résultat trouvé

Algorithms and bioinformatics Comparative genomics Anthony Labarre December 11 and 18, 2014

N/A
N/A
Protected

Academic year: 2022

Partager "Algorithms and bioinformatics Comparative genomics Anthony Labarre December 11 and 18, 2014"

Copied!
130
0
0

Texte intégral

(1)Algorithms and bioinformatics Comparative genomics. Anthony Labarre. December 11 and 18, 2014.

(2) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Context and motivations. I. Biologists are interested in comparing species, for example: I I. in order to classify them; in order to explain evolution by reconstructing scenarios;. I. (Dis)similarity measures are needed;. I. Usually based on the sequenced genomes;. 2.

(3) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. At the nucleotide level I. Most comparisons take place at the nucleotide level;. Example (sequence alignment) S1 : · · · T | S2 : · · · T. C | C. C G. G | G. C. C | A C. A − − C | T G G C. I. Matches, substitutions, insertions and deletions;. I. Correspond to mutations;. I. (recall Gregory Kucherov’s classes);. A ··· | − A ···. T. 3.

(4) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. At the “gene” level I. Some mutations act on segments of nucleotides;. I. Those large-scale mutations are called genome rearrangements;. I. Sequence alignment becomes unfit;. Example (genomes as sequences of segments) (A). genome rearrangements. (B) 4.

(5) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Genome rearrangements I. Our problem:. Problem (pairwise genome rearrangement) Input: genomes G1 , G2 , a set S of mutations; Goal: find a shortest sequence of elements of S that transforms G1 into G2 . I. Related, simpler problem: compute the evolutionary distance dS (G1 , G2 ) (i.e. just the length of a shortest sequence);. I. Many variants, depending on how genomes are modelled, what (and how) mutations are taken into account, etc.; 5.

(6) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Modelling genomes as permutations I. Genomes are seen as permutations if: 1. they form ordered sequences of genes (or other segments), and 2. they only differ by order (no duplications or deletions).. Example (genomes → permutations) 5. 1. 2. 4. 7. 3. 6. (A). 7. (B). genome rearrangements. 1. 2. 3. 4. 5. 6. 6.

(7) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Genome rearrangements for permutations I. Segments can be numbered as we wish, so we assume either genome is the identity permutation ι = h1 2 · · · ni;. I. We wish to sort the other genome;. I. Our problem:. Problem (genome rearrangement (permutations)) Input: a permutation π in Sn , a set S of (per)mutations; Goal: find a shortest sorting sequence of elements of S for π. I. Again, we can also focus on merely computing dS (π) – the length of an optimal sorting sequence; 7.

(8) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Notation and definitions pertaining to permutations I. Permutations can be written in one- or two-row notation:   1 2 3 4 5 6 π= = h4 1 6 2 5 3i. 4 1 6 2 5 3. I. We deal exclusively with [n] = {1, 2, . . . , n};. I. All permutations of [n] with composition form the symmetric group Sn ;. I. Composition: the usual ◦, which means that in π ◦ σ, σ is applied first;. 8.

(9) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Sorting permutations by exchanges I. Simple operation : exchange any two elements:. Example. I. 4. 1. 6. 2. 5. 3. 2. 1. 6. 4. 5. 3. So we want to sort a permutation by performing as few such exchanges as possible;.

(10) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Sorting permutations by exchanges I. Here’s a more complete example:. Example (a sorting sequence for h4 1 6 2 5 3i). I. 4. 1. 6. 2. 5. 3. 1. 4. 6. 2. 5. 3. 1. 2. 6. 4. 5. 3. 1. 2. 3. 4. 5. 6. It works... but can we do better?.

(11) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Sorting permutations by exchanges. I. Our goal: each element should be “at the right place”;. I. Some elements are already where they should be, so they won’t move; Strategy: read permutation from left to right, and:. I. I I. if πi = i, pass; otherwise, exchange πi with i;. 11.

(12) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Sorting permutations by exchanges. I. The algorithm obviously terminates;. I. At every step, we “fix” one or two positions;. I. We use the minimum number of exchanges;. I. On the other hand, we’d like to be able to compute the distance without sorting;. 12.

(13) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Cycles I. Computing the distance requires using the cycles of the permutation;. I. Those cycles are obtained by iterating the permutation’s action on {1, 2, . . . , n}, stopping when all elements have been visited;. Example (cycles of h4 1 6 2 5 3i) 1. 2. 3. 4. 5. 6. 4. 1. 6. 2. 5. 3 13.

(14) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Disjoint cycle decomposition of permutations I. Each permutation decomposes into disjoint cycles:   1 2 3 4 5 6 π= = (1, 4, 2)(3, 6)(5). 4 1 6 2 5 3. I. The graph of the permutation π, denoted by Γ(π), pictures this decomposition:. 4. 1. 6. 2. I. The number of cycles of π is written c(π);. I. 1-cycles are sometimes omitted;. 5. 3. 14.

(15) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Cycles and sorting I. Cycles of length 1 correspond to sorted elements; all other cycles consist of elements that are misplaced;. 4 I. I. 1. 6. 2. 5. 3. Sorting comes down to splitting cycles until we only have cycles of length 1; Our algorithm repeatedly splits k-cycles into a 1-cycle and a (k − 1)-cycle;. 15.

(16) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Computing the “exchange distance” exc(·) I. At each step, we can always create a new cycle if π 6= ι, so: exc(π) ≤ n − c(π). I. And we can’t do better, so: exc(π) ≥ n − c(π). I. Therefore: exc(π) = n − c(π). 16.

(17) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Lessons from sorting by exchanges. I I. Note that ι is the only permutation with n cycles; The formula exc(π) = n − c(π) expresses: I. I. I. the difference between the number of cycles we have and the number of cycles we want; and the fact that at each step, we can obtain exactly one new cycle.. This point of view will be crucial to sorting problems;. 17.

(18) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Block-interchanges and transpositions I I. The mutations we observe in evolution (may) act on intervals; We’ll now look at two generalisations of exchanges: 1. block-interchanges; 2. transpositions;. I. Block-interchanges exchange two disjoint intervals in a permutation; 1 2 34 5 6 7 8 9. I. 1 67 5 234 8 9. Transpositions displace an interval of the permutation; 1 2 34 5 6 7 8 9 10. 1 5 67 8 2 3 4 9 10.

(19) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Computing the associated distances I. Does the disjoint cycle technique “work”?. I. No: let π = hn/2 + 1 n/2 + 2 · · · n 1 2 · · · n/2i; then bid(π) = td(π) = 1, but π has n/2 cycles of length two.. I. We’ll need something else;. Definition The directed breakpoint graph of hπ1 π2 · · · πn i is defined by: 1. an ordered vertex set V = (π0 = 0, π1 , π2 , . . . , πn ); 2. a bicoloured arc set A = AB ∪ AG , where: 2.1 AB = {(πi , πi−1 (mod n+1) ) : 0 ≤ i ≤ n}; 2.2 AG = {(i, i + 1 (mod n + 1)) : 0 ≤ i ≤ n};.

(20) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. The “directed breakpoint graph” I I. (the term “breakpoint” will be explained later); Let’s build the directed breakpoint graph of π = h4 1 6 2 5 7 3i: 0 3. 1. build the ordered vertex set (π0 = 0, π1 , π2 , . . . , πn );. 4. 7. 1 5. 6. 2. add black arcs for every ordered pair (πi , πi−1 (mod n+1) ); 3. add grey arcs for every ordered pair (i, i + 1 (mod n + 1));. 2 DBG (π) decomposes in a unique way into alternating cycles.

(21) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Intuitions behind DBG (π) – 1 I. Both “monochromatic” cycles represent an ordering: 1. the black one represents the one we have (“reality”); 2. the grey one represents the one we want to obtain (“desire”);. 0 3. 0 4. 7. 3 1. 5. 6 2. 4. 7. 1 5. 6 2. 21.

(22) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Intuitions behind DBG (π) – 2 I. I. The alternating cycles are a blend of “reality” and “desire”, and we must act on those cycles to turn “reality” into “desire”; When we are done, we have the largest number of cycles; ι0 π0 0 0 ι7 ι1 π7 π1 3 4 7 1. π6 7 π55. 1 π2. 2 π4. 6π 3. ι6 6 ι5 5. 2 ι2. 4 ι4. 3ι 3 22.

(23) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Proving lower bounds. I. One way of proving lower bounds on distances: be optimistic: 1. find out the “best case”; 2. pretend we’re always in that case;. 23.

(24) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Lower bounding the block-interchange distance I. A block-interchange increases the number of cycles in DBG by at most 2:. Theorem ([Christie, 1996]) For all π in Sn : bid(π) ≥. n+1−c(DBG (π)) . 2.

(25) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Lower bounding the transposition distance I. A transposition increases the number of odd cycles in DBG by at most 2:. Theorem ([Bafna and Pevzner, 1998a]) For all π in Sn : td(π) ≥. n+1−codd (DBG (π)) . 2 25.

(26) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Proving upper bounds. I. One way of proving upper bounds on distances: be pessimistic: 1. find out the “worst case”; 2. pretend we’re always in that case;. I. For better upper bounds: be less pessimistic: I I. case analyses of varying difficulty; look at sequences of moves;. 26.

(27) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Upper bounds. I. For block-interchanges, we have:. Theorem ([Christie, 1996]) For all π in Sn : bid(π) ≤ I. n+1−c(DBG (π)) . 2. Which equals the lower bound and therefore the exact distance;. 27.

(28) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Upper bounds. I. For transpositions:. Theorem ([Bafna and Pevzner, 1998a]) For all π in Sn : td(π) ≤ 34 (n + 1 − codd (DBG (π))) = 32 OPT . I. Current best approximation: 11/8 [Elias and Hartman, 2006]. 28.

(29) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Reversals I. Another type of mutation occurs frequently: reversals, which reverse the order of elements on an interval of the permutation;. Example 4. 1. 6. 2. 5. 3. 4. 5. 2. 6. 1. 3. 29.

(30) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Reversals I. Here’s an optimal sorting sequence1 :. Example (optimal sorting sequence of reversals). I 1. Bad news: stuff seen so far doesn’t work; Obtained using GRIMM: http://grimm.ucsd.edu/cgi-bin/grimm.cgi.

(31) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Breakpoints Definition A breakpoint in a permutation π is an ordered pair (πi , πi+1 ) with |πi+1 − πi | = 6 1 (otherwise it’s an adjacency).. Example (breakpoints of h3 1 5 4 2 8 6 7i) 3•1•5 4•2•8•6 7 I. This notion characterises elements that are “relatively misplaced”: they’re not consecutive in ι, nor in hn n − 1 · · · 2 1i.

(32) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Breakpoints I I I. Note that ι = h1 2 · · · ni has no breakpoint; That’s also the case of hn n − 1 · · · 2 1i; To distinguish them, we frame permutations: hπ1 π2 · · · πn i 7→ h0 π1 π2 · · · πn n + 1i. I I. Those artificial elements are denoted by π0 and πn+1 ; Which leads to the following definition:. Definition The number of breakpoints of a permutation π in Sn is b(π) = |{(πi , πi+1 ) | 0 ≤ i ≤ n and |πi+1 − πi | = 6 1}|. I. Example: h0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9i ⇒ b(π) = 7;.

(33) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Usefulness of breakpoints. I. Observation: reversals can “fix” breakpoints:. Example 0•3•1•5 4•2•8•6 7•9. 0•3•1•5 4•8•2•6 7•9. 0•3•1 2•8 7 6 5 4•9. 0•3•1 2•8•4 5 6 7•9. 33.

(34) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Lower bound I. A reversal can fix at most two breakpoints;. I. If we’re lucky, we are always in that case;. I. This yields the following lower bound:. Theorem ([Kececioglu and Sankoff, 1995]) For every permutation π: rd(π) ≥ b(π)/2.. 34.

(35) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Upper bound. I. On the other hand we can always fix at least one breakpoint (details: [Kececioglu and Sankoff, 1995]). I. So the algorithm is a 2-approximation: b(π)/2 ≤ rd(π) ≤ b(π). I. Can we do better?. 35.

(36) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. The undirected breakpoint graph I. Yes, but we need a more appropriate structure:. Definition The undirected breakpoint graph of the permutation π in Sn , written UBG (π) = (V , E ), is defined by: I. V = (π0 = 0, π1 , π2 , . . . , πn , πn+1 = n + 1);. I. E = {{πi , πi+1 } | 0 ≤ i ≤ n} ∪ {{i, i + 1} | 0 ≤ i ≤ n}. {z } | {z } | black edges grey edges. 36.

(37) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. The undirected breakpoint graph: example I. Let us build the undirected breakpoint graph π = h3 1 5 4 2 8 6 7i:. Example. 0 3 1 5 4 2 8 6 7 9 1. 2. 3. 4.. frame the permutation; build ordered vertex set (π0 = 0, π1 , π2 , . . ., πn+1 = n + 1); add black edges for every pair {πi , πi+1 }; add grey edges for every pair {i, i + 1};.

(38) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Decomposition I. That graph decomposes into cycles:. Example. 0 3 1 5 4 2 8 6 7 9 I. ... but the decomposition is no longer unique!. Example 0 2 1 3 has either one 3-cycle or a 1-cycle and a 2-cycle.. 38.

(39) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Cycle decompositions I. We use decompositions to derive lower bounds;. I. A reversal acts on one or two cycles and can split / merge cycles;. πi−1. I. πi. ···. πj. πj+1. πi−1. πj. So we’re tempted to say: rd(π) ≥ n + 1 − c(UBG (π)). ···. πi. πj+1.

(40) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Decomposition I I I. ... but recall that the decomposition is not unique!!! The more cycles we have, the closer we are to UBG (ι); Therefore, we have in fact:. Theorem [Bafna and Pevzner, 1996] For all π in Sn : rd(π) ≥ n + 1 − c ∗ (UBG (π)), where c ∗ (UBG (π)) is the number of cycles in a maximum cardinality decomposition. I. Unfortunately, finding a maximum cardinality decomposition is NP-hard [Caprara, 1999b];.

(41) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. The model Exchanges Larger-scale transformations The directed breakpoint graph Breakpoints The undirected breakpoint graph. Results on sorting permutations. prefix. I. I. Here’s a nonexhaustive summary on sorting permutations using various operations: Operation exchange block-interchange double cut-and-joins reversal transposition exchange reversal transposition. Sorting Distance O(n) [Knuth, 1995] O(n) [Christie, 1996] NP-hard [Chen, 2010] NP-hard [Caprara, 1999c] NP-hard [Bulteau et al., 2012b] O(n) [Akers et al., 1987] NP-hard [Bulteau et al., 2012a] ? ?. Best approximation 1 1 ? 11/8 [Berman et al., 2002] 11/8 [Elias and Hartman, 2006] 1 2 [Fischer and Ginzinger, 2005] 2 [Dias and Meidanis, 2002]. Let us move on to our next model: signed permutations; 41.

(42) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Motivation I. Permutations lack realism: DNA segments are oriented;. I. We need to take orientation into account; Therefore, two DNA segments match if:. I. I I. they are the same, or one is the reverse complement of the other. (picture by Madeleine Price Ball, taken from Wikimedia) 42.

(43) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Motivation I. So instead of permutations:. Example (genomes → permutations) 5. 1. 2. 4. 7. 3. 6. (A). 7. (B). genome rearrangements. 1. I. 2. 3. 4. 5. 6. We now have signed permutations:. Example (genomes → signed permutations) −5. +1. +2. +4. −7. −3. +6. (A). +7. (B). genome rearrangements. +1. +2. +3. +4. +5. +6.

(44) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Signed (per)mutations. I. Note: this does not mean that everything you know about unsigned comparisons is useless: 1. orientation information is not always available; 2. ideas from unsigned comparisons lead to ideas for signed comparisons;. I. Mutations may now act on a segment’s place and orientation;. 44.

(45) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Tools. I. As expected, tools we’ve seen previously cannot be used here because they do not take signs into account;. I. Then again, some ideas can be adapted;. 45.

(46) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Notation and definitions pertaining to signed permutations I. We deal exclusively with {±1, ±2, . . . , ±n};. I. Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;. I. Permutations can be written in one- or two-row notation:   −4 −3 −2 −1 1 2 3 4 π= = h−3 1 −4 −2i. 2. 4 −1. 3 −3. 1. −4. −2. I. We will restrict ourselves to the mapping of positive elements;. I. Composition works as before;. I. The corresponding group is the hyperoctahedral group Sn± ;. 46.

(47) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Signed reversals I. Reversals can be generalised to signed reversals, which not only reverse an interval but also flip signs along the interval;. Example (signed reversal). I. -5. 1. 2. 4. -7. -3. 6. -5. 1. 3. 7. -4. -2. 6. As before, we’re interested in sorting a given signed permutation using as few signed reversals as possible (or merely computing the length of a shortest sequence); 47.

(48) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Example: sorting by signed reversals π=. σ=. −5. +1. +2. +4. −7. −3. +6. −5. +1. +2. −4. −7. −3. +6. −5. +1. +2. −4. −7. −6. +3. −5. +1. +2. −4. −3. +6. +7. −5. +1. +2. +3. +4. +6. +7. −5. −4. −3. −2. −1. +6. +7. +1. +2. +3. +4. +5. +6. +7. srd(π, σ) ≤ 6.

(49) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Solving the problem. I. How do we attack this problem?. I. Breakpoints can be generalised to the signed setting .... I. ... but you already know / guess that this will at best provide an approximation;. I. Instead, we’re going to adapt the breakpoint graph to the signed setting;.

(50) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). The breakpoint graph I. The breakpoint graph in the signed case is slightly different:. π= −5 +1 +2 +4 −7 −3 +6 π 0 = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15. 1. double π’s elements (i 7→ {2|i| − 1, 2|i|}) and add 0 and 2n + 1 2. elements of. π0. = vertices. 3. black edges connect distinct adjacent genes 4. grey edges connect distinct consecutive genes. 0 π12. 0 π15 0 π14 0 12 15 π13 11. π00 0. 5. π10 10 9. 0 π11 6. π20. 1 π30. 0 13 π10. 2 π0. 4. 14 0. π9. 8 7 π80 π70. 4 π60. 3 0 π5. 50.

(51) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Using the breakpoint graph I. The breakpoint graph is 2-regular and decomposes as such into alternating cycles in a unique way;. I. The breakpoint graph of h1 2 · · · ni contains the largest number of cycles: 12. 15. 0. 11. 14 10. 5 6. 2 3 8. I. 7. 4. 1. 12 1. 14. 0. 13 9. 13. 15. 2. 11. 3. 10. 4 9. 5 8. 7. 6. ⇒ goal: create new cycles in as few moves as possible;. 51.

(52) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Overview of Hannenhalli and Pevzner’s solution. I. [Hannenhalli and Pevzner, 1999] came up with the first polynomial-time algorithm for this problem: 1. Transform π into π̃ (simpler, does not affect distance); 2. Find an optimal sorting sequence for π̃; 2.1 identify “good” and “bad” cycles in BG (π̃); 2.2 identify “good” and “bad” components in BG (π̃); 2.3 “sort” those components to optimality;. 3. Convert it back to an optimal sorting sequence for π;. 52.

(53) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Transformation into simple permutations I. A permutation π is simple if BG (π) contains only cycles of length ≤ 2;. I. The transformation was introduced to simplify analysis;. I. Nice property: the transformation preserves the distance: if π̃ is the “simplified” version of π, then srd(π̃) = srd(π);. I. So we can assume from now on that the permutation to sort is simple;. I. [Gog and Bader, 2008] give fast algorithms to achieve the conversions;.

(54) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). A lower bound on the signed reversal distance I. A signed reversal involves black edges belonging to at most two cycles;. I. The only way to increase c(BG (π)) is to split cycles:. 0 0 π2i π2i+1. I. ···. 0 0 π2j π2j+1. 0 π2i. 0 π2j. Therefore, for all π in Sn± : srd(π) ≥ n + 1 − c(BG (π)).. ···. 0 0 π2i+1 π2j+1.

(55) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Good and bad cycles I. However, we cannot always split cycles:. 0 0 π2i π2i+1. I. ···. 0 0 π2j π2j+1. 0 π2i. 0 π2j. ···. 0 0 π2i+1 π2j+1. Hence the inequality: we can split “good” cycles, and we cannot split “bad” cycles; I. standard terminology: “good” = oriented, “bad” = unoriented;.

(56) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Handling good cycles I. Things are actually more complicated than that;. I. Even when we don’t have bad cycles, the order in which we do things matters!. Example (careless and careful cycle splitting). (bad) 0. 6. 5. 10. 9. 2. 1. 7. 8. 3. 4. 11. 0. 6. 5. 10. 9. 2. 1. 7. 8. 3. 4. 11. −→. 0. 1. 2. 9. 10. 5. 6. 7. 8. 3. 4. 11. 0. 6. 5. 10. 9. 8. 7. 1. 2. 3. 4. 11. (good). −→. 56.

(57) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Handling bad cycles I. “Bad” cycles are not so bad if we can make them “good” (see previous example);. I. But sometimes we can’t:. Example (a minimal permutation with only bad cycles). 0. 5. 6. 3. 4. 1. 2. 7. 57.

(58) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Components. I. Although reversals only modify the “contents” of a single cycle, they may also modify the configuration of some other cycles;. I. This suggests that cycles are not the right “unit” to deal with;. I. We need to consider collections of cycles, or components instead;. 58.

(59) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Interleaving cycles I. Grey edge {πi0 , πj0 } spans the interval [i, j] in π 0 ;. I. Two grey edges interleave if their spans intersect properly;. I. Two cycles interleave if they contain interleaving edges; The interleaving graph Iπ is defined by:. I. I I. I. V (Iπ ) = cycles of BG (π); E (Iπ ) = pairs of interleaving cycles in BG (π);. A component of the breakpoint graph is a connected component of the interleaving graph;. 59.

(60) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). The interleaving graph Iπ. (source: [Hannenhalli and Pevzner, 1999]) 60.

(61) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Good and bad components I. I. A component is bad if it contains only bad cycles, and good otherwise; Although we must be careful (as seen before), good components are not a problem: I I. I. they contain good cycles, which can be split; applying a signed reversal on a 2-cycle C reverses the orientation of the cycles interleaving with C (and also changes their interleaving relationships); so we just need to make sure at each step that we can keep splitting cycles afterwards;. 61.

(62) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Hurdles I. I. “Bad” (unoriented) components are called hurdles and are a problem; There are two ways of getting rid of them: 1. “cutting” them; 2. “merging” them;. I. Either way, one move must be wasted for each hurdle to turn them into “good” components;. I. Therefore, for all π in Sn± : srd(π) ≥ n + 1 − c(BG (π)) + h(π).. 62.

(63) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). An actual formula I. All this (and many concealed details) leads to a formula for the signed reversal distance:. Theorem ([Hannenhalli and Pevzner, 1999]) For all π in Sn± : srd(π) = n + 1 − c(BG (π)) +. h(π) |{z}. number of hurdles. +. f (π) |{z}. .. special “fortress” case. 63.

(64) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Last time. I. I. We saw an algorithm and a formula for computing the signed reversal distance; Ingredients: I I I. a simplifying transformation; a careful analysis of “good” and “bad” cycles and components; a separate treatment of those components according to their type;. 64.

(65) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Sorting by signed reversals: wrap-up I I. The presented solution runs in time O(n4 ); Improvements were subsequently given both for the sorting problem and computing the related distance: I. I. I. an optimal sequence can be found in O(n3/2 ) time [Han, 2006]; the distance can be computed in O(n) time [Bader et al., 2001];. It is also possible to bypass the breakpoint graph [Bergeron et al., 2002];. 65.

(66) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). A word on “unsigned problems” I. Similar complications arise in the analysis of sorting unsigned permutations by various operations;. I. A natural idea would then be to try and recycle Hannenhalli and Pevzner’s ideas for signed reversals;. I. Unfortunately, the combinatorics of the related problems are not always that nice;. I. Let’s see what happens in the case of sorting by transpositions;. 66.

(67) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Simple permutations I I. Simplifying permutations simplifies the analysis; Unfortunately, there are problems:. The signed case. The unsigned case. I. cycles of length ≤ 2;. I. cycles of length ≤ 3;. I. cycles are either disjoint or connected;. I. several ways to connect two cycles;. I. splitting a cycle “flips” the connected cycles’ orientations;. I. splitting a cycle may or may not affect the connected cycles;. I. the transformation preserves the distance;. I. the transformation only preserves the lower bound; 67.

(68) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Operations on cycles affecting other cycles I. Transpositions acting on cycles may or may not affect interleaving cycles;. Example (crossing cycles [Bafna and Pevzner, 1998b]). 0. 3. 2. 5. 4. 1. 6. 0. 3. 4. 1. 2. 5. 6. Example (noninterfering cycles [Bafna and Pevzner, 1998b]). 0. 3. 5. 1. 4. 2. 6. 68.

(69) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Detecting bad cycles or components I I. As in the problem of sorting by signed reversals, some bad cycles are “planar”; Other bad cycles / components may not be easily detectable;. Example (“invisible” problems ... [Christie, 1998]). 0. 7. 5. 3. 1. 8. 6. 4. 2. 9. Example (... that become visible in π −1 [Christie, 1998]). 0. 4. 8. 3. 7. 2. 6. 1. 5. 9.

(70) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Closing remarks on transpositions. I. Adapting ideas from sorting by signed reversals is not a dead end;. I. But it is not straightforward;. I. “(...) detecting hurdles may not be any easier than determining the transposition distance (...)” [Christie, 1998]. I. The sorting and distance computation problems are NP-hard even on 3-permutations [Bulteau et al., 2012b];.

(71) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Other operations. I. Sorting by signed reversals is (computationally) easy;. I. But genomes do not evolve solely by one type of mutation (be it reversals or something else);. I. So we should take other mutations into account;. I. Seems like a daunting task, but the following operation allows us to do that in a very convenient way;. 71.

(72) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Double cut-and-join (DCJ) I I. The double cut-and-join (DCJ) operation aims at taking more mutations into account in a simple way; As the name suggests, we cut two black edges {a, b} and {c, d}, then join vertices using two new black edges;. (source: [Yancopoulos et al., 2005]) 72.

(73) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). DCJs in action Example (two cuts, two possible joins) −5. a b. +1. +2. +4. −7. c d. −3. +6. −2. −1. b d. −3. +6. b. +1. join {a, c} and {b, d} (signed reversal) −5. a c. +7. −4. join {a, d} and {b, c} (extract a circular chromosome) −3. +6 c. +2. −7. a d. +4. −5.

(74) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Signed reversals Adapting the theory to “unsigned problems” Double cut-and-join (DCJ). Pros and cons of DCJ X Computationally easy; X Conceptually easier than signed reversals (just keep splitting cycles); X Encompasses several operations: I I I. signed reversals (1 DCJ), block-interchanges (2 DCJs, circular intermediate), translocations (1 DCJ).. × May be “too” general; × Weights are not taken into account (recall discussion on probability of a type of mutation);. 74.

(75) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Motivation. I. We saw a model for representing genomes without directionality;. I. We saw another model for taking directionality into account;. I. Both of them lack realism in a crucial way: they don’t allow duplications;. I. And duplications / insertions / deletions account for a very large part of what happens in evolution [Ohno, 1970];. 75.

(76) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Two examples of duplications Example (tandem duplications). (source: K. Aainsqatsi on Wikimedia). Example (whole genome duplication). (source: Eric Lyons on CoGePedia).

(77) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Today. I. Models that take duplications into account;. I. Other approaches to solving the corresponding problems;. I. Other models for those cases where only partial information is available or relevant;. 77.

(78) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Strings. I. Since duplications pervade genomes, we should take them into account;. I. We now see genomes as strings on an alphabet Σ;. I. Be careful: similar segments have been identified, so Σ = {segments} and not {A, C , G , T };. I. Our goal is still to explain evolution using most parsimonious scenarios made of fixed transformations;. 78.

(79) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Strings. I. Note: the restriction to sorting problems does not work anymore; I. if you have two A’s, which one should be “number one”?. I. So we really are interested in transforming one string into another, which is not equivalent to sorting another string;. I. Sorting problems have been considered in that model, but they’re just a special case of a more general problem;. 79.

(80) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Strings I. We can distinguish between several approaches based on gene contents;. I. Either we have exactly the same contents in both genomes (and duplications are of course allowed);. I. Or we have duplications but with different amounts of repetitions (e.g. three 1’s in genome A but only two in genome B);. I. This time the breakpoint graph cannot save us anymore, since we would not know how to connect elements or decompose the graph;. 80.

(81) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Balanced strings I. The number of occurrences of a character c in a string S is denoted by occ(c, S);. Definition (balanced strings) Two strings S and T on an alphabet Σ are balanced if: ∀ c ∈ Σ : occ(c, S) = occ(c, T ). I. Basically, S and T are anagrams;. I. Straightforward generalisation of permutations: we have duplications, but we actually still have the same content in both genomes; 81.

(82) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Comparing balanced strings. I. I. One way of relating genomes’ contents is to identify common segments; In other words, we want to partition genomes into the same set of segments; I I. this is how we obtained (signed) permutations; but now we want to partition the resulting sequences;. 82.

(83) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Generalising breakpoints I. Recall that, for permutations: I. I. I. adjacencies are pairs of adjacent elements in π that are also adjacent in ι = h1 2 · · · ni (or χ = hn n − 1 · · · 1i for reversals); breakpoints are pairs that are not adjacencies;. Recall that, for signed permutations: I. I. adjacencies are pairs of adjacent elements in π that are also adjacent in ι = h1 2 · · · ni (or χ = h−n − (n − 1) · · · − 1i for signed reversals); breakpoints are pairs that are not adjacencies;. I. Those can be generalised to any pair of permutations;. I. And we can do the same thing for strings;.

(84) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Minimum common string partition I I. A partition of a string S is a set of strings that can be concatenated to obtain S; A common partition of two strings S and T is a set of strings that can be concatenated to obtain both S and T ;. Example (common string partitions) Here’s a common partition of “dictionary” and “indicatory”: S1 d. i. c. i S3. n S5. d S1. S2 t. S3 i. S4 o. S5 n. S6 a. S7 r. i. c. a S6. t S2. o S4. r S7. y. y.

(85) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Minimum common string partition I. A common string partition is minimum if there is no smaller common string partition for the two strings under consideration;. I. This leads to the following decision problem:. Problem (minimum common string partition (mcsp)) Instance: balanced strings S and T , a bound k ∈ N; Question: is there a common partition of S and T with at most k blocks?.

(86) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Relation(s) to rearrangement problems. I. Recall that breakpoints were pairs of elements adjacent in one genome but not in the other;. I. Common string partitions generalise that point of view to an arbitrary number of elements in each part;. I. So if we have a minimum common string partition for S and T , we get the number of breakpoints between strings S and T;. 86.

(87) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. About mcsp I. Bad news about mcsp: I. I. I. Good news about mcsp: I. I. NP-hard, even if only one gene family is nontrivial [Blin et al., 2004]; APX-hard, even if no character appears more than twice [Goldstein et al., 2005]; fixed parameter tractable: a solution of size k can be found in time f (k) · poly(n) (n = |S| = |T |) [Bulteau and Komusiewicz, 2014];. Greedy approach [Goldstein and Lewenstein, 2011]: repeatedly select an LCS without any marked letter; X simple and fast (runs in O(n) time); × approximation ratio between Ω(n0.43 ) and O(n0.69 ) [Kaplan and Shafrir, 2006]; 87.

(88) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Minimum common string partition: variants. I. One can also consider signed strings: each segment is then equivalent up to a reversal;. I. Or equivalence under full reversals: a partition of S is also a partition of T if one can concatenate its elements to obtain T or its reverse;. I. Those variants are still hard, but the positive results do not straightforwardly generalise [Bulteau and Komusiewicz, 2014];. 88.

(89) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Unbalanced strings. I. Of course, we are not always so lucky that our genomes are just anagrams;. I. Most of the time, duplications are not balanced;. I. So, what do we do?. 89.

(90) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Arbitrary strings. I. I. One idea is to try and match different copies of a same gene accross two genomes; Three general approaches have been proposed: 1. the exemplar model; 2. the intermediate model; 3. the full model;. I. All three are based on a notion of matching;. 90.

(91) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Matching and pruning Definition (gene matching) A gene matching between two strings S and T is a set of disjoint pairs {Si , Tj } such that Si = Tj for every such pair (1 ≤ i ≤ |S|, 1 ≤ j ≤ |T |).. Definition (pruning) Given two strings S and T and a gene matching M, the M-pruning is the pair (S 0 , T 0 ) obtained by removing all unmatched characters from S and T and relabelling the remaining characters according to M. (examples to appear shortly) 91.

(92) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Matching(s) and pruning(s). I. Matchings will depend on the model we use;. I. Since prunings are derived from matchings, they will also vary depending on the underlying model;. I. Let us review them on examples;. 92.

(93) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Exemplar matching / pruning I. In the exemplar model, we match only one copy of each gene:. Example (exemplar matching / pruning) S=. 1. 2. −4 −2. 3. 1. 4. −3. T =. 4. 1. −3 −2. 2. 1. 2. 4. S0 = 1. 2. T0 = 1. 3. −3 −2. 4 4. 4.

(94) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Intermediate matching/pruning I. In the intermediate model, we match at least one copy of each gene:. Example (intermediate matching/pruning) S=. 1. 2. −4 −2. 3. 1. 4. −3. T =. 4. 1. −3 −2. 2. 1. 2. 4. S0 = 1. 2. T0 = 1. 3. −3 −2. 10. 4. 10. 4. 4.

(95) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Full matching / pruning I. In the full model, we match as many copies of each gene as possible:. Example (full matching / pruning) S=. 1. 2. −4 −2. 3. 1. 4. −3. T =. 4. 1. −3 −2. 2. 1. 2. 4. S0 = 1. 2. −40 −20. 3. 10. 4. T 0 = 40. 10. −3. 1. 2. 4. 20. 4.

(96) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Using matchings and prunings I. Once we’ve pruned our input strings, we can compare them as if they were permutations;. I. This gives rise to many variations on the following theme:. Problem (“(M, d)-comparison”) Input: two strings S and T Goal: find an “M matching” such that the resulting “M pruning” (S 0 , T 0 ) minimises d(S 0 , T 0 ) I. Here M ∈ {exemplar, intermediate, full}, and d is any distance on Sn or Sn± (with n = |S 0 | = |T 0 |); 96.

(97) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Strings. I. This is not “just a matching problem”; I. I. I. in matching problems, every edge is given a weight, and we have to optimize a function that takes all weights into account; while here, we look for a matching that optimises a quantity, but the edge weights are not fixed to begin with;. In other words: in matching problems we can compute the cost of a partial solution, while here we must have a full matching before we can even begin to compute the cost;. 97.

(98) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Duplications in evolution Balanced strings General strings. Strings: extensions. I. Strings can of course be signed to take directionality into account;. I. They can also be circular;. I. And of course we could have a mix of both to represent different chromosomes;. 98.

(99) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Other models. I I. We’ve mostly seen (signed) permutations and strings so far; Other models may be more suitable, according to: I I. I. the data we have; the relations we want to take into account;. We mention briefly the following structures: I I. posets; set systems;. 99.

(100) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. The need for other models I. Most genomes consist of several chromosomes:. 100.

(101) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Posets I. Recall that genomes are not directly copied from a long string of DNA to a drive; 1. “small” subsequences called reads are identified; 2. then those reads are assembled to form the target genome;. I. We still want to be able to compare genomes even if only partial gene order information is available;. I. This naturally leads us to compare posets instead of permutations or strings;. 101.

(102) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Posets I I. Informally, although we may not know the complete ordering, we may know parts of it; So segments are partially ordered, and genomes may be represented by directed acyclic graphs, where: I I. I. vertices stand for segments; arc (u, v ) means “segment u precedes segment v ”;. In this regard, permutations are paths of maximal length;. Example (a genome as a poset) −2 −5. 1 3. 6. 10. 9. 12.

(103) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Comparing genomes as posets. I. Comparing genomes G1 and G2 represented as posets is based on permutations: I. I. find linear extensions L1 and L2 that minimise d(L1 , L2 );. Another way of trying to aggregate their contents is by: I I. merging them into a conflict-free graph; finding a linear extension of that graph;. 103.

(104) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Finding an “agreement” for posets −2 G1 :. −5. 1. 6. 10. 8. 12. 3 9 G2 :. 1. −2. −4. −5. 7. 12 11. 6 G1 ∪ G2 :. 1. −2. −4. 8. −5. 12 7. 3. 10. 9 11. 104.

(105) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Set systems and the syntenic distance. I. Recall that chromosomes are ordered sets of genes;. I. Sometimes we’re not interested in order, but in the fact that two segments belong to the same chromosome;. I. So we view a genome as a family of (unordered) sets of genes;. 105.

(106) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Set systems and the syntenic distance I. Three operations are taken into account in that setting: {{a, b, c}, {p, q, r }, {x, y }} fission. fusion. {{a, b}, {c}, {p, q, r }, {x, y }}. translocation {{a, b, c, x, y }, {p, q, r }}. {{a, p}, {b, c, q, r }, {x, y }} I. The syntenic distance between two genomes is then the minimum number of such operations that are needed to transform one genome into the other;. 106.

(107) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Set systems and the syntenic distance I. There is a compact representation that allows us to assume that: 1. our input is {S1 , S2 , . . . , Sk } (subsets of {1, 2, . . . , n}); 2. our target is {{1}, {2}, . . . , {n}};. I. So we want to obtain that genome using as few fissions, fusions and translocations as possible;. I. Syntenic genes are simply genes that belong to the same chromosome;. 107.

(108) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Synteny graph I. A graph-theoretic approach for attacking the problem was proposed:. Definition The synteny graph of an instance S (n, k) is defined by: I. V = {S1 , S2 , . . . , Sk };. I. E = {{Si , Sj } | Si ∩ Sj 6= ∅, 1 ≤ i 6= j ≤ n};. I. The synteny graph of our target {{1}, {2}, . . . , {n}} has n components;. 108.

(109) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. Mutations and the synteny graph I. Translocations, fusions and fissions affect the graph in different ways; I I I. translocations (may) disconnect adjacent vertices; fissions split vertices into two nonadjacent vertices; fusions: opposite of fissions;. I. Our goal is to obtain n components;. I. It can be proved that the distance is at least n − p (where p is the number of components in our instance’s graph);. 109.

(110) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. Posets Set systems. About the syntenic distance I. The synteny graph dictates that we want to increase the number of connected components;. I. In that regard, restricting oneself to “intra-component moves” seems optimal;. I. But any approach that does this is a 2-approximation [Liben-Nowell, 2001];. I. No better approximation is known;. I. And computing the distance or an optimal scenario is NP-hard [DasGupta et al., 1998];. 110.

(111) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. SAT solvers Linear programming. Today’s models: wrap-up. I. As soon as we have duplications, most problems become hard (to solve exactly, or even to approximate within a reasonable factor). I. As soon as we forget about order (partially or completely), we also end up with difficult problems;. I. Yet the problems still have to be solved;. 111.

(112) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. SAT solvers Linear programming. Alternative approach: sat solvers I. sat solvers are highly-optimised programs for solving the well-known NP-complete satisfiability problem [Cook, 1971]:. Problem (satisfiability (sat)) Input: a Boolean formula φ in conjunctive normal form. Question: is there a satisfying assignment for φ? I. Idea: take advantage of these solvers;. 112.

(113) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. SAT solvers Linear programming. Alternative approach: sat solvers I. The workflow is as follows: PROBLEM. INSTANCE. BOOLEAN FORMULA SAT SOLVER. translation. SATISFYING ASSIGNMENT SOLUTION. 113.

(114) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. SAT solvers Linear programming. Alternative approach: linear and pseudo-boolean programming I. I. I. Linear programs are of the form: maximise c T x subject to Ax ≤ b and x ≥ 0 Pseudo-boolean programs: same form, but the function to optimise maps {0, 1}n to R (versus {0, 1} for boolean functions); Specialised solvers also exist for those and were used to solve rearrangement problems on strings [Angibaud et al., 2007] and posets [Angibaud et al., 2009];. 114.

(115) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. SAT solvers Linear programming. Comparative genomics wrap-up I. Here we talked mostly about computing “edit distances” between genomes;. I. Other measures of similarity exist that are not associated to mutations;. I. Many hard problems; Much remains to be done in order to satisfy biologists;. I. I I I. realistic models; software; .... 115.

(116) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Beyond pairwise comparisons. I. The genome rearrangement problems we’ve seen were formulated in a pairwise fashion;. I. But actually, more than two genomes can be taken into account;. I. Unsurprisingly, most problems become hard in that setting;. 116.

(117) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Why more than two genomes? I. A sequence does not yield enough information for ancestral genome reconstruction: G1. I. G2. Taking an additional genome into account restricts our choices: G3. G1 I. G2. What’s more, it’s ultimately one of our goals;.

(118) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Median problems I. Measures of similarities between genomes are useful in reconstructing phylogenies;. Example (phylogeny from distance matrix) a b c d e I. a 0 2 3 6 6. b 2 0 3 6 6. c 3 3 0 5 5. d 6 6 5 0 4. e 6 6 5 4 0. c a. 1. b. 1. 1. 1. 2. d. 2 2. e. (The matrix must satisfy some conditions [Buneman, 1971]);. 118.

(119) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Median problems I. Parsimony again: search for a tree that minimises the total number of evolutionary events (i.e. the sum of all edge weights);. I. In its simplest form, the problem we want to solve is:. Problem (median of three) Given: π, σ, τ in Sn± ; a distance d : Sn± × Sn± → N. Find: a permutation µ in Sn± that minimises w (µ) = d(π, µ) + d(σ, µ) + d(τ, µ). I. Can be generalised to more than three input permutations;. 119.

(120) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Generic bounds [Siepel and Moret, 2001] I. Generic lower and upper bounds for any distance: π d(π, µ). d(π, σ). d(µ, σ). µ. d(µ, τ ). σ. d(π, τ ). τ d(σ, τ ). if µ=π. if µ=σ. if µ=τ. z }| { z }| { z }| { I w (µ) ≤ min{d(π, σ) + d(π, τ ), d(π, σ) + d(σ, τ ), d(π, τ ) + d(σ, τ )}. I 2w (µ) = d(π, µ) + d(π, µ) + d(σ, µ) + d(σ, µ) + d(τ, µ) + d(τ, µ)2w (µ) =. d(π, µ) + d(π, µ) + d(σ, µ) + d(σ, µ) + d(τ, µ) + d(τ, µ) ≥ d(π, σ) + d(π, τ ) + d(σ, τ ) (triangle inequalities). 120.

(121) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Results on median problems I. What has been done: Operation or measure breakpoint signed breakpoint exchange signed reversal signed double-cut-and-join transposition. I. Median of three NP-hard [Bryant, 1998] NP-hard [Bryant, 1998] ? NP-hard [Caprara, 2003] NP-hard [Caprara, 2003] NP-hard [Bader, 2011]. Best approximation 5/3 [Caprara, 2002] 7/6 [Pe’er and Shamir, 2000] ? 4/3 [Caprara, 1999a] 4/3 [Caprara, 1999a] ?. What could be done: 1. complexity of the exchange median problem? (trivial for 2 permutations, NP-hard for ≥ 4; what about 3?) 2. better approximations; 3. “median clouds” [Eriksen, 2009];. I. More on phylogenetics: next two sessions;. 121.

(122) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. Further topics. I. Other topics could have been discussed: I I I I I I. what to do in the presence of multiple optimal sequences? what can be said about the distribution of those distances? how else can we assess the quality of the solutions? how do we modify them if they’re unsatisfactory? what other biological constraints can we add? .... 122.

(123) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. References I Akers, S. B., Krishnamurthy, B., and Harel, D. (1987). The star graph: An attractive alternative to the n-cube. In ICPP’87, pages 393–400. Pennsylvania State University Press. Angibaud, S., Fertin, G., Rusu, I., and Vialette, S. (2007). A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. Journal of Computational Biology, 14(4):379–393. Angibaud, S., Fertin, G., Thévenin, A., and Vialette, S. (2009). Pseudo boolean programming for partially ordered genomes. In Ciccarelli, F. and Miklós, I., editors, RECOMB-CG, volume 5817 of Lecture Notes in Computer Science, pages 126–137. Springer. Bader, D. A., Moret, B. M. E., and Yan, M. (2001). A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. Journal of Computational Biology, 8(5):483–491. Bader, M. (2011). The transposition median problem is NP-complete. Theoretical Computer Science, 412(12-14):1099–1110.. 123.

(124) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. References II Bafna, V. and Pevzner, P. A. (1996). Genome rearrangements and sorting by reversals. SIAM J. Comput., 25(2):272–289. Bafna, V. and Pevzner, P. A. (1998a). Sorting by transpositions. SIAM Journal on Discrete Mathematics, 11(2):224–240. Bafna, V. and Pevzner, P. A. (1998b). Sorting by transpositions. SIAM Journal on Discrete Mathematics, 11(2):224–240. Bergeron, A., Heber, S., and Stoye, J. (2002). Common intervals and sorting by reversals: a marriage of necessity. Bioinformatics, 18(Suppl 2):S54–S63. Berman, P., Hannenhalli, S., and Karpinski, M. (2002). 1.375-approximation algorithm for sorting by reversals. In ESA’02, volume 2461 of LNCS, pages 200–210. Springer-Verlag. Blin, G., Fertin, G., Chauve, C., et al. (2004). The breakpoint distance for signed sequences. In 1st Conference on Algorithms and Computational Methods for biochemical and Evolutionary Networks (CompBioNets’ 04), volume 3, pages 3–16.. 124.

(125) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. References III Bryant, D. (1998). The complexity of the breakpoint median problem. Technical report, Université De Montréal. Bulteau, L., Fertin, G., and Rusu, I. (2012a). Pancake flipping is hard. In MFCS, volume 7464 of Lecture Notes in Computer Science, pages 247–258. Springer. Bulteau, L., Fertin, G., and Rusu, I. (2012b). Sorting by transpositions is difficult. SIAM Journal on Discrete Mathematics, 26(3):1148–1180. Bulteau, L. and Komusiewicz, C. (2014). Minimum common string partition parameterized by partition size is fixed-parameter tractable. In Proc. 25th SODA. to appear. Buneman, P. (1971). The recovery of trees from measures of dissimilarity. Mathematics in the Archaeological and Historical Sciences, pages 387–395. Caprara, A. (1999a). Formulations and hardness of multiple sorting by reversals. In RECOMB’99, pages 84–93, New York, NY, USA. ACM.. 125.

(126) Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons. From comparisons to phylogenies Bounds Selected results. References IV Caprara, A. (1999b). Sorting Permutations by Reversals and Eulerian Cycle Decompositions. SIAM Journal on Discrete Mathematics, 12(1):91. Caprara, A. (1999c). Sorting permutations by reversals and eulerian cycle decompositions. SIAM Journal on Discrete Mathematics, 12(1):91–110 (electronic). Caprara, A. (2002). Additive bounding, worst-case analysis, and the breakpoint median problem. SIAM Journal on Optimization, 13:508–519. Caprara, A. (2003). The reversal median problem. INFORMS Journal on Computing, 15:93–113. Chen, X. (2010). On sorting permutations by double-cut-and-joins. In COCOON’10, volume 6196 of LNCS, pages 439–448. Springer-Verlag. Christie, D. A. (1996). Sorting permutations by block-interchanges. Inf. Process. Lett., 60(4):165–169.. 126.

Références

Documents relatifs

Introduction Permutations Signed permutations Strings Other models Alternative approaches Beyond pairwise comparisons.. Context

The model Exchanges Larger-scale transformations The directed breakpoint graph.. Modelling genomes as

The model Exchanges Larger-scale transformations The directed breakpoint graph.. Modelling genomes as

I As before, we’re interested in sorting a given signed permutation using as few signed reversals as possible (or merely computing the length of a shortest sequence);... Solving

In particular, the combinatorial viewpoint introduced here allows us to obtain several powerful and generic results (Theorems 1 to 3) about the number of hired candidates and

Keywords Regular class of permutations · Signature of a permutation · Uniform random sampling · Exponential generating function · Timed automata · Boltzmann

Then, the 6 × 6 matrix used moves from the basis element 0o – which corresponds to the previous answer given by the integer - to the basis element 1o – to ask what symbol preceded

We cannot apply such an argument for the virtual braid group VB n since we do not know if it can be embedded into the automorphism group of a finitely generated free group... We