• Aucun résultat trouvé

Sorting With Forbidden Intermediates

N/A
N/A
Protected

Academic year: 2022

Partager "Sorting With Forbidden Intermediates"

Copied!
39
0
0

Texte intégral

(1)Sorting With Forbidden Intermediates Carlo Comin. Anthony Labarre. Romeo Rizzi. Stéphane Vialette. February 15th, 2016. 1.

(2) Genome rearrangements for permutations I. Permutations model genomes with the same “contents” without duplication: 5. 1. 2. 4. 7. 3. 6. (A). genome rearrangements (only affect order). 1. 2. 3. 4. 5. 6. 7. (B). I. The actual numbering is irrelevant, so we assume either genome is the identity permutation ι = h1 2 · · · ni;. I. The classical (family of) problem(s):. genome rearrangement (permutations) Input: a permutation π in Sn , a set S of (per)mutations; Goal: find a shortest sorting sequence of elements of S for π..

(3) Three examples I. Let us sort π = h3 2 1 9 8 7 6 5 4i using three different sets of operations:. Reversals h3 2 1 9 8 7 6 5 4i h1 2 3 9 8 7 6 5 4i h1 2 3 4 5 6 7 8 9i. Block-interchanges h3 2 1 9 8 7 6 5 4i h1 9 8 7 6 5 2 3 4i h1 2 3 4 8 7 6 5 9i h1 2 3 4 5 7 6 8 9i h1 2 3 4 5 6 7 8 9i. Block-transpositions h3 2 1 9 8 7 6 5 4i h3 4 2 1 9 8 7 6 5i h1 9 8 3 4 2 7 6 5i h1 2 7 6 9 8 3 4 5i h1 2 7 8 3 4 5 6 9i h1 2 3 4 5 6 7 8 9i. I. All these sequences are optimal (proofs omitted);. I. The distance of π is the length of an optimal sequence;. 3.

(4) Issues with the model. I. The overall approach is criticised for various reasons: I. I. I. I. I. permutations are too restricted ; ... but many other models exist operations are too restricted; ... but we can consider several of them at once complexity issues; ... but we have SAT and LP solvers if need be .... Another critical issue needs addressing (next slide);.

(5) Phylogenies I. One motivation for measuring similarities between genomes is to reconstruct ancestral genomes and phylogenies;. source: https://commons.wikimedia.org/wiki/File:Drynarioid_phylogeny.png. 5.

(6) Phylogenies I. One motivation for measuring similarities between genomes is to reconstruct ancestral genomes and phylogenies;. source: https://commons.wikimedia.org/wiki/File:Drynarioid_phylogeny.png (except the •’s). I I. But some mutations are lethal; Which means some ancestors cannot exist and therefore cannot have led to present-day species;.

(7) A more realistic model I. We must therefore forbid some intermediate configurations in our search for a sorting sequence;. I. Our problem becomes:. guided sorting (permutations) Input: a permutation π in Sn , a set S of (per)mutations, a set F of forbidden permutations; Goal: find a shortest sorting sequence of elements of S for π that avoids all elements of F ;. I. Here “shortest” means “as if F were empty”;. I. Note: we do not try to restrict operations themselves or the structure of genomes;.

(8) Example I. If π = h2 3 1 4i, S = {exchanges} and F = {h1 3 2 4i, h3 2 1 4i}:. h2 3 1 4i. h1 3 2 4i. h3 2 1 4i. h1 2 3 4i. the black paths are optimal but do not avoid F. 8.

(9) Example I. If π = h2 3 1 4i, S = {exchanges} and F = {h1 3 2 4i, h3 2 1 4i}: h4 3 1 2i. h1 3 2 4i. h2 3 1 4i. h4 3 2 1i. h3 2 1 4i. h4 2 3 1i. h1 2 3 4i. the black paths are optimal but do not avoid F the blue path avoids F but is not optimal. 9.

(10) Example I. If π = h2 3 1 4i, S = {exchanges} and F = {h1 3 2 4i, h3 2 1 4i}: h4 3 1 2i. h2 3 1 4i. h1 3 2 4i. h3 2 1 4i. h4 3 2 1i. h2 1 3 4i. h4 2 3 1i. h1 2 3 4i. the black paths are optimal but do not avoid F the blue path avoids F but is not optimal the green path avoids F and is optimal 10.

(11) In this talk. I. We focus on “exchanges” (i.e. algebraic transpositions); I I. I. strongly connected to cycles of permutations; hopefully some connections carry on to cycles in breakpoint graphs;. We give a polynomial-time algorithm for solving the problem on involutions;.

(12) Obvious and generic solution: Cayley graph. Definition The Cayley graph G of Sn with generating set S is defined by: 1. V (G ) = {π | π ∈ Sn }; 2. E (G ) = {{π, σ} | dS (π, σ) = 1}.. I. Here’s a straightforward solution to all variants of guided sorting: 1. build the part of the Cayley graph we are interested in; 2. find a shortest path between π and ι (e.g. Dijkstra);. 12.

(13) The Cayley graph approach in action I. Here’s what would happen using our previous example:. 2341. 1342. 1423. 2143. 2413. 2314. 1243. 3142. 2431. 1324. 3421. 3124. 1432. 3241. 2134. 1234. 4123. 4312. 3412. 3214. 4132. 4231. 4213. 4321.

(14) The Cayley graph approach in action I. Here’s what would happen using our previous example:. 2341. 1342. 1423. 2143. 2413. 2314. 1243. 3142. 2431. 1324. 3421. 3124. 1432. 4123. 3241. 2134. 4312. 3412. 3214. 4132. 4213. 4231. 1234. I. Obviously, the approach does not scale (O(n!) vertices, O(n!|S|) edges);. 4321.

(15) Involutions. I. An involution is a permutation π such that π = π −1 ;. I. Equivalently: all its cycles have length ≤ 2;. Example. 3. 2. 1. 5. 4. 9. 10. 8. 6. 7. 15.

(16) A simpler view of sorting by exchanges I. Involutions are “conceptually simpler” to sort: I 1-cycles are left alone; I a single exchange splits a 2-cycle, and we can always find one;. Example (split 2-cycles from left to right). 3. 2. 1. 5. 4. 9. 10. 8. 6. 7. 1. 2. 3. 5. 4. 9. 10. 8. 6. 7. 1. 2. 3. 4. 5. 9. 10. 8. 6. 7. 1. 2. 3. 4. 5. 6. 10. 8. 9. 7. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 16.

(17) From involutions to the hypercube graph I. π is an involution ⇒ we only need to worry about forbidden involutions whose 2-cycles appear in π;.

(18) From involutions to the hypercube graph I I. π is an involution ⇒ we only need to worry about forbidden involutions whose 2-cycles appear in π; We map (π, F ) onto ([k], F 0 ), where: I I. k is the number of 2-cycles of π; F 0 is a collection of forbidden subsets of [k];.

(19) From involutions to the hypercube graph I I. π is an involution ⇒ we only need to worry about forbidden involutions whose 2-cycles appear in π; We map (π, F ) onto ([k], F 0 ), where: I I. k is the number of 2-cycles of π; F 0 is a collection of forbidden subsets of [k];. Example. π= 2. 1. 4. 3. 6. 5. φ1 = 2. 1. 3. 4. 5. 6. φ2 = 1. 2. 4. 3. 5. 6. φ3 = 2. 1. 4. 3. 5. 6. φ4 = 2. 1. 3. 4. 6. 5. F =.

(20) From involutions to the hypercube graph I I. π is an involution ⇒ we only need to worry about forbidden involutions whose 2-cycles appear in π; We map (π, F ) onto ([k], F 0 ), where: I I. k is the number of 2-cycles of π; F 0 is a collection of forbidden subsets of [k];. Example 1 π= 2. 1. φ1 = 2. φ2 = 1. 1. φ3 = 2. 1. 2 1. 4. 1. 3. 2. 4. 1. 1. 3 3. 6. 5. 4. 5. 6. 3. 5. 6. 4. 3. 5. 3. 4. 6. 2. 2. F =. φ4 = 2. 3. 6. 5.

(21) From involutions to the hypercube graph I I. π is an involution ⇒ we only need to worry about forbidden involutions whose 2-cycles appear in π; We map (π, F ) onto ([k], F 0 ), where: I I. k is the number of 2-cycles of π; F 0 is a collection of forbidden subsets of [k];. Example 1 π= 2. 1. φ1 = 2. φ2 = 1. 1. 2 1. 4. 1. 3. 2. 4. 1. 1. 3. {1, 2, 3} = [3]. 3. 6. 5. 4. 5. 6. 3. 5. 6. 4. 3. 5. 6. 3. 4. 6. 2. 2. F =. φ3 = 2. φ4 = 2. 1. 3. 5. F0 {1, 2}. {1, 3}. {2, 3}. {1}. {2}. {3}. ∅.

(22) Properties of involutions and exchanges. I. Involutions behave nicely with respect to exchanges;. I. Only elements of F whose 2-cycles all appear in π need to be considered;. I. guided sorting under these hypotheses reduces to the following problem:. (s, t)-paths in hypercube network Input: the set [k] = {1, 2, . . . , k}, a collection F of subsets of [k]; Goal: find a sequence of element deletions for [k] that empties it while avoiding F.. 22.

(23) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k}. ∅. 23.

(24) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k}. S. 1. S ← {[k]}, T ← {∅};. T. ∅. 24.

(25) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k} BFS↓ S. 1. S ← {[k]}, T ← {∅}; 2. launch a double BFS on S and T ;. T BFS↑ ∅. 25.

(26) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k} BFS↓ S. 1. S ← {[k]}, T ← {∅}; 2. launch a double BFS on S and T ; 3. if a solution exists: return it;. T BFS↑ ∅. 26.

(27) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k} BFS↓ S. 1. S ← {[k]}, T ← {∅}; 2. launch a double BFS on S and T ; 3. if a solution exists: return it; 4. if no solution exists: return NO;. T BFS↑ ∅. 27.

(28) Overview of the main algorithm I. We follow the Cayley graph approach but avoid its explicit construction; I. I. otherwise: O(2k ) vertices and O(2k−1 k) edges;. The main algorithm goes as follows: {1, 2, . . . , k} BFS↓ S. 1. S ← {[k]}, T ← {∅}; 2. launch a double BFS on S and T ; 3. if a solution exists: return it; 4. if no solution exists: return NO;. T BFS↑. 5. otherwise: compress S and T and go back to 2;. ∅ I. The compression phase ensures the running time remains polynomial; 28.

(29) The double BFS phase. I. Classical breadth-first searches, skipping elements from F: 1. one upwards from the current bottom; 2. one downwards from the current top;. I. To keep the running time polynomial, searches stop when we have O(|F|dn) vertices (d is the difference in cardinality between S and T );.

(30) Obvious case where a solution exists I. If S ∩ T 6= ∅, then a solution exists; {1, 2, . . . , k}. S T. ∅. 30.

(31) Obvious case where a solution exists I. If S ∩ T 6= ∅, then a solution exists; {1, 2, . . . , k}. S T. ∅. 31.

(32) Obvious case where a solution exists I. If S ∩ T 6= ∅, then a solution exists; {1, 2, . . . , k}. S T. ∅. 32.

(33) Obvious cases where no solution exists I. If S or T is empty, then no solution exists;. I. If we’ve gone “deep (resp. high) enough” and S ∩ T is empty, then no solution exists: {1, 2, . . . , k}. S T. ∅. 33.

(34) The other cases I. We may have collected enough vertices to stop the BFS’s, but S and T don’t intersect yet: {1, 2, . . . , k}. S. T. ∅ I. In this case, we may either compute a solution, or launch the compression and keep going; 34.

(35) Interlude: Lehman and Ron’s theorem We need the following result1 .. Theorem (r ). Given n, m ∈ N, consider two families of sets R ⊆ Hn and (s) S ⊆ Hn where |R| = |S| = m and 0 ≤ r < s ≤ n. Assume there exists a bijection ϕ : S → R such that ϕ(S) ⊂ S for every S ∈ S. Then there exist m vertex-disjoint directed paths in Hn whose union contains all the subsets in S and R. In other words: S. S. ⇒. R 1. R. E. Lehman and D. Ron, ”On Disjoint Chains of Subsets ”, Journal of Combinatorial Theory, Series A, 94(2):399–404, 2001..

(36) Finding a solution with Theorem 1. 1. Build a bipartite graph B with: I I. vertex set S ∪ T ; edges connecting each element s of S with an element t of T if t ⊂ s;. 2. Compute a maximum matching M of B; 3. If |M| > |F|, there is at least one (S, T )-path that avoids F (thanks to Lehman and Ron’s theorem); 4. Otherwise, we keep going but reduce the size of T by removing “non essential” vertices;.

(37) The compression phase. I. Compute a minimum vertex cover X = XS ∪ XT of B; I. (XS = X ∩ S, XT = X ∩ T ). I. Since X is a vertex cover, no relevant path from S \ XS to T \ XT exists;. I. We then search for a solution using XS and T , and repeat the process until we find one or reach the threshold of |F|dn vertices; If no solution has been found, we return to the main algorithm S (i) with T 0 = i XT ;. I. I. (i). (XT is the XT computed at the i th iteration).

(38) Summary of results. I. We can solve guided sorting by exchanges on involutions in time: p I. I. I. O(min(p|F| d k, |F|) |F|2 d 4 k 2 ) (“decision version”); O(min( |F| d k, |F|) |F|2 d 4 k 2 + |F|5/2 k 3/2 d) (“search version”);. (k is the number of 2-cycles in π);. 38.

(39) Future work. I. Complexity of (variants of) guided sorting?. I. Other tractable cases?. I. What if we relax “optimal” to “minimal”?. I. Do the algorithms generalise?. I. Can we compute or benefit from an “implicit” encoding of F ? I I I. F = Avn (some patterns); F =< some generators > \ some small set; ....

(40)

Références

Documents relatifs

Using this culture system, it has been previously shown that the absence of estrogens or the presence of ER antagonists resulted in an impaired development of CD11c + CD11b int DCs

We study the complexity of reducing (proof-nets representing) λ -terms typed in the elementary or light linear logic and relate it to the cost of naive reduction.. In particular,

So, the second step to ensure that our work reaches the right audience is drawing attention to our work, creating a community, and engaging the public.. As written by

Since this difference in expected remaining run- ning time is much larger than the fitness difference, the optimal RLS and (1 + 1) EA variants use mutation rates that are larger

tinuous contact with Member States and Headquarters ensure that communicable disease information of international importance witlun a region is conveyed to health

It follows that the optimal decision strategy for the original scalar Witsenhausen problem must lead to an interim state that cannot be described by a continuous random variable

We rst give some general results such as aggregation, Mertens decomposition of the value process, Skorokhod conditions sat- ised by the associated non decreasing processes; then,

(b) The second algorithm computes at each step the shortest path accord- ing to the value of the uncovered edges and the value of the unknown edges set to 0, and uncovers an edge