Improving to factor 3 - Shortest Superstring

Part I Combinatorial Algorithms

7 Shortest Superstring

7.2 Improving to factor 3

Notice that any superstring of the strings a(ci), ⁱ= 1, ... , k, is also a super-string of all super-strings in S. Instead of simply concatenating these strings, let us make them overlap as much as possible (this may sound circular, but it is not!).

Let X be a set of strings. We will denote by I lXII the sum of the lengths of the strings in X. Let us define the compression achieved by a superstring s as the difference between the sum of the lengths of the input strings and lsi, i.e., IISII-Isl- Clearly, maximum compression is achieved by the shortest superstring. Several algorithms are known to achieve at least half the optimal compression. For instance, the greedy superstring algorithm, described in Section 2.3, does so; however, its proof is based on a complicated case analysis.

For a less efficient algorithm, see Section 7.2.1. Either of these algorithms can be used in Step 3 of Algorithm 7.5.

7.2 Improving to factor 3 65

Algorithm 7.5 (Shortest superstring- factor 3) 1. Construct the prefix graph corresponding to strings in S.

2. Find a minimum cycle cover of the prefix graph, C

= {

c1. ... , ck}.

3. Run the greedy superstring algorithm on { a(c1), ... , a(ck)} and output the resulting string, say r.

Let OPT u denote the length of the shortest superstring of the strings in Su

= {

a(cl), ... , a(ck)}, and let ^ribe the representative string of ci.

Lemma 7.6 lrl ::; OPTu + wt(C).

Proof: Assume w.l.o.g. that a(c1), ... , a(ck) appear in this order in a short-est superstring of Su. The maximum compression that can be achieved on Su is given by

k-1

L

loverlap(a(ci),a(CiH))I.

i=l

Since each string a(Ci) has ^rias a prefix as well as suffix, by Lemma 7.3,

Hence, the maximum compression achievable on Su is at most 2 · wt(C), i.e., IISull- OPTu ::; 2 · wt(C).

The compression achieved by the greedy superstring algorithm on Su is at least half the maximum compression. Therefore,

Therefore,

2(lrl- OPTu)::; IIBull- OPTu::; 2 · wt(C).

The lemma follows. ⁰

Finally, we relate OPTu to OPT.

Lemma 7.7 OPTu::; OPT+wt(C).

Proof: Let OPT ^rdenote the length of the shortest superstring of the strings in Sr = {rb ... , rk}· The key observation is that each a(ci) begins and ends with ri. Therefore, the maximum compression achievable on Su is at least as large as that achievable on Sr, i.e.,

66 7 Shortest Superstring

Clearly, IISull = IISrll

+

wt(C). This gives OPTu ~ OPTr

+

^wt(C).

The lemma follows by noticing that OPTr ~OPT. 0

Combining the previous two lemmas we get:

Theorem 7.8 Algorithm 7.5 achieves an approximation factor of 3 for the shortest superstring problem.

7.2.1 Achieving half the optimal compression

We give a superstring algorithm that achieves at least half the optimal com-pression. Suppose that the strings to be compressed, s1, · · ·, sn, are numbered in the order in which they appear in a shortest superstring. Then, the optimal compression is given by

n-1

L

ioverlap(si, si+l)l.

i=l

This is the weight of the traveling salesman path 1 --+ 2 --+ . . . --+ n in the overlap gmph, H, of the strings s17 · · ·, ^Sn.His a directed graph that has a vertex vi corresponding to each string si, and contains an edge (vi--+ vi) of weight loverlap(si, si)l for each i

=I

j, 1 ~ i,j ~ n (H has no selfloops).

The optimal compression is upper bounded by the cost of a maximum traveling salesman tour in H, which in turn is upper bounded by the cost of a maximum cycle cover. The latter can be computed in polynomial time using matching, similar to the way we computed a minimum weight cycle cover. Since H has no self loops, each cycle has at least two edges. Remove the lightest edge from each cycle of the maximum cycle cover to obtain a set of disjoint paths. The sum of weights of edges on these paths is at least half the optimal compression. Overlap strings s1, · · ·, sn according to the edges of these paths and concatenate the resulting strings. This gives a superstring achieving at least half the optimal compression.

7.3 Exercises

7.1 Show that Lemma 7.3 cannot be strengthened to loverlap(r, r')l <max {wt(c), wt(c')}.

7.4 Notes 67 7.2 (Jiang, Li, and Du [155]) Obtain constant factor approximation algo-rithms for the variants of the shortest superstring problem given in Exercise 2.16.

7.4 Notes

The algorithms given in this chapter are due to Blum, Jiang, Li, 'Iromp, and Yannakakis [28].

8 Knapsack

In Chapter 1 we mentioned that some NP-hard optimization problems allow approximability to any required degree. In this chapter, we will formalize this notion and will show that the knapsack problem admits such an approxima-bility.

Let II be an NP-hard optimization problem with objective function frr.

We will say that algorithm A is an approximation scheme for II if on input (I, c), where I is an instance of II and c

>

0 is an error parameter, it outputs a solution s such that:

• frr(I, s)

:S:

(1 +c)· OPT if II is a minimization problem.

• frr(I, s) ~ (1-c)· OPT if II is a maximization problem.

A will be said to be a polynomial time approximation scheme, abbreviated PTAS, if for each fixed c

>

0, its running time is bounded by a polynomial in the size of instance I.

The definition given above allows the running time of A to depend arbi-trarily on c. This is rectified in the following more stringent notion of approx-imability. If the previous definition is modified to require that the running time of A be bounded by a polynomial in the size of instance I and 1/c, then A will be said to be a fully polynomial time approximation scheme, abbreviated FPTAS.

In a very technical sense, an FPTAS is the best one can hope for an NP-hard optimization problem, assuming P =f. NP; see Section 8.3.1 for a short discussion on this issue. The knapsack problem admits an FPTAS.

Problem 8.1 (Knapsack) Given a set S

=

{a1, ... , an} of objects, with specified sizes and profits, size(ai) E z+ and profit(ai) E z+, and a "knapsack capacity" B ^Ez+, find a subset of objects whose total size is bounded by B and total profit is maximized.

An obvious algorithm for this problem is to sort the objects by decreasing ratio of profit to size, and then greedily pick objects in this order. It is easy to see that as such this algorithm can be made to perform arbitrarily badly (Exercise 8.1).

V. V. Vazirani, Approximation Algorithms

8.2 An FPTAS for knapsack 69

Dans le document Springer-Verlag Berlin Heidelberg GmbH (Page 79-84)