HAL Id: hal-00726287
https://hal.archives-ouvertes.fr/hal-00726287
Preprint submitted on 29 Aug 2012
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Choose Outsiders First: a mean 2-approximation random algorithm for covering problems.
Etienne Birmele
To cite this version:
Etienne Birmele. Choose Outsiders First: a mean 2-approximation random algorithm for covering
problems.. 2012. �hal-00726287�
Choose Outsiders First: a mean 2-approximation random algorithm for covering problems. ∗
Etienne Birmel´ e
Laboratoire Statistique et G´ enome, CNRS, Universit´ e d’Evry, France Laboratoire de Biom´ etrie et Biologie Evolutive, CNRS, Universit´ e Lyon 1, France
INRIA Rhˆ one-Alpes, Montbonnot Saint-Martin, France etienne.birmele@genopole.cnrs.fr
August 29, 2012
Abstract
A high number of discrete optimization problems, including Vertex Cover, Set Cover or Feedback Vertex Set, can be unified into the class of covering problems. Several of them were shown to be inapproximable by deterministic algorithms. This article proposes a new random approach, called Choose Outsiders First , which consists in selecting randomly ele- ments which are excluded from the cover. We show that this approach leads to random outputs which mean size is at most twice the optimal solution.
In his landmark paper in complexity theory [11], R. Karp provides a list of 21 NP-complete problems from which most of the NP-completness results are deduced. Among them are the extensively studied Vertex Cover, Set Cover, Feedback Vertex (or Arc) Set or Hitting Set problems, which belong to the class of covering problems. Covering problems ask how large a certain combinatorial structure has to be to cover another one, and have a wide range of applications in all areas involving combinatorial optimization problems, including VLSI systems [10], routing [6] or scheduling [7]. In the last decades, they also became central in computational biology [12] as parsimony is often considered as the choice criteria between the different evolutional scenarios explaining the observations [9].
Most of the covering problems are NP-complete, so that they need to be solved by using heuristics. The proposed algorithms can mainly be classified into two families. The firt one consists in the primal-dual approaches which are based on the formulation of covering problems as integer linear programming problems [13]. The second type of approximation algorithms are based on local ratio techniques which consist in solving a problem locally and extending the
∗
This work was supported by the ERC Advanced Grant SISYPHE.
solution [2, 4]. A common measure of the quality of those heuristics is their approximation factor. The litterature about approximation results for cover- ing problems is huge, and an overview can be found in [1]. The main covering problems listed above were shown to be APX-hard. The Set Cover is even not approximable better then within a logarithmic factor, whereas the constant approximability of Hitting Set and Directed Feedback Vertex (or Arc) Set prob- lems are still open questions. The best known solutions for Vertex Cover and Undirected Feedback Vertex Set have an approximation ratio of 2.
One way to reach better approximation results is the use of random algo- rithms and the study of the mean approximation ratio of the outputs. A random local ratio approach proposed in [3] yields for instance a mean approximation of 2 for the Vertex Cover problem and of the maximum size of the sets for the Set Cover and Hitting Set problems.
In this paper, we propose a new random algorithm for covering problems.
Its main difference with already studied heuristics is that the aim is not to select good candidates for the cover but to exclude randomly elements from the cover.
This corresponds to assign a random order to the elements and to consider them in increasing order. An element is then added to the cover if and only if has to be added in order not to miss a structure which has to be covered. This idea was introduced in the case of the unweighted Vertex Cover in [8] and was proved to yield a mean 2-approximation for this particular covering problem [5]. We show that this approach, that we call Choose Outsiders First, is in fact much more general in the sense that it can be applied and yields a mean approximation ratio of 2 for any covering problem. This is to our knowledge the first approximation result for which the ratio is independent from the input for problems like the Set Cover or the Directed Feedback Vertex Set.
1 The algorithm
Following Bar-Yehuda’s [3] formalism, an unweighted covering problem is a pair (U, f : 2 U → {0, 1}, ω : U → R
+) where U is a finite set, f is monotone, i.e., A ⊆ B ⇒ f (A) ≤ f (B ), and f (U ) = 1. For a set C ⊂ U , ω(C) = P
x∈C ω(x) is called the weight of C. A set C ⊆ U is a cover if f (C) = 1. The problem is then to find a cover of minimum weight, that is a set C
∗⊂ U such that
ω(C
∗) = min(ω(C) : C ⊂ U and f (C) = 1)
To do so, we consider the algorithm Choose Outsiders First which relies on
the idea that if the optimal cover is small, a randomly chosen vertex has a high
probability not to be contained in the optimal solution. Therefore, two sets
OU T and IN are considered and at each step, a vertex is randomly chosen and
is put into OU T , that is considered to be not in the cover. However, from time
to time, a structure which has to be covered has seen all its elements but one
put into OU T . This last element has then to be put into the cover and is added
in the IN set. Once all the elements of U have been classified into OU T or IN,
the set IN is a cover and is output by the algorithm.
The pseudo-code of Choose Outsiders First is given in Algorithm 1. At each step of the algorithm, we say that a vertex is available if it hasn’t be classified yet and denote by A the set available vertices, that is A = U \{OU T ∪IN }. The pseudo-code of Algorithm 1 is written by using A, IN and OU T at each step for better readability but in practice, the algorithm can be written by updating only A and IN or OU T and IN , the union of the three sets beeing always U . Note that if the conditions of Line 2 are checked in polynomial time, which is the case if the problem is in NP, the total running time is polynomial.
The probability distribution used to choose the excluded vertex at each step is the one proportional to the weights of the available vertices. Elements of small weight are therefore excluded with lower probability and thus favored to be in the output. Note that in the case of an unweighted covering problem, the algorithms picks uniformly the excluded vertex.
Algorithm 1: Choose Outsiders First IN = ∅, OU T = ∅, A = U ;
1
while A 6= ∅ do
2
Pick randomly u ∈ A with probability ω(A) ω(u) ;
3
OU T = OU T ∪ {u} ;
4
for v ∈ U \ {IN ∪ OU T } such that f (U \ {OU T ∪ {v}}) = 0 do
5
IN = IN ∪ {v}
6
end
7
A = U \ {OU T ∪ IN} ;
8
end
9
The size of the output cover is a random variable, which we call CoverSize.
To assess the efficiency of the algorithm, we have to rely the values of CoverSize to the size of an optimal solution. Let us first show that this value is equal to min(CoverSize).
Theorem 1. Any optimal cover C
∗has a non-null probability to be output by Choose Outsiders First. Hence, the optimal size of a cover is min(CoverSize).
Proof. Let C
∗be an optimal cover. Consider a run of the algorithm such that, if possible, the random picked vertex is always chosen in U \ C
∗. Let us show by induction that at each step, OU T ∩ C
∗= ∅ and IN ⊆ C
∗. Note that it is trivially true at the beginning of the algorithm.
Suppose now it is true at some point just before a random vertex is picked
and suppose that no vertex in U \ C
∗is available. Then A ⊂ C
∗, IN ⊂ C
∗and
OU T ∩ C
∗= ∅, that is U \ OU T = C
∗. But if there is a vertex v in A, it has
not been put into IN in the previous round, which means that the condition
at Line 5 was not satisfied. Hence, U \ {OU T ∪ {v}} = C
∗\ {v} is a cover,
which contradicts the minimality of C
∗. Consequently, a vertex of U \ C
∗has
to be available and it is such a vertex which is chosen. Thus the two desired set
relations are still valid after Line 4.
a
c b
e f
h
i d
g
l j
k
e a h
g i
j l