Frequency-driven tabu search for the maximum s -plex problem

(1)

Frequency-driven tabu search for the maximum s-plex problem

Yi Zhou

^a

and Jin-Kao Hao

^a,b,

∗

aLERIA, University of Angers, 2 bd Lavoisier, 49045 Angers, France

bInstitut Universitaire de France, Paris, France

Computers and Operations Research, http://dx.doi.org/10.1016/j.cor.2017.05.005

Abstract

The maximum s-plex problem is an important model for social network analysis and other studies. In this study, we present an effective frequency-driven multi- neighborhood tabu search algorithm (FD-TS) to solve the problem on very large networks. The proposed FD-TS algorithm relies on two transformation operators (Add and Swap) to locate high-quality solutions, and a frequency-driven perturbation operator (Press) to escape and search beyond the identified local optimum traps. We report computational results for 47 massive real-life (sparse) graphs from the SNAP Collection and the 10th DIMACS Challenge, as well as 52 (dense) graphs from the 2nd DIMACS Challenge (results for 48 more graphs are also provided in the Appendix). We demonstrate the effectiveness of our approach by presenting comparisons with the current best-performing algorithms.

Keywords: Clique relaxation; Heuristic; Massive network;s-plex.

1 Introduction

Given a simple undirected graph G = (V, E) with a set of vertices V and a set of edges E, let N(v) denote the set of vertices adjacent to v in G. Then, an s-plex for a given integer s ≥ 1 (s ∈ Z⁺) is a subset of vertices C ⊆ V that satisfies the following condition: ∀v ∈ C, |N(v)∩C| ≥ |C| −s. Thus, each vertex of an s-plexC must be adjacent to at least |C| −s vertices in the subgraph G[C] = (C, E∩(C×C)) induced by C.

∗ Corresponding author.

Email address: hao@info.univ-angers.fr(Jin-Kao Hao).

(2)

The maximum s-plex problem involves finding, for a fixed value of s, an s- plex of maximum cardinality among all possible s-plexes of a given graph. As indicated in [3], the maximum s-plex problem can be formulated as a binary linear program as follows:

max ω_s(G) = ^P_i∈V x_i

s.t.^P_j∈V\(N(i)∪{i})x_j ≤ (s−1)x_i+ ¯d_i(1−x_i),∀i∈V , x_i ∈ {0,1},∀i∈V

(1)

where x_i is the binary variable associated with vertex i, such that x_i = 1 if vertexiis in ans-splex,x_i = 0 otherwise. Also, ¯d_i =|V \N(i)| −1 denotes the degree of vertex i in the complement graph ¯G = (V,E). Note that¯ i /∈ N(i) by definition.

The s-plex concept was first introduced for graph-theoretic social network studies [27]. The decision version of the maximum s-plex problem with any fixed positive integer s is known to be NP-complete [3]. Whens equals 1, the maximum s-plex problem reduces to the popular maximum clique problem, the decision version of which was among Karp’s 21 NP-complete problems [14].

The maximums-plex problem is often referred to as aclique relaxation model [23, 24]. Other clique relaxation models include s-defective clique [37], quasi- clique [8,21,22], andk-club [7], which are defined by relaxing the edge number, the edge density, and the pairwise distance of vertices in an induced subgraph, respectively. In addition to studies of social networks, the maximum s-plex problem has also been investigated in other contexts [5, 6, 10]. For instance, an interesting application of the maximum s-plex model was described in [6], where the maximum s-plex algorithm of [29] was used to find profitable diversified portfolios on the stock market.

Similar to a clique, an s-plex C has the heredity property, which means that every subset of vertices C⁰ ⊂C remains an s-plex, i.e., the subgraph induced by C⁰ always has the property of an s-plex [29]. The most successful combinatorial algorithms for s-plex essentially rely on the heredity property and a polynomial feasibility verification procedure. For example, a powerful exact algorithmic framework was introduced in [29] for detecting optimal hereditary structures (s-plex and s-defective clique), which is based on the maximum clique algorithm proposed in [20]. This algorithm performed well on the maximum s-plex problem for graphs in the 2nd DIMACS Challenge and popular large-scale social networks. Other exact algorithms for the s-plex problem include the following. A branch-and-cut algorithm was introduced in [3] based on a polyhedral study of the s-plex problem. Two branch-and-bound algorithms were presented in [16], which are based on popular exact algorithms for the maximum clique problem [9,20]. In [19], exact combinatorial algorithms

(3)

were investigated using methods from parameterized algorithmics. Finally, a parallel algorithm for listing all the maximals-plexes was introduced in [36].

However, given the computational complexity of the maximum s-plex problem, any exact algorithm is expected to require an exponential computational time to determine the optimal solution in the general case. Thus, it is useful to investigate heuristic approaches, which aim to provide satisfactory solutions within an acceptable time frame, but without a provable optimal guarantee for the solutions obtained. However, our literature review found only two heuristics for the maximum s-plex problem [12, 17], which are based on the general GRASP method [26]. This situation contrasts sharply with the huge body of heuristics for the conventional maximum clique problem [34] and other clique relaxation problems [24]. We note that exact and heuristic approaches may complement each other, and together they can enlarge the classes of problem instances that can be solved effectively. Moreover, they can even be combined within a hybrid approach, as exemplified in [17] where the GRASP heuristic was used to enhance the exact algorithm proposed in [3] to solve very large social network instances.

In this study, we aim to partially fill the gap in terms of heuristic methods for solving the maximums-plex problem by introducing an effective heuristic approach. The main contributions of this study can be summarized as follows.

• From an algorithmic perspective, this is the first study to employ the tabu search metaheuristic [11] to solve the maximum s-plex problem (Section 2). Thus, the proposed frequency-driven tabu search algorithm (FD-TS) integrates several original components. First, FD-TS jointly employs three dedicated move operators calledAdd,Swap, andP ress, two of which (Swap and P ress) are applied for the first time to the maximum s-plex problem.

Second, we introduce a frequency-based mechanism for perturbation and constructing initial solutions, which is proven to be more effective than a random mechanism. We also apply a peeling procedure to dynamically reduce the graph with the best identified lower bound. Finally, specific design decisions are made in order to handle very large networks with thousands and even millions of vertices.

• From a computational perspective, our experimental results indicate that the proposed algorithm performs very well with both sparse and dense graphs (Section 4). For 47 very large networks from the SNAP collection and the 10th DIMACS Challenge benchmark set, our algorithm success- fully obtained or improved the best-known results from previous studies for s = 2,3,4,5. Our algorithm even proved the optimality of many instances for the first time using the peeling procedure. For 52 dense graphs from the collection used in the 2nd DIMACS Challenge, our algorithm also obtained or improved the best-known results for s = 2,3,4,5. To comprehensively assess the performance of our algorithm, we compared FD-TS with several

(4)

cutting edge algorithms, including the commercial CPLEX solver (version 12.6.1). Results of 48 additional graphs for the s-plex problem are also presented for the first time in the Appendix.

The remainder of this paper is organized as follows. Section 2 presents the FD-TS algorithm. Section 3 discusses the implementation and complexity issues related to FD-TS. Section 4 presents the computational results obtained on benchmark instances and provides comparisons with state-of-the-art algorithms. In the final section, we give our conclusions and discuss future research.

2 FD-TS algorithm for the maximum s-plex problem

2.1 General procedure

The general scheme of the proposed FD-TS algorithm is shown in Algo- rithm 1. FD-TS starts from an initial feasible solution (s-plex) built using the Init Solution() procedure (Section 2.4), before entering the main multi- neighborhood local search procedure, Freq Tabu Search(), to improve the initial solution (Section 2.5). A vectorf req, which records the number of times each vertex is moved in the last round of the Freq Tabu Search() procedure, is initialized as a null vector (Algorithm 1, line 3). This vector is used by theInit Solution() procedure as well as the perturbation method explained in Section 2.5.3. If the solution returned by tabu search is better than the current best solutionC^∗, C^∗ is updated (Algorithm 1, lines 7–8). The new lower bound |C^∗| is then given to the Peel() procedure (Section 2.6) to reduce the current graph (Algorithm 1, line 9). IfPeel() returns a reduced subgraph with fewer vertices than|C^∗|, thenC^∗ must be an optimal solution and the overall algorithm stops. Otherwise, the algorithm enters a new round of search to build a new starting solution with Init Solution(), before improving the new starting solution withFreq Tabu Search() and reducing the graph withPeel() if this is possible. The algorithm continues until a given stopping condition (e.g., a cut-off time limit) is met.

2.2 Preliminary definitions

Given G = (V, E), s ∈ Z⁺, let C ⊆ V be a subset of vertices and N(v) the set of vertices adjacent tov. The following definitions are provided, which are useful for the description of our algorithm.

We say that C is a (feasible) solution or an s-plex if ∀v ∈ C,|N(v)∩C| ≥

(5)

Algorithm 1: Main framework of Frequency Driven Tabu Search

Input: Problem instance (G, s), predefined sample size q, maximum allowed iterations in tabu searchL.

Output: The largest s-plex ever found begin

1

C^∗ ← ∅; /* the best solution found so far */

2

f req(v)←0 for all v ∈V; /* frequency count of vertex moves */

3

while the stopping condition is not metdo

4

{C, f req} ←Init Solution(G, s, f req, q); /* §2.4 */

5

{C, f req} ←F req T abu Search(G, s, C, f req, L); /* §2.5 */

6

if |C|>|C^∗|then

7

C^∗ ←C;

8

G←P eel(G, s,|C^∗|); /* §2.6 */

9

if |V| ≤ |C^∗| then

10

returnC^∗ ; /* return the best solution found */

11

end

12

return C^∗

13

|C| − s; otherwise, C is an infeasible solution (i.e., ∃v ∈ C,|N(v) ∩C| <

|C| − s). For a vertex v ∈ C, we say that v is saturated (first introduced in [29]) if |N(v) ∩ C| = |C| − s. If |N(v)∩ C| < |C| − s, v is deficient.

Obviously, whenever a deficient vertex exists inC,C is an infeasible solution.

The saturated set S of set C is defined as the set of all saturated vertices in C, i.e., S ={v ∈C :|N(v)∩C|=|C| −s}. We can see that if C is a 1-plex (i.e., a clique), then all the vertices in C are saturated. The search space Ω of G includes all s-plexes, Ω = {C ⊆V :∀v ∈ C,|N(v)∩C| ≥ |C| −s}. For brevity, we also use N(C) to denote the set of vertices in V \C with at least one adjacent vertex in C, N(C) = ^S_v∈CN(v)\C.

In Figures 1 and 2, we provide examples ofSandN(C), as well as their uses in the definitions of the Add and Swap operators introduced in the next section.

Finally, the quality of any candidate solution (s-plex) C ∈Ω is evaluated by its cardinality|C|. Thus, given two candidate solutionsC⁰ and C,C⁰ is better than C if |C⁰|>|C|.

2.3 Move operators

Our FD-TS algorithm explores the search space Ω by jointly applying three move (or transformation) operators,Add, Swap, and P ress, to generate new solutions in Ω from the current solution (ors-plex). If we letC be the incumbent solution, then each move operator transfers one vertex v ∈N(C) inside

(6)

0 5 10 15 20 2.5

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5

0

1 2

3

Fig. 1. Suppose s = 2, C = {0,1,2} is the incumbent solution, S = {1,2} is the saturated set ofC, thenM1 ={3} and C∪ {3} is an extendeds-plex.

0 2 4 6 8 10 12

2 3 4 5 6 7 8 9 10

0

1 2

3

4

5

Fig. 2. Suppose s= 2,C ={0,1,2,3} is the incumbent solution, S ={1,2} is the saturated set ofC,N(C) ={4,5}. Then for theSwapoperator, we getA={4}and exchangeable pair<4,1>;B={5}and exchangeable pairs<5,0>and<5,3>.

C and eliminates zero, one, or more vertices from C to keep C feasible. If we let OP be a move operator, then we use C⁰ ← C ⊕OP(v, X) to denote the new (neighboring) solution obtained by applying OP toC (X represents the subset of vertices eliminated from C, which can possibly be empty). The details of these operators are described as follows. For simplicity, when the subset of eliminated verticesX is empty or a singleton, the set notation of X is ignored.

(1) Add(v): This operator extends the incumbent solution C by including a new vertex from N(C). Clearly, each application of this operator will increase the cardinality of the solution by one, which always leads to a better solution. However, we must take special care to ensure that the extended solution remains an s-plex. Thus, we identify the following

(7)

vertex subset M₁ ⊆ N(C), which has the required feasibility property (first introduced in [29]):

M₁ ={v ∈N(C) :|N(v)∩C| ≥ |C| −s+ 1, S\N(v) =∅} (2) By definition (2), if a vertexv inN(C) is adjacent to at least|C|−s+1 vertices inC and adjacent to all the saturated vertices of C, then adding v toC yields a new solutionC⁰, the size of which is increased by one (see Fig. 1 for an example).

The set of neighboring solutions ofC induced by Add(v) is then given by:

N_Add ={C⁰ :C⁰ ←C⊕Add(v), v ∈M₁} (3) TheAddoperator is used by the search algorithm to improve the quality of the incumbent solution.

(2) Swap(v, u): This operator exchanges a vertex v ∈ N(C) with another vertex u in the incumbent solutionC (u∈C), while keeping the quality of the solution unchanged. Similar toAdd, to ensure the feasibility of the transformed solutions, we need to identify the set of suitable candidate pairs < v, u > ∈ N(C)×C. Considering the definition of s-plex, a pair of vertices < v, u > is eligible for exchange only if it satisfies one of the following two conditions.

• First,v is adjacent to at least|C| −s vertices inC and uis the unique saturated vertex that is not adjacent to v (i.e., S\N(v) = {u}).

• Second,v is adjacent to exactly|C| −s vertices in C and these |C| −s vertices must include all the saturated vertices (i.e.,S\N(v) = ∅), and u is an arbitrary vertex from C\N(v).

We use setsA and B to denote the candidate sets of v that satisfy the two conditions above, respectively, (see Fig. 2 for an example of these two types of vertices):

A={v ∈N(C) :|N(v)∩C| ≥ |C| −s,|S\N(v)|= 1}

B ={v ∈N(C) :|N(v)∩C|=|C| −s, S\N(v) =∅} (4) The set of neighboring solutions induced by Swap(v, u) is then given by:

NSwap={C⁰ :C⁰ ←C⊕Swap(v, u),(v ∈A, u∈S\N(v))∨(v ∈B, u∈C\N(v))}

(5) If we let M₂ = A∪B, a practical method for generating a suitable pair < v, u > is to first build the set M2, then pick a vertex v from M2, and finally determine an appropriate vertexufromS\N(v) orC\N(v).

The Swap operator is used by the search algorithm to visit neighboring

(8)

solutions of equal quality, so a transition withSwap is also called a side- walk move.

(3) P ress(v, X): If we let M₃ =N(C)\(M₁ ∪M₂), this operator adds one vertex v ∈ M3 to C and then eliminates two or more vertices from C \ N(v) so thes-plex structure of the new solution is maintained. Obviously, given a vertex v /∈ M₁, the set C⁰ = C ∪ {v} is an infeasible solution because v or the vertices in S\N(v) become deficient (see Section 2.2) inG[C⁰]. Therefore, to restore the feasibility of the transformed solution, this operator iteratively eliminates vertices from (C⁰ \ {v})\N(v) (i.e., C\N(v)) until the solution becomes ans-plex. Preference is given to the deficient vertices inC\N(v) and if no deficient vertex exists inC\N(v), the vertices to be eliminated are selected randomly from C. All of the eliminated vertices are collected in X. The set of neighboring solutions induced by the P ress(v, X) operator is given by:

N_{P ress} ={C⁰ :C⁰ ←C⊕P ress(v, X), v ∈V \C, X ⊆C\N(v)} (6) By definition, P ress eliminates at least two vertices from the solution (i.e., |X| >1). Thus, the application of P ress always degrades the quality of the current solution. Hence, this operator is only used by the perturbation procedure when the Add andSwap operators are no longer applicable, or when the search stagnates in local optima.

Next, we provide some additional comments about these operators and discuss some implementation issues.

• AddandSwaphave been used in several algorithms for the maximum clique problem and the equivalent maximum independent set problem [4,13,25,35].

However, given the generality of the maximum s-plex problem, the definitions of these operators are different and more complex in the present study;

in particular, the saturated setS must be involved. TheP ressoperator can be treated as a dedicated application to the s-plex problem of the P U SH operator introduced in [38] for the maximum weight clique problem.

• It should be noted that traditional maximum clique benchmark graphs, such as those from the 2nd DIMACS Challenge, include many dense graphs.

Thus, most maximum clique algorithms typically operate on the complement graph ¯Gto search for the maximum independent sets in terms of run- time efficiency [13, 25, 33]. The FD-TS algorithm proposed in the present study is designed to handle very large real-world networks, which are typically sparse graphs. Consequently, FD-TS operates directly on the original graph based on its adjacency-list representation. In addition, we restrict the candidate vertices considered by the three move operators to N(C) instead of V \C becauseN(C) is much smaller thanV \C for very large networks.

Indeed, given the size of the networks considered (up to millions of ver-

(9)

tices), even a simple operation such as scanning all the vertices of V \C becomes too expensive, and it can slow down the algorithm considerably.

Thus, special care is taken to avoid ineffective or unpromising examinations.

• When Swapand P ress are applied, each dropped vertex is forbidden from rejoining the solution during a number of subsequent iterations to avoid revisiting previously examined solutions. This is achieved by using a tabu list (see Section 2.5.4).

2.4 Constructing the initial solutions

Each round of the FD-TS algorithm requires a starting solution (see Algorithm 1, line 5). In general, the starting solutions can be generated by any method that ensures the s-plex property. In our method, we employ the following construction procedure, which applies the Addoperator while considering the frequency information for vertex moves.

From a random sample ofqvertices (q∈[50,150]), we use the vertex with the minimum frequency (ties are broken randomly) to create a singleton set C.

From the singleton s-plex C, the procedure repeats the following three steps to extend the current solution: 1) generate the M₁ set from N(C), 2) select one vertex with the minimum frequency from M₁ (ties are broken randomly), and 3) add the selected vertex toC. We repeat this process untilM₁ becomes empty. The final s-plex C is returned as the initial solution.

The intuitive assumption that less frequently moved vertices are preferred is intended to make the initial solution as diverse as possible. Moreover, using a sample of q vertices instead of the whole set V for seeding the solution helps to reduce the computational overheads in the initialization procedure. This is particularly true for massive graphs because scanning all the vertices of the graph can be time consuming in this case. In Section 4.2, we discuss the calibration of the parameter q.

2.5 FD-TS

2.5.1 General procedure

The key search procedure employed in the FD-TS algorithm (see Algorithm 2) combines a double-neighborhood search procedure (we refer to this procedure as TS²; Algorithm 2, lines 11–24) to facilitate intensification (to obtain local optima) and a frequency-based perturbation procedure (we refer to this procedure as PERTURB; Algorithm 2, lines 25–30) for diversification (to escape from local optima).

(10)

Algorithm 2: Frequency driven tabu search

Input: Problem instance (G, s), current solutionC, max. allowed iterationsL

Output: The largests-plex foundCbest

begin 1

Cbest←C;

2

tabu list← ∅;

3

l←0 ; /* Counter of cycles of TS² + PERTURB runs */

4

f ix← ∅; /* The vertices that are forbidden to drop */

5

f req(v)←0 for allv∈V ; /* Reset frequency records */

6

λ←0 ; /* Counter of consecutive non-improving iterations */

7

// Run TS² + PERTURB a maximum of L iterations

whilel≤L do

8

// Start the two neighborhood tabu search procedure - TS²

Updating the saturated subsetS; /* Sect. 2.2 */

9

Decompose setN(C) intoM₁⁰={v∈M1:v /∈tabu list∨ |C|+ 1>|C_best|},

10

M₂⁰ ={v∈M2:v /∈tabu list},M₃⁰={v∈M3:v /∈tabu list}; /* Sect. 2.3 */

ifM₁⁰ 6=∅then 11

v←a random vertex fromM₁⁰;

12

C←C⊕Add(v);

13

f req(v)←f req(v) + 1;

14

else ifM₂⁰6=∅then 15

(v, u)←two random exchangeable vertices fromM₂⁰×N(C); /* See Sect. 2.5.2 for

16

selection rule */

C←C⊕Swap(v, u);

17

Addutotabu listwith tabu tenureTu; /* Sect. 2.5.4 */

18

f req(v)←f req(v) + 1,f req(u)←f req(u) + 1;

19

if|C|>|C_best|then 20

Cbest←C;

21

λ←0;

22

else 23

λ←λ+ 1;

24

// Run the PERTURB procedure

if(No feasibleAddandSwapoperation)∨(λ > s∗ |C_best|)∧M₃⁰ 6=∅) then

25

v←a vertex with maximumf req(v) fromM₃⁰, break ties randomly;

26

C←C⊕P ress(v, X) ; /* X collects the removed vertices from C, Sect. 2.3 */

27

f ix← {v}, λ←0;

28

f req(v)←0,f req(u)←f req(u) + 1 for alluinX; 29

Add eachu∈X totabu listwith tabu tenureTu; /* Sect. 2.5.4 */

30

l←l+ 1 31

end 32

ReturnC_best,f req;

33

Based on a given initial solution, TS² usesAdd and Swapto improve the current solution until search stagnation occurs. PERTURB then appliesP ressto modify (perturb) the current local optimum and passes the modified solution to TS² for further improvement. FD-TS iterates this TS²+PERTURB process a maximum ofLtimes and then starts the next round of its search procedure.

In addition to the three move operators (Add,Swap, andP ress), FD-TS employs a tabu mechanism (see Section 2.5.4) [11] and a frequency technique to ensure the effective exploration of the search space. A tabu list (tabu list) is used to mark the vertices that are forbidden from joining the current solution during a specific number of iterations. Information related to the move frequency of each vertex v is collected, where f req(v) records the number of times that v is operated upon by a move operator in the recent history. Ini- tially, tabu list is empty and the frequency of each vertex is set to 0. The

(11)

f ixset (a singleton set used by the PERTURB procedure) records an added vertex, which is forbidden from moving out of the current solution in TS² and the counter l counts the completed iterations, the upper limit of which is given by the parameter L. Moreover, in order to avoid being trapped by local optima, a counter λ records the consecutive iterations that have passed since the last improvement of the current solution. Each time that the counter λ reaches a threshold, the perturbation procedure is triggered to modify the current solution using theP ress operator.

At the beginning of each iteration, set C is checked to identify the saturated subset S. Next, N(C) is decomposed into three disjoint subsets: M₁⁰, M₂⁰, and M₃⁰. These sets correspond to M₁, M₂, and M₃ (defined in Section 2.3), respectively, but they exclude the vertices in the tabu list. However, a vertex v ∈ M1 is always retained in M₁⁰ if adding v to C leads to a new solution that is better than the best solution found previously, i.e., |C|+ 1 > |C_best|, regardless of the tabu status of the vertex.

2.5.2 Solution improvement with Add and Swap

The current solution is transformed successively by applying Add and Swap.

Preference is given to theAddoperator. Thus, wheneverM₁⁰ is not empty,Add is applied to improve the current solution by adding one vertex of M₁⁰ to the solution (Algorithm 2, lines 11–14). If no vertex can be added to the solution (M₁⁰ =∅), butM₂⁰ is not empty, then the search continues with theSwap(v, u) operator by visiting solutions of equal quality (Algorithm 2, lines 15–19).

To provide Swap(v, u) with an appropriate exchangeable pair < v, u >, v is first selected randomly fromM₂⁰, anduis then selected from the candidate set defined by the rules given in Section 2.3, while excluding the vertex recorded in f ix. In particular, if v is a vertex of type (set) A, then u is selected from S\(N(v)∪f ix) (which is a trivial set with zero or one vertex); and if v is a vertex of type (set) B, u is selected randomly fromC\(N(v)∪f ix). If there is no eligible candidate for u (S\N(v) = f ix or C \N(v) = f ix), then we simply give up attempting to applySwap(v, u) and move on to the PERTURB procedure (Algorithm 2, line 25).

2.5.3 Perturbation with Press

Inevitably, at a certain search stage, no candidate vertex v or candidate pair

< v, u >is available for the Add(v) orSwap(v, u) operator (i.e., both M₁⁰ and M₂⁰ are empty or no eligible vertex can be found foru), or the search stagnates on the Swap(v, u) operator. In the latter case, search stagnation occurs when the current solution has not been improved fors∗|C_best|consecutive iterations.

The self-adaptive threshold, s∗ |C_best|, is identified based on the assumption

(12)

that if the current s-plex cannot be improved even after the replacement of each of its vertices at least s times by the Swap operator, then no better solution can be found in the current search region. To continue the search, the algorithm triggers the PERTURB procedure in order to move the search to a distant new region. This procedure applies theP ress(v, X) operator, where v is the vertex fromM₃⁰ with the largestf req value and X collects the dropped vertices from C \N(v) to recover a feasible solution (Algorithm 2, lines 26–

27). It is important to reset the frequency of vertex v (Algorithm 2, line 29), or the accumulated frequency of this vertex could dominate other vertices in subsequent cycles. The dropped vertices in X are added to the tabu list (Algorithm 2, line 30) and they will not be considered during the forbidden period, as explained in the next section.

2.5.4 Tabu tenure and management

As mentioned above, to avoid revisiting recently examined solutions, we use a tabu list to record the vertices dropped from the current solution in order to exclude them from consideration during a number of consecutive iterations.

According to the definitions in Section 2.3, each application of Swap(v, u) or P ress(v, X) removes one or more vertices from the current s-plex. Each dropped vertex u will be kept in the tabu list for the next T_u iterations (the tabu tenure), which is set according to the following two rules:







T_u = 10 +random(0,|M₂|), uis dropped by Swap(v, u) T_u = 7, u∈X is dropped byP ress(v, X)

(7)

where random(0, I) is a random integer in {0, . . . , I}. These rules are based on previous studies of the related maximum clique problem [13, 35]. The first rule estimates the forbidden period for vertices for side-walk moves with the Swapoperator, which ensures that a dropped vertex will not be reconsidered for at least 10 iterations, and the second rule for theP ressoperator prevents any dropped vertex from being reconsidered for a small number of iterations (seven in this case). Based on experiments, we observed that other values around these tabu tenures obtained similar performance. Thus, we selected the values used in [13, 35].

Finally, the tabu list is more useful when the number of candidate vertices for Swap or P ress is limited, because a dropped vertex u will have a high probability of being added again to the solution if it is not prohibited. However, if numerous candidate vertices exist (such as in massive graphs), there is little chance of a dropped vertex being re-selected immediately. Thus, the tabu mechanism is more useful for graphs of limited size than massive graphs.

(13)

2.6 Reducing large (sparse) graphs

Given a graph G= (V, E) and a parameters, suppose that after tabu search (Algorithm 1, line 6), the current bests-plex has cardinality|C^∗|(lower bound of the maximum s-plex ofG). Clearly, to further improve C^∗, considering any vertex in V with a degree smaller than or equal to |C^∗| − s would not be beneficial because such a vertex cannot extend C^∗. Thus, these vertices can be safely removed from the graph [1]. In FD-TS, we explore this strategy using the P eel(G, s,|C^∗|) procedure (Algorithm 1, line 9), which recursively deletes the vertices (and their incident edges) with a degree less than or equal to

|C^∗| −s until no such vertex exists. Finally, if the subgraphs obtained after P eel(G, s,|C^∗|) have fewer vertices than |C^∗|, then C^∗ must be an optimal solution because no better solution can exist.

For very dense graphs, the P eel procedure may not reduce the graph size greatly because the degrees of most vertices will remain larger than |C^∗| −s.

However, this technique is highly effective when it is applied to large sparse graphs such as massive real-world complex networks. As shown in Section 4.3, by using the high-quality lower bound |C^∗| provided by our tabu search procedure, this pruning technique can effectively reduce large sparse graphs to very small graphs (even the null graph).

We note that the idea of removing unpromising vertices was used previously in a GRASP heuristic for detecting dense subgraphs (quasi-cliques) in massive sparse graphs [1], as well as in several exact algorithms for the maximum clique and s-plex problems [3, 29, 32].

3 Implementation and time complexity

To effectively implement FD-TS, we maintain two structures: the vectordegC[v]

(i.e.,deg_C[v] =|N(v)∪C|, v ∈V) and the setN(C) (i.e.,N(C) =^S_v∈CN(v)\ C), which are updated whenever the current solution C changes. Thus, each time a vertex (say u) is added to or removed from C by a move operator, we increase or decrease deg_C[v] by one for each v ∈ N(u). Since the set N(C) must only contain vertices with deg_C[v]>0, it is also adjusted when deg_C[v]

changes.

Next, we discuss the time complexity of the main components of the proposed algorithm. First, we consider the procedure for constructing initial solutions (Section 2.4). In each iteration, we need to build the subset M1 from N(C) and update deg_C[v] and N(C) after adding a vertex, which can be achieved inO(|N(C)|+ ∆) (∆ = maxv∈V{|N(v)|}).

(14)

The efficiency of TS² (Section 2.5) and PERTURB (Section 2.5.3) is closely related to the method used for building setsM₁⁰,M₂⁰, andM₃⁰ fromN(C) from scratch in each iteration. In our implementation, we build the saturated set S (Algorithm 2, line 9) from C in a time of O(|C|) at the very beginning of each iteration. Then, for each vertex inN(C) (e.g., u), we count the number of saturated vertices in the set of vertices adjacent to u, i.e., |S∩N(u)| (the saturated connectivity of u). Obviously, if the saturated connectivity of u is 0, S\N(u) =∅; and if the saturated connectivity of u is 1, |S\N(u)| = 1.

According to the definitions of M₁, M₂, and M₃, once the saturated connectivity of u and degC[u] is known, it is trivial to identify u as an element of M₁⁰, M₂⁰ orM₃⁰. Consequently, decomposing N(C) (Algorithm 2, line 10) can be achieved in O(|N(C)| ∗∆).

Next, we consider the time complexity when employing the move operators.

First, for theAdd operator, we only need to update the vectordeg_C[v] and set N(C) after reallocating a vertexv, which can be achieved inO(∆). Second, to apply Swap with a vertexv ∈N(C), we first need to identify the other vertex u ∈ C. According to the rule defined in Section 2.5.2, the sets S\N(v) and C\N(v) can be identified by traversing setsCandN(v) respectively. Thus, the time required to identifyuis bounded byO(|C|+∆) =O(2∗∆+s) = O(∆+s) (because |C| ≤ ∆ + s), while updating deg_C[v] and N(C) is bounded by O(2∗∆) =O(∆) because two vertices are displaced during each application of Swap. Finally, each application of the P ress operator can be achieved in O(∆²).

Overall, one operator (Add,SwaporP ress) is applied during one iteration, so the total time complexity of TS² and PERTURB for each iteration is bounded byO(|C|+|N(C)|∗∆+∆²). We note that for sparse graphs,|C|,N(C), and ∆ are usually extremely small compared with the number of vertices in a graph.

4 Computational assessment

4.1 Benchmarks

In this section, we present computational evaluations of the proposed FD-TS algorithm for the maximum s-plex problem based on the following three sets of 79 (19, 17, and 43, respectively) benchmark graphs.

• Stanford Large Network Dataset Collection (SNAP)¹. SNAP provides a large range of large-scale social and information networks [15], in-

1 http://snap.stanford.edu/data/

(15)

cluding graphs retrieved from social networks, communication networks, citation networks, web graphs, product co-purchasing networks, Internet peer-to-peer networks, and Wikipedia networks. Some of these networks are directed graphs, so we simply ignored the direction of each edge and eliminate self-looping and duplicated edges.

• The 10th DIMACS Implementation Challenge Benchmark (10th DIMACS)². This testbed contains many large networks, including artifi- cial and real-world graphs from different applications. This benchmark set is popular for testing graph clustering and partitioning algorithms. More information about the graphs can be obtained from [2].

• The 2nd DIMACS Implementation Challenge Benchmark (2nd DIMACS)³. This set is from the 2nd DIMACS Implementation Challenge for the maximum clique problem. These instances cover real-world problems (e.g., coding theory, fault diagnosis, and the Steiner triple problem) and random graphs. The instances range from small graphs (50 vertices and 1,000 edges) to large graphs (4,000 vertices and 5,000,000 edges). These DIMACS graphs are very popular and they are generally used as a testbed for evaluating clique and s-plex algorithms. Unlike the instances in the two first benchmark sets, most of these instances are dense graphs.

4.2 Experimental protocol and parameter tuning

The proposed FD-TS algorithm was implemented in C++⁴ and compiled by g++ with optimization option ‘-O3’. All experiments were conducted on a computer with an AMD Opteron 4184 processor (2.8 GHz and 2 GB RAM) running CentOS 6.5. When we solved the DIMACS machine benchmarking program fdmax.c⁵ without compilation optimization flag, the run time on our machine was 0.40, 2.50, and 9.55 seconds for graphs r300.5, r400.5, and r500.5, respectively.

An interesting feature of FD-TS is that it has very few parameters. In addition to the tabu tenure discussed in Section 2.5.4, the parameter q (the sample number of vertices in Section 2.4) was set to 100. As indicated in Section 2.4, q is not a sensitive parameter so we simply fixed it to the middle value in the range of [50,150]. However, according to our experiments, the best value of parameterL, which specifies the maximum number of iterations in each round of tabu search, was highly dependent on the instance considered. We compared different values in{10, 100, 1000, 5000}forLwiths = 2,3,4,5 for six selected instances from the 2nd DIMACS benchmark set (MANN a27, brock400 2,

2 http://www.cc.gatech.edu/dimacs10/downloads.shtml

3 http://www.cs.hbg.psu.edu/txn131/clique.html

4 We will make our program available.

5 ftp://dimacs.rutgers.edu/pub/dsj/clique/

(16)

brock800 2, C1000.9, keller6, p hat1500-3). As a trade-off, we retained L = 1000 because the algorithm achieved the best average solution quality in most cases. We also observed that fine tuning L for each pair (instances, s) could improve the performance. However, to report our computational results, we used the fixed settingL= 1000, which allowed the algorithm to obtain highly competitive results. Given the stochastic nature of our algorithm, we ran FD- TS to solve each instance 20 times for each givens, where each run was limited to a maximum of 180 CPU seconds (3 minutes).

4.3 Computational results for very large networks from SNAP and the 10th DIMACS Challenge

Table 1 shows the performance of FD-TS on 47 instances taken from the SNAP and 10th DIMACS benchmarks, namely, all37 instances tested in [29] for the s-plex problem, as well as 10 instances from the recent literature [32]. Note that in [32], only the best clique size was reported, which is just a lower bound of the maximum s-plex for s = 2,3,4,5. To be complete, we also report our results for the remaining 20 instances from [32] in the Appendix (Table A.1).

For each instance, the columns “Instance,” “|V|,” and “|E|” indicate basic information for the name, number of vertices, and number of edges, respectively. For eachs= 2,3,4,5, column “BKV” denotes the best-known objective values collected from [29] (for the first 37 instances) and [32] (for the last 10 instances). For the items in this column, an additional symbol “*” indicates that this objective value was proved to be optimal in [29], and the symbol ω shows that this value is the maximum clique size mentioned in [32] (which is a lower bound of the maximum s-plex). The “max” column shows the best objective value found by FD-TS among its 20 trials and the “time” column indicates the average time (in seconds) required for runs to obtain the best objective value (excluding the time spent reading the graph). The “|V⁰|” column shows the number of remaining vertices in the reduced subgraph after execut- ing the P eel(G, s, max) procedure (see Section 2.6). As mentioned earlier, if the number of vertices in the reduced subgraph (column “|V⁰|”) is less than or equal to the lower bound (column “max”), then the latter is guaranteed to be the optimal solution. In these cases, we put a “*” beside the value.

Moreover, for instances where the optimum could not be determined in ei- ther [29] or FD-TS, we conducted an additional experiment with the CPLEX solver (version 12.6.1). First, we reduced the original graph by applying the P eel(G, s, max) procedure. Then, for the reduced subgraph and a given s ∈ {2,3,4,5}, a cutoff time of one CPU hour was used when running CPLEX with the mathematical model (1) presented in Section 1. Experiments were conducted using the same machine employed for running FD-TS. The best

(17)

objective values found by CPLEX are listed in the “cplex” column, where “*”

indicates the optimal values. If CPLEX was unable to load the model, the entry is marked by “N/A.”

Table 1 shows that FD-TS always obtained the same or better objective values compared with the current best-known results (BKV values) (better objective values are highlighted with a bold font). In particular, FD-TS improved the best-known results for more instances as s increased (FD-TS found better solutions for 7,15,19,21 instances with s = 2,3,4,5, respectively). This observation indicates that the instances become more challenging for exact algorithms with a largers. Using the P eel procedure, FD-TS was also provably optimum for the first time for 6,6,6,10 instances and s = 2,3,4,5, respectively. For the cases where the number of vertices in the reduced subgraph remained larger than the lower bound given by FD-TS, the majority of these cases were manageable when the subgraph had less than 10,000 vertices (ex- ceptions include the instances 333SP, cage15, cit-Pattens, and wiki-Talk for s = 2). In terms of the computational time, FD-TS was able to obtain the best solutions in less than one second in most cases, including instances with millions of vertices. The average time required by FD-TS on cit-Patents for s= 3 was the longest but still less than one minute. Unfortunately, and sim- ilarly to CPLEX, it could not determine any solution better than that found by FD-TS in one hour when given the reduced subgraph. However, for the instances rgg n 2 17 s0 with s = 5, and rgg n 2 20 s0 with s = 4 and 5, optimal solutions were also obtained by CPLEX. Finally, we note that for the instances coPapersCiteseer, coPapersDBLP, and cond-mat-2005, the size of the maximum clique was the same as the size of the maximum s-plex for s= 2,3,4,5.

(18)

Table1 ComputationalresultsofFD-TSon47largenetworksfromtheSNAPCollectionandthe10thDIMACSImplementationChallenge. Instance|V||E|

s=2s=3s=4s=5 BKVmaxtime|V0|cplexBKVmaxtime|V0|cplexBKVmaxtime|V0|cplexBKVmaxtime|V0|cplex adjnoun1124256*60.00102-8*80.00102-8*80.0089-10*100.00102- celegansmetabolic453202510*100.00313-11*110.00429-13*130.00240-14*140.00429- dolphins621596*60.0053-7*70.0036-7*70.0045-9*90.0045- email1133545112*120.01238-12*120.02349-12*120.01349-13*130.01848- football11561310*100.01114-11*110.01115-12*120.00115-12*120.00115- jazz198274230*300.00130-30*300.00164-30*300.00127-30*300.00127- karate34786*60.0022-6*60.0010-8*80.0010-11a9*0.000- netscience1589274220*200.0050-20*200.00137-20*200.00158-2020*0.0020- polblogs14901671523*230.02541-27*270.03894-29*290.04489-32320.0329332* polbooks1054417*70.00103-9*90.00103-10*100.00105-11*110.00105- power494165946*60.0036-6*60.00231-680.01128*890.01129* PGPgiantcompo106802431629*290.03115-31*310.0243-33*330.0241-35*350.0241- as-22july06229634843619*190.02117-21*210.02110-22*220.04110-24*240.09104- astro-ph1670612125157*570.06113-57*570.05113-57*570.07113-57*570.07165- caidaRouterLevel19224460906620*200.312039-22230.5012901323240.7412901223260.82109814 cnr-2000325557273896985*85*6.750-86*8610.510-86*86*7.6686-86*86*8.0986- coAuthorsCiteseer22732081413487*87*0.2187-87*87*0.2187-87*87*0.2287-87*87*0.2287- coAuthorsDBLP299067977676115*115*0.27115-115*115*0.26115-115*115*0.27115-115*115*0.27115- cond-mat-20054042117569130*30*0.0830-30*30*0.0730-30*30*0.0830-30*300.0857- memplus177585419697*97*0.0897-97*97*0.1197-97*97*0.1197-97*97*0.1097- rggn217s013107272875316*16*0.150-17*17*0.150-18*18*0.140-18180.143418* rggn219s0524288326976619*19*0.690-19*19*0.6619-20*20*0.6419-2021*0.6619- rggn220s01048576689162018*181.4259-19*191.3759-19201.365920*19201.3317220* cit-HepPh3454642087724*240.374768-25270.273498525300.1923441125320.30180422 cit-HepTh2777035228528*281.204815-31310.703999N/A31340.812922631370.72172623 email-EuAll26521436448119*190.151357-22220.2211982022250.2210552327270.2698826 p2p-Gnutella0410876399945*50.226089-570.0954334690.10485757100.014857N/A p2p-Gnutella2426518653695*50.079271-560.029271N/A580.107480N/A590.087480N/A p2p-Gnutella2522687547055*50.037892-560.017892N/A580.0160915510*0.010- soc-Epinions17587940574028*280.0941561228320.0937911228370.0932801228390.16317810 soc-Slashdot08117736046918031*310.053665-33340.063188636380.122596736400.1024167 soc-Slashdot09028216850423032*320.053669-35350.063185636400.102435936420.1022877 web-BerkStan6852306649470202*2024.19392-202*2024.47392-202*2024.21392-202*2026.55392- web-Google875713432205146*46*1.130-47*47*1.240-48*48*1.340-48*48*1.2948- web-NotreDame3257291090108155*1550.741367-155*1550.711367-155*1550.841367-155*1550.991367- web-Stanford281903199263664*643.86741-64*645.61985-64654.779856564666.8098565 wiki-Vote711510076221*210.102098-24240.1120051226270.1219251627280.13192517 333SP3712815111086334(ω)51.192261408N/A4(ω)60.712261408N/A4(ω)70.512261408N/A4(ω)80.462261408N/A belgium.osm144129515499703(ω)5*0.660-3(ω)5*0.355-3(ω)6*0.375-3(ω)7*0.345- cage155154859470223466(ω)62.485135355N/A6(ω)85.685134115N/A6(ω)106.025091619N/A6(ω)1126.715091619N/A coPapersCiteseer43410216036720845(ω)845*7.92845-845(ω)845*7.20845845845(ω)845*7.01845-845(ω)845*8.51845- coPapersDBLP54048615245729337(ω)337*2.83337-337(ω)337*3.03337-337(ω)337*3.18337-337(ω)337*3.28337- amazon0312400727234986911(ω)12*0.540-11(ω)13*0.580-11(ω)14*0.610-11(ω)15*0.600- amazon0505410236243943711(ω)12*0.500-11(ω)13*0.580-11(ω)14*0.620-11(ω)15*0.640- amazon0601403394244340811(ω)12*0.750-11(ω)13*0.820-11(ω)14*0.790-11(ω)15*0.740- cit-Patents37747681651894711(ω)1721.2529585N/A11(ω)2151.0214717N/A11(ω)2616.236293N/A11(ω)3111.0936045 wiki-Talk2394385465956526(ω)321.1811327N/A26(ω)363.609814N/A26(ω)411.518376N/A26(ω)443.727719N/A Notea:Thevalueof11reportedin[29]for‘karate’iswronglyclaimedtobetheoptimalsolution.