Automatic Design of Hybrid Stochastic Local Search Algorithms - analysis and application

(1)

Automatic Design of Hybrid Stochastic

Local Search Algorithms - analysis and

application

Thesis submitted by Federico PAGNOZZI

in fulfilment of the requirements of the PhD Degree in Docteur en Sciences de l’Ingénieur

Academic year 2018-2019

Supervisor: Prof.

Thomas STÜTZLE

(2)

(3)

(4)

Acknowledgments

As in all things in life, when you reach one of your goals you feel the need to thank and acknowledge all the people that supported and helped you. This is that time for me. In the first place I want to thank my family for all the support they gave me and for always encouraging me to pursue my interests. Secondly, I want to give a special thank to Garazi, sharing with her the last part of this journey has been a delight and I cannot wait to share with her what will come next.

My path towards the beginning of this PhD has not been a straight line. After the master degree, as I started working, I was thinking that my chances had already past and starting a PhD was not something I could do anymore. I owe my thanks to Gianpiero, for convincing me otherwise.

I would like to extend a special thank to my supervisor Thomas Stützle for providing me with the best supervision a student could ask for. I will always try to reach the standard he sets as a scientist and, in general, as an amazing human being.

Iridia is a special place, working there you become part of an amazing community always ready to help when you need it. So I would like to thank all the iridians that I was lucky enough to meet during these years. In particular, I would like to extend a special thank to Alberto for being the first user, beta tester and guinea pig of the EMILI framework.

(5)

(6)

List of Figures

2.1 A general scheme of an Automatic Algorithm Design system . . . 31 2.2 Part of a context-free grammar describing how to build different SLS

algo-rithms that also allows hybridization. . . 33 3.1 Class diagram displaying the core classes of the EMILI framework for

algo-rithm building . . . 40 3.2 Class diagram displaying the SLS methods implemented in the EMILI framework 43 3.3 Class diagram displaying the problem independent termination criteria

imple-mented in the EMILI framework . . . 47 3.4 Class diagram displaying the problem independent acceptance criteria

imple-mented in the EMILI framework . . . 48 3.5 Class diagram displaying the problem independent perturbation methods

implemented in the EMILI framework . . . 50 3.6 Class diagram displaying the base Neighborhood implemented in the EMILI

framework . . . 52 3.7 The base grammar rules used to parse the components and build the algorihtm 54 3.8 The grammar rules to build the base algorithms definitions defined in Section

3.2.1 . . . 55 3.9 The grammar rules used to build the base problem independent components

defined in Section 3.2.2 . . . 56 3.10 Class diagram displaying the parser structure in the EMILI framework . . . . 57 3.11 The grammar rules of the components that can be built by ABuilder for the

problem A . . . 59 3.12 Sequence diagram showing the function calls involved in the parsing of the

algorithm showed in Section 3.3.2 . . . 62 3.13 Class diagram displaying the classes modeling the different PFSP variants and

their relation . . . 72 3.14 Class diagram displaying the classes modeling different objectives of the PFSP 73 4.1 Context-free grammar that contains the rules used to build algorithm

tem-plates for this study. Note that rules ILS together with LocalSearch define a recursion that can be exploited to generate hybridizations of various algorithms. 89

(11)

4.2 PFSPMScomparison on the Taillard benchmark, Average RPD and 95% con-fidence intervals of IGall and IGirms for T = 60 (left), T = 120 (center) and

T = 240(right). . . 99 4.3 PFSPMScomparison on VRF benchmark, Average RPD and 95% confidence

intervals of IGalland IGirms for T = 60 (left), T = 120 (center) and T = 240 (right). . . 100 4.4 Average RPD and 95% confidence intervals of IGA and ALGirtct for T = 60

(left), T = 120 (center) and T = 240 (right). . . 102 4.5 Average RDI and 95% confidence intervals of TSM63, IGRLS and ALGirtt for

t = 60(left), T = 120(center) and T = 240 (right). . . 104 5.1 Context-free grammar that contains the rules used to build algorithm

tem-plates for this study. Note that rules ILS together with LocalSearch define a recursion that can be exploited to generate hybrids combining various algorithms. . . 113 5.2 Average ARPD and 95% confidence intervals of EMBO, MRSILS, IGrsand IRstms

for t = 60 (left), T = 120 (center) and T = 240 (right). . . 122 5.3 Average ARPD and 95% confidence intervals of IGrsand IRsttctfor t = 60 (left),

T = 120(center) and T = 240 (right). . . 124 5.4 Average ARPD and 95% confidence intervals IGrsand IRsttt for t = 60 (left),

T = 120(center) and T = 240 (right). . . 127 5.5 Average RDI and 95% confidence intervals of MANEH, GVNS and IRnims for

t = 60(left), T = 120 (center) and T = 240 (right). . . 129 5.6 Average RPD and 95% confidence intervals of VigDE and IRnitct for t = 60

(left), T = 120 (center) and T = 240 (right). . . 131 5.7 Average RDI and 95% confidence intervals of DTLM and IRnittfor t = 60 (left),

T = 120(center) and T = 240 (right). . . 131 6.1 Algorithmic components implemented in the EMILI framework that were used

in this study . . . 142 6.2 Number of parameters generated for each objective when converting the

grammar with values of Rcfrom one to three. . . 145 6.3 Template of the context-free grammar used for this study. . . 145 6.4 A simple context-free grammar comprising five non-terminal and eleven

terminal symbols. . . 146 6.5 Graph representing the parameters and the dependencies generated from the

grammar in Figure 6.4 . . . 146 6.6 Two possible configurations for the parameters derived from the grammar in

(12)

6.7 Correlation between ARPD and algorithm complexity for the PFSPMS con-sidering three levels of max allowed recursion. Circles indicate algorithms generated not allowing any hybridization, with the triangles one level is allowed and with the squares two levels are allowed. . . 149 6.8 Correlation between ARPD and algorithm complexity for the PFSPTCT

con-sidering three levels of max allowed recursion. Circles indicate algorithms generated not allowing any hybridization, with the triangles one level is allowed and with the squares two levels are allowed. . . 151 6.9 Correlation between ARDI and algorithm complexity for the PFSPTT

con-sidering three levels of max allowed recursion. Circles indicate algorithms generated not allowing any hybridization, with the triangles one level is allowed and with the squares two levels are allowed. . . 152 A.1 Approximation mean speed-up when varying T a with k = 1. T a is a parameter

that applies the approximation only to jobs that are inserted at or after the index T a · n. . . . 165 A.2 Approximation mean speed-up when varying k with T a = 0. The

parame-ter k indicates the number of machines used for the approximation of the completion times. . . 165 A.3 Mean speed-up for best improvement (left plot) and first improvement (right

plot) and insert neighborhood using the approximation technique and delta evaluation over the complete calculation of the objective function value and using 12 different instance sizes. . . 167 A.4 Mean speed-up for best improvement (left plot) and first improvement (right

plot) and exchange neighborhood using the approximation technique and delta evaluation over the complete calculation of the objective function value and using 12 different instance sizes. . . 167 B.1 Average RPD and 95% confidence intervals for different number of jobs

removed (1 to 10), over the set of training instances, for IGLSPS−s(left) and

IGLSPS−l(right). . . 175 B.2 Average RPD and 95% confidence intervals for different values of parameter

Tp in the acceptance criterion (from 0.0 to 1.5), over the set of training instances, for IGLSPS−s (left) and IGLSPS−l(right). . . 176 B.3 Average RPD and 95% confidence intervals for the two variant of LSPS, and

the original algorithm, over the set of Taillard instances, for T = 60. . . . . 177 B.4 Average RPD and 95% confidence intervals for IGRS, IGLSPS−l, IGtb, IGtb+LSPS,

IGnls, IGnls+LSPS, and IGallin this order on Taillard’s benchmark set, for stop-ping criteria with T = 60 (top), T = 120 (middle) and T = 240 (bottom).

(13)

B.5 Average RPD and 95% confidence intervals for IGRS, IGLSPS−l, IGtb, IGtb+LSPS,

(14)

List of Tables

2.1 Instance of the PFSP composed of five jobs and three machines. . . 15

3.1 The heuristics to generate the initial solution implemented in EMILI for PFSP 74 3.2 Neighborhood components implemented for PFSP problems . . . 77

3.3 Type definition and parameters of the Perturbation components implemented for PFSP problems . . . 78

3.4 Type definition and parameters of the Acceptance criteria implemented for PFSP problems . . . 80

3.5 Type definition and parameters of the TabuMemory implemented for PFSP problems . . . 81

3.6 Type definition and parameters of the LocalSearch components implemented for PFSP problems . . . 82

4.1 Heuristics implemented for the generation of the initial candidate solution. . 91

4.2 Iterative Improvement algorithms implemented . . . 91

4.3 Neighborhood implementations . . . 92

4.4 Termination Criteria used to generate algorithms in this study . . . 93

4.5 Perturbations used to generate algorithms in this study . . . 94

4.6 Acceptance criteria used to generate algorithms in this study . . . 95

4.7 Parameter settings for IGirms . . . 98

4.8 ARPD results of IGalland IGirmsfor the two running times. If an algorithm is statistically significantly better, according to the Wilcoxon signed-rank test with a 95% confidence, this is shown in bold face. . . 99

4.9 Average RPD values obtained by IGall and IGirms on the different instance sizes o the VRF-large benchmark set, using three different values for T. The algorithm that is significantly better than the other according to the Wilcoxon signed-rank test with a 95% confidence is shown in bold face. . . 100

4.10 Parameter settings for ALGirtct . . . 102

4.11 Average RPD results of IGA and ALGirtct for the three running times.The algorithm that is significantly better than the other, according to the Wilcoxon signed-rank test with a 95% confidence, is shown in bold face. . . 103

4.12 Parameter settings for ALGirtt . . . 103

(15)

4.13 Average RDI results of TSM63, IGRLSand ALGirtt for the three running times. The algorithm that is significantly better than the others, according to the Wilcoxon signed-rank test with a 95% confidence corrected using Bonferroni, is shown in bold face. . . 105 5.1 Algorithmic components implemented in the EMILI framework that were used

in this work. . . 120 5.2 Average RPD results of EMBO, MRSILS, IGrs and IRstms. If the result of one

of the algorithms is in bold face it means that it is statistically significantly better then the others according to the Wilcoxon signed-rank test with a 95% confidence using the Bonferroni correction to take into account multiple comparisons. . . 123 5.3 Parameter settings for IRstms . . . 124 5.4 Average RPD results of IGrs and IRsttct. If an algorithm is statistically

sig-nificantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face. . . 125 5.5 Parameter settings for IRsttct . . . 125 5.6 Average RPD results of IGrs and IRsttt. If an algorithm is statistically

sig-nificantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face. . . 127 5.7 Parameter settings for IRsttt . . . 129 5.8 Average RPD results of MANEH, GVNS and IRnims. If an algorithm is

statisti-cally significantly better according to the Wilcoxon signed-rank test with a 95% confidence with Bonferroni correction, the result is shown in bold face. 130 5.9 Parameter settings for IRnims . . . 131 5.10 Average RPD results of VigDE and IRnitct. If an algorithm is statistically

significantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face. . . 132 5.11 Parameter settings for IRnitct . . . 133 5.12 Average RDI results of DTLM and IRnitt. If an algorithm is statistically

sig-nificantly better according to the Wilcoxon signed-rank test with a 95% confidence, the result is shown in bold face. . . 134 5.13 Parameter settings for IRnitt . . . 135 B.1 Average RPD values obtained by IGRS and IGall on the different instance

(16)

(17)

(18)

1

Introduction

1.1 Motivation

Combinatorial optimization problems can be found in many aspects of manufacturing, computer science, logistics and many more. These problems consist in combining a finite set of elements so that a cost measure is minimized or a quality measure is maximized. One of the most known examples of such problems is the traveling salesman problem, where a salesman has to visit a set of cities in a certain order that minimizes the distance traveled. Another example, the permutation flowshop problem, involves finding the best schedule for the execution of a group of jobs in a shop.

Despite the great interest generated by the many practical applications, combinatorial optimization problems can be quite hard to solve. In fact, many combinatorial optimization problems, like the traveling salesman problem and the permutation flowshop problem, belong to a class of problems called N P-hard. To this day, no known algorithm can guarantee to find the optimal solution of the problems in this class in polynomial time. The techniques used to solve these problems can be grouped in two classes, exact methods and approximate methods. Exact methods are guaranteed to eventually find the optimal solution. Yet, the time needed to find the optimal solution may be impractical. On the contrary, approximate methods are not guaranteed to find the optimal solution but, in most cases, can find solutions with a quality close to the optimal in little time. Among these methods, local search algorithms have been proved to be very successful.

A local search algorithm starts from some given solution and tries to find better solutions in a properly defined neighborhood of the current solution. In case a better solution is found, the current solution is replaced and the algorithm continues. One of the most simple local search algorithms, iterative improvement, keeps applying this strategy until it reaches a local optimum. A major disadvantage of this algorithm is that it may stop in a poor quality local optimum. Being able to escape from local optima would greatly increase the performance of a local search, since the algorithm would be able to look for better solutions. One strategy is to have a larger neighborhood. In this way, the chances of finding a better solution would increase, but this would also increase the time to evaluate the neighbors. Another strategy is to restart the algorithm from a randomly generated

(19)

solution. Ultimately, these strategies are quite inefficient considering that the search space typically contains a huge number of local optima.

Stochastic local search (SLS) has been proposed to overcome these disadvantages. SLS algorithms can be defined as local search algorithms that use some degree of randomness in the way they explore the solution space [96]. These algorithms employ different escape strategies such as moving, with a certain probability, to worse solutions, using memory to avoid previously visited solutions or making visited solutions less attractive by using penalties. SLS algorithms comprehend many of the most widely known high performance algorithms to solve hard combinatorial optimization problems such as the traveling salesman problem [142, 89], the permutation flowshop problem [55, 161] and vehicle routing problems [171, 42].

In general, algorithms expose parameters that allow the user to adapt the algorithm to different problems and application scenarios. Properly setting these parameters can greatly improve the performance of a given SLS algorithm. The problem of finding the best setting for all the parameters of an algorithm has been historically solved using a manual process of trial-and-error. Manual approaches based on systematic experiments and statistical analysis have been proposed, but these methods become impractical when the number of parameters grow. Automatic algorithm configuration (AAC) [93, 59, 126] has been developed to solve this problem. Given a properly defined application

scenario, automatic configuration tools treat the problem of finding the best parameter settings as an optimization problem using techniques such as SLS algorithms [101], racing algorithms [131] or model-based approaches [103] to find the best configuration for a given algorithm.

Implementing an SLS algorithm to solve a given problem has been, usually, a manual engineering process. A designer would use his knowledge of the problem and experience to choose one particular SLS algorithm, adapt the chosen SLS to the problem and, finally, find the best setting for its parameters. Ultimately, this manual process is open to inefficiencies since it relies on the designer to implement the most appropriate SLS algorithm for the problem. It is possible to automatize this process by using automatic configuration tools with a configurable algorithmic framework. Such frameworks implement one or more SLS algorithms in a modular way, where an algorithm is composed of different algorithmic components. For each aspect of the algorithm, a parameter is used to select among the different available components.

In other words, different algorithms can be instantiated by setting the parameters of the framework. Consequently, given a configurable algorithmic framework capable of instantiating SLS algorithms for some problem, using automatic configuration tools to find the best parameter settings leads to generate a high performing algorithm for the

(20)

considered problem. This process, called automatic algorithm design (AAD), can be achieved by following a top-down or bottom-up approach. In the first approach, the framework can instantiate only one SLS algorithm. However, on the latter approach, the framework can instantiate more than one SLS algorithms and different SLS algorithms can be combined to form hybrids. The top-down approach is easier to implement, because only one SLS algorithm is implemented. The number of parameters is also relatively small, with only one parameter for each aspect of the algorithm. Instead, the bottom-up approach requires a framework able to instantiate and combine more SLS algorithms. Moreover, the ability to instantiate different SLS algorithms together with the hybridization allows the generation of completely new algorithms. The downside is that defining the parameters is not straightforward, different SLS algorithms may require different number and type of components. These rules can be specified using a context-free grammar [137] that can be converted into parameters. Automatic algorithm design has been shown to generate high performing algorithms that are able to outperform the state-of-the-art when solving several combinatorial optimization problems [137, 114, 47, 21].

1.2 Contribution

In this thesis, we expand the work done on grammar based automatic design of stochastic local search algorithms. In particular, we present a new algorithmic framework, EMILI. This new framework improves over previous frameworks thanks to its modular design and its ability to instantiate algorithms at run time. Using AAD, we present new state-of-the-art algorithms for the major objectives of the permutation flowshop problem (PFSP) and PFSP variants with additional constraints. While working on PFSP we introduced a new speed-up mechanism for the calculation of the objective function for PFSP with the weighted tardiness objective as well as a new state-of-the-art algorithm for PFSP with the makespan objective. Finally, we analyze how algorithm complexity affects the performances of automatically generated SLS algorithms.

EMILI framework

One of the core components of an automatic algorithm design system is the framework used to instantiate algorithms. Such framework should have the flexibility to instantiate different algorithms for different problems while at the same time provide a way to allow the definition of problem specific components. In addition, it has to be easy to expand so that new components can be added with minimum effort. Finally, it should offer a reasonable level of performance so that the instantiated algorithms can be compared with state-of-the-art algorithms.

(21)

We present the EMILI framework that was implemented specifically to support the auto-matic design of SLS algorithms. In developing this framework, we tried to find a good trade-off between flexibility and overall performance. In particular, EMILI uses a modular object-oriented architecture that allows the definition of general components that can be reused with several problems together, as well as problem-specific components that are tailored to a specific problem variant or implement specific speed-up techniques. Fur-thermore, the framework uses a parser to instantiate algorithms at run time. In this way, the time needed by the automatic algorithm design process is greatly reduced when com-pared to previous frameworks that needed to be compiled for each algorithm instantiation [137].

Iterated greedy extension to solve permutation flowshop

Permutation flowshop problems (PFSPs) are one of the most widely studied classes of scheduling problems [68], the arguably most studied variant being the one to minimize a schedule’s makespan [65]. Other widely studied variants include those that consider minimizing the sum of the completion times of the jobs [166] or the sum of the jobs’ tardiness if due dates are considered. As these variants (with few exceptions such as the two machine case for makespan minimization [107]) are N P-hard, much of the research on these problems is focused on heuristic and metaheuristic algorithms.

In recent years, for the most widely studied PFSP variant that considers the minimization of the makespan, only some limited improvements over the structurally rather simple iterated greedy (IG) algorithm by Ruiz and Stützle [180] have been made, which mainly refine some details of this algorithm. (In fact, this IG algorithm itself has been shown to outperform other high-performance metaheuristic algorithms for the makespan PFSP; for details see [180].) These refinements include tie-breaking rules in insertion heuristics [64] and improvements over the initial solution [172].

We propose an extension to the IG algorithm that considers the local optimization of partial solutions. Our experimental evaluation of this extension shows that it leads to IG algorithms for the PFSP with makespan minimization that improve both over the original IG algorithm and other proposed refinements mentioned above. Moreover, our experimental results show that an IG algorithm that combines our proposal with the aforementioned refinements leads to a new state-of-the-art IG algorithm for the PFSP with makespan minimization.

Applying automatic algorithm design to permutation flowshop

Over the years a large number of high-performing algorithms have been proposed for the most studied variants of the permutation flowshop problem [65, 68, 166], often obtained after a significant, manual algorithm engineering effort. Even if these efforts seem often

(22)

rather disconnected from each other, for a number of basic PFSP variants the best available algorithms share some similarities. For example, many recent, high-performing algorithms rely on iterated local search or iterated greedy type algorithms [65, 180, 55, 166, 122, 111]

We show that for the PFSP variants with makespan, sum of completion time and total tardiness objectives we can generate automatically new state-of-the-art algorithms from a same code-base and without human intervention in the algorithm design process. The main ingredients that we use for this design process are a flexible algorithm framework from which a set of pre-programmed algorithmic components can be combined to generate algorithms, a coherent way of how to generate stochastic local search (SLS) algorithms from the framework and automatic algorithm configuration tools.

A comparison of our automatically generated algorithms to the state of the art for each objective shows that in all cases our algorithms are clear improvements. Thus, our results indicate a new way of how to generate high-performing algorithms for a set of scheduling problems, which so far have been tackled by extensive, manual algorithm engineering efforts.

Applying automatic algorithm design to permutation flowshop with additional constraints

In addition to the standard permutation flowshop problem, several additional constraints have been proposed in the literature to take into account different scenarios. Often, machines have to be setup before being able to process a job. For instance, a machine may need to be cleaned or calibrated before processing another job. The setup time may depend not only on the job that has to be processed, but also on the changes done to setup the previous job. Permutation flowshop problem with sequence dependent setup times (PFSPsdst_{) has been introduced to model this scenario. This problem has been shown to} be N P-hard even when there is only one machine [80]. No-idle permutation flowshop (PFSPni_{) is another such variant. PFSP}ni _{models a scenario where machines cannot have} idle times. In fact, in some contexts (e.g. steel industry) some machines cannot be stopped and keeping them running requires a lot of resources. PFSPniis also a N P-hard problem [10]. For each PFSP variant, the minimization of the makespan, sum of completion times and total tardiness is considered. Although less studied than standard PFSP, several metaheuristics have been proposed to solve these problems [179, 181, 218, 195, 192, 149].

Using the automatic design system based on the grammar representation and the EMILI framework, we generate six algorithms for solving the three considered objectives for the

(23)

two considered PFSP variants. The results show that the generated algorithms outperform the state-of-the-art.

Analysis of the structure of automatically generated SLS algorithms

The hybridization of different kinds of stochastic local search algorithms has shown to be able to generate state-of-the-art algorithms when applied to permutation flowshop [161] [137]. However, allowing hybridization can generate huge parameter spaces and generate algorithms with a complex nested structure.

We try to understand if this complexity is really needed by using the AAD system presented in [161] to generate algorithms, with varying levels of allowed hybridization, for the three most studied objective of the permutation flowshop problem: (PFSP) makespan (PFSPMS), total completion time (PFSPTCT) and total tardiness (PFSPTT). The complexity

of the generated algorithms is analyzed and compared.

For each objective we allow no hybridization, one level of hybridization and two levels. For each of these levels we generate 10 algorithms that are compared considering solution quality and algorithm complexity. The experiments show that our AAD system generates a more complex algorithm only if it performs similarly or better that a less complex algorithm. Furthermore, the results show that the huge parameter spaces produced by allowing hybridization do not seem to generate less performing algorithms when simple algorithms perform better.

Speed-up technique for local search algorithms solving the permutation flowshop problem with the weighted tardiness objective

Many algorithms for minimizing the weighted tardiness in the permutation flowshop problem rely on local search procedures. An increase in the efficiency of evaluating the objective function for neighboring candidate solutions directly also improves the performance of such algorithms. We introduce a speed up of the evaluation of the weighted tardiness while exploring the insert neighborhood of a solution. To discard non-improving neighbors and to avoid the full computation of the objective function, we use an approximation of the weighted tardiness. The experimental results show that the technique delivers a consistent speed-up that increases with instance size. Furthermore, we show that it is possible to apply the same approximation technique to the exchange neighborhood achieving again a significant, but smaller speed-up.

1.3 Publications

The work described in this thesis has been presented in several pubblications.

(24)

• F. Pagnozzi and T. Stützle. “Speeding up Local Search for the Insert Neighborhood in the Weighted Tardiness Permutation Flowshop Problem”. In: Optimization Letters 11 (2017), pp. 1283–1292.

• J. Dubois-Lacoste, F. Pagnozzi, and T. Stützle. “An Iterated Greedy Algorithm with Optimization of Partial Solutions for the Permutation Flowshop Problem”. In: Computers & Operations Research 81 (2017), pp. 160–166.

• F. Pagnozzi and T. Stützle. “Automatic design of hybrid stochastic local search algorithms for permutation flowshop problems”. In: European Journal of Operational Research 276 (2 2019), pp. 409–421.

• F. Pagnozzi and T. Stützle. “Automatic design of hybrid stochastic local search algo-rithms for permutation flowshop problems with additional constraints”. Submitted to: Computers & Operations Research.

Other publications related to the topic of this thesis have been published during the time of this thesis.

• P. Alfaro-Fernández, R. Ruiz, F. Pagnozzi and T. Stützle. “Automatic Algorithm Design for Hybrid Flowshop Scheduling Problems”. Accpeted for pubblication in: European Journal of Operational Research.

• L. Pérez Cáceres, F. Pagnozzi, A. Franzin, and T. Stützle. “Automatic Configuration of GCC Using Irace”. In: EA 2017: Artificial Evolution. Ed. by E. Lutton, P. Legrand, P. Parrend, N. Monmarché, and M. Schoenauer. Vol. 10764. Lecture Notes in Computer Science. Springer, Heidelberg, Germany, 2018, pp. 202–216.

Finally, two more publications are currently under working. The first is a journal paper based on the EMILI framework and its architecture, that will be submitted to a software engineering journal. The second is a conference paper based on the analysis of the structure of automatically generated SLS algorithms, that will be submitted to the GECCO conference next year.

1.4 Outline

This work is organized in seven chapters. Chapter 2 introduces the main concepts at the base of the work presented in this thesis. We briefly introduce combinatorial problems and problem complexity. The permutation flowshop problem is introduced as an example of N P-hard problem. Next, we introduce exact and approximate solving methods and, in particular, SLS algorithms. We present the main SLS algorithms and, among others, we

(25)

outline the main SLS methods that are the most relevant to this study. Subsequently, we introduce automatic algorithm configuration and we briefly outline the main AAC tools available. The chapter is concluded with an introduction to automatic SLS design. We start with explaining how AAC tools can be used to design algorithms together with flexible algorithmic frameworks. We discuss the two main design approaches for algorithmic frameworks top-down and bottom-up. In particular, we focus on the latter approach which consists of generating SLS algorithms by combining algorithmic components.

In chapter 3, we present the EMILI framework, implemented specifically to support the automatic design of SLS algorithms. First, we present the framework and the reasons that led to its developments. In particular, we describe how SLS algorithms are decomposed in components and how the components interact. Then we discuss how SLS algorithms are represented in the framework. Subsequently, we show how algorithms are assembled through the use of a parser. We conclude by showing the application of the framework to the permutation flowshop problem, describing the problem-specific components imple-mented.

In chapter 4, grammar based automatic algorithm design with the EMILI framework is applied to generate SLS algorithms to solve the most widely studied variants of the permutation flowshop problem. We list the components of the EMILI framework that are used in the experiments and we outline the grammar used to define the building rules. The generated algorithms are compared with state-of-the-art algorithms over the main benchmarks used in the literature.

In chapter 5, we use automatic algorithm design to tackle two of the most studied additional constraints for the permutation flowshop problem: sequence dependent setup times and no-idle. We give a description of the additional components that have been added to adapt the system to the new problems while keeping intact the grammar structure and the experimental setup. We describe the generated algorithms and we compare them with the best algorithms available.

In chapter 6, we investigate the impact of algorithm hybridization on automatic algorithm design by creating grammars that allow the combination of two algorithms at most two, one or zero times. The grammars are used to generate algorithms for the three most studied objectives of the permutation flowshop problem. The generated algorithms are compared using a quantitative measure of complexity based on a similarity metric. Finally, in Chapter 7 we summarize the contributions of this work and outline direction for future research.

This thesis contains two annexes. In annex A, we present the speed-up technique for evaluating the insert neighborhood for the permutation flowshop with the weighted

(26)

tardiness objective. In annex B, we present the work done on introducing a local search on partial solutions in the iterated greedy algorithm to solve the PFSP with the makespan objective.

(27)

(28)

2

Automatic design of stochastic local

search algorithms

Stochastic Local Search algorithms have been shown to be very successful in solving hard combinatorial problems [96]. Usually, such algorithms are developed trough a careful, manual engineering process in order to reach high performance. Automatic algorithm design represents an automated alternative to the manual process that uses automatic configuration techniques together with a flexible algorithmic framework. In this chapter, we provide an introduction to the concepts at the base of this work such as optimization problems, stochastic local search, automatic parameter tuners and automatic design. The chapter starts with a definition of combinatorial problems where the permutation flowshop problem is used as an example. We continue by giving a brief introduction to the concepts of N P-completeness and to the general methods used to solve such problems. Stochastic local seach methods (SLS) comprehend many of the most widely known high-performance algorithms to solve hard combinatorial problems. Notable example of such algorithms are the Lin-Kernighan algorithm for the Travelling Salesman Problem [124], as well as general methods such as iterated local search [16, 17], simulated annealing [117, 213], tabu search [76, 77], ant colony optimization [52, 51] and evolutionary algorithms [91, 78]. We give an outline of the main SLS methods, focusing on the SLS methods most relevant to this work. In particular, we describe iterated local search, iterated greedy and large neighborhood search [193], variable neighborhood search [145], simulated annealing, tabu search, and greedy randomized adaptive search procedure [61].

Subsequently, we discuss automatic parameter configuration (AAC) tools. When applying a SLS algorithm to solve a problem, there are several aspects of the algorithm that need to be properly configured. This algorithm configuration is a key aspect to consider when building high performing SLS algorithms for any given problem. In the past years, several AAC tools have been proposed to automatize this process [101, 103, 131]. We give a brief presentation of the main AAC tools and how they work.

In the final part of the chapter, we discuss automatic algorithm design (AAD). AAD systems are based on using AAC tools with algorithmic frameworks. Such frameworks implement one or more SLS algorithms so that most aspects of the algorithm behavior are exposed as parameters and can be easily configured using AAC tools. When the configuration of a SLS algorithm includes design choices and influences the algorithm behavior, the result of

(29)

using AAC tools produces a new algorithm. We present an outline of how AAD systems are composed and, in particular, we present the AAD system used in this work. To this purpose we explain how context-free grammars and AAC tools, together with a flexible algorithmic framework, are used to generate effective SLS algorithms.

2.1 Combinatorial optimization problems

This thesis concerns the automatic generation of SLS algorithms to solve combinatorial optimization problems. These problems come out in many areas of computer science and consequently in many other disciplines where computer science is applied such as operations research, artificial intelligence and bioinformatics [96]. Examples of combina-torial problems regard tasks such as finding the best scheduling for a set of operations in a shop, finding the shortest round trips in graphs, assigning resources to the positions that minimize distribution costs or finding the best routing for vehicles transporting goods or people.

In a nutshell, combinatorial problems can be defined as the determination of schedules, groupings or assignments of a discrete set of objects that satisfy certain conditions so that a certain quality measure is maximized or minimized. Therefore, combinatorial problems admit as many solutions as the number of ways one can arrange the objects characterizing the problem.

Before moving forward, it is necessary to define the difference between problem and problem instance as intended in this work. A problem is characterized by several variables that need to assume a value in order to be solved. For instance, the time required to process each operation in a shop or the total number of operations to process. A problem definition provides the general specification of the problem, what are the variables, what is a solution and how the quality of a solution is measured.

A problem instance provides a setting for all the variables. More formally, a problem instance can be defined as a pair (f, S), where S is the set of all candidate solutions and f is a function defining the quality measure as f : S → R. The goal of an optimization problem is to find a solution s∗∈ S that either maximizes or minimizes the function f , also called objective function. Without loss of generality, we always assume we are dealing with minimization problems since maximization problems can be turned into minimization problems with little effort. The solution s∗ always verifies the relation s∗≤ f (s0) ∀s0∈ S and it is also called optimal solution or global optimal.

Intuitively, a simple way to solve any given instance (f, S) would be to just enumerate all the solutions in S. Unfortunately, this approach is not practical for all combinatorial

(30)

problems, since the size of S grows more than polynomially with instance size. The complexity of combinatorial problems is further discussed in the next section.

2.1.1 Computational complexity

Computational complexity theory is concerned with decision problems, that is, problems where the objective function returns either true or false and the goal is to find a solution so that the objective function is verified. Combinatorial optimization problems can be easily expressed as decision problems. In fact, by choosing a value L reasonably close to the optimal solution and changing the goal to find a solution s so that f (s) < L. Therefore, conclusions drawn regarding decision problems can be applied to combinatorial optimization problems.

In computational complexity theory, problems are classified as easy and hard. Problems classified as easy are the ones that can be solved efficiently. Problem complexity is evaluated by considering the number of steps (or the time) needed to reach the optimal solution. Considering all the instances of a problem, the complexity is expressed as a function of the instance size. Consequently, a problem can be considered efficiently solvable when we can bound the complexity with a polynomial function of instance size.

On the contrary, when the bounding function is more than polynomial, the problem is considered hard. The theory of N P-completeness formalizes this distinction by dividing problems in two basic classes, P and N P. The first class, P, contains all the problems that can be solved in a polynomial-time. The second class, N P, contains the problems that can be solved in a polynomial-time by a non-deterministic algorithm. Such algorithm can be seen as composed by two phases, a guessing phase and a verification phase. In the first phase, the algorithm produces a solution by a hypothetical non-deterministic method (i.e. such method is able to guess correctly for certain decisions [96]) and in the second phase the solution is evaluated. The second phase uses a deterministic algorithm to verify the guessed solution in a polynomial-time.

Any problem that can be solved in polynomial-time by a deterministic algorithm can also be solved by a non-deterministic algorithm. Thus, we can infer that P ⊆ N P, but the relation between these two problem classes is not clearly defined. In fact, the question regarding whether P = N P is, today, one of the most important question in theoretical computer science. A problem is considered intractable, if it belongs to N P/ P. Although the question whether N P/ P = ∅ is still open, it is possible to prove that a problem is in N P/ P assuming N P 6= P using the concept of polynomial-time reducibility.

A problem Π is polynomially reducible to a problem Π0_{, if there is a deterministic} polynomial-time algorithm that maps each instance of Π to each instance to Π0.

(31)

quently, if such algorithm exists and Π can be solved in polynomial-time then also Π0 can be solved in polynomial-time. Using this concept, it has been proven that the satisfiability problem of propositional logic, for which no deterministic polynomial-time algorithm is known, can be polynomially-reduced to any decision problem that can be solved by a non-deterministic polynomial-time algorithm, that is, any problem in N P [44]. This problem is the first one inserted in the class of N P-complete problems. A problem is considered N P-complete if it can be polynomially-reduced to another N P-complete problem. The problems in this class represent the hardest problem in N P since finding a deterministic polynomial-time algorithm for one of the N P-complete problems would mean that all N P problems could be solved in polynomial time, proving P = N P.

Combinatorial optimization problems that have a decision problem counterpart which is N P-complete are qualified as N P-hard, which means that these problems are as hard as the N P-complete problems. In this thesis we will focus on a very well studied N P-hard problem, the permutation flowshop problem.

2.1.2 Permutation flowshop problem

The flowshop scheduling problem arises from environments like chemical plants or steel rolling-mills. The problem consist of a set of n jobs that have to be processed on m machines. Each job Ji consists of (at most) m operations, where each operation has a non-negative processing time pij on machine Mj. All jobs are released at time zero and they must be processed on the machines M1, M2, ..., Mmin the same canonical order. All jobs are processed on all machines in the same order and preemption is not allowed. A solution is a schedule specifying the execution order of all jobs. The solution can be represented as a permutation π = (π(1), π(2), ...π(n)) of the job indices, leading to the permutation flowshop problem (PFSP).

The objective in classic PFSP is to minimize the completion time, also called makespan, that is the time required to finish the processing of the last job in the schedule on the last machine. During the years, several different objectives have been introduced. Among the most studied objectives there are the minimization of the total completion time and the minimization of the total tardiness. In the first, the goal is to minimize the sum of the completion times of all the jobs, while in the second the goal concerns the minimization of job tardiness.

The completion time of the job at position i on machine j is given by

Cij= max(Ci−1,j, Ci,j−1) + pi,j

(32)

J1 J2 J3 J4 J5

M1 3 3 4 2 3

M2 2 1 3 3 1

M3 4 2 1 2 3

Tab. 2.1.: Instance of the PFSP composed of five jobs and three machines.

Given a solution π, the makespan is evaluated by calculating Cn,m, while the sum of completion times is evaluated by calculatingPn

j=1Cm,j.

To compute the total tardiness, for each job a due date dj is added to the problem so that the tardiness of a job j can be computed as Tj = max(0, Cm,j− dj).

When minimizing the makespan, the problem has been shown to be N P-hard for any instance with more than two machines [73]. This problem has been proved to be intractable even in the two machines case for other objectives like in the case of the sum of completion times [73]. The PFSP problem has been extensively studied [65, 166] with hundreds of papers published on the makespan objective alone. The high attention from the scientific community resides in the fact that the problem is simple to define, understand and it is relevant to industry, but it is quite hard to solve exactly.

2.2 Solving combinatorial optimization problems

Due to their practical importance there has been a great interest in developing solution methods for combinatorial optimization problem. The existing methods can be divided in to two main classes: exact methods and approximate methods. Exact methods are complete, that is, they are guaranteed to find the optimal solution. Approximate methods instead, are incomplete, that is, there is no guarantee that such methods find the optimal solution.

2.2.1 Exact methods

The most straightforward of these methods consists of enumerating all the solutions. Due to the exponential growth of the solution space, this method can be applied only to very small instances. To increase efficiency, modern methods use pruning to reduce the solution space in order to avoid the regions where the optimal solution cannot be found. The most notable examples of such methods for optimization problems are branch-and-bound [120, 121] algorithms, branch-and-cut [160], branch-and-price [11] and dynamic programming [18]. Exact methods have the advantage of returning the optimal solution but still the required running time grows rather fast when increasing the instance size.

(33)

For example, considering the PFSP, minimizing the makespan of the Ta111 instance of the Taillard benchmark [200] consisting of 500 jobs and 20 machines with a branch & bound algorithm required close to 19 days1. For comparison, an approximate method running on a comparable machine can produce in 60 seconds a solution that is just 0.7% worse than the optimal solution [161]. Moreover, some instances may have a solution space that is very difficult to prune efficiently. For example, instances Ta051-60 consisting of 50 jobs and 20 machines from the Taillard benchmark, for which no optimal solution is known.

2.2.2 Approximate methods

Approximate methods are often able to find solutions close to the optimal in a short amount of time but, unlike exact algorithms, do not offer any guarantee of finding the optimal solution in a finite amount of time. These methods iteratively generate and evaluate candidate solutions exploiting heuristic information about the problem, instead of systematically evaluating all candidate solutions. Approximate algorithms can be classified as constructive and perturbative according to the way they manipulate solution components to generate solutions.

Constructive algorithms build a solution starting from an empty set and iteratively adding a solution component until a candidate solution is generated. In the case of PFSP, a constructive algorithm would start with an empty schedule and then iteratively add a job until all jobs are in the schedule. Constructive algorithms are typically very fast but the quality of the solutions they generate is significantly worse compared to the quality of the solutions generated using perturbative methods.

Perturbative algorithms start with a complete solution and at each iteration modify the base solution by applying some kind of modification to its components. For instance in PFSP, these algorithms would start with a random schedule and then generate new solutions by changing the position of one or more jobs in the schedule. Perturbative algorithms can be seen as following a path through the space of candidate solutions going from one solution to another. Often a constructive algorithm is used to generate the starting solution. The most successful approximate algorithms when solving N P-hard combinatorial prob-lems are local search algorithms. Local search algorithms, starting from an initial candidate solution, explore the solution space by iteratively moving from one candidate solution to another, where the next candidate solution is selected among the solutions local to the current candidate solution according to a neighborhood relation. For a definition of neighborhood relation and a discussion of this algorithm we refer to the dedicated section in this chapter.

1_{http://mistic.heig-vd.ch/taillard/problemes.dir/ordonnancement.dir/flowshop.dir/best_lb_up.txt}

(34)

Algorithm 1 A construction heuristic. 1: Input Given a Problem Definition π, 2: Output A solution s,

3: s := ∅

4: while s is not a complete solution do 5: c := GreedySelection(π, s)

6: s := ConstructionRule(s, c) 7: end while

8: Return s

Although incomplete, due to their flexibility and simplicity there are several scenarios where local search algorithms may be preferred to exact methods. In particular, when the problem instances to solve are particular hard or their size is considerably large (as seen in the previous section), exact methods may be unable to find solutions with a reasonable amount of time or computational resources. Furthermore, when the time to generate a solution is a strict constraint exact methods may not have time to find a solution.

In the following, we will first introduce construction heuristics and then, we will discuss local search algorithms. In particular, we will focus on stochastic local search.

Construction heuristics

The initial solution of a local search can be generated either randomly or using a construc-tion heuristic. Using a construcconstruc-tion heuristic is generally a good idea since allows the local search to start its exploration from a better position in the solution space and to reach a better local minimum in less time. These algorithms generally work as constructive meth-ods but can also use local search procedures to improve partial solutions while building the solution. A construction heuristic can be structured in two phases. The first phase is composed of a greedy selection function, that given a set of solution components will return one component according to some greedy heuristic. The second phase consists of a construction rule, which describes how a solution component is added to the partial solution.

The outline of a generic construction heuristic is shown in Algorithm 1. Construction heuristics can also be divided in deterministic and stochastic. In deterministic heuristic, selection function and construction rule base their choices only on the instance data. As a result, when applied to one instance, a deterministic heuristic will always build the same solution. On the contrary, a stochastic heuristic includes some degree of randomness in the greedy function or in the construction rule (or both). Consequently, if used to build several solutions for one problem instance, a stochastic heuristic will tend to build every time a different solution. Conversely, a deterministic heuristic can be made stochastic by inserting some randomness in the greedy approach used to choose the solution components.

(35)

Algorithm 2 Local Search

1: Input Given a Problem Definition π, 2: Output The best solution found s∗, 3: s := init(π)

4: s∗:= s

5: while ! termination criterion do 6: s := selectNeighbor(π, s) 7: if f (s) < f (s∗)then 8: s∗:= s 9: end if 10: end while 11: Return s∗

Another distinction is the one between adaptive and non-adaptive heuristics. In an adaptive heuristic, the solution component returned by the greedy function depends on the partial solution. Instead, in a non-adaptive heuristic, the greedy function returns the solution components according to an order established at the beginning of the algorithm. Generally, adaptive heuristics need longer running time than non-adaptive heurisitcs but they are usually able to generate better solutions.

An example of construction heuristic for the PFSP may be an heuristic that generates a solution by adding jobs to an empty schedule selecting always the job with the minimum sum of processing times. In this context, the greedy selection function evaluates the sum of processing times and returns the job with the minimum sum. The construction rule in the example simply appends the job selected by the greedy selection at the end of the partial solution.

Local Search

Local search algorithms start from a given candidate solution and explore the search space moving from one candidate solution to the other. In local search, the algorithm selects the next solution among the locally reachable solutions according to a neighborhood relation. Considering a problem instance π and the set of all candidate solutions S(π), we can define a neighborhood relation N (π) as a function N : S → 2S(π)_{that maps a candidate solution}

s ∈ S(π)to the set of its neighbors N (s) := {s0∈ S | N (s, s0) ⊆ S(π)}, where the set of all neighbors of the candidate solution s, N (s), is also called neighborhood of s. One of the most widely used neighborhood relation, called k-exchange neighborhood, considers two solutions as neighbors if they differ by at most k solution components. The local search is a very simple and easy to implement algorithm. In general, it is able to find better solutions than constructive algorithms.

The algorithm bases its decisions on evaluating a, relatively, small set of solutions. In general, due to this narrow view the algorithm will not return the optimal solution but the

(36)

best solution found. In fact, when it is unable to move to better solutions the algorithm may remain stuck in a region of the solution space known as local minimum. Local minima represents points in the solution space where all the neighbors of the current solution have an equal or worse objective function value. More formally, considering a solution s and its neighborhood N , s is a local minimum if ∀s0_{∈ N : f (s) ≤ f (s}0_{). The local minimum may} be strict, in that case ∀s0∈ N : f (s) < f (s0), that is, all neighboring solutions are worse. In general, when the search reaches a local minimum, no conclusions can be drawn about how good the local minimum is with respect to all the other solutions in the solutions space. This is the reason why strategies to escape local minima are of great importance for SLS methods.

An outline of a simple local search algorithm is shown in Algorithm 2, where the algorithm starts with a solution generated by the init procedure. In the main loop the local search updates the candidate solution if the selectNeighbor procedure return an improving neigh-bor. The algorithm stops when a termination criterion is verified. For example, we can define a local search algorithm for the PFSP by setting the init procedure to generate a random jobs’ schedule; use a 2-exchange neighborhood relation to generate neighbors in the selectNeighbor procedure; and, finally, we can set the algorithm to stop as soon as it encounters a local minimum as termination condition.

Stochastic Local Search

Stochastic Local Search (SLS) algorithms can be defined, in general, as local search al-gorithms that make use of randomized choices when generating or selecting candidate solutions [96]. Furthermore, SLS algorithms may use additional memory to store informa-tion about recently visited soluinforma-tions. A high level distincinforma-tion that we can make about SLS algorithms is between single solution SLS and population based SLS. Single solution SLS methods manipulate only one candidate solution in each search iteration, while population based SLS maintain several candidate solutions. Population based SLS methods provide a simple mean of enhancing the exploration of the solution space. Furthermore, methods that generate a solution combining promising features of multiple solutions are easier to implement using the population paradigm. Notable examples of such algorithms are ant colony optimization [52], inspired by the way real ants explore the environments, and evolutionary algorithms [91, 78], based on the natural evolution of biological species. In the following, we will focus mainly on SLS methods that manipulate only one solution at a time.

Iterative improvement

Iterative improvement is one of the basic SLS algorithms. This algorithm typically starts from a random generated solution or one generated by a construction heuristic. At each

(37)

Algorithm 3 Iterative Improvement 1: Input Given a Problem Definition π, 2: Output The best solution found s∗, 3: s := init(π)

4: s∗:= s

5: while ! termination criterion do

6: s := selectNeighbor(π, s, pivoting rule) 7: if f (s) < f (s∗)then 8: s∗:= s 9: end if 10: end while 11: Return s∗ Algorithm 4 VND

1: Input Given a Problem Definition π,

2: Input a set of Neighborhood relations N of size imax, 3: Output The best solution found s∗,

4: i := 1 5: s := Init(π) 6: s∗:= s 7: repeat 8: s := MostImprovingNeighbor(π, s, Ni) 9: if f (s) < f (s∗)then 10: s∗:= s 11: i := 1 12: else 13: i := i + 1 14: end if 15: until i < imax 16: Return π∗

iteration the algorithm chooses an improving solution from the neighborhood. The criterion used to choose among improving solutions is also called pivotal rule. Typically one chooses between two main pivotal rules, best improvement and first improvement. When using the best improvement, the algorithm evaluates all the neighbors of the current solution choosing the one with the best objective function value. Instead, with first improvement the algorithm stops generating neighbors as soon as an improving solution is found. The algorithm typically stops when it cannot find any more improving solutions, that is, when it reaches a local minimum. The outline of this algorithm is shown in Algorithm 3. The main difference with the outline of a simple local search is mainly the selectNeighbor function that, in the iterative improvement, follows the policy dictated by the pivotal rule.

(38)

Escaping local minima

When in a local minimum, the iterative improvement algorithm may get trapped and unable to find better solutions. This happens because, according to the neighborhood the algorithm is exploring, there are no better solutions. There are several strategies that can be adopted to try to escape local minima. When dealing with an iterative improvement algorithm that generates the initial solution in a stochastic fashion (i.e. by either generating a random solution or using a stochastic construction heuristic), the most simple way to escape a local minimum may be just restarting the algorithm.

Another simple escape strategy is changing the local view of the algorithm by changing the neighborhood once a local minimum is reached. The reasoning is that a solution that is a local minimum in one neighborhood may not be one in another. The variable neighborhood descend algorithm (VND) implements this strategy by changing the neighborhood once a local minimum is reached. The outline of this algorithm can be found in Algorithm 4. Once the algorithm manages to find an improving solution it goes back to the first neighborhood. The algorithm stops when it reaches a local minimum for all the neighborhoods.

In general, a good way to escape local minima is to introduce some degree of randomness in the exploration of the search space. An example of this strategy is the simulated annealing algorithm that will be presented later. At each iteration, the algorithm selects a random neighbor of the current solution. A final strategy consists in moving towards worse solutions for a certain number of search iterations. This strategy is employed in SLS algorithms like tabu search and dynamic local search [216]. The first, that we will describe in more detail later, uses memory to escape local minima by avoiding solutions already visited in the past. The second, once in a local minimum, applies weights to the objective function to penalize some solution components. The idea is that the penalties will make the local minimum less attractive and the algorithm will start to move to non-improving solutions.

We can refer to all these strategies to escape local minima as diversification strategies. Similarly, we can define intensification as moving always towards an improving solution. Keeping the right balance between intensification and diversification is quite important for a SLS algorithm. A SLS algorithm that has too much diversification will struggle to find improving solutions, while a SLS algorithm that has too much intensification will more likely be stuck in a low quality local minima.

(39)

Algorithm 5 Simulated Annealing 1: Input Given a Problem Definition π, 2: Output The best solution found s∗, 3: s := Init(π)

4: T := InitTemp(π) 5: s∗:= s

6: while ! termination criterion do 7: s := selectNeighbor(π, s) 8: if f (s) < f (s∗)then

9: s∗:= s

10: else

11: s∗:= swith probability exp(f (s)−f (s_T 0))

12: end if

13: update T

14: end while 15: Return π∗

2.2.3 General SLS methods

In this section, we present some of the best known SLS algorithms, which can be instanti-ated by the automatic design system discussed in this thesis. Each one of these algorithms uses different ways of handling the intensification/diversification trade-off.

Simulated annealing

This algorithms is inspired by the annealing process of solids, where a solid starting from a high temperature is slowly cooled [117]. In simulated annealing, a non-improving solution can be accepted with a probability that starts high and decreases during the execution of the algorithm until it reaches a minimum value. The outline of the algorithm is shown in Algorithm 5. During one iteration, the algorithm, typically generates a random neighbor s0 of the current solution s. The new solution is accepted with a probability Pacalculated according to the Metropolis condition [143] shown in Equation 2.1

Pa=    1 if f (s0_{) ≤ f (s)} exp(f (s)−f (s_T 0)) otherwise (2.1)

If the new solution s0 _{is improving or non-worsening, then it is immediately accepted.} Otherwise the probability of accepting s0 is calculated depending on the temperature T and the difference in quality between s and s0. In a typical simulated annealing algorithm, the temperature parameter starts with a large value that decreases during the execution until it reaches a minimum value. In this way, at the beginning the algorithm will accept

(40)

Algorithm 6 Tabu Search

1: Input Given a Problem Definition π, 2: Output The best solution found s∗, 3: s := Init(π)

4: s∗:= s

5: while ! termination criterion do

6: s := selectNotTabuNeighbor(π, s, pivoting rule, tabu list) 7: if f (s) < f (s∗)then

8: s∗:= s

9: update tabu list

10: end if

11: end while 12: Return π∗

non-improving solutions with higher probability favoring exploration. Towards the end of the execution, the lower temperature will favor intensification.

Determining the initial temperature and the cooling schedule are key aspects of the algorithm. The most common cooling scheme is the geometric cooling [117], where the temperature is updated following the formula

Tk+1= αTk

and the initial temperature is determined after a series of experiments.

The reader can find an overview of simulated annealing in several papers [58, 90, 177]. This SLS algorithm has been applied to many problems such as scheduling problems [159, 157, 119], assignment problems [14, 98], graph problems [109, 110], function optimization [26] and multi-objective problems [189]. Moreover, an interesting analysis of the algorithm from an automatic algorithm design point of view, can be found in [69].

Tabu search

This SLS algorithm is based on the systematic use of memory to guide the search process. In fact, the key idea is to keep track of the last visited solutions and use this information to avoid already explored regions of the search space. An outline of the algorithm is shown in Algorithm 6. The outline is very similar to the one of the iterative improvement with the difference that tabu search selects the next neighbor among the ones that are not in the memory. The memory is used to avoid cycling when going away from a local minimum by accepting non-improving solutions. The type of information stored in the memory, called also tabu list, can greatly influence the algorithm. One may want to record complete solutions, but this requires memory and checking if two solutions are equal may

Automatic Design of Hybrid Stochastic Local Search Algorithms - analysis and application