Dataset - Boosting Branching in Cplex for MIP

8.2 Boosting Branching in Cplex for MIP

8.2.3 Dataset

In order to obtain a solver that works well for a generic MIP problem, instances are collected from many different datasets: miplib2010[63],fc[7],lotSizing [9], mik [8], nexp [6], region [68], and pmedcapv,airland, genAssignment,scp, and SSCFLP[100]. From an initial dataset of about 900 instances, those for which all our solvers timed out in 1,800 s are filtered. Also removed are the easy instances, solved entirely during the Cplex presolving or in less than 1 s by each heuristic. The final dataset is composed of 341 instances with the desired properties. From this dataset 180 are randomly selected for the training set and 161 for the testing set.

If the training data is clustered, the distribution of instances per cluster can be seen in Table8.1. Each row is normalized to sum to 100 %. Thus, for Cluster 1, 25 % of the instances are from theairlanddataset. From this table one can observe that there are not enough clusters to perfectly separate the different datasets into unique clusters. This, however, is not the objective. This is because we are interested in capturing similarities between instances, not in splitting benchmarks. Observe that theregion100andregion200instances are grouped together. Cluster 4, meanwhile, logically groups theLotSizingand theSSCFLPinstances together. Finally, instances from themiplib, which are supposed to be an overview of all problem types, are spread across all clusters.

This clustering therefore demonstrates both that this is a diverse set of instances and that the features are representative enough to automatically notice interesting groupings.

8.3 Numerical Results

The above heuristics are embedded in the state-of-the-art MIP solver Cplex version 12.5. Note that only the branching strategy is modified through the implementation of a branch callback function. When compared with default Cplex

Table 8.2 Solving times on

the testing set Solver Avg Par10 % solved

BSS 315 1321 93.8

RAND 1 590 4414 77.0

RAND 2 609 5137 72.7

VBS 225 326 99.4

VBS_Rand 217 217 100

an empty branch callback is used to ensure the comparability of the approaches.¹ The empty branch callback causes Cplex to compute the branching constraint using its internal default heuristic. None of the other Cplex behavior is changed; the system uses all the standard features such as pre-solving, cutting planes, etc. Also, the search strategy, i.e., which open node to consider next, is left to Cplex. Note, however, that when Cplex dives in the tree, it considers first the first node returned by the branch callback so it does make a difference whether a variable is rounded up or down first.

With the described methodology, the main question that needs to be addressed is whether switching heuristics can indeed be beneficial to the performance of the solver. To test this, for each of the instances in the test set each of the implemented heuristics is evaluated without allowing any switching. Following this, two versions of a solver that switched between heuristics uniformly at random are also evaluated.

The first solver switched between all heuristics, while the second switched only among the top four best heuristics. The results are summarized in Table8.2.

What can be observed is that neither of the random switching heuristics performs very well by itself. However, based on the performance of the virtual best solver² that employs these new solvers, the performance can be further improved beyond what is possible when always sticking to the same heuristic. The question therefore now, becomes if it is possible to improve performance just by switching between heuristics randomly, can we do even better by doing so intelligently?

To answer this question, it is first necessary to set a few parameters of the solver.

Particularly, to what depth should the solver be allowed to switch heuristics, and at what interval? For this, the extended dataset is clustered. This dataset includes both the original training instances and the possible observed subproblems. There are a total of 10 clusters formed. The result of projecting the feature space into two dimensions using Principal Component Analysis (PCA) [2] is presented in Fig.8.1.

Here, the cluster boundaries are represented by the solid lines, and the best heuristic for each cluster is represented by a unique symbol at its center. Also shown in these figures is the typical way in which features change as the problem is solved with a

1Note that Cplex switches off certain heuristics as soon as branch callbacks, even empty ones, are used, so that the entire search behavior is different.

2VBS is an oracle solver that for every instance always uses the strategy that results in the shortest runtime.

90 8 Dynamic Approach for Switching Heuristics

−1.0−0.50.00.5

PC1

PC2

DASH − Solving Time − PCA

Depth

DASH − Solving Time − PCA

Depth

Fig. 8.1 Position of a subproblem in the feature space based on depth. (a) Multi-path evolution.

(b) Single-path evolution

Table 8.3 Solving times on

the testing set Solver Avg Par10 % solved

BSS 315 1321 93.8

particular heuristic. The nodes are colored based on the depth of the tree, with (a) showing all the observed subproblems and (b) a single branch.

What this figure shows is that the features change gradually. This means that there is no need to check the features at every decision node. The subproblem features are therefore checked at every third node. Similarly, the figure and those like it show that using a depth of 10 is reasonable, as in most cases the nodes don’t span across more than two clusters.

GGA is used to tune the parameters of DASH, computing the best heuristic for each cluster. The results are presented in Table8.3, which compares the approach to a vanilla ISAC approach that for a given instance chooses the single best heuristic and then does not allow any switching. What can be observed is that DASH is able to perform much better than its more rigid counterpart. However, it is also necessary to allow for the possibility that switching heuristics might not be the best strategy for every instance. To cover this case, DASH+ is introduced, which first clusters the

original instances using ISAC and then allows each cluster to independently decide if it wants to use dynamic heuristic switching.

Additionally, information gain is used as a filtering technique, a method based on the calculation of entropy of the data as a whole and for each class. When applied to ISAC and DASH+ the resulting solvers are referred to respectively as ISAC_filt and DASH+filt, resulting in improvement in both cases. In particular, the resulting solver DASH+filt performs considerably better than everything else.

Table8.3finally shows the performance of a virtual best solver if allowed to use DASH. Observe that even though the current implementation cannot overtake VBS, future refinements to the portfolio techniques will be able to achieve performances much better than techniques that rely purely on sticking to a single heuristic.

8.4 Chapter Summary

This chapter introduced the idea of using an offline algorithm tuning tool for learning an assignment of branching heuristics to clusters of training data. During search these are used dynamically to determine a preferable branching heuristic for each encountered subproblem. This approach, named DASH, was evaluated on a set of highly diverse MIP instances. We found that the approach clearly outperforms the Cplex default and also the best pure branching heuristic considered here. While not limited by it, DASH comes very close to choosing the performance of an oracle that magically tells us which pure branching heuristic to use for each individual instance.

The chapter concluded that mixing branching heuristics can be very beneficial, yet care must be taken when learning on when to choose which heuristics, as early branching decisions determine the distribution of instances that must be dealt with deeper in the tree. This problem was solved by using the offline algorithm tuning tool GGA to determine a favorable synchronous assignment of heuristics to clusters so that instances can be solved most efficiently.

Chapter 9 Evolving Instance-Specific Algorithm Configuration

One of the main underlying themes of the previous chapters has been to demonstrate that there is no single solver that performs best across a broad set of problem types and domains. It is therefore necessary to develop algorithm portfolios, where, when confronted with a new instance, the solver selects the approach best suited for satisfying the desired objective. This process can then be further refined to intelligently create portfolios of diverse solvers through the use of instance-oblivious parameter tuners. However, all of the approaches described thus far take a static view of the learning process.

Yet there exists a class of problems where it is not appropriate to tune or train a single portfolio and then rely on it indefinitely in the future. Businesses grow, problems change, and objectives shift. Therefore, as time goes on, the problems that were used for training may no longer be representative of the types of problems being encountered. One example of such a scenario could be a business whose task it is to route a small fleet of delivery vehicles. When the business starts off, there may be only a small number of vehicles and customers, which means that the optimization problems are very small. As the business becomes increasingly successful, more customers arrive and the increased revenue allows for the purchase of more vehicles, which in turn means that a larger area can be covered. It is unlikely that a solver that has been configured to solve the original optimization problem will still be efficient at solving the new more complex problems.

This chapter, therefore, focuses on a preliminary approach that demonstrates that it is necessary to retrain a portfolio as new instances become available. This is done by identifying when the incoming instances become sufficiently different from everything observed before, thus requiring a retraining step. The chapter shows how to identify this moment and how by doing so efficiently we can create an improved portfolio over the train-once methodology.

Dans le document Instance-Specific Algorithm Configuration (Page 94-99)