Clustering the Instances - Instance-Specific Algorithm Configuration

There are many clustering techniques available in recent research [14]. This section, however, first shows how to define the distance metric, which is important regardless of the clustering method employed. It then presents the two clustering approaches initially tested for ISAC.

3.1.1 Motivation

One of the underlying assumptions behind ISAC is that there are groups of similar instances, all of which can be solved efficiently by the same solver. Here we further postulate that these similarities can be identified automatically. Figure3.1highlights the validity of these assumptions. The figures are based on the standard 48 SAT features (which will be introduced in detail in Chap.4) for 3,117 instances from the 2002 to 2012 SAT Competitions [1]. The features were then normalized, and using PCA, projected onto two dimensions. We ran 29 solvers available in 2012 with a 5,000 s timeout and recorded the best possible time for each instance. Figure3.1 shows the performance of two of these solvers (CCASat [27] and lingering [19]).

In the figure, an instance is marked as a green cross if the runtime of the solver on this instance was no worse than 25 % more time than the best recorded time for that instance. All other instances are marked with a black circle unless the solver timed out, in which case it is a red triangle.

What is interesting to note here is that there is a clear separation between the instances where the solvers do not timeout. This is likely attributed to the fact that CCASat was designed to solve randomly generated instances, while lingering is better at industrial instances. Therefore it is no surprise that in the instances where one solver does well, the other is likely to timeout. What is also interesting is that the instances where either of the solvers does not timeout appear to be relatively well clustered. This complementary and cluster-like behavior is also evident for the other 27 solvers, and is the motivation behind embracing a cluster-based approach.

3.1.2 Distance Metric

The quality of a clustering algorithm strongly depends on how the distance metric is defined in the feature space. Features are not necessarily independent. Furthermore, important features can range between small values while features with larger ranges could be less important. Finally, some features can be noisy, or worse, completely useless and misleading. For the current version of ISAC, however, it is assumed that the features are independent and not noisy. Chapter10will show how to handle situations where this is not the case.

3.1 Clustering the Instances 17

Fig. 3.1 Performance of CCASat and lingering on 3,117 SAT instances. A feature vector was computed for each instance and then projected onto 2D using PCA.Greencrosses mark good instances, which perform no worse than 25 % slower than the best solver on that instance. An ok instance (black circle) is one that is more than 25 % worse than the best solver. An instance that takes more than 5,000 s is marked as a timeout (red triangle)

A weighted Euclidean distance metric can handle the case where not all features are equally important to a proper clustering. This metric also handles the case where the ranges of the features vary wildly. To automatically set the weights for the metric an iterative approach is needed. Here all the weights can be first set to 1 and the training instances clustered accordingly. Once the solvers have been tuned for each cluster, the quality of the clusters is evaluated. To this end, for each pair of clusters i ¤ j, the difference is computed between the performance on all instances in clusteri that is achieved by the solver for that cluster and the solver of the other

cluster. The distance between an instanceain clusterC_i and the centers of gravity of clusterC_jis then the maximum of this regret and 0. Using these desired distances, the feature metric is adjusted and the process continues to iterate until the feature metric stops changing.

This iterative approach works well when improving a deterministic value like the solution quality, where it is possible to perfectly assess algorithm performance.

The situation changes when the objective is to minimize runtime. This is because parameter sets that are not well suited for an instance are likely to run for a very long time, necessitating the introduction of a timeout. This then implies that the real performance is not always known, and all that can be used is the lower bound.

This complicates learning a new metric for the feature space. In the experiments, for example, it was found that most instances from one cluster timed out when run with the parameters of another. This not only leads to poor feature metrics, but also costs a lot in terms of processing time. Furthermore, because runtime is often a noisy measurement, it is possible to encounter a situation where instances oscillate between two equally good clusters. Finally, this approach is very computationally expensive, requiring several retuning iterations which can take CPU days or even weeks for each iteration.

Consequently, for the purpose of tuning the speed of general solvers, this chapter suggests a different approach. Instead of learning a feature metric over several iterations, the features are normalized using translation and scaling so that, over the set of training instances, each feature spans exactly the intervalŒ1; 1. That is, for each feature there exists at least one instance for which this feature has value1and at least one instance where the feature value is1. For all other instances, the value lies between these two extremes. By normalizing the features in this manner, it is found that features with large and small ranges are given equal consideration during clustering. Furthermore, the assumption that there are no noisy or bad features does not result in bad clusterings. However, Chap.10shows how filtering can be applied to further improve performance.

3.1.3 k-Means

One of the most straightforward clustering algorithms is Lloyd’s k-means [72].

As can be seen in Algorithm1, the algorithm first selectskrandom points in the feature space. It then alternates between two steps until some termination criterion is reached. The first step assigns each instance to a cluster according to the shortest distance to one of thekpoints that were chosen. The next step then updates thek points to the centers of the current clusters.

While this clustering approach is very intuitive and easy to implement, the problem withk-means clustering is that it requires the user to specify the number of clusterskexplicitly. Ifkis too low, this means that some of the potential is lost for tuning parameters more precisely for different parts of the instance feature space.

On the other hand, if there are too many clusters, the robustness and generality of

3.1 Clustering the Instances 19

the parameter sets that are optimized for these clusters are sacrificed. Furthermore, for most training sets, it is unreasonable to assume that the value ofkis known.

3.1.4 g-Means

In 2003, Hamerly and Elkan proposed an extension tok-means that automatically determines the number of clusters [44]. This work proposes that a good cluster exhibits a Gaussian distribution around the cluster center. The algorithm, presented in Algorithm 2, first considers all inputs as forming one large cluster. In each iteration, one of the current clusters is picked and is assessed for whether it is already sufficiently Gaussian. To this end,g-means splits the cluster into two by running 2-means clustering. All points in the cluster can then be projected onto the line that runs through the centers of the two sub-clusters, giving a one-dimensional distribution of points.g-means now checks whether this distribution is normal using the widely accepted Anderson–Darling statistical test. If the current cluster does not

pass the test, it is split into the two previously computed clusters, and the process is continued with the next cluster.

It was found that theg-means algorithm works very well for our purposes, except sometimes clusters can be very small, containing very few instances. To obtain robust parameter sets we do not allow clusters that contain fewer than a manually chosen threshold, a value which depends on the size of the dataset. Beginning with the smallest cluster, the corresponding instances are redistributed to the nearest clusters, where proximity is measured by the Euclidean distance of each instance to the cluster’s center.

Dans le document Instance-Specific Algorithm Configuration (Page 25-29)