Classes ValuesMap - Machine Learning, Game Play, and Go

Classes

Values Hashing

Figure 5.9: Comparison of hashing and map strategies. With the hashing technique, there may be collisions between classes.

like a map does, so better use may potentially be made of memory. As the the number of classes increases, there is a graceful degradation of performance.

Unfortunately, the size of available memory on workstations is not enough to prevent a general failure to cope with number of patterns created by large windows. This can be seen by comparing the 7²7 and 9²9 windows in figure 5.6; the exponential explosion will eventually overwhelm any reasonable hash table. In addition, consider that when the size of the database is small (and 257 gamesis small for this sort of learning), the number of observed instances of each pattern becomes vanishes; noise overwhelms the signal.

A word of warning: I found, naturally enough, that the collision rate was extremely sensitive to the particular hash function. I experimented with a variety of functions before settling on one which seemed to have a minimal collision rate. In particular, I extract patterns by first transforming them to an intermediate string representation; this is then hashed. Because this string often would differ by only a single character for different classifications, it was necessary to choose the hash function carefully.

One attempt was made to circumvent the collision effect, by adding a quality component

2000003

0.00 10000.00 20000.00 30000.00 40000.00 50000.00

Figure 5.10: A study plot, showing the effect of collisions on hash table method effective-ness. As hash table size increases, probability of collisions between patterns decreases, so the table is a better discriminator between patterns, approaching the effectiveness of the complete map for a large enough hash table.

to each hash entry. The table was hashed into five times using double hashing, and the ultimate evaluation, rather than simply being the value found in the table, was:

h v

h h

where ^v^h is the value found in the table for hash ^h and ^h is the quality component located at that entry. All values are initialized to 0.0, as in the previous technique, but the quality component is initialized to 1.0. The evaluation shown above is a weighted average of multiple values; the quality component is the weight that a value should have when combining it with other values. Initialized to 1.0, this amounts to averaging. Over time, these values should go to zero for values with many damaging collisions, and increase for values with great correlation.

The gradients for this method can be found by chaining from the previous method; for an evaluation^eand hash entry ⁱ,

and this gradient can be followed as before. This didn’t have much effect: addition of a quality component resulted in negligible improvement.

Figure 5.5 shows the very large window (9²9 square) on encountering data for a second time; it has managed to nearly memorize the data, in spite of collisions that must occur on the enormous number of patterns observed in the window. For this reason and the experience with addition of a quality component it would appear that collisions are not the

primary mechanism which is losing discriminatory ability; one would expect the addition of a quality component to have has a noticable constructive effect if collisions were a problem.

5.4 Pattern cache

Another approach to decreasing memory use is to selectively forget patterns. The idea is to maintain a working set of patterns which are effective and toss out patterns which aren’t helping to lower the error much. How should patterns to be forgotten be choosen? Here are a few ways:

The old standby, always replacing the least recently used pattern, akin to standard methods used for paging memory (LRU). This has the advantage of being easy to implement, but has little to recommend it from a go perspective - we wouldn’t want a pattern of great utility to be forgotten just because an instance hadn’t been encountered recently. On the positive side, patterns of high utility are likely to be so just because they are frequently encountered.

Replacing patterns with values close to the default value used when a pattern is created. The motivation for this is that if we delete patterns that have a value close to the default value choosen when a pattern is introduced into the mapping, then if a future evaluation finds this pattern the value used won’t be too different from what it is at the time we are considering replacing it. In my code, the default value was zero (that is, all patterns are “initialized” to zero, the same value as a pass), so this would mean preferentially replacing patterns with low values over high ones.

Replacing patterns that recently have contributed in a negative manner to the overall error. The program could keep track of difference in effect each pattern would have had on the error if it’s value were the default value instead of whatever it happens to be at the moment of each evaluation. Some kind of running average of this effect would allow replacing those patterns that are not contributing to the “greater good”.

I tried implementing the second technique. One way this could have been done would be with a priority queue, so that the pattern with the lowest value is always available on demand. To avoid the space taken by a splay tree or other fast data structure, I implemented a simple flushing technique, similar to that used to approximate LRU in paging systems. A maximum and minimum cache size was given by the user. Once per evaluation, the cache is checked to see if it exceeds the maximum cache size. If it does, then all patterns are discarded which are less than a certain value, determined by estimating the number of cache entries needed to reduce the size to the minimum, based on the assumption of a uniform value distribution between the max and min cache values.

This causes the cache size to slowly grow until it reaches the maximal value, at which time enough patterns are deleted to bring it near the minimum size. The choice of a cropping value based on an assumed uniform distribution allows deletion only of patterns which would likely be deleted by a “pure” priority technique.

This did not work well in comparison to hashing. This can explained by the nature of collision between patterns. When a flush occurs in a pattern cache and patterns are

wiped out, this is relatively catastrophic. In comparison, when collisions occur in the hash table, patterns are sharing a value; if their optimal separate values are similar, this may be constructive, and if one occurs frequently with respect to another, the value will tend towards some kind of weighted average of the individual values. In short, the hashing technique degrades gracefully; the pattern cache doesn’t.

5.5 Improvements

I tried two techniques to futher improve the performance, conditional window expansion andliberty encoding.

When a fixed window is used, it suffers from overgeneralization for sparse positions (those with few nearby stones). This was seen in the poor performance of small fixed windows at the beginning of the game. Conditional window expansion uses a fixed shape window, such as a diamond, but adjusts the size to take advantage of information such as the relative position of the edge when pieces are far apart. This was implemented by modifying the parameters given to the diamond window extractor; instead of extracting a window whose radius is fixed and specified by a parameter, the new parameters are the min and max window sizes and the minimum number of stones which must be seen in the window. In this scheme, extraction proceeds by starting with a window of radiusmin. This window is incrementally increased until either the minimum number of stones criterion is met or the maximum window size is reached.

Another experiment was to add liberty information to the window about groups which are only partly inside the window; whether a group has more liberties outside of the observation window is very important for tactical relationships. For this, an additional parameter was added; the maximum of the parameter and the liberty count of a stone was encoded, so that two otherwise identical positions that differ only in the number of liberties of a group will not be catagorized differently past some set limit of liberties. The rational for this is that once a group is alive, it is alive, and the difference between having five and six liberties isn’t likely to be important; however, having or not having a single liberty is extremely important—it is the definition of the life or death of a group.

Figure 5.11 shows that the effect of allowing conditional expansion is dramatic. The point of conditional expansion is to avoid the exponential explosion of classifications in crowded condition, and still achieve the benefits of a large window at the beginning of the game. Addition of liberty information was a slight improvement, but not past discriminating between groups with only a single liberty and those with two or more.

simple diamond cond. expanded liberty info

0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18

40000.00 45000.00 50000.00 55000.00

Figure 5.11: Study plot for end of training data, for: a simple diamond of radius two; a diamond conditionally allowed to expand to a maximum radius of five until a stone is seen;

and with the addition of identification of groups with a single liberty.

50 100 150 200

0.2 0.4 0.6 0.8

0 0.1 0.2 0.3 0.4 0.5

Move

Error

Figure 5.12: NRM plot for the best found classification: conditionally expanded diamond window with liberty information added.

Chapter 6

Dans le document Machine Learning, Game Play, and Go (Page 56-61)