Machine Learning: Much Promise, Many Problems

It might strike you as a bit odd to see this chapter on machine learning pop-up in a book concerned with the practical application of AI. Well, your first impression would be appropriate if this chapter reviewed the theories and mechanisms that are generally taken to constitute the Al subfield of machine learning, but it will not. My goal in this chapter is to convince you that machine learning mechanisms will be a necessary part of many Al systems, and that this necessity brings with it additional aggravation for those of us that wish to engineer AI software. I do this not because I think that engineering

artificially-intelligent software is a well understood and boring procedure which is in need of some injection of excitement—far from it. I do this because the potential power of machine learning mechanisms is something that we should be well aware of even though it introduces further problems on top of those that we already face. It may be wise, if not essential, to have the full picture in mind even though work on just one part of it gives us quite enough problems to be going on with.

Self-adaptive software

Mention of the possibility of self-adaptive software should, for all the old-timers in the software world, conjure up the spectre of self-modifying

Page 178

code. A favored strategy, in the days when programming was largely the pastime of grappling with a mass (or better, morass) of machine code instructions in order to save a few bytes of memory or to shave a few milliseconds off the running time of a program, was to devise ingenious ways of reusing memory locations and blocks of instructions by overwriting certain critical instructions with new ones as the program executed. Thus the code that actually constitutes the program will vary as the program runs through an execution sequence. And, moreover, the details of how it actually varies are dependent upon the details of a given execution sequence.

creative people, system designers and builders. Looking for errors in such creations (either one's own or worse someone else's) is an exercise that few can gather great enthusiasm to pursue. Add to this inherent lack of appeal the further negative incentive of machine code, especially the early ones with

(mis)mnemonic labels and glorious perceptual opacity, and it is not surprising that the queues of qualified applicants to maintain software systems have never been long. And on top of all this, in the presence of self-modifying code, the maintenance person could not even while away the hours staring at the listing hoping for inspiration in the sure knowledge that all the relevant information is in front of him, or her. For the listing, in effect, changes as the program executes. In this thoroughly dismal

scenario the only serious options are to abandon the task or to collect strategically chosen snapshots of the program code and data. Octal core dumps were one favored representation of this information (favored by the machines that is, and in those days it was the machines that were the scarce and expensive resource). The advent of high-level languages (among other things) has done much to improve this bleak picture, but another significant improvement was introduced early on; it was a programming principle to abide by if you wanted to stand a good chance of debugging your creative efforts. The principle was:

Never write self-modifying code.

So to persons well aware of this principle and the very good reasons for its existence, a call for self-adaptive software systems is likely to be greeted with much the same enthusiasm as ham sandwiches at a Jewish wedding. I hope to convince you that such a response would be uncalled for. But first, why do we need to contemplate the undoubted extra problems that self-adaptive software systems will bring with them?

Page 179

The promise of increased software power

Way back in Chapter 1 I presented four aspects of how we might approach the problem of increasing software power—and each aspect involved AI. As it happens, not only does each of these four aspects involve AI, but they also each imply a need for machine learning in their more impressive

manifestations—i.e. a need for self-adaptive software.

The central notion here is that many significant enhancements of software power require a move from static, context-free systems to dynamic context-sensitive systems. Intelligence is not a context-free phenomenon, and AI can not be either. "Machine learning," as Roger Schank has succinctly put it, "is the quintessential AI issue." This does not mean that all AI software must involve mechanisms for

machine learning. But it does mean that our options in the absence of robust and reliable mechanisms for machine learning will be severely limited.

The need for self-adaptive software derives from several sources: there is a need for software that is reactive to changing circumstances, and at the more mundane level there is a need for mechanisms to lessen the difficulty of the task of incrementally upgrading knowledge bases (as argued by Michalski, Carbonell and Mitchell, 1983, in their volume entitled Machine Learning).

The threat of increased software problems

Given that there are some not unreasonable reasons why we might want self-adaptive software systems, what does this new need entail? In a word: problems.

To begin with, there is the old problem of the state of a system changing over time—i.e, the program that you are asked to debug or enhance is defined by the specification plus its usage history. Now, if the old-timers were smart enough to realize that this type of software system is primarily a short cut to disaster, do we need to re-embark on this learning experience?

I think not; we must clearly not rush in, but times have changed, and, I shall argue, they have changed sufficiently for the possibility of self-adaptive software to be a viable proposition, provided we operate with caution. It is the differences more than the similarities between old-style machine-code

programming and modem software system design that are most obvious—the two processes bear little resemblance except at some very basic level. So what are the modem innovations that suggest to me

Page 180

that self-adaptive software need not carry with it the curse of self-modi-fying code?

• Systems are designed and developed more systematically using appropriate abstract representations which reduce the effective complexity by several orders of magnitude.

• With the development of elaborate machine-manipulable data structures program complexity can be traded out of the algorithm and into the data structures; as a result substantial software adaptivity can be gained by merely changing data values and keeping the actual algorithm unchanged.

• Software can now be developed within a sophisticated support environment, which can remove a myriad of trivial considerations from the concern of the programmer, and this again significantly reduces the effective complexity of the overall task.

• Principles of structured system design capture the hard-won wisdom of the intervening years, and when adhered to make a well-structured system much more conceptually manageable than the equivalent monolithic list of machine code instructions.

• Within the comprehensive, multi-levelled and highly-structured framework implied by the first three points, we can constrain and limit the scope of any self-adaptive mechanisms employed.

learning mechanisms are now fair game for the zealous software engineer. I am saying that within the constraints of a well-engineered software system there is scope for the careful deployment of self-adaptive mechanisms that will enhance the power of the software without necessarily wrecking its maintainability, and I shall provide you a couple of examples in the final section of this chapter.

In particular, some types of machine learning are more inherently controllable than others, and judicious encapsulation of these mechanisms within a software system will keep the overall system intellectually manageable—especially if appropriate software management tools are developed and added to the support environment at the same time as the particular mechanisms of machine learning.

Machine learning does not have to be self-modifying code, and it should not be—at least, not in the undisciplined way that was customary in the bad old days. So what are the options for mechanisms to implement self-adaptive software?

Page 181

The state of the art in machine learning

Machine learning (ML), the umbrella term for most mechanisms of self-adaptation in computer programs, is very broad with surprising depths here and there. I do not propose to subject you to a comprehensive survey, for apart from being inordinately long, it would for the most part address

strategies that show little promise of being viable practical mechanisms in the near future. So I shall be highly selective, and for those readers who doubt the appropriateness of my selection (or would just like to see the big picture) I can indicate a number of comprehensive sources of information. Most general AI books are not very good on this topic, perhaps because it has blossomed so much in recent years and they have yet to catch up. But putting modesty aside, you'll find no more comprehensive coverage than the 100-page chapter on ML in Partridge (1991) A New Guide to AI. Up-to-date details on individual projects can be found in the periodic edited collections entitled Machine Learning, published by Morgan Kaufmann (vols I, II and III published), and in the journal Machine Learning, published by Kluwer.

The ML mechanisms with some promise of near-term practical utility can be divided into the classical ones—such as inductive generalization—and the network learning models (the connectionistic ones—to use the common but universally disliked general label)—such as back propagation of an error signal.

In both categories, mechanisms are designed to exploit the accumulated experience of the system within a given operating environment; the major differences are the degree to which experience can be

automatically exploited, and the nature of the system modifications employed in this exploitation process. Let's take the connectionistic (or parallel distributed processes—hence PDP) first.

A PDP system is typically a network of primitive processing elements that receive 'activity' values from other elements that are directly connected into them, accumulate the received activity values according to some specified function, and then pass on through further links to other elements some function of the activity which they have accumulated. A further, and crucial feature, of these models is that the activity transfer operations occur in parallel. The obvious analogy for those readers who are grappling with this description is with the brain as a network of neurons each receiving and passing on electrical pulses.

Finally, each of the links between network elements has an associated 'weight' which is usually involved in the activity-transfer function through the link. So by changing these link weights activity flow paths can be effectively opened up and closed down. Hence link-weight adjustment is the typical

Page 182

ing mechanism, which is, of course, totally different from learning in classical models (as you will soon see if this is not immediately obvious to you). But this sort of information can be classed as merely implementation detail (a widely misused phrase but I think that I'm on relatively safe ground this time) and thus not of prime importance to the software designer; it is the general learning strategies that are of major concern.

Unfortunately, the general strategy for ML in PDP systems is one of error-signal feedback (and

individual mechanisms dictate how to adjust link weights to reduce the observed error). This means that the capacity for self-adaptation is restricted to the training of given networks to exhibit some

well-defined functionality, and not the process of learning to respond appropriately to specific, but changing, environmental idiosyncrasies. PDP models can be trained (and with formal guarantees of convergence, although the training itself may be a very lengthy process) to exhibit some 'correct' behavior, but this is not really the type of self-adap-tation that we are looking for, although it undoubtedly has some useful practical applications (e.g. in pattern recognition applications, see WIS-ARD, Aleksander, 1983). We'll return briefly to this topic at the end of the chapter in order to view the tantalizing possibility of 'black-box' software.

So PDP models, although currently generating much excitement and hope, fall a long way short of offering us the mechanisms of self-adaption that we might use to build powerful Al systems of the type discussed earlier. This leaves us with only the classical models, but there are some bright prospects here.

possibilities) has had some three decades to proliferate and promulgate the resultant diversity of

alternative strategies. There is thus a wide selection of mechanisms to consider: learning from analogy, rote learning, advice taking, learning from examples, explana-tion-based learning, apprentice learning, etc. But while there is a wealth of research projects emanating from this AI subfield, very few of the current mechanisms can boast the robustness and reliability that practical software systems demand.

Hence the set of actual mechanisms to consider is quite small; it is basically inductive learning schemes (i.e. the automatic generation of new information by abstracting generalities from a set of instances of some phenomenon). In addition, I shall mention an up and coming contender from the area of deductive learning schemes—explanation-based learning (EBL)—which shows some promise of practical utility and has a great advantage in its deductive nature. For induction suffers from the inherent weakness than it cannot be guaranteed

Page 183

correct. Except in totally artificial domains, induction, working as it does from the particular to the general, is always liable to generate errors. No matter how many white swans you see, you will never be sure an inductive generalization that all swans are white is correct. Induction is a powerful mechanism (and we humans exploit it to the full) but it is one that comes with no guarantees. Nevertheless, it is the basic mechanism that has proved useful in practical software systems.

The general strategy for self-adaptation based on inductive generalization is for a system to amass a collection of instances of, say, a specific user's interaction with the system, and then to generate from this set of instances a general model of the preferred style of interaction of this user for use when he or she next appears. To the user of such a system, it should appear that the system has tailored its style of interaction to his or her personal idiosyncrasies. And furthermore, if the system continued to collect instances of the interactions with this user, subsequent inductive generalizations could be used to either fine tune this specific example of human-computer interaction or to track the changing habits of this user. Such a software system would clearly be well on the road to the greater software power that we discussed in the opening chapter of this book.

An even better mechanism would take each new instance of interaction with this user and examine it to see if it implied any change in the general strategy of interaction being used for this user. The former, what we might call 'batch-type' inductive generalization (called non-incre-mental generalization in ML), is mechanistically easier but behaviorally less satisfactory, and the latter is incremental inductive

generalization, which can adjust the generalized information after each new instance of interaction—i.e, it can offer just the functionality that we require.

What represents an instance, and what are we aiming for with a generalization? Well, an instance is determined by the specific problem being addressed. Thus, in my example of a system customizing its user interface to accommodate a specific user's idiosyncrasies, an instance (or a training example in EBL terminology) is a particular example of an interaction with this user. But, although this may clarify

things a little, it does lead us on to a number of salient subsidiary problems: What specific aspects of an actual interaction sequence constitute an instance to learn from? What does the system need to learn from the specific instances? What are the goals of this self-adaptivity? Much of the ML literature seems to assume that the world is neatly divided into 'instances', and moreover that these instances are not only clearly demarcated but are also conveniently labeled to tell us what general behavior they are instances of—reality, even the limited reality of software systems, is just not

Page 184

structured in this handy way. Answers to the above questions can then be viewed as imposing the necessary structure and an unstructured reality.

The first question raises considerations of what to abstract from the individual interactive sessions, i.e.

what sort of abstract representation will the inductive generalization algorithm be operating on? Surely, we will not want to record minor typing errors that the user makes? Or will we? In trying to answer this question we are led on to a consideration of the others. If one of the goals of the overall self-adaptivity of the interface is to adapt to user deficiencies, then we might well want to record all of the users slips and mistakes in the hope that the system is able to produce a general response strategy to this particular user that is tolerant of such minor flaws in communication. There are many (very many) different structures that can be abstracted from any given session of actual behavior, and so one of the major initial decisions is which aspects of an interactive exchange will be recorded and taken to comprise an instance of behavior for the inductive generalization process.

Even earlier issues to be addressed are the goals of the self-adaptivi-ty within the software system. It's all very well to decide that a user-responsive interface to a piece of software is desirable, but then there are many consequent issues of the exact form that responsiveness is to take. And the final decisions here will rest on notions of exactly how and to what purpose the software system is to be used: the

self-adaptivity might be seen as a means to train humans to use the underlying system as efficiently and as effectively as possible, or it might be viewed as a tolerant interface that will accept (whenever possible) the efforts of casual and sloppy (by the pedantic standards of most software systems) users and deal with them as best it can. So there are clearly many tricky decisions to be made even when it has been decided that self-adaptivity would be beneficial and that inductive generalization is the route to take.

The other major question broached earlier, but as yet unaddressed, is what form of general information can we hope that our inductive generalization mechanism will produce? Currently, there are just two answers in systems that have found practical application: decision trees and IF-THEN-type rules. And

Dans le document ARTIFICIAL INTELLIGENCE and SOFTWARE ENGINEERING (Page 159-181)