Self-Organizing Multi-Agent Systems for the Control of Complex Systems

(1)

Any correspondence concerning this service should be sent

to the repository administrator: [email protected]

This is an author’s version published in:

http://oatao.univ-toulouse.fr/19154

To cite this version: Boes, Jérémy and Migeon, Frédéric

Self-Organizing Multi-Agent Systems for the Control of Complex

Systems. (2017) Journal of Systems and Software, 134. 12-28.

ISSN 0164-1212

Official URL

DOI : https://doi.org/10.1016/j.jss.2017.08.038

Open Archive Toulouse Archive Ouverte

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

(2)

Self-organizing

multi-agent

systems

for

the

control

of

complex

systems

Jérémy

Boes

∗

_,

_Frédéric

_Migeon

IRIT, University of Toulouse 118, route de Narbonne, F-31062 Toulouse Cedex 9, France

a

b

s

t

r

a

c

t

Becauseofthelawofrequisitevariety, designingacontrollerforcomplex systemsimpliesdesigningacomplexsystem. In software engineering, usualtop-down approaches become inadequate to design such systems. The Adaptive Multi-AgentSystems (AMAS)approach relies onthe cooperative self-organization ofautonomous micro-level agents totackle macro-levelcomplexity. Thisbottom-upapproachprovidesadaptive, scalable,and robustsystems. Thispaperpresentsa complexsystem controllerthathasbeendesignedfollowingthisapproach,andshowsresultsobtainedwiththeautomatic tuningofarealinternalcombustionengine.

Keywords:

Multi-Agent systems Control

Self-organization Complex systems Internal combustion engines

1. Introduction

Controllingasystemmeansbeingabletoperformtheadequate modifications on its inputs in order to set the outputs on a de-sired state.OverthecourseofHistory,humansmadetremendous effortsto controlsystemsthat aremoreandmorecomplex: non-linear,dynamic,noisy,withalargenumberofinputsandoutputs, and soon. Yet,the lawofrequisite variety(Ashby, 1956 ) implies thatthecomplexityofacontrollerhastobegreaterthanorequal tothecomplexity ofthetarget system.Thus,thedesignofa con-trollerinvolvesthedesignofacomplexsystem.Thisisachallenge forengineering.

Complexity isoftentackled aposteriori, to studyexisting sys-tems. On the contrary, methods enabling the design of complex systems that meet strict requirements are quite rare. The main feature of a complexsystem isthat its behavior cannot be eas-ily predicted (Heylighen, 2008 ). Usual design methods, for in-stance insoftwareengineering, seektoapriorieliminateany un-expected event. The design process must ensure that everything willbesmoothatruntime.But,asanyothercomplexsystem, com-plex programs sometimes haveunexpected, unpredictable behav-iors,andtheseclassicalmethodsfail.

Forinstance,inthefield ofsystemcontrol,theusual methods in the industry rely on the construction of a fine mathematical modelofthetargetsystem,thatislaterusedtocomputethe

com-∗ _{Corresponding author.}

E-mail addresses: [email protected] (J. Boes), [email protected] (F. Migeon).

mandstoperform,givensomesetpoints.Thecostanddifficultyof theconstruction(andthetuning)ofamathematicalmodelishigh. Anoftenusedalternativeismachinelearning.Givingtheabilityto learn to a controller enables it to learn the behavior of the tar-get system and build a model from data. However, this method showsitslimitswhen usedwithcomplex systems.Nonlinearities inthelearntmodelleadtoovercostlyorimpossiblecomputations inthecontrol system. Anotherpossibilityexists: directlylearning theadequatecommands,insteadofamodelthatwilllaterleadto the said commands. We then focus only on the inputs and out-putsofthecontrolledsystem,withouttryingtodecipherits inter-nalmechanisms.

Anotherdifficultyis scalability.Whilevarious controlmethods exist,they(almost)allfailtoscalewhenalargenumberofinputs andoutputsareinvolved.Mostadvancedsolutionsrelyonthe dis-tributionofthecontrol.Insteadoflettingacentralcontroller han-dlealltheinputs, eachinputiscontrolled byone localcontroller, andallcontrollerstrytocooperatetocontrolthewholesystem.

Multi-AgentSystems(MASs),composedofautonomousentities, arenaturallydistributed.Theycanbeveryusefultotheproblemof thecontrol ofcomplexsystems, forinstancewithmulti-objective optimization(Khamis and Gomaa, 2014 ).Moreover,they bring in-novativedesignmethods. Inparticular,Adaptive Multi-Agent Sys-tems(AMASs)are designedtobe abletoself-adaptatruntimeto anyunexpectedevent.Insteadofwastingtimetryingtocopewith anypossibleeventduringthedesignphase,weletthesystemdeal withtheunexpectedatruntime.Drivenbycooperationprinciples, agents self-organize locally to produce and maintain the desired globalfunction.

(3)

This paper presents experimental results obtained with an AMAS designed to control complex systems, and applied to the calibration ofreal heat engines. Thissystem isfully described in English forthe firsttime in thispaper.Ableto learn andcontrol simultaneously, it provides a generic and robust solution to the problemof control.It is a goodexample ofthe abilityof AMASs tobeefficientinreallifeconditions.

Section 2 givesaquickbackgroundoncontrol.Section 3 intro-ducesourapproachandSection 4 presentsoursystem.Results, ob-tained in simulated as well as in realconditions, are showed in

Section 5 .Section 6 concludeswithourperspectives.

2. Relatedworks

Ourworkisatthecrossroadofthefieldsofcomplexsystems, control,andmachinelearning.Itisinspiredby theideasofEdgar Morin oncomplexity (Morin, 2008 ), which we apply hereto the designofself-adaptivecontrolsystems.

2.1. Complexsystems

The notion of complexity reflects the difficulty to analyze a systemandtoforecastits behavior.Nonlinearities, innerfeedback loops, large numberof inputs/outputs/innerparts,uncertainty on the measures, and unpredictable behaviors are some of the re-curring features of complex systems. However, there is no com-mon agreementon adefinition.Forinstance,Kolmogorov defines the complexity ofa string as the length of the shortest descrip-tionofsaidstring(Kolmogorov, 1998 ).Whileitislargelyaccepted, this measure implies that a purely random string is of maxi-mal complexity,as it can onlybe described by its full enumera-tion.Howeverthiscontradictsoneofthekeyfeaturesof complex-ity:it issituatedsomewherebetweentotalorder andtotal chaos (Heylighen, 2008 ). Moreover, a complex systemis dynamic, it is able tospontaneously changeits state. Itis importantnot to ne-glect this aspect during the analysis or the design of a system. MeasuressuchasKolmogorovcomplexitygivetoomuchattention tostatic,structuralfeaturesofsystems,andnotenoughtotheir dy-namics.Tothisend,dynamicaldepthisbasedontheideathatthe degreeofcomplexityofasystemisnotgivenbyitspartandtheir causal relations,butby theimbrication ofthedifferentdynamics thatdriveitsbehavior(Deacon and Koutroufinis, 2014 ).

Furthermore,thegeneralsystemtheorystatesthattheclassical analytical approach can only be applied on systems whose parts arelinearandsharenegligibleinteractions(Von Bertalanffy, 1968 ). This lets a lot of systemsout ofits scope, inparticular complex systems.We need tofollowa differentapproach thanthe reduc-tionist top-down analysis forcomplexsystems control aswell as for complex systems design. The Adaptive Multi-Agent Systems theoryisbeingdevelopedinthisregard.

2.2. Control

Controlapproachesalsofindtheirlimitswhenfacedwith com-plexity.ArtificialIntelligence(AI),andinparticularmachine learn-ing,areusedtoovercometheselimits.

TheobjectivewithAIincontrolistoautomaticallylearneither themodelofthetarget system,thetuningofthemodel,the cali-brationofthecontroller,ordirectlycontrollawsfromobservations. Forinstance,Jesus and Barbosa (2013) usesageneticalgorithmto learntheoptimaltuningofPIDs.Thisapproachgivesexcellent re-sultsbutneeds alarge numberofiterations.Moreover, ifthe be-havior of the controlled system changes over time (for instance, becauseofmechanicalwear),thetuning mustbeentirelyredone, itisnotadaptive.

The biggestdifficulty ofdual controlistofindthecorrect bal-ancebetweenprobeactionsandcontrolactions.Awaytodothisis to useneural networkstolearn thisbalance fromdata(Fabri and Bugeja, 2013 ). Thisapproach is limited to control affine systems, i.e.systemsthatreactslinearlytomodificationsontheirinputs.

The most promising approach for scaling up, i.e. for control-ling a large number of inputs with many criteria on many out-puts,istodistributethecontrol.Forinstance,Bull et al. (2004) and

Choy et al. (2006) control roadtraffic junction signalson several crossroads.Intheseapproaches, thereisnocentralcontrollerthat handles allthe trafficjunctions, eachcrossroadis controlledby a local controller.Bull et al. (2004) uses learning classifier systems, while Choy et al. (2006) uses a combinationof neural networks, geneticalgorithms andfuzzylogic.Theyobtainedveryinteresting results,butthedifficultytoinstantiatetheirapproachestoreallife problemsisaseveredrawback.

Ourapproachusesfeedbackloopstolearnnotthemodelofthe controlledsystembutthecontrollawsthemselves,anddistributes a controlleroneachcontrolled input.Innerfeedbackloopsensure an adequate balance betweenexploration andexploitation ofthe model.

2.3. Machinelearning

A program learns when it is able to improve its functional-ity using its experience, i.e. data acquired during its execution (Mitchell, 2006 ). Machine learning hasbeen heavily influence by the waywe think the human mind works. The two well-known methods for machine learning are supervisedlearning and unsu-pervised learning, whetherexamples of the expected results are presented to the learning program ornot. However, this distinc-tionismerelytechnicalanddoesnotallowtohighlightthe funda-mentaldifferencesbetweenmachinelearningalgorithms.We pre-ferthefollowingfivecategories:Behaviorism,Cognitivism, Connec-tivism,Evolutionism,andConstructivism.

Behaviorism considers thelearnerasa black-box.Learning oc-curs when theobserved behaviorchanges inresponse tothe dy-namics ofthe environment. In machine learning, the behavior is then aproductoftheinitialstate oftheprogramandits progres-sive conditioningbyitsenvironmentthroughafeedbackloop. Re-inforcement learning can be consideredasa behviouristmachine learning approach. It isnotably popular in robotics (Kober et al., 2013 ).Its mostwell-knownalgorithmisQ-Learning (Watkins and Dayan, 1992 ).

Onthecontrary,cognitivistsconsiderthatwhatisimportantis notwhatthelearnerdoesbutwhatheknows.Cognitivistmachine learning algorithms classically rely on symbol manipulation, and thus onapredefinedsetofsymbols,whichisnotadequatewhen dealingwithcomplexity(Raghavan et al., 2016 ).

Connectionismconsiderslearningatalowerlevelinthebrain: the dynamic interconnection of neurons. In machine learning, it regroups all the artificial neural network algorithms, from back-propagation perceptrons to the more recent Kohonen maps (Astudillo and Oommen, 2014 ) and deep learning algorithms (Deng and Yu, 2014 ).Theyshowimpressiveresultsbutneedahuge amountofdataandcomputingpower.

Evolutionismconsiderslearningatthescaleofaspeciesrather thanan individual.Evolutionary algorithmsevolveapopulationof solutions towards better solutions by evaluating them, mutating them, andcrossing thebest individuals.These algorithms are in-teresting becausethey cantackle problemsforwhichthere isno knownsolution,buttheyaretime-consumingandthefitness func-tioncanbedifficulttoobtain(Bongard, 2013 ).

Constructivismistheideathathumanshavetheabilityto con-structknowledgeintheirownmindthroughinteractionswiththe environment.Constructivistartificialintelligenceaimsatdesigning

(4)

self-constructive systems (Thórisson, 2012 ). In such systems, not only theknowledge butthe meansto acquire itare learned.The focusismadeonself-organizationandbottomdesignmethods.

Note that there are no hard boundaries between these cate-gories. Most advanced machine learning algorithms actually take simultaneously from severalof them.For instanceLearning Clas-sifier Systems stem from Behaviorism since they are reinforce-ment learningalgorithms, butthey alsoincorporatean evolution-arycomponent(Urbanowicz and Moore, 2009 ).

TheAdaptiveMulti-AgentSystemsapproachisconstructivist:it focusesonself-organizationandsharesthesamelongtermgoalof designing a fullyself-constructed artificialintelligence.Italsohas a linkwithconnectionism withtheidea thata complextaskcan beachievedbyasetofseveralsimpleentities.

3. Approach

Top-down classical methods have severe shortcomings

when it comes to complexity: scale, integration, and flexibility (Thórisson, 2012 ). Thissection presents theAdaptive Multi-Agent Systems(AMASs)theory,thataimsatovercomingtheselimitations thanks to the natural modularity of MASs and the cooperative self-organizationofagents.

3.1. Adaptivemulti-Agentsystems

Wooldridgedefinesanagentasfollows:Anagentisacomputer system thatis situated in some environment,and thatis capableof autonomousactioninthisenvironmentinordertomeetitsdelegated objectives.Wooldridge (2009) “Autonomousaction” meansanagent takesitsowndecisiononwhattodoandwhentodoit.It indefi-nitelyfollowsalifecycleofperception,decisionandactionwithout anyexternalcontrol.

Asystemcomposedofseveralagentsininteractioninthesame environment iscalledaMulti-AgentSystem(MAS)(Ferber, 1999 ). Knowledge, computation, and control are distributed among the agents of a MAS. Such systems are based on collective problem solving, the idea that local behaviors within a group can ensure the achievement ofa givenglobaltask.Multi-agent systems pro-vides interesting features whendealing withcomplexity,such as scalability,robustnessandadaptivity(Ren and Cao, 2013 ).

The function of a MAS is dependent on its organization (the agents, their relations, their behavior). Achange in the organiza-tionoftheMASisachangeofitsglobalfunction.Whenagents de-cidethemselvestodynamicallychangetheirbehaviorortheir rela-tions, thesystemisself-organizing.DiMarzoSerugendoetal. de-fineself-organizationastheprocess withwhichasystemchanges its structure without any external control to respond to changes in its operating conditions and its environment (Di Marzo Seru- gendo et al., 2011 ). It is very naturaland powerfulfora MAS to performlearningandself-adaptationthisway.

TheAdaptiveMulti-AgentSystemsapproachaimsatfacilitating thedesignofmulti-agentsystemsforsolvingcomplexproblemsby designing simple agentsthat self-organize to generatea complex globalfunction(Georgé et al., 2011 ).Inthisapproach,theprocess ofself-organizationisdrivenbycooperationprinciples.Local deci-sionsfromeachagentmayprovokelocalchangesthatinturnlead tochangesintheglobalfunctionofthesystem.

Thisapproach isbasedonthetheoremoffunctionaladequacy (Georgé et al., 2011 ).AppliedtoMASs,oneoftheconsequencesof thistheoremistheassurancethattheglobalfunctionofasystem is adequateifallagentsmaintaininteractions withtheir environ-ment that are favorableto themselves andto their environment (they aresaid tobe ina cooperativestate). Then,the challengeis tofindthebehaviorforeachagentthatenableseachofthemto re-maininacooperativestatedespitechangesintheirenvironment.

Tothisend,eachagenthastwosetsofrules.Nominalrules en-ableanagenttoachieveitsfunctionwhenitisalreadyina cooper-ativestate.However,itishighlyprobablethattheagenteventually findsitselfunabletoachieveitsfunction,duetochangesinits en-vironment,ortoasimplelackofknowledge.Suchcasesarecalled Non-CooperativeSituations(NCSs)andareprobablecauseoffailure fortheglobalsystemtoachieve itstask.There areseventypesof NCSs:

• Incomprehension:the agent isnot able toextract information

fromtheperceivedsignal.

• Ambiguity:theagentcaninterprettheperceivedsignalin

sev-eraldifferentmanners.

• Incompetence: theagent is not ableto decideanything based

onitscurrentknowledgeandskills.

• Unproductiveness:thedecisionofanagentistodonothing. • Concurrence:theagentthinksitsactionwillhavethesame

ef-fectsastheactionofanotheragent.

• Conflict: theagent thinksitsactionisdiscordantregardingthe

actionofanotheragent.

• Uselessness: theagentthinksthatits actionwillhaveno

con-sequencesonitsenvironment.

WhenaNCSoccurs,theinvolvedagentsswitchfromtheir nom-inalbehavior rulesto their cooperativerules,which seektosolve the NCS by provoking changes in the MAS (in other words, by triggeringself-organization). An agent hasseveralmeans tosolve aNCS: tuning internalparameters, reorganizing itsrelations with otheragents,creatinganewagent,orself-destructing.

In the current state of the approach, the AMAS designer has to design thecooperative behavior for each NCS. A methodology named ADELFE (French acronym for Toolkit for Developing Soft-ware withEmergent Functionalities) guides the designof AMASs (Bonjean et al., 2014 ).Itisabottom-upanditerativedesignprocess thatencouragesthedesignertofocusonthelocalfunctionofeach agent, and to forget the global function of the system. A strong focusisputondecomposing theprobleminsteadofthesolution. Theresultingagentswilloftenbefollowingsimple(yetintricated) reactive behavioralrules, andthus will seem toosimple tosolve anything. It isthe point ofour approach: dodgingcomplexity by thinkingexclusivelywithinalocalscope.Ifagentsbehave accord-ingly tothe AMASprinciplesof cooperation, theemerging global functionshallbeadequate.OriginallybasedontheRationalUnified Process (Kruchten, 2004 ), ADELFE incorparates specific steps and guidelinestohelp identifytheentities oftheproblemandwhich onesshouldbecomeagents,andfindtheirNCSsandtheir coopera-tivebehaviors.Ithasbeenusedforthedevelopmentofthesystem presentedin4.

3.2.Objectivesintermsofcontrol

Otherthanpushingforwardtheexperimentalverificationofthe AMASapproach,themainobjectiveofthisworkistodesigna sys-temabletolearn inrealtimehow toputacomplex systemin a desiredstate.In ourcase, thecontrolled systemmayhave multi-pleinputsandoutputs (MIMO),andthedesiredstateisdescribed asa combinationofcriteria. Acriterionmayaffectoneorseveral inputsoroutputs.Therearethreetypesofcriteria:

• Constraint:athresholdtomeet • Setpoint:atargetvalue

• Optimization:avaluetominimizeortomaximize

Anadditionalrequirementisthatthecontrollermustbeeasyto implementforreal-lifecomplexsystems.Inparticular,thismeans thecontrollershouldnotneed aheavytuning andshouldnot re-quireany predefined model. In other words, prerequisite knowl-edgeonthecontrolledsystemhastobeminimal.

(5)

Fig. 1. A view of all the agents of ESCHER.

Moreover,thelearning processhastobe perpetualandinreal time. It has to occur simultaneously to the control, so the con-troller adapts itself to changes in the controlled system (suchas failures,wear,etc).Ourcontrollersees thecontrolledsystemasa blackbox:itonlyhasaccesstotheinputsandtheoutputsofthe blackbox,nottotheinternalprocessesthatdrivesitsbehavior.

4. ESCHER,AnadaptiveMAStolearnthecontrolofcomplex systems

Inthissectionwepresentamulti-agentsystemcalledESCHER, forEmergent Self-adaptiveControllerforHeat EnginecalibRation. Thanks tocooperativeself-organization,itisabletolearnin real-timethecontrolofasystem.Ithasbeendesignedandtested dur-ingaprojectrevolvingaroundautomotivethermalengines,buthas been design under the assumption that nothing is known about thecontrolledsystem,exceptitsnumberofinputsandoutputs.

The goalis to makethe controller generic enoughto be used onanyothersystemswithoutanymodificationsotherthanthe in-terface. Following a “black box” approachof the control, ESCHER playswiththeinputsofthecontrolledsystem,observestheeffects ontheoutputsandinferstheactionsthatwillleadtocompliance withtheuser-definedcriteria.

4.1. Systemoverview

TheenvironmentofESCHERiscomposedofthecontrolled sys-temandoftheuserdefinedcriteria.ThismeansthatESCHER ob-serves the inputs andoutputs of thecontrolled system, andalso thecontrolcriteria definedby theusers.Among theinputsofthe controlled sytem, there may be some that are not controlled by ESCHERbuthaveanimpactonthecontrolledsystem.Forinstance theatmosphericpressurecannotbecontrolledbutcansignificantly alter theoutputofathermalengine.Ifsuch asensorisavailable, itcanbetakenintoaccountbyESCHER.

ESCHERitselfiscomposedoffourtypesofagents:

• VariableAgentsaretheeyesofthesystem,thereisoneVariable

Agentforeachinputandoutputofthecontrolledsystem.

• Criterion Agents represent user-defined criteria, the desired

stateofthecontrolledsystem.

• Context Agents can be seen as the memory of the system,

theyrepresentapartofthestatespaceoftheenvironmentfor whichtheconsequencesofagivenactionareknown.

• Controller Agents are the hands of the system, they interact

withasetofContextAgents tofindthemostadequateaction toperformintheenvironment.

Fig. 1 showsanoverviewofthesystem,withthelinksbetween the four types ofagents. Notethat this view is intended forthe reader,agentsdonothaveaglobalviewofthesystem.

4.1.1. Contextagentsandcontrolleragents

Each ControllerAgent iscoupled withaset ofContextAgents whose memorized action is related to the effector associated to this sameControllerAgent. TheControllerAgent selectsthe next action to perform among the received suggestions and notifies the Context Agents which has sent a suggestion. There is no di-rect interaction between Context Agents, neither between Con-trollerAgents.Theonlylinkbetweenthemisthroughthe environ-ment:theactionofaControllerAgentwillhaveanimpactonthe controlled system which will be perceived fromother Controller AgentsthroughVariableAgentsandCriterionAgents.

A Controller Agent andits set of Context Agents can be seen as an automous MAS. Its environment would be made of Vari-able AgentsandCriterionAgents.AContext-Controller“sub-MAS” isabletosynchronizeitsactionswiththeothersub-MASsby ob-serving the controlled system’s inputs and outputs variations. A ControllerAgentdoesitsbesttodecreasethecriticallevelsby per-forming actions on only one input, locally, without caring about howtheotherinputsarehandled.Thereisnoglobaldecision pro-cesstofindthe adequateactionson eachinputatonce.This fea-tureisthekey toscalability.Moreover,thedistributionofcontrol makesESCHERmodular.Theadditionortheremovalofanew Con-trollerAgentdoesnotimpacttheothers.

4.1.2. Variableagentsandcriteriaagents

Tofulfillitsfunction,eachagentbesidesVariableAgents,needs to know the current state of the controlled system. This is why VariableAgents sendvalue update toevery othertypesof agents (the relevant Criterion Agents, every Context Agent, every Con-trollerAgent).Thisbroadcastmayseemharmfulforscalability,but it isnot. Indeed,agentsofESCHER are not physicallydistributed,

(6)

Fig. 2. Examples of criticality functions.

thecostofmessagesendingisverylow.Onthecontrary,thecost ofreadingthevalue ofaphysicalsensorishigh, sinceitinvolves external systems,andprobablynetworking.Hence,itiswaymore efficienttohaveanagentpersensor,broadcastingitsvalueto oth-ersthantogiveaccesstoasensortoeveryagentneedingthis par-ticularvalue.

CriterionAgents transformthevariablevaluesintocritical lev-els, representingthe satisfactionofthecriteria (i.e.aideaofhow farfromthedesiredstateisthecurrentstateofthecontrolled sys-tem).VariableAgentsandCriterionAgentsgiveESCHERacomplete representationofitsenvironment.

Atagivenmoment,ifeveryagentinthesystemisableto prop-erly perform its function, then ESCHER is in a cooperative state anditsglobalfunctionisadequate.However,numerouscasesexist whereatleastone oftheagentsisunabletoexecuteitsfunction. ThesecasesaretheNon-CooperativeSituations,thatarepresented inSection 4.3 .

4.2. FunctionandnominalbehaviorofESCHERagents

Thissectionpresentsadecompositionoftheactivityofcontrol inelementarytasks.Agentsinchargeofthesetasksaredetailed. 4.2.1. Observingthecontrolledsystem

The firstthingwe needwhenitcomes tocontrollingasystem with a “blackbox” standpoint is to be ableto observe it.A spe-cifictypeofagentsisinchargeoftheperceptionofthecontrolled system:VariableAgents.Toeachinputandoutputofthesystemis associateda VariableAgent.Duringitslifecycle,aVariableAgents perceivesthevalueofitsdesignatedvariableonthecontrolled sys-temandforwardsittotheotheragentswhichmayneedthis infor-mation.Ifnecessary,aVariableAgentsmayembedanoisefiltering algorithm.

4.2.2. Representingcontrolcriteria

The controller needsto havean internal representationofthe objectives of the user, of the desired state for the controlled system. Giving such a representation is the function of Criterion Agents.TherearethreetypesofCriterionAgents:

• Threshold: theagent expressesthewill tomaintainavariable

eitheraboveorbelowauser-definedthreshold.

• Setpoint: the agent expresses the will to set a variable to a

user-definedvalue.

• Optimization: the agent expresses the will to minimize or to

maximizethevalueofavariable.

Each Criterion Agent receives updates fromthe relevant Vari-ableAgents, computesa criticallevel,andsendsittoother agents which may need this information. This critical level reflects the satisfaction ofthecriterion represented bythe agent. Thecritical

levelrangesfromzero(thecritrerionisfullysatisfied)to100(the criterionisfarfrombeingsatisfied).

Fig. 2 showsexamplesofcriticalityfunctionsusedbyCriterion Agents to compute their criticallevel. Forinstance the threshold criticalityfunctionreturnszeroifthethresholdismet,otherwisea valueupto100.Thecriticalityfunctionforasetpointreturnszero onlywhenthetarget valuehasbeenreached.Thecriticality func-tionofan optimizationcriterion isasymptotictozero.Thecurves ofthesefunctionscanbe adjustedbytheusertodefine the rela-tivesignificanceofitsneeds.

CriterionAgents apply atransformation fromthespaceof the controlled systemvariablesto thespaceof thecriteria. The criti-cal levels decrease when their criterion isbeing satisfied.Hence, agents perceiving critical levels seek to decrease them. The only way to do so is to perform adequate actions on the input of thecontrolled system.Findingtheseadequateactionsrequiresthe analysisofthecurrentstateofvariablesandcriteriatotryto un-derstandthedynamicsofthesystem.

4.2.3. Analyzingthestateoftheenvironment

WiththeVariableAgents andtheCriteriaAgents, ESCHER has an internal distributed representation of its environment. To be able to decide which actions to perform, an analysisof this en-vironmentisneeded.ThisisthefunctionofContextAgents.

AContextAgentmemorizestheeffect,oneachcriticallevel,of a particular action performed on a particular effector. The agent alsomemorizes the state of the environmentwhen the action is performed. This provides information about the expected conse-quencesofaparticularactioniftheactionisperformedwhilethe environmentisinaparticularstate.

Concretely,aContextAgentiscomposedof:

• anaction,i.e.anoffsettobeperformedonaninputofthe

con-trolledsystem,

• a set of forecasts, which contains a value for each Criteria

Agent,representingtheexpectedvariationsofcriticallevel,

• a set ofvalidity ranges, whichcontainsa value rangeforeach

VariableAgent,representingthestateofthecontrolledsystem. AContext Agent receives value updatesfrom VariableAgents, andcriticallevelupdatesfromCriterionAgents.Whenthecurrent valueofeachVariableAgentisinsidetheircorrespondingvalidity range,the ContextAgent is said valid. This means thecontrolled systemisinastateinwhichtheforecastsoftheagentarerelevant. WhenaContextAgentbecomesvalid,itsendsanotificationwhich containsitsactionanditsforecasts.Thisnotificationisactuallyan actionsuggestion.Letpasuggestion(1)

p:=

(

a,F

)

(1)

where a is an action andF is a set of critical levels forecasting functions. Thus, a function fi_∈_F _returns _the _critical _level _of Cri-terionAgent i forecastedby the ContextAgent ifa isperformed.

(7)

Suchafunctioncanbeexpressedas(2)

fi

₍

_a

₎

₌_ci₊

_δ

i

₍

_a

₎

₍₂₎ where ci is the current critical level of Criterion Agent i, and

δ

i is afunction resultingfromthelearning ofthe ContextAgent.In practice,aContextAgentsendsanactionsuggestiontogetherwith asetofvaluesfi₍_a_),_not_a_set_of_computable_functions_fi_._We_only show expression (2) to explicita part of thelearning ofContext Agents,whichwillbediscussedlater.

A notification is also sent when the Context Agent becomes non-valid so its suggestion is withdrawn. These suggestions and notificationsare received bythe Controller Agent incharge ofthe affected effector.This newtype ofagent ispresentedinthe next paragraphs.

4.2.4. Performingthemostadequateaction

AControllerAgentisassociatedtoeachinputcontrolledby ES-CHER. The function ofa ControllerAgent isto perform themost adequate action on this input, i.e. the action which will provoke thegreatestdecreaseofcriticallevel.Anactionmaybeincreasing, decreasing,ormaintainingthevalueoftheinput.

LetutthecurrentvalueoftheinputcontrolledbytheController Agent,andat theactionperformedbythe ControllerAgent atits lifecyclet.ThenextvalueoftheinputisgivenbyEq. (3) .

ut+1=ut+at (3)

Ateachlifecyclet,theControllerAgentchoosesataccordingtoits internalrepresentations,whicharecomposedof

• Ct,thesetofcriticallevels,updatedatlifecyclet.

• P_t,theset ofaction suggestionsfromvalidContextAgents at lifecyclet.

Among P_t (the received suggestions), the Controller Agent chooses the action associated with the greatest decrease of the highestcriticallevel.Ifthehighestcriticallevelisnotexpectedto vary,accordingtotheforecasts,thentheControllerAgentseeksto decreasethesecondhighestcriticallevel,andsoon.

Hence, foreach suggestion pk

t ∈Pt, theControllerAgent looks at fk

max∈Ftk,the function whichreturns the highestcriticallevel (in other words, the function corresponding to the most critical CriterionAgent).ThisfunctionisdefinedbyEq. (4) .

(

a

)

,whilebeinglowertothecurrenthighestcriticallevel (Eq. (5) ).

≤maxCt (5) where A_t isthe setof actions containedinthe suggestionsfrom P_t.

TheControllerAgentthenperformstheactionatandsends:

• anacceptancenotificationtothecurrentlyvalidContextAgents

whoseactionhasbeenselectedandperformed,

• a rejection notification to the currently valid Context Agents

whoseactionhasnotbeenselected,

• incaseofthecurrentactionisdifferentfromtheactionofthe

previousstep,awaivernotificationtotheContextAgentswhich suggestedthepreviousaction.

Of course, at any given time, a ControllerAgent may not be ableto makeagooddecision(i.e.adecisionthat willleadtothe decrease of critical levels), because of false or incomplete infor-mation. These cases are Non-Cooperative Situations (NCSs). They occur when ESCHER has not sufficiently learned andis not fully

adapted toitsenvironment.Forinstance,ifthe condition6 isnot met,thenEq. (5) cannotbeapplied.

∃

pi

t∈Pt,

∃

fmaxi ∈Fti,fmaxi

(

ai

)

≤maxCt (6) TheoccurenceofaNCStriggersaspecificbehavior(the cooper-ativebehavior)oftheinvolvedagentstosolveitandsettheagents in a cooperative state. Solving NCSs drives the whole system to-wardsastateoffunctionaladequacy.NCSsandtheirresolutionare presentedinSection 4.3 .

4.3. Non-Cooperativesituations

This section explainshowagentsdetect andsolveNCSs. Since they provokechangesintheorganizationofthesystem,NCSs and their resolution are the key to the self-adaptativenessof AMASs. EachagentlocallysolvestheNCSsitdetects,thankstospecific ac-tions. InESCHER,NCSs mainlyoccurforContextAgentsand Con-trollerAgents.Theymotivatethesystemtoself-organize,in partic-ularbycreating,modifying,ordeletingContextAgents.

4.3.1. NCS1:Controlleragentincompetence

Detection:WhenaControllerAgentdoes notreceiveanyaction suggestion,P_t=∅,henceA_t=∅.Inthissituation,theagentisnot abletochooseanadequateactionusingEq. (5) :itfindsitselfina NCSofincompetence.

Resolution: Theresolution ofthisNCShas twosteps. First,the ControllerAgent hastochoosean actiononitsown.Its choiceis basedontheeffectsofitspreviousaction.Ifthecriticallevelsare increasing, thenewactionischosenastheoppositeofthe previ-ousaction,otherwisethepreviousactionisrepeated(Eq. 7 ).

at:=

½

at−1ifmaxCt<maxCt−1

−at−1otherwise

(7)

Ift=0,thenthenewactionisrandomlychosen.

IfthepreviousactionhadbeenselectedfromP_t−1andis con-tinued,theControllerAgentdoesnotsendawaivernotificationto the ContextAgents that hadsuggestedit att−1, evenifthey are

now non-valid.Theymayneedthisinformationtolearn(seeNCS 6).

Otherwise, afterhaving determined itsnew action,butbefore performing it,the ControllerAgent creates a newContextAgent. This new Context Agent is initialized with the new action, and memorizes the current value of all variables. While the highest critical level decreases, the ControllerAgent continues the same action.Duringthistime,thenewContextAgentobservesthe vari-ationsofallcriticallevelstosetitsforecasts.Finally,whenthe ac-tion isabandoned,theContextAgentsets itsvalidityranges with theminimumandmaximumobservedoneachvariable.

4.3.2. NCS2:Controlleragentunproductiveness

Detection: When noneof the received action suggestions con-tainsforecastsofadecreaseofthehighestcriticallevel(condition 6 is not met), the Controller Agent is ina NCS of unproductive-ness. Its nominaldecisionprocess (selectthe actionassociatedto thebiggestdecreaseofthehighestcriticallevel)doesnotproduce anyaction.Therearetwo waysofsolvingthisNCS,dependingon the received suggestions.Let Atheset ofall possible actions for theControllerAgent,ateachtimesteptwehaveA_t⊆ A.

Resolution1: IfA_t=A,inother wordsifevery typeofactions (increment, decrement, stay) has been suggested, the Controller Agent thinks that thehighest criticallevel cannot be decreased, whatevertheagent maydo.Then,theagentattemptstodecrease the second highest critical level (without increasing the highest criticallevel). Ifit isnotpossible,itwilllookatthethirdhighest critical level, andso on. Ifthere is noforecasted decrease atall,

(8)

the agent chooses theleast harm:theaction associatedwiththe smallestincreaseofthehighestcriticallevelischosen(Eq. (8) ).

(8)

Resolution2:ThesecondcaseiswhenA_t=6 ∅∧A_t6=A.Itmeans that some actions have not been suggested, they have not been tested inthecurrentstate oftheenvironment. Sincenoneofthe received action suggestions containsforecasts of decrease of the highest critical level, they actually contain actions to avoid. Let A_c=A− At theset ofcandidateactions, i.e. actions thatare not currentlysuggested.TheControllerAgentthendecidestoselectan action among the ones which are not suggested (which we call candidate actions). The selection of the new action is similar to the resolution ofthe NCS 1 butis,this time, conditionedby the presenceofthisactioninA_c(9) .

(

_a

t=at−1ifat−1∈Ac∧ maxCt< maxCt−1

at=−at−1 if −at−1∈Ac∧ maxCt≥ maxCt−1

at=rand

(

Ac

)

otherwise

(9)

WiththesameconditionsthanintheNCS1,theControllerAgent maycreateanewContextAgent,initializedinthesamemanner. 4.3.3. NCS3:Controlleragentconflict

Detection:WhenaControllerAgentappliesanactionsuggested by a ContextAgent,it expectsthat thecriticallevels willvary in thewayindicatedbytheforecasts. IftheControllerAgentnotices that itisnotthecase,itthinksthattheactionthathasjustbeen performedmaybeharmful,itisaconflictNCS.

Resolution: The action must be stopped. The Controller Agent abandons the action and notifies the Context Agents which had suggesteditwhenitwasselected.Moreover,iftheContextAgents whichwerewrongarestillvalid, theywillbetemporarilyignored infuturestep.

4.3.4. NCS4:Contextagentconflict(falseforecasts)

Detection: When the action of a validContext Agent is being performed, said agent observes the variations of critical levels. When the actionisterminated, theagent compares theobserved variations withitsforecasts. ThereisaconflictNCSifatleastone of the observedvariation contradicts the forecast (their direction ofvariationisdifferent).

Resolution:Anerrorinthedirectionofvariationofaforecastis probablymorethana simplemistakeintheinitial observation,it isnota problemofforecastadjustment.Thisratherindicatesthat the Context Agent should not have sent its suggestion,it should not havebeen valid. To correctthis situation,the Context Agent will reduce its validity ranges,bringing closerthe nearest bound tothecurrentvalueofthecorrespondingvariable.

4.3.5. NCS5:Contextagentconflict(inaccurateforecasts)

Detection: ThisNCS issimilar toNCS 4.But thistime,the ob-served variations are in the same direction as the forecasts, but not ofthe same amount.This kindof observationis sensitive to noiseontheperceptionofvariablevalues,hencesmalldifferences (under5%ofcriticality)areignored.

Resolution: An errorin the amplitudeofvariation is less seri-ous thananerrorinthedirectionofvariation.TheContextAgent onlyneedstoadjustitsforecast.Thus,inthiscase,theagentdoes not changeitsvalidity ranges,butratherincreaseordecreasethe erroneousforecastssotheyfititsobservations.

4.3.6. NCS6:Contextagentincompetence

Detection:It happensthat aContextAgent whomactionis be-ingperformedbecomesnon-valid,butdoesnotreceivedanyreject nor waivernotification fromthe ControllerAgent (it isapossible

outcomeofNCS1).TheContextAgentistheninanincompetence NCS,thissituationisnotcoveredbyitsnominalbehavior.

Resolution: From its standpoint, this situation means that the Controller Agent considered that its action can be kept a little longer. Hence,to keep sendingwhat could be agood suggestion, theContextAgentextendsthevalidityrangesthatmakehim non-valid.

4.3.7. NCS7:Contextagentuselessness

Detection:Sometimes,afterseveralNCS4,somevalidityranges ofaContextAgent havebeensoshrinked thattheir amplitudeis nearzero.Iftheamplitudeofatleastonevalidityrangefallsunder thethresholdofminimalsize,theContextAgentisinauselessness NCS,thechancesofbeingvalidaretoolow.Bydefault,the thresh-oldisequalto onehundredthofthedomainofthevariable.This NCSisignoredforunboundedvariables.

Resolution: A useless Context Agent can do nothing else than delete itself to solve this situation. Indeed, a Context Agent can onlylearnifitsactionisselectedwhilevalid.Iftheagentisnever valid,it neverbrings informationtothesystemandneverlearns. Bydeletingitself,theagentfreescomputationressources.ThisNCS isnotpivotalforESCHER.Thepresenceofuselessagentsdoesnot preventtheadaptationandfunctionaladequacyofthewhole sys-tem.ButthisNCSavoidsoveragesofContextAgents.Toavoidthat toomanydeletionsandalossofmemory,weadvisetoset 4.3.8. NCS8:Contextagentunproductiveness(validityranges)

Detection:This NCSconcerns a ContextAgent whichhas been valid,selected,thenbecamenon-valid,andobservedadecreaseof criticallevels.Thisis anidealcases,everything wentfine. Thisis whyaContextAgentinthissituationconsidersthatitsactionmay still berelevant, evenifthe agent itselfis nownon-valid.This is anunproductiveness NCS:the nominaldecisionprocess resultsin doing nothing (sincethe agent is not valid), while there is good chancesthatsending anactionsuggestionshouldbe agoodthing todo.

Resolution:TheContextAgentexpandsthevalidity rangesthat makeitnon-valid,soitisnowvalid. Theagent alsosendsan ac-tionsuggestion.Iftheagentwaswrongtosendasuggestion,aNCS 4willoccurandtherangeswillbe shrinked.LikewiseNCS7,this situationisnot crucialforthesystem,butenablesafiner adapta-tionforalimitedrisk.

4.3.9. NCS9:Contextagentunproductiveness(suggestedaction) Detection:AContextAgentwhoseactionhasbeenselected sev-eraltimes in a rowconsiders itself in unproductiveness NCS. In-deed,theagentthinksthattheidealcasewouldbethatitsaction should provoke a greater decrease of criticallevel so it only has tobeperformedonce.TheContextAgenthenceseektoadjustthe amplitudeofthesuggestedaction,ina waytomaximize the de-crease(orminimizetheincrease)ofcriticallevels.

Resolution: The adjustment of the amplitude of the action is basedontheestimationoftheeffectsofthevariation ofthe am-plitude on the variation of criticallevels. The idea is to increase ordecreasetheamplitudeoftheactioninawaytoacceleratethe decrease(orslowdowntheincrease)ofcriticallevels.Tothisend, a ContextAgent which hasbeen selected severaltimes ina row slightlyandrandomlychangestheamplitudeofthesuggested ac-tionandcorrelatesthisvariationwiththespeedvariationof criti-callevels.Hence,ifthehighestcriticallevelisdecreasing:

• quicker while the amplitude has been increased: the Context

Agentkeepsincreasingtheamplitude;

• quicker while theamplitude hasbeen decreased: the Context

Agentkeepsdecreasingtheamplitude;

• slower while the amplitude has been increased: the Context

(9)

Fig. 3. Typical convergence of an adaptive value tracker.

• slower while the amplitude has been decreased: the Context

Agentincreasestheamplitude;

The ContextAgentdoes the exactopposite ifthehighest crit-icallevel isincreasing,although thisrarelyhappen sinceit isnot frequentthatanactioniscontinuedifithasprovokedariseofthe highestcriticallevel.Notethatamaximalamplitudecanbesetin ordertoavoidtoobrutalactions.

4.3.10. Conclusiononnon-Cooperativesituations

ThissectionhaspresentedtheNCSsencounteredbytheagents ofESCHER.Inparticular,theresolutionofthesesituationsprovokes thecreation,thedeletion,andthemodificationofContextAgents, which are thememory ofthe system. In other words,NCSs pro-vokethememorizing,theforgetting,andthecorrectionof knowl-edgebasedonobservationsoftherealsystem:theirresolution en-ablesESCHERtolearnandself-adapt.

NCSs 1 and 2 correspond to the acquisition of new informa-tions. Theyoccur when ESCHER isdiscovering a newpart ofthe state spaceofitsenvironment.Theyopenthesystemasthey add newContextAgents.

NCS 3 enables ESCHER to not persist in error. It is partially solvedthankstothereorganizationoftherelationsbetweena Con-trollerAgentandsomeofitsContextAgents.Indeed,theController AgentignoressomeoftheContextAgentsiftheyhavebeenwrong. ContextAgentsalwaysself-evaluate.Hence,NCSs4to9are de-tectedifoneofthepartsisnolongeradaptedtotheenvironment. Theyare solved by theadjustment ofthe agents(except forNCS 7 which issolved thanks to openness). Hence, ESCHER is always self-evaluatingandself-adapting.

4.4. Learningandadjustment

Alargepartofthelearningofthesystemreliesonthetuningof ContextAgents’internalparametersduringtheresolutionofaNCS. Alltheseparameters aretuned thankstoAdaptive ValueTrackers (AVT,(Lemouzy et al., 2011 )).Theseparametersare:theboundaries ofthevalidity ranges,theamplitudeofthesuggestedaction,and thevaluesoftheforecasts.

AnAVTconvergestowardsavaluethanks tobinaryfeedbacks: loweriftherealvalueislower,orgreateriftherealvalueisgreater. Boththe value andthe variation stepof thetracker are dynami-cally tuned.Thevariationstepisincreasedwhentwoconsecutive feedbacksareequal,anddecreasedotherwise.Thesevariations fol-lowuser-definedcoefficients.Fig. 3 showsanexampleofthe vari-ation of the value of an AVT with standard settings (two equal consecutivefeedbacksdoublethevariationstep,twodifferent con-secutive feedbackdivide thevariation stepby three). A plussign meanstheAVTreceivedagreaterfeedback,aminussignmeansit receivedalowerfeedback.

AContextAgent transformsits observationsandreceived noti-fications intofeedbacksforits numerous numerousAVTs. For in-stance,a ContextAgentin NCS5observing agreater variationof

critical levels than what its forecast indicates will send a greater feedback tothecorresponding AVT.The trackerthen increasesits value. Ofcourse,the newvalue oftheforecast maynot beequal totheobservation.Butgiventhedynamicsoftheenvironmentand the inevitablenoiseonrealsensors,perfectlyfittingtothe obser-vationsisnotdesirable.

AVTs quicklyconvergetowardavalue,areabletostabilise,and to move again quickly toward a new further value. They match our needs, astheparameters of agentsoftenhave tochange, of-tendrastically.

4.5. Comparisonwithexistingapproaches

ESCHER hasbeenpresentedasacontrolsystembecauseithas beendesignedtocontrol.Nevertheless,learningplaysacrucialrole inthissystem.Thissectionexploresthistwocomplementarysides ofoursystemandtheir linksthrough comparisonswiththeDual ControlTheoryandwithLearningClassifierSystems.

4.5.1. Comparisonwithdualcontrol.

In the DualControl Theory, the controlled system is partially known.Thecontrollerapplieseitherprobeactionstolearnand re-fine its modelof thecontrolled system, orcontrol actions to put the controlled systemin the desired state (Feldbaum, 1961 ). Too many probes hampers the control, but too many control actions makes a small gain. Finding the balance between probe actions andcontrolactionsrequirestosolvethedifficultBellmanequation, whichisnoteasilyfeasibleinrealcases.

Like dual controllers, ESCHER faces unknown systems and learns fromits actions. However, it learnsfrom all of its actions and all of its actions seek to put the controlled system in the desired state. All ofits actions are probes andcontrol actions at thesametime.Moreover,unlikedualcontrollers,ESCHERdoesnot needapredefinedmodelthatislateradjustedbylearning.

The need tolower the criticallevels (even whenno agent in-dicates how to do it), combined to the fact that ESCHER learns from each of its actions, can be seen as an approach to solve the problem ofbalance between probes andcontrol actions. The control process drives the learning process towards interesting statesoftheenvironment,whilegettingclosertothedesiredstate (and thuspreventingto strayawayandvisituninterestingdistant states).

4.5.2. Comparisonwithlearningclassifiersystems.

Learning Classifier Systems (LCSs) are reinforcement learning systems(Urbanowicz and Moore, 2009 ).Theyarecomposedofaset ofbehaviorrules,apairingsystemwhichmatchesstatesofthe en-vironment withrules conditions,a selectionmechanism between simultaneouslytriggeredrules,andageneticalgorithmtotunethe setofrules.

There are several similarities between a LCS anda Controller Agent coupledwithitssetofContextAgents.ContextAgentsplay thesamerole thanthepairingsystem(withtheir validityranges) andthesetofrules (eachContextAgent canbeseen asa behav-ior rulesince itsuggestsan actionundercertain conditions).The ControllerAgentplaysasimilarrolethantheselectionmechanism, chosing an action among several suggestions from valid Context Agents.

The main difference comes fromthe fact that Context Agents are autonomous, they learn by themselves. On the contrary, the rules of a LCS are processed by a genetic algorithm, to with-drawtheweakestandgeneratenewandpresumablymoreadapted rules.Thefitnessfunctionofthisalgorithmisusuallyareward sig-nal, perceived fromtheenvironment. Agreat difficulty inthe in-stanciationofaLCSistoadequately splittherewardbetweenthe different rules. This difficulty does not exist in ESCHER, because

(10)

Table 1

Parameters of ESCHER and their significance.

Parameters Significance

Number of controlled variables Important Number of observed variables Important

Variables references Important

Criticality functions Important

Variation ranges Optional

Maximal size of an action Incidental Minimal size of a validity range Incidental Minimal step of an AVT Incidental Coefficients of an AVT Incidental

of theautonomyofContextAgents.Theyevaluate theiradequacy themselves, and adjustthemselves if needed. On certain aspects, the notionofcriticallevelsmaybeassimilatedtothereward sig-nal,asitenablestoevaluatetheadequacyoftherules.

Byself-adjusting,ContextAgentssuggestactionsthataremore andmoreadequate,withamoreandmoreadequatetiming,along withmoreandmorereliableforecasts. Thus,thelearning process feedsthecontrolprocess.

4.6. Settings

ForESCHERtobeeasytoinstantiatetoaparticularsystem,the number of parameters has to be as low as possible, and setting themshouldnotrequiretheuseofelaboratecalibrationmethods.

The only knowledgeaboutthe controlled systemthat ESCHER needsisquitesimple

• thenumberofcontrolledvariables,andtheirreferences; • thenumberofobservedvariables,andtheirreferences.

Itispossibletogivethelowerandhigherboundforeach vari-able.ESCHERworkswithoutthisinformation,butitcanbeofuse forthecriticalityfunctions.Anyway,thisisbasicknowledgeabout thecontrolledsystem,itisnotanobstacle.

TheonlydifficultyintheinstanciationESCHERisthedefinition of the criticality functions. Controller Agents focus on the most criticalCriterionAgent.Thismeansthatthecompromise between severalcriteriaisexpressedthroughthedefinitionofthecriticality functions.Forinstance,inan absurdcase,ifwewanttomaximize andminimizethesamevariable,ESCHERwillstabilizeonthevalue wherethetwocriticalityfunctionsmeet.Thisknowledgeconcerns notonlythecontrolledsystem,butalsotheobjectivesoftheuser. Finally,someotherparametersaresecondary.Theyhaveavery limitedimpacton theoverallperformanceofthesystem, theydo not require to be specifically set each time, their default values work fine. It is, forinstance, the minimal size of validity ranges (that triggers NCS 7), the maximal size ofan action (to prevent ESCHERtoperformbrutalactions,forsafetyreasons),orthe inter-nal parametersofAVTs. Thestrongandquickadaptivenessofthe agents reducestheimpactoftheseparameters. Table 1 showsall theparametersofESCHERandtheirsignificance.

5. Experiments:Real-Timecontrolofcombustionengines

The firstexperimentspresentedinthissectionhavebeen con-ductedonautomaticallygeneratedsyntheticblackboxes.Then, ex-perimentsonarealcombustionengineareshown.The implemen-tationofESCHER usedfortheseexperimentsis aprototype writ-ten inJava 1.7using Eclipse anda component-basedmulti-agent architecture generator calledMakeAgent Yourself(Noël, 2012 ). It runsonalaptopwithanInteli72.67GHzCPU and4GBofRAM. The duration of a lifecycle of ESCHER (i.e. a lifecycle of each of itsagents)dependsmainlyonthenumberofagents.Itis approxi-mately20mswith10agents,and500mswith800agents.Thisis

somethingthatshouldbe improvedbycodeoptmization,butthis isnottheimmediateconcernforESCHER.Herethegoalistoshow thattheagentsareindeedabletolearnhowtocontrolseveral in-putsofanunknownsystem,regardingseveralcriteria.

5.1. Criticalityfunctions

Thefunction1 _used_in_our_experiments_to_compute_critical

lev-elsisdefinedoverRasfollows(Eq. (10) ):

f(x)=





γ

(sup−x−η) 2

2(ǫ−η) +

γ

(sup−x−

η

)+

δ

ifsup−

ǫ

<x≤sup−

η

γ

(sup−x−η) 2

2 η +

γ

(sup−x−

η

)+

δ

ifsup−

η

<x≤sup

100 ifsup<x (10) with

γ

=−2100

ǫ

and

δ

=−

γ

(

ǫ

−

η

)

2

Parameterssup,

ǫ

et

η

are definedby theuser.The curve ofthis function is symmetricalwithrespect to the centerof [0; sup], it decreases on [0;

ǫ],

and increaseson [sup−

ǫ;

sup]. Parameter

η

defines the inflection point, sup acts as the upper bound of the function, above this value the critical level is always 100, and

ǫ

definesthe interval [

ǫ;

sup−

ǫ]

where thecritical level isalways zero.

aregenerated auto-matically.

5.2.Experimentsonsyntheticblackboxes

The useof ablack box generationtool (Boes et al., 2013 ) en-abledustotestESCHERover50casesofvariouscomplexity,with up to dozens of inputs and outputs. We present here two very simple cases to provide a better understanding on how ESCHER reachesacompromise betweenseveralcriteria, andhowit is ro-busttoperturbations.Intheseexperiments,acyclecorrespondsto alifecycleofeachagentfollowedbyasimulationstepoftheblack box.

1 Function whose formula was proposed by our colleague Sophie Jan, at the Toulouse Institute of Mathematics.

(11)

Fig. 4. Optimization of two criteria.

5.2.1. Optimizingtwocriteria

In thisexperiment, the blackbox has one input (I1) and two outputs (O1 and O2) varying fromzero to 100. The setpoint on both outputs is 50. There are two CriterionAgents, one for each output,eachwiththesamecriticalityfunction.Hence,bothcriteria havethesameweight.However, thissetpointisnot reachableon bothoutputatthesametime,thereisnovalue fortheinputthat put both output at50. ESCHER hasto find a compromise, i.e. to minimizethehighestcriticallevel.

Fig. 4 showsthevariationsoftheinputandoutputsofthe con-trolledblackbox,ofthenumberofContextAgents inthesystem, andofthecriticallevels.Theinputisinitialized to1.1,whichsets O1 to 21.8 and O2 to 1.8. O2 is further from the setpoint than O1,itscriticallevelisthereforehigher.ESCHERhasnopreliminary knowledgeontheblackbox.Itsactionatthefirststepisamistake, ESCHERslightlyincreasetheinputwhichprovokesasmallincrease of both criticallevels. A ContextAgent for thisaction iscreated. Thefollowingstep,ESCHER correctsthismistake,andfindthe

ac-tion whichpushtheoutputstowards thesetpoint.Asecond Con-textAgentiscreated,whichactioniskeptuntilthehighestcritical levelstopsdecreasing.

The criticallevelof O1reaches 0atlifecycle76. However,the critical level of O2 is then at 26.1, and still decreasing. The ac-tioniscontinued,sincethehighestcriticallevelisdecreasing,even thoughtheothercriticallevelisincreasing.

Atlifecycle96,criticallevelofO1 becomeshigherthancritical levelofO2.Inconsequence,ESCHERmodifiesitsaction,and criti-cal levelscrossagain.Aserie ofoscillationsfollows,duringwhich 3 newContextAgents are created.Finally, thevalue ofthe input is stabilized,slightlyoscillatingaround 3.O1 oscillatesaround60 andO2 around 40.Bothcriticallevels oscillatearound 5.ESCHER hasreachedthebestcompromise(accordingtothecriticality func-tions),sincethehighestcriticallevelisthelowestpossible.

This experimentshowshowa ControllerAgent isable todeal withaninputthatcontrolseveraloutputswithantinomiccriteria. Different criticalityfunctionswouldhaveleadtoa different

(12)

com-Fig. 5. Robustness to perturbations at runtime.

promise.Forinstance,onecanprioritizeoneoutputovertheother bymakingacriticalityfunctionalwaysgreaterthantheother. 5.2.2. Robustness

ThisexperimentshowshowESCHERreactstoperturbationsin itsenvironment.Here,ESCHERcontrolstwoofthethreeinputs(I1 andI2)ofablackbox.Thethirdinput(I3)ismanuallycontrolled. Thesethreeinputshaveaninfluenceonthesameoutput(O1),on which a setpoint criterion is applied. First, we let ESCHER make O1 meet the setpoint by modifying I1 and I2. Then, we manu-allychangethevalueofI3,provokingaperturbationonO1,which abruptly goaway fromthe setpoint. ESCHERmust adaptitself to thismodificationbyfindingnewvaluesforI1andI2.

Fig. 5 showsthevariationsoftheinputandoutputsoftheblack box,alongwiththenumberofContextAgentsandthecriticallevel ofthesetpointcriterion.Inputsareinitializedto1,whichsetsthe output to 68.The setpoint is 50.ESCHER reachesthe setpointin lessthan100lifecyclesbyincreasingI2only.

At lifecycle 160,I3 is manually set to 50.This makes O1 de-crease, jumping outofthesetpoint, resultinginapeak ofcritical

level,whichrisesfrom0to12.ThisisresorbedbyESCHER,which decreasesI2untilthesetpointisreachedagain.

I3isonceagainmodifiedatlifecycle220,from50to100.This provokesa hugeincrease oftheoutput,thereforeariseofcritical level(from0to72).Onceagain,ESCHERself-adaptstothis pertur-bation.First, I1is increased, then I2. Thecritical levelis brought back to 0 at lifecycle 350, while new ContextAgents have been created.Two other perturbations are later performed. Eachtime, ESCHERisabletobringbacktheoutputonthesetpoint.

Thisexperimentshowsthat ESCHERisableto reactto pertur-bationsonthecontrolledsystem.Itself-adaptstochangesto main-tainanadequatecontrol.Here,eachperturbationisbigenoughto provokethecreationofnewContextAgents.

5.3.Experimentsinrealconditions

The resultspresented inthis section havebeen obtained dur-ingtestsdrivenona125ccmonocylinderfuelengine.Theengine was instrumented so ESCHER has access to temperatures,

(13)

pres-Fig. 6. Experimental Set-Up for the Tests on a Real Engine.

(14)

Fig. 8. Inputs and critical levels during a multi-objective optimization.

sures,andothers,viatheEngineControlUnit(ECU)andagas an-alyzer.

The linkbetweentheengineandtheECU isassuredthanksto variousspecificinstruments.AControllerAreaNetwork(CAN)bus enablesthecommunicationofexternalsystemswiththeECU.CAN busesarewidelyusedintheautomotiveindustry.Acomputer soft-warecalledControlDeskenablesthereadingontheECU(in partic-ular of the variables measured by the sensors), the computation of values from read variables, and the modifications of parame-ters (suchasthe ignitionadvance). ESCHER isconnected to Con-trolDesk viaaspecificcommunicationprotocol,MCD-3(standsfor Measurement,Calibration,Diagnostics)overEthernet,enablingour systemtoreadandwritevaluesontheECU.Finally,agazanalyzer ispluggedontotheengineexhaust.Itmeasuresgasconcentration of various pollutants (carbon monoxide, for instance), and sends

dataviaaserialoutput(RS232/DB25)interfacedwiththeUSBport ofthecomputeron whichESCHERruns. Fig. 6 showsthisset-up. Fortheseexperiments,ESCHERhadtobesloweddownandwaitat least10sbetweeneachlifecycleinordertolettheenginestabilize afterchangingitsparameters.Forthesecondexperiment,ESCHER hadto wait 10 more secondsbetween each lifecycle for the gas analysertoprovidedata.

5.3.1. Torqueoptimization

Inthisexperiment,theengineisputat5000 rpm,withaload of870mbarintheintakemanifold.ESCHERcontrolsthetotal in-jectedfuel massandtheignitionadvance.Theonlycontrol crite-rionisto maximizetheindicated meaneffectivepressure (IMEP), whichreflectsthetorque.

(15)

Fig. 9. Engine outputs during a multi-objective optimization.

The injected fuel mass is measured in milligrams per shot (mg/shot),andtheignitionadvanceincrankshaftdegrees(°c),i.e. the position of the piston in the cylinder when the combustion is triggered. IMEP is measured in bars. IMEP is a very unstable variable,inparticularwithmonocylinderengines.Workingathigh rpmandhighload,asitisthecaseinthisexperiment,reducesthe instability.

Thecriticality functionisstrictly decreasing(sincewe wantto maximize IMEP).We do not know a priori what is the maximal reachableIMEP,thereforewecannotsetthecriticalityfunctionin a waythat it returns0 when themaximal PMIis reached. Thus, we donot expect the critical level to be zero at the endof the test,butwedoexpectittobelower attheendthanatthestart. Thisistrueforeverycriticalityfunctionusedwiththerealengine.

Fig. 7 showsthe variations of the controlled inputs, the opti-mizedoutput,thenumberofContextAgentsandthecriticallevel.

At the start, theinjected fuel mass islow (7 mg/shot) regarding thecurrentoperatingpoint.Theengineisonthevergeofstalling. Ofcourse,ESCHERwhichdoesnot haveanyknowledgeaboutthe engine, isnot aware ofthisfact. Its first actionis amistake: ES-CHER decreases both parameters, which leads to a dropof IMEP (andariseofcriticallevel).

ESCHERquicklyfindsawaytomakethecriticalleveldecrease, by increasing first the injected fuel mass, then the ignition ad-vance. IMEPfinallyreachesitsmaximum(about9bars),the criti-callevelstopsdecreasing.ESCHERstabilizesitselfat11.50 mg/shot of injected fuel,with a2424◦_c_ignition _advance._The _decrease _of

theseinputsatlifecycle24isexplainedby noseontheIMEP. But thesystemquicklycorrectsitself.

ESCHER managedtoimprovetheIMEP by3 barin9lifecycles (about 90s),reachingthe maximalIMEP possibleforthe consid-eredoperatingpoint.Obtainingthesameresulttakesaskilled

(16)

en-gineer, used to thisparticular engine, around 20 minwith usual methods.

5.3.2. Multi-Objectiveoptimization

Forthistest,theengineisputinanotheroperatingpoint(2500 rpm,750mbar). ESCHERcontrolstheinjectedfuelmassthe igni-tionadvance,butalsothestartofinjection(SOI).Thisnew param-eteristhetiming oftheinjectionrelativelyto thepositionofthe piston, itismeasured incrankshaftdegrees. Thereare criteriaon fouroutputs:

• IMEPmustbemaximized;

• fuelconsumption,measureding/kWh,mustbeminimized; • hydrocarbons(HC)emissionmustbeunder500 ppm(partspar

million);

• carbonmonoxideconcentration(CO)mustbeunder3%.

The last three criteriaare contradictory withthefirst one. In-deed,themostefficientwaytoimproveIMEPistoinjectmorefuel. However,thisalsoincreasefuelconsumptionandpollutants emis-sions.WeneedtoadjustignitionadvanceandSOItoextractmore powerfromthecombustion.ThisiswhatESCHERhastolearn.

Fig. 8 showsthevariationsofthecontrolledparametersandthe criticallevels,whileFig. 9 showsthevariationsoftheoutputs.At the beginning,the highestcriticallevels isthat offuel consump-tion.Thus,ESCHERseekstodecreasethefuelconsumptioncritical level inpriority. Thesystemmanagesto dosoduringthefirst 20 lifecycles,inparticularbyincreasingtheignitionadvancefrom10 to26◦_C_and_by_decreasing_the_SOI_from_-150_to₋₄₀₀◦_C,_while_the

fuelinjectionoscillatesbetween6and7 mg/shot.

At lifecycle 10, IMEP maximization becomes the mostcritical criterion, however,its criticallevelisdecreasing,so thesame ac-tions are continued. At lifecycle 20, the COthreshold is crossed, its criticallevel rises. ESCHER exploresnewactions to solve this problem.ItcontinuestodecreaseSOIbutstarttodecreaseignition advance. Thisleadto a peakofconsumption anda dropofIMEP betweenlifecycles45and50,alongwithsmallexcessesof hydro-carbons. Finally, after some oscillations, ESCHER manages to put thepollutantsundertheirrespectivethresholds,whilemaintaining ahighIMEPandalowconsumption.

Attheendofthetest,IMEPisaround8 bar(2 barhigherthan the begining),whilefuel consumptionisaround 275 g/kWh(165 g/kWhlessthantheinitialvalue).Pollutantsemissionsarehigher thantheirinitialvalues,buttheymeettheirthreshold.ESCHERhas successfully completed a standard engine optimization (i.e. opti-mizingtorqueandconsumptionwhilerespectingpollution thresh-olds)withouthavinganypriorknowledgeaboutengines.Thistest lasted 123 lifecycles,around 41 min(ESCHERhas towait forthe gazanalyzer).Thisisabouttwiceasfastthanahumanexpertwith usualmethodsforasimilarendresult.

6. Conclusionandperspectives

ThisarticlepresentedESCHER,asystemthatillustratesthe con-tributions of the AMAS approach to the field of control systems andcalibration.Thisarticlefocusedonthefullpresentationofthe system, and showed results obtained both with unrelated black-box simulations andrealengines. The goal withtheexperiments onblack-boxeswastoillustratehowESCHERworksonbasiccases. Experimentsontherealengineshowits applicabilityinreal con-ditionsanditsrobustnesstonoisydata.Overall,theautomatic cal-ibration performedbyESCHERisfasterthanmethodsusedinthe industryforasimilarresutlt.Howevertheseexperimentshighlight alimitationofESCHER.Wehadtomakeitwaitbetweenits lifecy-clesfortheenginetostabilizeandforthegazanalyzertoprovide data. Thisisduetoits inabilityto correlateactionsandeffects if

theeffectsbecome sensibletoolongafter theaction.Further pa-perswillpresentcomparisonswithotherlearningmethods, detail-ingtheadvantagesandlimitationsofeach.

TheAMAS approach breakswiththe traditionaltop-down de-sign of artificial systems. It focuses on the local behavior of agents,leaving them the taskof controlling their own organiza-tion. An adequate global function emerges from this local self-organizationprocess.Wehopethisisthefirststeptowardsafully self-reconfigurableECU.

OtherAMASshavetackledtheproblemoflearningandcontrol withsimilar Context Agents, for instance withmodel generation (Nigon et al., 2016 )andambientrobotics(Verstaevel et al., 2016 ). ContextAgents arebeinggeneralizedandstandardizedtobecome apatternforcontextlearninginamulti-agentsystem(Boes et al., 2015 ).

AMASsareayoungtechnologycomparedtothemajorityofAI methods used inintelligent control, such asartificial neural net-worksor geneticalgorithms. Our futurework must focus on the formalization of the approach to enable a priori proofs ofAMAS properties.Thisisaworkinprogress,whichfirststepshavebeen madewithEvent-B(Graja et al., 2014 )andcontinuous approxima-tion(Stuker et al., 2014 ).

References

Ashby, W.R. , 1956. An Introduction to Cybernetics. Chapman & Hall, London, UK . Astudillo, C.A. , Oommen, B.J. , 2014. Topology-oriented self-organizing maps: a sur-

vey. Pattern Anal. Appl. 17 (2), 223–248 .

Boes, J. , Glize, P. , Migeon, F. , 2013. Mimicking complexity: automatic generation of models for the development of self-adaptive systems. In: Proceedings of Inter- national Conference on Simulation and Modeling Methodologies, Technologies and Applications. INSTICC Press, Reykjavik, Iceland, pp. 243–250 .

Boes, J. , Nigon, J. , Verstaevel, N. , Gleizes, M.-P. , Migeon, F. , 2015. The Self-adaptive context learning pattern: overview and proposal. In: Proceedings of Interna- tional and Interdisciplinary Conference on Modeling and Using Context (CON- TEXT). Springer, Larnaca, Cyprus, pp. 91–104 .

Bongard, J.C. , 2013. Evolutionary robotics. Commun. ACM 56 (8), 74–83 .

Bonjean, N. , Mefteh, W. , Gleizes, M.-P. , Maurel, C. , Migeon, F. , 2014. Adelfe 2.0. In: Cossentino, M., Hilaire, V., Molesini, A., Seidita, V. (Eds.), Handbook on Agent-Oriented Design Processes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 19–63 .

Bull, L. , Sha’Aban, J. , Tomlinson, A. , Addison, J.D. , Heydecker, B.G. , 2004. Towards distributed adaptive control for road traffic junction signals using learning classifier systems. In: Applications of Learning Classifier Systems. Springer, pp. 276–299 .

Choy, M.C. , Srinivasan, D. , Cheu, R.L. , 2006. Neural networks for continuous online learning and control. IEEE Trans. Neural Netw. 17 (6), 1511–1531 .

Deacon, T. , Koutroufinis, S. , 2014. Complexity and dynamical depth. Information 5 (3), 404–423 .

Deng, L. , Yu, D. , 2014. Deep learning: methods and applications. Found. Trends® in Signal Process. 7 (3–4), 197–387 .

Di Marzo Serugendo, G. , Gleizes, M.-P. , Karageorgos, A. , 2011. Self-organising Sys- tems. In: Di Marzo Serugendo, G., Gleizes, M.-P., Karageogos, A. (Eds.), Self-Or- ganising Software: From Natural to Artificial Adaptation. Springer Berlin Heidel- berg, pp. 7–32 .

Fabri, S.G. , Bugeja, M.K. , 2013. Kalman filter-based estimators for dual adaptive neural control: a comparative analysis of execution time and performance issues. In: Proceedings of the 10th International Conference on Informatics in Control, Automation and Robotics. INSTICC Press, Reykjavik, Iceland, pp. 169–176 . Feldbaum, A .A . , 1961. Dual control theory, I-IV. Automation Remote Control 21–22 .

874–880, 1–12, 109–121, 1033–1039

Ferber, J. , 1999. Multi-agent systems: an introduction to distributed artificial intelligence. Addison-Wesley Reading .

Georgé, J.-P. , Gleizes, M.-P. , Camps, V. , 2011. Cooperation. In: Di Marzo Serugendo, G., Gleizes, M.-P., Karageogos, A. (Eds.), Self-organising Software: From Natural to Artificial Adaptation. In: Natural Computing Series. Springer Berlin Heidelberg, pp. 193–226 .

Graja, Z. , Migeon, F. , Maurel, C. , Gleizes, M.-P. , Laibinis, L. , Regayeg, A. , Kacem, A.H. , 2014. A pattern based modelling for self-organizing multi-agent systems with event-b. In: Proceedings of International Conference on Agents and Artificial In- telligence. INSTICC Press, Angers, France, pp. 223–236 .

Heylighen, F. , 2008. Complexity and Self-organization. In: Bates, M.N., Marcia, J., Maack (Eds.), Encyclopedia of Library and Information Sciences, 3rd Edition. Taylor and Francis, pp. 1215–1224 .

Jesus, I.S. , Barbosa, R.S. , 2013. Tuning of fuzzy fractional pd β_{+i controllers by genetic}

algorithm. In: Proceedings of the 10th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2013). INSTICC Press, Reykjavik, Iceland, pp. 282–287 .

(17)

Khamis, M.A. , Gomaa, W. , 2014. Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi- -agent framework. Eng. Appl. Artif. Intell. 29, 134–151 .

Kober, J. , Bagnell, J.A. , Peters, J. ,2013. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32 (11), 1238–1274 .

Kolmogorov, A.N. , 1998. On tables of random numbers. Theor. Comput. Sci. 207 (2), 387–395 .

Kruchten, P. , 2004. The rational Unified Process: An Introduction. Addison-Wesley Professional .

Lemouzy, S. , Camps, V. , Glize, P. , 2011. Principles and properties of a MAS learning algorithm: a comparison with standard learning algorithms applied to im- plicit feedback assessment. In: Proceedings of 2011 International Conference on Web Intelligence and Intelligent Agent Technology. Springer, Lyon, France, pp. 228–235 .

Mitchell, T.M. , 2006. The Discipline of Machine Learning. Carnegie Mellon Univer- sity, School of Computer Science, Machine Learning Department .

Morin, E. , 2008. On Complexity. Hampton Press .

Nigon, J. , Gleizes, M.-P. , Migeon, F. , 2016. Self-adaptive model generation for ambient systems. In: Proceedings of the 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016). Elsevier, pp. 675–679 . Noël, V. , 2012. (Ph.D. thesis). Université de Toulouse, Toulouse, France .

Raghavan, V.V. , Gudivada, V.N. , Govindaraju, V. , Rao, C.R. , 2016. Cognitive Comput- ing: Theory and Applications. Elsevier .

Ren, W. , Cao, Y. , 2013. Distributed Coordination of Multi-Agent Networks: Emergent Problems, Models, and Issues. Springer Publishing Company, Incorporated . Stuker, S. , Adreit, F. , Couveignes, J.-M. , Gleizes, M.-P. , 2014. Continuous approxima-

tion of a discrete situated and reactive multi-agent system: contribution to agent parameterization. In: Dam, H.K., Pitt, J., Xu, Y., Governatori, G., Ito, T. (Eds.), Proceedings of the 17th International Conference on Principles and Prac- tice of Multi-Agent Systems PRIMA 2014: December 1–5, 2014. Springer Inter- national Publishing, Gold Coast, Australia, pp. 365–380 .

Thórisson, K.R. , 2012. A new constructivist ai: from manual methods to self-constructive Systems. In: Theoretical Foundations of Artificial General Intelligence. Springer, pp. 145–171 .

Urbanowicz, R.J. , Moore, J.H. , 2009. Learning classifier systems: a complete introduction, review, and roadmap. J. Artif. Evol. Appl. 2009, 1 .

Verstaevel, N. , Régis, C. , Gleizes, M.-P. , Robert, F. , 2016. Principles and experimenta- tions of self-organizing embedded agents allowing learning from demonstration in ambient robotics. Future Gen. Comput. Syst. Emerg. Ambient Ubiquitous Syst. 64, 78–87 .

Von Bertalanffy, L. , 1968. General System Theory: Foundations, Development, Appli- cations. George Braziller, New York .

Watkins, C. , Dayan, P. , 1992. Q-learning. Mach Learn 8 (3–4), 279–292 .

Wooldridge, M. , 2009. An Introduction to Multiagent Systems - Second Edition. John Wiley & Sons .