A Polling Layer for Belief Instrumentation

Agent-based NM applications rely on a large set of beliefs on the network status and behavior. These beliefs, referred to asnetwork beliefs, need to be created and updated from the currently existing instrumentation, mostly using SNMP agents. Many NM skills

define network beliefs that have to be filled using polling operations. The duplication of polling functionality in many skills results in a large number of polling threads, thus lead-ing to heavy CPU usage. Moreover, the heavy monitorlead-ing activity required for high-level and sophisticated management functions need to be rationalized in order to optimize polling operations and network resources.

Consequently, SNMP polling operations are encapsulated into a highly optimized module called thepolling layer. Agent skills use the services of the polling layer to in-strument the necessary network beliefs using a higher-level API than the instantiation of polling threads and the handling of SNMP polling packets. In addition, the polling layer uses a small number of polling threads for the instrumentation of all agent network beliefs.

The polling layer is based on sophisticated optimization mechanisms, and provides a powerful interface for NM applications. Details about the polling layer do not fit in the context of this chapter, and are therefore thoroughly presented in Appendix A.

4.9 Summary and Conclusion

Our agent architecture is different from other NM-oriented agent approaches. First, our skill based architecture is not restricted to a single type of NM applications. Most of the agent applications surveyed in the previous chapter are designed to tackle specific NM issues and are therefore not reusable for other types of management applications. Our approach is open to the overall NM functional areas and is designed to answer global NM requirements instead of specific applications. Management competences can be developed in skill modules which allow agents to undertake new management roles and functions.

Second, our agent architecture is not restricted to certain techniques of software agents. Instead, it is designed in a way that new agent features can be encapsulated into skills and supplied to the agents, thus providing them with new behaviors and capabil-ities. The behavior of each agent can be customized, and each agent can be endowed with the necessary intelligence to achieve its roles. This is also different from the agent approaches presented in the previous chapter, which commit to a certain type of agents, thus inevitably limiting the potential application of the resulting agents. We have also shown that our architecture offers the basic agent services to allow agent communica-tions, while more sophisticated services can be provided by dedicated skills. Therefore, our approach allows to customize the organization of the agent system exactly as needed, which is opposed to other approaches that impose a certain type of agent organization,

for example around a facilitator agent, or within administrative groups that constrain inter-agent communications.

Our architecture supports the developement of highly dynamic management appli-cations. Any task instantiated from a skill capability can subscribe to relevant changes in the agent belief database, and dynamically adapt to these change. Hence, a domain-based monitoring task can automatically integrate the monitoring of newly added ele-ments in the domain and stop the monitoring of the removed eleele-ments. This support for dynamism is not constrained to a single agent, but can occur between different agents as well. The communication mechanism allows an agent to transparently adapt its be-havior according to the beliefs of another agent. On-the-fly skill plugging while the agent is running is another feature that promotes dynamism. An agent can therefore acquire new management functions and new agent capabilities without interrupting its opera-tion. This is important in future generation NMSs that have to support frequent updates to the management functionality.

Concrete case studies using this agent architecture will provide enhanced evidence of these advanced properties. These case studies are the subject of the next two chapters.

Chapter 5

The First Case Study: When Management Agents Become

Autonomous, How to Insure Their Reliability?

The first case study has two distinct aspects. The first is an NM aspect that, in a simpli-fied way, shows how a global management task can be dynamically distributed amongst a set of agents, using domain-based delegation. The second aspect is rather a funda-mental problem in using autonomous agents to achieve critical NM tasks. If agents are deployed to undertake sensitive management responsibilities, then how to be sure that these agents remain reliable? This chapter proposes to answer this question.

5.1 Rationale

The problem of the reliability of management agents has not been a major issue in clas-sical management paradigms. As a matter of fact, such management agents do not un-dertake important management responsibilities and all the decisions were taken at the manager level. Therefore, there was no real danger that a particular agent performs un-controlled management operations that might compromise the overall operation of the network. Moreover, the management protocols employed to communicate with these agents intrinsically ensured the reliability of the agent. For example, the SNMP protocol

mainly uses confirmed communications for SET operations, or polling-based monitor-ing in which eachGET REQUESTquery expects aGET RESPONSEreply.

However, these conditions are no longer ensured when a distributed NMS is based on highly autonomous agents. Autonomous management agents are capable of making high-level decisions. They have the authority to execute sensitive management opera-tions without direct control from the network administrator. Therefore, it is not straight-forward to detect the unreliability of an autonomous agent that might make wrong man-agement decisions, or might not even react to critical changes in the network. Such ab-normal behaviors may compromise the overall security and performance of the managed network.

Since the agents themselves make use of the managed network, which is fault prone, autonomous management agents may become unreliable due to network or system faults. Therefore, an agent-based NMS must be able to promptly detect the possible unreliability of its intelligent management agents. When an agent is detected to be un-reliable, the other agents should be able to cooperate together in order to ensure the management tasks that were previously assigned to the unreliable agent. Therefore, the agents must be capable of dynamically undertaking new management tasks during their operation.

The purpose of this first experiment with our skill-based agent architecture is to build a prototype of an agent-based NMS in which autonomous agents, affected each to a dis-tinct domain of the network, are able to mutually test each other and to detect the agents that become unreliable due to network, system or service failures. These so called do-main agentscan dynamically redistribute their management tasks in the case of an agent failure, so that all these tasks continue to be performed.

We use a distributed diagnosis algorithm to detect the unreliability of the au-tonomous agents. This algorithm is based onSystem Level Diagnosis(SLD) [Ber96] which will be presented in the next section. As an example of a distributed management task the agent system may achieve, we choose the monitoring of NEs for fault detection pur-poses. Each autonomous agent is affected to a network domain composed of a dynamic set of NEs that the domain agent has to monitor.

After detailing the principle of SLD, we present in the subsequent sections the way we proceed to identify the required agent roles and correspondent skills. Each identified skill is then presented, with the intra/inter-agent interaction involved. The experimental results and conclusions about this case study close the chapter.

Dans le document Software agents in network management (Page 115-121)