Current Network Management Challenges - Software agents in network management

1.1.1 Dynamism

Probably one of the most intricate problems that people from the whole NM field are facing is that of keeping pace with network evolution. Currently, several months (if not years) are required in order to build an operational Network Management System (NMS) using a management platform. The obtained NMS is very inflexible and hardly main-tainable. Therefore, it is difficult to update the NMS to rapid changes in the structure and topology of the managed network. There are at least two problems related to the dynamic evolution of managed networks. The first problem is how to instrument new network components. In general, hardware devices are supplied with SNMP agents that provide the basic instrumentation for monitoring and quite limited control. Conversely, applications and services are rarely instrumented. In some cases, a proprietary man-agement tool is supplied. Such tools rarely adopt a manman-agement standard to be easily integrated into an NMS.

The second problem is how to integrated changes in the deployed NMS. In general, this is a mundane task that requires to load the new MIB into the NMS, to initiate the day-to-day monitoring, to settle alarms, to handle these alarms and correlate them with the other alarms, etc. There is no mechanism that allows the NMS to automatically adapt to new changes in the network topology and services.

1.1.2 Evolving Management Requirements

Related to the dynamism of the managed network is the challenge of constantly evolving management requirements. New types of services such as e-commerce, VoIP (Voice over IP) and application outsourcing cannot be managed using the usual network manage-ment solutions, which are mostly restricted to elemanage-ment- and network-level managemanage-ment.

Increasingly, new applications, software add-ons, and stand-alone tools are necessary to be able to answer these management requirements. But in order to be cost-effective,

these stand-alone tools have to interoperate with legacy management solutions. This is not obvious to achieve as the following two examples of new management requirements show.

1.1.2.1 Service-level Agreements

Market deregulation has necessitated contracts between network service providers and consumers. Service-level Agreements (SLA) [Lew99] are contracts that specify the ser-vices supplied to the customer, as well as their quality parameters and liabilities associ-ated to the violation of the contract.

Service-level agreements require multiple functions to be included in NMSs. First, agreements should be specified and fed into the NMS using a certain formalism. After-wards, a capability has to map the abstract agreement sepcifications into precise QoS parameters to be monitored. The NMS has then to initiate monitoring operations in or-der to check whether these parameters are in acceptable ranges. Finally, the NMS has to take the corrective actions and to generate the necessary alarms in the case that the agreement is violated.

Despite the issues that these functions involve, one major problem remains how to integrate SLA functions within an NMS that was not designed with such funtions. In general, SLA functions have to be supported by independent dedicated tools and cannot be plugged into an integrated NMS in a straightforward way [Lew99, pp 210–212].

1.1.2.2 Application-level and End-to-end QoS Management

Application-level management is a recent challenge for NM. One of the major problems with application-level management is that dependencies between the user applications and the network- and resource-level elements has to be properly identified. Yet the cur-rently available instrumentation rarely goes beyond a low-level of information, and there is no common agreement on how applications should declare the services and resources on which they depend.

Another aspect of the problem is end-to-end management which is required to effi-ciently manage services such as IP telephony. End-to-end management requires span-ning management views from different network providers. Only the combination of all these views allows to provide an overall end-to-end management view of the delivered service. This requires mechanisms for heterogeneous management applications of dif-ferent network and service providers to exchange information and to cooperate, which is another challenge for today’s NM.

1.1.3 Integration and Interoperability

In general, multiple independent management tools are used simultaneously. This is necessary to cover the whole, or at least the most required management functions. These management tools are often heterogeneous, i.e. developed by different parties, and it is difficult to communicate and coordinate between them. Every tool uses its own inter-nal database and uses its own monitoring and report generation functions, which results in uselessly duplicated resources and operations. The reports generated by the different tools do not have a uniform format, which hardens the tasks of analysis and interpreta-tion for the network administrator.

[HAN99] describes different levels of possible integration. The simplest integration occurs at the level of the user interface. Different management functions can be invoked from the same user interface, while still performed by completely independent tools.

Proxies and gateways can provide increased integration between different management tools by performing such tasks as information model conversion and data change prop-agation. At a higher level of integration, a uniform management information base is provided. The different management tools rely on this information base and comply to the same management information model. This allows to avoid inconsistency and data duplication. Finally, the highest level of integration implies that the different tools can seamlessly interoperate together. The management services provided by one tool can be directly used by another tool or management module. With this full integration, each management module concentrates exclusively on the management functionality for which it is deployed.

1.1.4 Management Automation

Currently, a large part of the management operations provided by available NMSs are limited to monitoring. Only limited control is possible, and the administrator has to per-form most control operations manually. In [Che99], the author shows that there are three reasons for the lack of automation in today’s NM.

1. There is currently no powerful mechanism that allows the network administrator to specify an explicit and complete description of the expected normal behavior of the network. Disparate thresholds and alarm patterns cannot suffice to let an NMS work out to bring the network from an abnormal state back to its normal operation.

2. Similarly, there is no possibility to provide the NMS with a description of the ef-fects of management actions. In order to autonomously initiate the adequate

man-agement operations when necessary, the NMS has to be aware of the impacts and side-effects of these management operations.

3. Current standard-based instrumentation fails to provide a complete control of the managed NEs. In many cases, network components include dedicated RPC, telnet or Web based control interfaces. Therefore, control operations have to deal with low-level details, and need to cope with heterogenenous control interfaces, which makes the generation of automatic control procedures very complex.

The dream of a self-healing network is nevertheless still tempting many industrials.

Efforts to provide “plug-and-play” networks based on JINI (http://www.jini.org), and the

“zero administration initiative for Windows (ZAW)” [Mic97] are proofs that management automation remains to be the ultimate target in NM, although apparently unreachable in the near term.

1.1.5 Proactive Management

Reducing the down time of the managed network strongly requires proactive manage-ment that detects, prevents, and repairs faults before they explicitly take place and se-riously affect the status of the network. Proactive management is currently reduced to setting early thresholds that allow to detect the possible beginning of network anomalies such as congestions and disk failures. The problem with early thresholds is that they sig-nificanly increase the number of false alarms and the number of alarm events in general.

The values of such thresholds have to strike a balance between sufficiently early detec-tion and the number of false alarm events. Early thresholds allow to detect only a limited type of network faults.

To promote proactive management, the NMS has to be aware [OL98] of the network applications and services. In addition, it has to be able to correlate different distributed views of the network, possibly provided by several management entities.

1.1.6 Other Challenges: Adaptable Scalability and Reliability

In general, problems related to scalability in NMSs are tackled using a hierarchical or-ganization. Such organization reliefs the central manager station from dealing with the whole processing required for management operations by pushing this processing down to intermediate entities. There are still ongoing research and standardization efforts to provide standard frameworks for such management distribution. Still the remaining problem is that the hierarchical organization lacks the flexibility required to support dy-namic network configurations [HAN99, pp 117]. Adaptable scalability would make an

NMS automatically scale by changing the organization of its managers in a way to mini-mize management overhead according to the changing topology of the network.

With the proliferation of distributed management, a reliability problem is raised. In distributed management, such as hierarchical management, the central manager relies on intermediate entities to perform local processing of management information and to automate some management tasks. The reliable operation of the central manager, which is the most critical part of the NMS, becomes dependent on the reliabilty of the intermediate managers. There is currently no standard way to insure the reliability of these intermediate managers, while serious reliability leaks may occur if this problem is not correctly addressed.

Dans le document Software agents in network management (Page 22-26)