You are on page 1of 6

9th IEEE/ACIS International Conference on Computer and Information Science

An Architecture of Extended Network Management System: Autonomous Cooperation between Knowledge Resource and Network Equipments
Kazuto Sasai 1
1

Naoyuki Tanji 2

Yusuke Takahashi 2

Gen Kitagata 1

Tetsuo Kinoshita 1

Research Institute of Electrical Communication, Tohoku University, Japan 2 Graduate School of Information Sciences, Tohoku University, Japan

AbstractTo reduce the work load of network administrators, e.g., fault detection/recovery, performance analysis and security maintenance etc., knowledge-based intelligent support system for network administrator based on active information resource (AIR-NMS) have been proposed. In this paper, we describe some design pattern and some implemented examples of activated information resource (AIR) on network administration. A design pattern consists of the classication policy of information resources such as state of network equipments and representation of information utilization knowledge such as expertise and heuristics of administrators. Based on these AIRs, we present a prototype architecture of cooperation among AIRs and integration of functions. The architecture realize the exible/scalable fault recovery support processing in form of parallel distributed procedure. Keywords-network management support system; network state information; network management knowledge; active information resource; knowledge-based approach

Router

Subnet A
Router

Subnet B
Router

Server

PC

Printer

Server

PC

Knowledge Base

KUS FUS
Status Information

I-AIR

I-AIR

I-AIR

I-AIR

K-AIR
Fault Detection Acquisition of Device Info. Diagnosis Request KUS FUS

Detection of Cause Derivation of Countermeasure

I-AIR

K-AIR
K-AIR

Management Knowledge

Administrator

Countermeasure

Figure 1.

Construction of AIR-NMS

I. I NTRODUCTION According to rapid growth of communication networks, the network administration activities become more demanding and data-intensive, and network management task automation solutions become more necessary. Many researches for network management systems (NMSs) are proposed [1], and some of them actually implemented to particular infrastructure system solutions. A typical approach of them is polling centralized, static, polling-based management that requires high-capacity computing resources. However, in view of the dynamic nature of developing network systems, future management solutions should be exible, adaptable and intelligent without increasing the burden on network resources. Since failures in communication network are unavoidable, quick detection and identication of the cause can fortify these systems, making them robust, with more reliable options, thereby ultimately increasing the level of condence in the services they provide[2], [3] Motivated by these conditions, we have proposed network management system based on active information resource (AIR-NMS) to improve an intelligent, adaptive and autonomous network management support paradigm for various network systems [4], [5]. An active information resource (AIR) is an information resource that is added its utilization support knowledge and functions to cooperate the other
978-0-7695-4147-1/10 $26.00 2010 IEEE DOI 10.1109/ICIS.2010.107 617

AIRs and to maintain its own information resources [6]. By introducing formalization with AIR to various information resources on network and to knowledge about management task in human brain and/or knowledge base, e.g., web, books and databases and so on, NMS can be extended to act for network administrators. So far, we have proposed some concrete AIR examples of information resources on network and conrmed its functionality. In this paper, we present an architecture of AIR-NMS, namely the scheme of autonomous cooperation between information on network equipments and expertise of administrator, to improve exibility, adaptability and robustness of network systems. Next section presents the examples of introduction of AIR to network management systems, and in section 3, we present architecture of activated AIR-NMS, section 4 shows the conclusion and remarks. II. D ESIGN OF ACTIVATED I NFORMATION R ESOURCES In this section, we express the network management system based on active information resource (AIR-NMS). The AIRs on communication network management mainly consist of two perspectives, status information AIR (I-AIR) and management knowledge (K-AIR).

Figure. 1 shows a concept diagram of AIR-NMS. I-AIRs manage the status information, which is classiable into two types: static information and dynamic information. For instance, the relationship between IP addresses and Mac addresses, host names, domain names, IP-routing, etc., are included as static network information, and the dynamic information includes number of packet trafc, RMON-MIB, SNMPv2-MIB, logs of network services, and so on. IAIRs are responsible to monitor the operational conditions of network, detect the important conditions to be alarmed, inspect/notify the conditions in response to requests of administrator. K-AIRs manage heuristics or expertise of expert administrators which can utilized as the generic knowledge of network management tasks. K-AIRs and I-AIRs interact with each other to deal with the given/detected problem of the management task. A. Example of I-AIRs Conventionally, management tools on individual network equipment and/or service providing system collect status information through periodical polling, aggregate them, and decide the operational conditions of the network system using his expertise/heuristics. The I-AIR is introduced to partially support the empirical procedure of administrator; the distributed and effective monitoring of network system, detection of network failure, processing of collected information according to failure, improvement of reliability of detection, recognition, and specication of failure through cooperation among AIRs. Table I shows concretely implemented I-AIR examples. In I-AIR, two information resource types, plain-text format and RDF/XML format, are utilized to represent and manage the status information. For instance, the log information is acquired through the Syslog (a standard logging solution on UNIX and LINUX systems ) in plain-text format and the I-AIR extracts a diverse type of log information and converts it to RDF/XML format specications. On the other hand, I-AIRs hold knowledge about information resources together with the functionality to handle collected information. Essential components of knowledge represented an I-AIR are follows:

Table I E XAMPLE OF IMPLEMENTED I-AIR S No. 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 Name (Role) Network Disconnection detector NIC conguration failure detector SPAM mail detector MSBlaster attack detector Mail send/receive error detector TCP/IP stack failure checker NIC conguration failure checker HUB failure checker Router failure checker DNS server process checker SMTP server process checker POP server process checker DNS connection checker Network route to host checker Kernel information checker Lease IP address checker Mail server error checker Number of SPAM mail

AIR Identication Knowledge (ID) The ID includes an identication number, task number of I-AIR, etc. knowledge about Information Resource (IR) The IR includes a type, an update-time, a format type, etc. Knowledge about Failure Inspection (FI) The FI includes two types of knowledge to inspect the failure: text information to be detected in logs, and a threshold of packets, etc. Knowledge about Periodic Investigation Process Control Method (CM) The CM includes the polling time and other conditions for updating of the information resource.

Knowledge about Cooperation Protocol (CP) The CP includes protocol sequences for cooperation with other AIRs. The knowledge contained in an I-AIR as ID, IR and CP is required mainly to operate on the information resource and facilitate communication and cooperation among I-AIRs. The preeminent characteristic of I-AIR is its autonomous monitoring mechanism, which is supported via FI and CM for the inspection and investigation of obstacles that hinder the normal network operation. I-AIRs mentioned above are about mainly dynamic, temporal status of network equipments. Another kind of information designed as static information on network equipments. The static information consists of atemporal information, e.g., host name, IP address, service conguration and device information. These information resources help administrator and other AIRs to specication and collection of equipment characteristics and/or property. Figure 3 show an example of representation in static I-AIR designed according to XML formalization. When I-AIRs for static information receive request message of information acquisition, they add the its information related to request content if message satises the condition for activation. Circulating the information acquisition messages within static information I-AIRs, enough amount of information about network equipments are collected. In next section, we present example of K-AIRs. Information collection function of I-AIR is important role that supports the inference process of K-AIRs.

618

I-AIR-No.15
(ID :air id :workplace id :task id :info type :path :format type :time :failure name :check name :check string i-air@w1:pcB1.example.com w1:pcB1.example.com 0123456789 ) ping_result stdout text 2010/01/01/00:11:22 ) NIC_problem Ping_NIC_pcB (100% packet loss Destination net unreachable TTL expired in transit Ping request could not find unknown host) (CI:exit yes) ) ping (-c 4 172.17.1.2 ) (TI:interval 60000) ) Inform failure Protocol Report Protocol )

KSC-1

S: unable to send mail u b e o se d C: unable to name resolve C: network connection failure C: sendable mail size over

S: unable to name resolve u be o e eso ve C: setting error of resolver C: setting error of DNS server C: network connection failure

KSC-2

(IR

KCD-1 1
C: sendable mail size over DM: 1.Check error code 522 at MTAs error message. DR: A size of sent mail is over a size of sendable mail which is set to SMTP server Srv-X. C: sendable mail size over DM: 1.Check a size of sent mail. 2.Check a size of sendable mail of SMTP server. 3.Compare 3 Compare these sizes. sizes
DR: A size of sent mail, 3000KB, is over a size of sendable mail, 2000KB, which is set to SMTP server Srv-X in subnet Subnet-X

KCD-3 3
C: network connection failure DM: 1.Check a network topology. 2.Check responses to ping at each path. DR: There is a connection problem between pc-A1 and rt-1.

(FI

C: sendable mail size over M: SMTP server Srv-X1 can change settings, because its OS is CentOS and its MTA is Postfix, by following operations:

C: sendable mail size over M: Reduce the size of sent mail to under 2000KB.

KCD-2

:check info (CM :method name :arguments

KCM-1

KCM-2

:trigger info (CP :protocol

Figure 4.

Knowledge representad as K-AIRs

Figure 2.
Static I-AIR

Example of I-AIR (No.15)

<? Xml version=1.0encording=Shift_JIS?> <subnet> <subnetName>Subnet A</subnetName> <domain>example.jp</domain> <addrspace>172.20.2.0/24</addrspace> <gateway>172.20.2.1</gateway> <firewall>active</firewall> <adminName>Mr. Noname</adminName> <adminMail>noname@example.jp</adminMail> <server> <service>SMTP</service> <ipaddress>172.20.0.2</ipaddress> <name>smtp.a_lab.example.jp</name> <process>Postfix2.1</process> </server> </subnet>

Figure 3.

Example of static I-AIR

B. Example of K-AIRs In our K-AIR design of prototype system focuses on fault resolution support. To make constitution of K-AIRs be adequate to that, we have proposed the representation and cooperation schema of network management knowledge AIR. A network fault resolution operation after fault appearance consists of following three processes: 1) Cause assuming assuming the conceivable causes from observed symptoms or detected faults. 2) Cause diagnosing diagnosing the exact causes of the faults and presenting the diagnosis report to the network administrators. 3) Measure planning planning the measures against the identied causes and presenting the to the network administrators.

On these operation bases, the proposed scheme divides the knowledge of a network fault resolution procedure, and classies the divided knowledge parts into the three operation types. For each type of classied knowledge parts, the following knowledge elements are extracted: KSC (Symptom, Cause) these knowledge elements are extracted from a part classied to the cause assuming operation. They are descriptions of a symptom / fault and its conceivable causes. In most cases, more than one cause is conceivable for a symptom / fault. KCD (Cause, Diagnosis method, Diagnosis report) these three knowledge elements are extracted from a part classied to the cause diagnosing operation. Diagnosis method is a description of the procedure for examining Cause and Diagnosis report is a template for producing the report on the diagnosis. KCM (Cause, Measure) these two knowledge elements are extracted from a part classied to the measure planning operation. Measure is a template for presenting the network administrators with the practical measure against Cause. These three types (KSC , KCD and KCM ) of knowledge element sets are regarded as information resource, structured as K-AIRs, and added in AIR-NMS individually. Figure 4 shows examples of classied network management knowledge elements. KSC 1, in Figure 4, consists of a symptom (unable to send mail) and its conceivable causes (unable to name resolve, network connection failure and over sendable size limit). KCD 2 consists of an assumed causes (over sendable size limit), its diagnosis method and

619

<sc symptom=unable to send mail> <cause>unable to name resolve</cause> <cause>network connection failure</cause> <cause>sendable mail size over</cause> </sc>

Example of KSC

<cd cause sendable mail size over> cause=sendable <dm> <p>request #//send_mail_size# to #source#</p> <p>request #//client/mta/servername# to #source#</p> <p>request #//max_mail_size# to #//client/mta/servername#</p> <p>true(#//ax_mail_size# -lt #//send_mail_size#)</p> </dm> <dr> A size of sent mail #//send mail size#B which is sent by mail, #//send_mail_size#B, #source# is over a size of sendable mail, #//max_mail_size#B, which is set to SMTP server #//server/mta/name#. </dr> </cd>

Example of KCD

(1) Information request destinations

<cm cause=sendable mail size over> <m> SMTP server #//client/mta/servername# can change settings, #// li / / # h i because its OS is #//host/os/name@//client/mta/servername#= (CentOS,Fedora) and its MTA is #//server/mta/name@//client/ mta/servername#=(Postfix), by following operations: </m> </cm>

(2) Conditional statements

Example of KCM

Figure 5.

Classication example of K-AIRs

the diagnosis report template. The diagnosis method series of command (acquire the size of the sent mail, acquire the sendable size limit and compare these acquired sizes) for examining the assumed cause. The diagnosis report template is supplied with actual information (3000, 2000 and srv-A) acquired in the diagnostic process, and a concrete report is produced. KCM 1 consists of an identied cause. The measure template is supplied with actual information (srv-A, CentOS, and Postx) acquired in the diagnostic process, and the description of a practical measure (a procedure for changing the server setting) is produced. Generally, a single cause may lead to different network faults, and/or one network fault may lead to another network fault. Therefore, the same operation is often included in different network operation is often included in different network fault resolution procedures. If the network management knowledge is installed in AIR-NMS for each complete procedure of network fault resolution, the knowledge of the same operation have to be described and added redundantly. Compared to this, the design pattarn of K-AIR divides and classies the knowledge of a network fault resolution procedure, and makes it possible to reuse and install the knowl-

edge for each classied part. Consequently, this scheme can avoid redundantly describing and adding the knowledge of the same operation. Furthermore, installing classied knowledge parts in AIR-NMS, the proposed scheme does not explicitly and statically specify the relations among the classied knowledge parts, and it gives independence to each classied knowledge part. As a result, this scheme has no need of considering the classied knowledge parts which have been already installed in AIR-NMS, and it facilitates designing and modifying the classied knowledge parts. As described above, the propose scheme would reduce the loads for installing the network management knowledge in AIRNMS. The ve types (Symptom, Cause, Diagnosis method, Diagnosis report, and Measure) of knowledge elements compose the three types (KSC , KCD , and KCM ) of knowledge element sets. We use XML and XPath to make these knowledge element sets into information resources for KAIRs. Figure 5 shows examples of the representations of the network management knowledge. In KSC , Symptom is represented as the attribute symptom, and a subject symptom / fault (unable to send mail) is given to its attribute value. Cause is represented as the element <cause>, and conceivable causes (unable to name resolve, network connection failure, and over sendable size limit) of the subject fault are given to the contents of the <cause>. In KCD , Cause is represented as the attribute cause, and a subject cause (over sendable size limit) is given to its attribute value. Diagnosis method is represented as the element <dm>, and a series of commands for examining the subject cause is given to the contents of the element <p> in the <dm>. The content of the rst <p>, request #//sent mail size# from #source#, is the command to request the size of the sent mail from I-AIR of the mail client, and the third content, true(#//sendable size limit# -lt #sent mail size#), is the command to check whether the sendable size limit is less than the sent mail size or not. The words #. . . # in the contents of the <p> are variables, and they are substituted with actual information acquired from I-AIRs and other K-AIRs by using K-AIRs features (KUS, FUS). In this example, #source# is substituted with the I-AIRs ID of the mail client, and #//sent mail size# is substituted with the sent mail size acquired from #source#. Diagnosis report is represented as the element <dr>, and a diagnosis report template is given to its content. The diagnosis report template also may include variables, and they are substituted with actual information acquired in the diagnostic process. In KCM , Measure is represented as the element <m>, and a measure template is given to its content. The measure template may include variables. In addition to this, as shown in Figure 5 (1), designation of the information request destinations, and as shown in Figure 5 (2), conditional statements can be used in the measure template.

620

C-a

C-b

C-c

C-d

C-f

C-a

C-b

C-d

C-d

C-e

C-a

C-a

C-b

C-d

C-e

C-e

I-AIRs

Figure 6.

Organization of K-AIRs

As described above, by using the proposed representation scheme, the knowledge element sets can be highly adaptable, and diverse diagnosis methods, concrete diagnosis reports, and practical measures are appropriately produced from these knowledge element sets for individual network situations. Since it improves the reusability of the classied parts of network management knowledge, it leads to the reduction of the loads for installing the network management knowledge in AIR-NMS. III. D ESIGN OF AUTONOMOUS C OOPERATION This section presents the example of cooperation architecture among information resources. To realize cooperative activity for exibility and adaptability, we design an interaction schemes between information on network equipments and network management knowledge. A. Example of Interaction among AIRs K-AIR described as previous section gives complete fault resolution procedure to its administrator according to failure ndings. To make a complete fault resolution procedure, the classied parts (i.e., knowledge element sets KSC , KCD , and KCM ) of network management knowledge should be selected and composed appropriately. If all these processes are entrusted to the network administrators, it causes heavy loads for the administrators to utilize the classied knowledge parts. In order to solve this difculty, the proposed scheme structures KSC , KCD , and KCM as K-AIRs, and makes them organize themselves autonomously to compose a complete fault resolution procedure and carry out it for individual network situations. Figure 6 shows the schema of K-AIRs organization where KSC , KCD , and KCM structured as K-AIRs are denoted by KSC -AIR, KCD -AIR, and KCM -AIR, respectively. Each type of K-AIR is activated in the workplace, which is a working environment of AIRs, for its own type. To make

MC

MC

MC

MC

MC

MC

MC

- AIR

- AIR

- AIR

- AIR

- AIR

- AIR

- AIR

MC

M-a

M-a

M-b

M-d

M-e

M-e

M-f

DC

DC

DC

DC

DC

DC

- AIR

- AIR

- AIR

- AIR

- AIR

- AIR

C-f

DC

D -a

D -b

D -d

D -d

D -e

D -F

D -a

D -b

D -d

D -d

D -e

CS

CS

CS

- AIR

- AIR

- AIR

K - AIR Workplace
CS

C-f

D -f

K - AIR Workplace

R M

S-

S-

)e-C =(

C-b

C-e

S : Symptom C : Cause D : Diagnosis Method D : Diagnosis Report M : Measure

[STEP 1] KCD - AIR Msg-S Manager Diagnosis request (Request-base driven) KSC - AIR Msg-S I-AIRs Diagnosis request (Alarm-base driven) KCM - AIR

[STEP 3a] Information request Msg-I I-AIRs

Organization (Integration and Utilization of Management Knowledge)

Information request [STEP 4] Generation request

Diagnosis request KSC - AIR Msg-C Msg C

KSC - AIR KCD - AIR

KCM - AIR Msg-C [STEP 3b]

M R

Diagnosis request

KCD - AIR

[STEP 2]

Organization (Acquisition and Utilization of Device Information)


K - AIR Workplace

Msg-S ::= <task id> <symptom> <source> <detail info>* Msg-C ::= <k-air id> <task id> <cause> <source> <cooperator> + <detail info>* Msg- I ::= <k-air id> <request info> + <destination>

Figure 7.

Message expressions among AIRs

K-AIRs organize themselves autonomously, it is needed to design the message expressions and message exchange procedure for cooperation among K-AIRs. Figure 7 shows the message expressions among K-AIRs. The details of these expressions are as follows: Msg-S this expression is used in diagnosis (for symptoms / faults) request messages, which are broadcast from UI agent or I-AIRs to KSC -AIRs. <task id> is the identier of each fault resolution task. When K-AIRs organize themselves for a fault resolution task, a unique identier is given to <task id>, and it is held and shared among the K-AIRs to avoid redundantly processing the same task. <symptom> is the description of a symptom / fault to be diagnosed. This value is mandatory to start a diagnosis. <source> is the host name, network name, or IP address on which the symptom / fault is observed / detected. This value is mandatory to start the diagnosis. <detail info> is optional information which can be used in the fault resolution task. Msg-C this expression is used in diagnosis (for conceivable causes) request messages, which are broadcast from KSC -AIRs to KCD -AIRs and the other KSC -AIRs. This expression is also used in measure planning (for identied causes) request messages, which are broadcast from KCD AIRs to KCM -AIRs. <k-air id> is the identier of a message sender K-AIR. When K-AIRs organize themselves, <k-air id>s are used in cooperation among the K-AIRs and I-AIRs to send the message senders acknowledgments and replies. <cause> is the description of a conceivable cause in a diagnosis request message. On the other hand, it is the description an identied cause in a measure planning request message. <source> is inherited from the diagnosis request messages (Msg-S). <cooperator> are the identiers of K-AIRs which have been in the organization for the fault resolution

621

task. <detail info> is similar to that of Msg-S. Msg-I this expression is used in information request messages, which are broadcast from KCD -AIRs or KCM AIRs to I-AIRs. <k-air id> is similar to that of Msg-C. <request info> are specications for the information needed by the KCD -AIR or KCM -AIR. <destination> is used for specifying a device (a hostname or IP address) from which the information is requested. If an I-AIR receives an information request message and it has information matched with the <request info> and <destination> in the message, the I-AIR sends the appropriate information to the message sender identied by the <k-air id>. As mentioned in III-B1, the proposed representation scheme does not explicitly and statically specify the relations among the K-AIRs (KSC s, KCD s, and KCM s). Nevertheless, by using the proposed cooperation scheme, those K-AIRs organize themselves autonomously and appropriately in AIRNMS. Consequently, a complete fault resolution procedure is composed and carried out for individual network situations, and it would effectively achieve the exhaustive utilization of a variety of network management knowledge. Furthermore, most of the processes based on the proposed cooperation scheme can be conducted in parallel, and then it is expected that the processing efciency of network fault resolution will improve in the distributed environment, such as distributed multi-agent systems. Therefore, the propose scheme would achieve the efcient utilization of the network management knowledge in AIR-NMS B. Evaluation of functionality The prototype system of AIR-NMS is implemented on the distributed agent framework ADIPS/DASH [7]. This framework has a useful reuse architecture agent repository. Figure 8 shows a screenshot on going to infer resolution procedure. If administrator nds a fault on network system, he can get the suitable measure only by input the symptom and some possible detail informations. IV. C ONCLUSION AND R EMARKS The architecture of cooperation/interaction scheme in AIR-NMS is presented in this paper. First we describe the design examples of activated status information on the network equipments (I-AIR) and generalized knowledge about expertise or heuristics of system administrator (KAIR). To improve exibility, adaptability and scalability, KAIR organization and interaction to I-AIR are proposed. As mentioned in this paper, real networks are continuously changing its topological conguration and updating service environment. Therefore, how heavy work load required for changing these state of network should be investigated on the framework of AIR-NMS as knowledge based network

Symptom

Detected site

Detailed information

Execute diagnosis Report of diagnosis

Present a counter measure

Figure 8.

A screenshot of prototype AIR-NMS

management system. As future work, we will observe the behavior of prototype AIR-NMS in the experimental environment more close to real situation, and that the interaction between AIR-NMS and administrator/consumer will be designed. R EFERENCES
[1] M. Consens and M. Hasan, Supporting network management through declaratively specied data visualizations, In: IEEE/IFIP 3rd International Symposium on Integrated Network Management, pp.725-738, 1993. [2] J. P. Martin-Flatin, S. Znaty, and J. P. Hubaux, A survey of distributed enterprise network and systems management paradigms. Network and Systems Management, Vol.7, No.01, pp.9?26, 1999. [3] R. Stephan, P. Ray, and N. Paramesh, Network management platform based on mobile agent. International Journal of Network Management, pp.14:59?73, 2003. [4] S. Konno, Y. Iwaya, T. Abe, T. Kinoshita, Design of Network Management Support System based on Active Information Resource, Proc. 18th Int. Conf. Advanced Information Networking and Applications (AINA 2004), vol.1, pp.102-106, IEEE, Los Alamitos, 2004. [5] S. Konno, A. Sameera, Y. Iwaya, T. Kinoshita, Effectiveness of Autonomous Network Monitoring Based on IntelligentAgent-Mediated Status Information In: H. G. Okuno, M. Ali (eds.) IEA/AIE 2007. LNCS (LNAI), vol.4570, pp.10781087, Springer, Heidelberg, 2007. [6] B. Li, T. Kinoshita, Active Support for Using Academic Information Resource in Distributed Environment, Int. J. Computer Science and Network Security, Vol.7, No.6, pp.69-73, 2007. [7] DASH GROUP. DASH Distributed Agent System based on Hybrid architecture, [Online].Available: http://www.agenttown.com/dash/.

622