You are on page 1of 19

AEMS High Availability

ALPHION INDIA PRIVATE LTD


#302, 3rd Floor ‘A’ wing, Bonanza,
Sahar Plaza Complex, JB Nagar,
Near Hotel Kohinoor Continental,
Andheri-Kurla Road, Andheri(E),
Mumbai-400 059

www.alphion.in
COPYRIGHT
Copyright © 2018 Alphion India Pvt Ltd.

All Rights Reserved. Printed in India.

AGEMS-NBI

TRADEMARKS
All of the Alphion names, brand names, and product names referred to in this Document, in particular, the
name “Alphion” and its logo, are either registered trademarks or trademarks of the Alphion India Private Ltd.
All other registered trademarks or trademarks are the property of their respective owners.

LIMITED WARRANTY
Alphion warrants that this Document has been delivered free of all rightful claims of any third person by way of
infringement or the like of any copyright, trade secret, or trademark. THIS DOCUMENT AND THE
PRODUCTS DESCRIBED THEREIN (COLLECTIVELY, THE “DELIVERABLES”) ARE PROVIDED “AS
IS” AND ALPHION MAKES NO OTHER WARRANTIES, EXPRESSED OR IMPLIED, AND DISCLAIMS
ANY AND ALL OTHER WARRANTIES WITH RESPECT TO THE DELIVERABLES, OR ANY
MODIFICATIONS THERETO, IN WHOLE OR IN PART, INCLUDING, WITHOUT LIMITATION, ANY
IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO
EVENT SHALL ALPHION OR ANY ALPHION EMPLOYEE BE LIABLE FOR THE ACCURACY OR
COMPLETENESS OF THE DELIVERABLES.

EXCLUSION OF CONSEQUENTIAL DAMAGES; LIMITATION OF LIABILITY


ALPHION SHALL NOT, UNDER ANY CIRCUMSTANCES, BE LIABLE TO BUYER FOR
CONSEQUENTIAL, INCIDENTAL, SPECIAL OR INDIRECT DAMAGES ARISING OUT OF OR
RELATED TO THE DELIVERABLES, EVEN IF ALPHION HAS BEEN APPRISED OF THE
LIKELIHOOD OF SUCH DAMAGES. IN NO EVENT SHALL ALPHION'S LIABILITY TO BUYER FOR
DAMAGES ARISING OUT OF OR RELATED TO THE DELIVERABLES EXCEED THE AGGREGATE
PRICE OF THE DELIVERABLES.
Contents
COPYRIGHT .............................................................................................................. 2
TRADEMARKS ............................................................................................................ 2
LIMITED WARRANTY .................................................................................................... 2
EXCLUSION OF CONSEQUENTIAL DAMAGES; LIMITATION OF LIABILITY ........................................ 2
Alphion Element Management System(AEMS) ........................................................................................... 5
AEMS Architecture ........................................................................................................... 5
South-Bound Interface ................................................................................................. 7
North-Bound Interface ................................................................................................. 7
EMS Core Components ................................................................................................. 7
AEMS High Availability ....................................................................................................................... 8
AEMS High availability components, Organization & Working ....................................................... 9
Organization & Working ............................................................................................. 10
Disaster Recovery setup .................................................................................................................. 10
Organization & Working .................................................................................................. 11
Use Cases For Clustering .................................................................................................................. 11
Primary Cluster Use Cases ............................................................................................... 11
Disaster Recovery Site Use Cases ....................................................................................... 15
Technical Specification for EMS server & Client for 100,000 ONTs and 1000 OLTs ........................................... 16
Existing Cluster Deployments ............................................................................................................ 17
BSNL Cluster Deployment ................................................................................................ 17
BSNL South Zone Setup .............................................................................................. 18
BSNL West Zone Setup ............................................................................................... 18
This page is intentionally blank
Alphion Element Management System(AEMS)
The Alphion EMS provides the complete view of the network elements and the
interconnecting links. EMS has the ability to include the network elements and the links
in the visual/graphical map of the domain. The visual maps display the elements and the
links in different colors depending upon the status of the elements and links. The green
color stands for healthy and amber/yellow color for degraded condition and red color
for unhealthy condition.

EMS gives operators the view of selected sub-networks/rings controlled by the EMS.
By zooming-in or through user friendly GUI commands, the operators could drill down
up to the card level in each NE (network element) for configuration and fault
management. EMS could also drill down to the individual element, then to subsystem,
then to card and then to port level configuration template from the domain-map by
clicking on the icon of the network element.

Alphion also provides LCT (local craft terminal). Both Alphion LCT and central EMS
terminals can display:

 Graphical network topology

 All alarms and messages of the entire network

 Color coded graphical fault display

Alphion EMS uses MySQL Database server for storing network element related data
along-with the log information and performance information.

EMS Server will be installed at the centralized NOC. The servers will be configured in
High Availability cluster mode. The EMS server will be installed at a DR site as well
which will also be configured in High Availability Cluster Mode.

AEMS Architecture
Alphion EMS adopts a client-server based architecture. The server component is based
on J2EE and the client is a Swing based thick client. The client communicates with the
server using RMI calls and JMS(Java Messaging System). JMS communication is used
to pass messages such as traps, events and EMS messages to clients asynchronously.
The client makes use of RMI calls to communicate with the server.

The high level AEMS architecture is captured in the block diagram below.
Figure 1: AEMS Architecture

The AEMS client is a Java Swing based thick client which encapsulates the user
interfaces to perform various operations related to the Network element as well as to the
EMS. All FCAPS functionality is supported by the EMS client. User friendly menus
and interfaces are exposed to users for efficient and easy usage.
The server tier runs in the JBOSS application container. It contains the following sub
elements:

 South-Bound Interface

 North-Bound Interface

 EMS Core components

South-Bound Interface

Communication between the EMS server and the Network elements is handled by this
layer. SNMP(Simple Network Management Protocol) is used as the primary means to
retrieve and sent information from the EMS to the Network element(s). TFTP/FTP
protocls are used to send/retrieve files from the Network elements.

North-Bound Interface

The north bound interface layer consists of CORBA componenets which aid in
communication with Network Management Systems(NMS). The North bound interface
is developed based on the TMF814 standards. The North bound interface components
talks to the South-Bound interfaces and EMS Core components to perform tasks
requested by the NMS systems.

EMS Core Components

The EMS Core components perform the following tasks

 Scheduling tasks

 Database operations

 EMS Client communication

 JMS communication with clients to send Trap/Event information

 Implements the business logic for all FCAPS related functions


AEMS High Availability
A highly available system guarantees a higher level of uptime which is much more than
the normal period. High availability ensures that the system is up and running even in
case of hardware failures and power outages. A single point of failure is eliminated,
thus ensuring that regular operations are not affected.

High availability of the AEMS system is achieved using Operating system level
clustering. The AEMS Software is deployed on multiple servers which in turn has
redundant hardware modules. The database is deployed on a central storage which is
configured using RAID. This ensures that data is available in case of disk failures.

To counter situations where a complete data centre has to be taken offline or goes
offline due to conditions like power and ups outage or a flood or fire, then the AEMS
system can also be setup at a disaster recovery(DR) site. The configuration at the DR
site, if kept similar to the primary site, will make the AEMS application highly available
at the DR location as well.

The block diagram below describes the various components and the bonding of a
AEMS high availabilty setup.
Figure 2: AEMS High Availability components

AEMS High availability components, Organization & Working


The AEMS high availability setup consists of the following components:

 Two high end servers

 A Storage device

 Fibre channel adapters & cables

 Network adapters & cables


Servers are selected such that they deliver optimum performance and each server
individually can deliver redundant components that support failover. For example each
server has redundant power supply units so that redundant power sources if available
can be used. Similarly servers that support redundant network adapters and fibre
channel port adapters are selected so that failure in any of these components can still
keep the cluster operations running and the user connected to the AEMS system.

The storage device is also selected with an aim to provide inbuilt failover capabilities.
Dual power supply, dual controller configurations, storage capacity and RAID are the
main considerations when selecting a storage device.

Organization & Working

The AEMS server software resides on each of the EMS servers. The database files
reside on the storage device. RAID configurations are used to preserve data on the
storage device to ensure that failure of any single disk does not stop end users from
performing their regular operations.

The servers are connected to the storage device with fibre cables. Multiple fibre channel
paths are defined on each server connected to the storage using multipathing software.
This enables the server to parallelize operations to improve performance or to select
alternate paths in case of failure of any one path.

The servers are interconnected to each other with multiple network cables for the
heartbeat cluster mechanism.

Clustering software is installed on both the servers and cluster and resource group
configurations are made. Logical IP cluster configurations are done so that the EMS
clients can connect to the EMS server using a single IP address and clients need not be
aware of the underlying configuration.

Disaster Recovery setup


In order to mitigate environmental & other critical conditions that could render both the
servers available at a particular location unusable, it is possible to setup a similar
working environment at a geographically different location. This setup will be a standby
setup which will keep the AEMS downtime low in case of issues arising at the primary
location.The setup will be a replica of the one at the primary location.
Organization & Working
The EMS server application resides on both the servers at the DR location. The database
server instance is up and running on one of the available servers with the database files
on the shared storage. The servers are clustered.

Realtime database replication is setup among the database applications on the primary
and the disaster recovery locations. In case of a failure at the primary location, the
application on the active DR node is started manually. Clients can connect to the DR
location and continue with regular operations.

The DR location IP address would be different from that of the primary location. Hence
clients should be aware of the secondary IP address in this case.

Use Cases For Clustering

Primary Cluster Use Cases

Use Case Id 1

Use Case Power Source failure

Assumption 1. There are 2 power sources


provisioned for the cluster setup

2. The cluster is a 2 node(server)


cluster

Description One of the power sources to the cluster


fails.

Result The cluster runs on the alternative power


source without disrupting user operations.
Use Case Id 2

Use Case Power Source component failure on


server

Assumption 3. There are 2 power sources


provisioned for the cluster setup

4. The cluster is a 2 node(server)


cluster

Description One of the power components on the


active server fails.

Result The cluster runs on the alternative power


component on the same active server
without disrupting user operations.

Use Case Id 3

Use Case Total Power component failure on server

Assumption 5. There are 2 power sources


provisioned for the cluster setup

6. The cluster is a 2 node(server)


cluster

Description All the power components on the active


server fails.

Result A cluster switchover happens and the


second server starts responding to client
requests. The switchover is transparent
and user operations are not affected.
Use Case Id 4

Use Case A Hard disk on the shared storage fails

Assumption 1. The cluster is a 2 node(server)


cluster connected to a shared
storage

2. RAID 5 configured on the storage

Description Hard disk on the shared storage fails

Result The cluster continues to run


uninterrupted. The disk has to be
identified based on the alarm and
replaced.

Use Case Id 5

Use Case A power component on the shared


storage fails

Assumption 1. The cluster is a 2 node(server)


cluster connected to a shared
storage

2. There are multiple power sources


connected to the shared storage

Description One of the power components on the


shared storage has failed.

Result The cluster continues to run


uninterrupted. The second power
component takes over from the first.
Use Case Id 6

Use Case A controller on the shared storage fails

Assumption 1. The cluster is a 2 node(server)


cluster connected to a shared
storage

Description One of the controllers on the shared


storage has failed.

Result The cluster continues to run


uninterrupted. The second controller
component takes over from the first.

Use Case Id 7

Use Case Fibre channel failure

Assumption 1. The cluster is a 2 node(server)


cluster connected to a shared
storage

Description One of the fibre channel paths from the


active node to the storage fails.

Result The cluster continues to run


uninterrupted. The alternate fibre channel
path from the active node is activated.

Use Case Id 8

Use Case Network Adapter failure

Assumption 1. The cluster is a 2 node(server)


cluster connected to a shared
storage

Description One of the network paths from the active


node to the standby node has failed

Result The cluster continues to run


uninterrupted. The alternate network
adapter is activated and the heartbeat
mechanism is activated on the alternate
path.

Disaster Recovery Site Use Cases

Use Case Id 1

Use Case Total failure at primary site

Assumption 1. Database Replication is on from


the Primary site to the DR site

Description Primary site has crashed due to


environmental factors or total power
outage (including UPS failure)

Result The administrator knows about the total


disruption and manually triggers a
switchover to the DR site. The
administrator intimates all users.

All use cases from the “Primary Cluster Use cases” section will also be applicable for
the DR location.
Technical Specification for EMS server & Client for
100,000 ONTs and 1000 OLTs

SL.
No. Description Requirements
1 Server Hardware Qty: 2

2 Processors with 32
a Processor Cores each, 4.13 Ghz
SPARC T7-2 (Preferrable/Other
vendors can also be selected if
b Server Model/Family desired)
c RAM 256 GB
d HDD 600 GB * 4

2 Storage Qty: 1 Set


a Database Storage Minimum 8 TB
b With RAID 5 Configuration Minimum 8 TB
Storage Tek 2540 M2 Array/Oracle
c Storage Model FS1-2/HP MSA 2040

3 Server Software Qty: 2


a OS Solaris 10/RHEL
b Application Software Jboss 4.3 EAP
c Database MySQL Enterprise Edition
d Java JDK 1.6
e Cluster Solaris Cluster Software/HP Cluster Software
Existing Cluster Deployments

1. Clustering has been deployed at BSNL separately for 2 different zones. This
includes two Primary setups and 2 Disaster recovery setups.

2. Clustering has also been deployed at MTNL Delhi with only a Primary setup(as per
requirements).

BSNL Cluster Deployment

Figure 3: BSNL Cluster Deployment on M5000 Servers


BSNL South Zone Setup

Primary Site at Bangalore

1. Servers: 2 * Sun M5000 servers deployed

2. Storage: Sun Storage 2540M-2 with 2 TB of storage space (Dual controller)

3. Database: MySQL

4. OS: Sun Solaris 10, Solaris Cluster Suite.

Disaster Recovery Site at Pune

1. Servers: 2 * Sun M5000 servers deployed

2. Storage: Sun Storage 2540M-2 with 2 TB of storage space (Dual controller)

3. Database: MySQL

4. OS: Sun Solaris 10, Solaris Cluster Suite.

Active Replication setup between Bangalore & Pune sites

BSNL West Zone Setup

Primary Site at Pune

1. Servers: 1 * Sun M5000 servers deployed

2. Storage: Sun Storage 2540M-2 with 2 TB of storage space (Dual controller)

3. Database: MySQL

4. OS: Sun Solaris 10, Solaris Cluster Suite.

Disaster Recovery Site at Bangalore

1. Servers: 2 * Sun M5000 servers deployed

2. Storage: Sun Storage 2540M-2 with 2 TB of storage space (Dual controller)

3. Database: MySQL
4. OS: Sun Solaris 10, Solaris Cluster Suite.

Active Replication setup between Bangalore & Pune sites