You are on page 1of 97

Operations Service Manual - <Unix>

Prepared By: <Devesh Kumar Singh>


Date: 06.03.2019

Page 1 of 97
DXC Technology Confidential
Document Revision History
Program Name: Bpost Unix KT
Document Status (e.g. Draft, Final, Release #): Draft

Change
Document Approval
Request# Modified By Section, Page(s) and Text Revised
Version Date
(Optional)

Devesh Kumar
NA Draft V4.0 NA Entire Document
Singh

Page 2 of 97
DXC Technology Confidential
TABLE OF CONTENTS:
1 OVERVIEW OF BPOST UNIX ARCHITECTURE.................................................................................. 4
1.1 ABOUT BPOST ............................................................................................................................... 4
1.2 DATA CENTERS AND REMOTE SITES ........................................................................................ 4
1.3 SERVER COUNT ............................................................................................................................ 5
1.4 SERVER MONITORING ................................................................................................................. 4
2 GENERAL SERVER MANAGEMENT ................................................................................................. 11
2.1 SERVER NAMING CONVERSION ............................................................................................... 11
2.2 SERVER BUILD AND DECOMMISSION PROCESS ................................................................... 11
2.3 USER ACCOUNT MANAGEMENT ............................................................................................... 52
2.4 UNIX ENVIRONMENT .................................................................................................................. 56
2.5 BEST PRACTICES ....................................................................................................................... 59
2.6 RED HAT STORAGE ENVIORNMENT ....................................................................................... 60
2.7 HARDWARE SUPPORT .............................................................................................................. 84
2.8 REMOTE CONNECTIVITY / CONSOLE MANAGEMENT ........................................................... 85
3 NETWORK SERVICE MANAGEMENT ............................................................................................... 86
3.1 NTP SERVER................................................................................................................................ 86
3.2 DNS ............................................................................................................................................... 86
4 PATCH MANAGEMENT ...................................................................................................................... 86
5 SECURITY MANAGEMENT ................................................................................................................ 87
5.1 HARDENING ON FEXT LINUX VM'S ........................................................................................... 87
5.2 ANTIVIRUS MANAGEMENT ........................................................................................................ 87
6 BACKUP MANAGEMENT ................................................................................................................... 87
6.1 BACKUP ........................................................................................................................................ 87
7 BPOST PROCESS .............................................................................................................................. 85
8 NETBACKUP MASTER HIGH AVAILABILITY AND FAILOVER PROCEDURE………………………… 86
8.1 NET BACKUP MASTERS ON NET APP DISKS……………………………………………….... 86

9 MISSLANEOUS ………………………………………………………………………………………….......87
9.1 GRAFANA ……………………………………………………………………………………………....87
9.2 FLEXERA……………………………………………………………………………………………....87
9.3 NIMSOFT……………………………………………………………………………………………....87

Page 3 of 97
DXC Technology Confidential
1 OVERVIEW OF BPOST UNIX ARCHITECTURE ABOUT BPOST

1.1 ABOUT BPOST


• Bpost known as Belgian Post Group is leading postal operator in European market with
headquarters in Brussels at the Muntcenter, Belgium.
• Bpost offering mail, insurance, financial and electronic services in European market.
• As well as it provides some other services also such as Financial, insurance product and VAS.

1.2 DATACENTER DETAILS

• Bpost have two Data centers Muizen and Roosendaal. The bpost ICT datacenter setup is based on
two datacenters which are interconnected with a fiber Optics ring (DWDM). This allows replicated
storage and data availability, while virtual server clusters can be stretched over both datacenters.
High availability and disaster recovery solutions are based on this topology.
• DC Roosendaal
- Roosendaal hosts half of the production infrastructure.
- SLA availability > 99,99 %
- The server names, which ends with odd number are normally located in Roosendaal

• DC Muizen
- Hosting all non-production servers and half of the production infrastructure in Muizen.
- SLA availability > 99,99 %
- The server names, which ends with even number are normally located in Muizen.

Site Roosendaal Site Muizen


Roosendaal
DXC Muizen
Colt Technology Services BV Smisstraat 48
Argonweg 9 2812 Muizen (Mechelen)
4706 NR Roosendaal Belgium

Page 4 of 97
DXC Technology Confidential
1.3 SERVER COUNT

• The bpost Unix Server environment consists of approx. 1253 hosted servers.
• In which approx. 57 physical servers and 1196 virtual machines.
• The foundation of the bpost virtual platform relies on virtualization technology from VMware,
physical servers from HP, backend storage from NetApp and networking from Cisco, including
Nexus 1000V virtual switch.

- Virtualization Technology: VMware


- Operating System: Linux (Redhat)

1.4 SERVER MONITORING

• Nimsoft is the Enterprise monitoring solution, set up at bpost. Current version 8.4, Service Pack 1.
• Main users of this product are the control-room, which provide 24/7 alarm handling and callouts,
system and application engineers, as well as additional availability and performance reporting.
• All Nimsoft servers are running on Windows Server 2012 R2 servers, while the back-end is running
on
• Microsoft SQL Server 2014. Jvm-based monitoring is installed on RHEL 5 servers.
• Nimsoft robot installation on RHEL

Nimsoftrobotinstall
ationonRHEL.pdf

• Open Alarm Console in Nimsoft -https://monitoring.netpost/


• Login using your uid and password.

Page 5 of 97
DXC Technology Confidential
• Select the => BPost Community and choose the Default option

Page 6 of 97
DXC Technology Confidential
• You need to acknowledge the alarm before closing the ticket in SM9.

Page 7 of 97
DXC Technology Confidential
• Connecting to Nimsoft, using the Infrastructure Manager:

• Using Remote Desktop Manager (on Windows: open Run-dialog box, and enter: mstsc) connect to the
jumpserver, vwpr834.
• Log in using your AD credentials.
• Open the Infrastructure Manager, by:
- Clicking on the Windows icon in the bottom left corner. This opens the Start Menu.
- Click on the downwards arrow in the bottom left corner to open the full Start Menu.
- Find and click the Infrastructure Manager icon.
- Enter your AD credentials and press the 'Advanced'-button.

• Enter the IP address for the login hub, 10.199.102.50, and press the TAB-key.
• You should see the login-hub show up. Press OK, after that.

Page 8 of 97
DXC Technology Confidential
• You should now see the Infrastructure Manager open.

Remove the server from monitoring permanently:

• This situation occurs during the decommission of server or any upgrade activity on server.
For example: Media server upgrade from RHEL 6 to RHEL 7.
• Open the infrastructure manager by mstsc on vwpr834.

• Use ctrl+p option to open Probe utility.

Page 9 of 97
DXC Technology Confidential
• Once it open use the drop down to select the removerobot in Probe commandset and give the server
name and click on green button to play it to remove the server from monitoring.
• Start and Stop the nimsoft service:
#/etc/init.d/nimbus start
#/etc/init.d/nimbus stop
• To set maintenance mode:
- To put 1 server in maintenance for 60 minutes (the default)
# u517189@cops:~>esm_put_server_in_maintenance vlds467 60

- To put all servers starting with vl in maintenance mode


# u517189@cops:~>esm_put_server_in_maintenance vl

- To put a server with a dash (-) in maintenance mode (eg. bmb-1012001):


# u517189@cops:~>esm_put_server_in_maintenance bmb%-1012001

• To leave maintenance mode:

Page 10 of 97
DXC Technology Confidential
- To leave maintenance, just invoke maintenance mode for 1 minute. (0 minutes will actually
configure maintenance mode till 2099)
# u517189@cops:~>esm_put_server_in_maintenance vlds467 1

2 GENERAL SERVER MANAGEMENT

2.1 SERVER NAMING CONVERSION


• Both virtual and physical servers adhere to a strict naming convention.
• Focusing on physical servers, the following distinctions are being made:
• BMB - Bare Metal Box
• ENC - Enclosure
• BLD - Blade Server
• NET - Network appliance or tooling server
• The following naming convention is in place: [BMB/ENC/BLD]-YYMMnnn where YYMM are year and
month of purchase. Example: BMB-131001, physical server ordered in October 2013.
• Each physical server or blade has been given a hostname as well, which is the name used in the
server’s configuration (netbios, dns,…): [T][O][EE][xnn] where T is the type , O the operating
system, EE the environment and nnn a sequential number. Virtual servers adhere to this same
naming convention:
• T: type is S for physical, V for virtual, Z for solaris zone. Operating system is W for windows,
L for Linux, S for solaris, O for special OS.
• EE: environment is DS for development & test, AC for acceptance, PR for production, TR for
training, PT for Perftest, DR for disaster recovery, BD for build & deploy.
• xnn: sequential Nr for the moment max = ‘Z99’. ‘x’ represents a number or a char, ‘n’
represents a number.
Example: slac915 – physical linux acceptance server n° 915; vwtra26: virtual windows training server
n°a26.

2.2 SERVER BUILD AND DECOMMISSION PROCESS


• Server provisioning is based on Redhat Satellite 6. Bpost currently support the following
- 2 resource providers: VMware and Bare-Metal.
- 3 operating systems: RHEL 5, 6 and 7.
• Installation is triggered by an ISO file mounted on server and kickstart profiles generated at server
creation.
• There are no PXE boot nor golden images involved.
• Provisioning is made through a dedicated network interface and server are always provisioned to
the latest patch level.
• Default sizing of physical servers (small – medium – large)

Page 11 of 97
DXC Technology Confidential
• The below figures are indicative, as default sizes are not provisioned. Instead per project
specific requirements on cpu and memory are requested.

Blade servers
Small Blade Medium Blade Large Blade
CPU type * Intel Xeon 2667 / 5500 AMD Opteron 6276 Intel(R) Xeon(R) CPU
16-core E5-2698 v3 16-Core @
2.30GHz
Memory (GB) 36 256 512
% in view of the total 60% 15% 25%
number of servers
* CPU type is indicative and is according to the market standard at the moment of purchase

Default sizing of virtual servers (small – medium – large)


Small VM Medium VM Large VM
vCPU 2 4 8
Memory (Mb) 4096 16384 32768
% in view of the total 75% 17% 8%
number of virtual
machines

• Installation Guide for RHEL 5 on physical server:

Installation Guide
for RHEL 5 on physical server.pdf

• Installation Guide for RHEL 6 on physical server:

Installation Guide
for RHEL 6 on physical server.pdf

• Installation steps for RHEL 7 on physical server:

Page 12 of 97
DXC Technology Confidential
Pre-requisite-
• Support to kickstart from network with VLAN and Bonding interface starts with RHEL 7.
• Network configuration (VLANID/IP addresses/Netmask and Gateway)
• vlan: vlan101 (Primary vlan)
• BMB: bmb-1108004
• MAC address of at least 1 connected interface
• Slave interface: eth0
• Mac address: b4:99:ba:5d:44:80
• Slave interface: eth2
• Mac address: b4:99:ba:5d:44:20
• Environment of the server (pr, ac, dv, st, pt, tr)
• Agent software to be installed (nimsoft, netbackup, uc4)

Steps-
• find out the BMB number and console ip. You can find out the BMB number and console ip from
CMDB .
#nslookup slpr092.hardware.netpost
• Create the iso. You can create the iso from satellite server (vlpr269).
[root@vlpr269 ~]# /usr/local/scripts/create_bmb_iso.sh --bmb bmb-1108004 --name
slpr092 --vlan 101 --mac b4:99:ba:5d:44:80 --env pr
• Connect to vwpr528 using you Windows credentials.
• Download created ISO from Satellite server http://vlpr269.netpost/pub/

Page 13 of 97
DXC Technology Confidential
• Put the server in maintenance mode
#esm_put_server_in_maintenance slpr092 1200
• Stop the nimsoft agent
#/etc/init.d/nimbus stop
• Remove the server from Nimsoft Infrastructure manager
• Deactivate and clean out the certs..the node from puppet master server (vlpr305)
#puppet node deactivate slpr092
#puppet node clean slpr092
• Stop the puppet agent
#/etc/init.d/puppet stop
• Upgrade the ILO firmware version before going to install the OS.
• Open a browser and connect to iLO web interface of server slpr092 using the BMB of the
machine (ex:- https://bmb-1108004.hardware.netpost/index.html) Use RSA password stored in
passwordvault.

• Open a remote console.

Page 14 of 97
DXC Technology Confidential
• . Select Virtual Drives -> Image File CDROM/DVD

• Select iso file (HP Service Pack for Proliant 2016.10)


• Reboot the server and make sure that you boot from the CD, Installation will start
automatically.
• Once the firmware upgrade will be complete , console will be disconnected.
• Select the RHEL7 iso from console which was created in step 1.
• Click on Virtual drive--> select the image file of rhel7 and boot the server. Installation will start
automatically.

Page 15 of 97
DXC Technology Confidential
• If iso doesn't work we need to create the new iso from another interface mac address. Before
that first delete the old host entry from satellite server.
[root@vlpr269 ~]#hammer host delete --name slpr092.netpost
[root@vlpr269 ~]#hammer host delete --name slpr092-back.netpost
For ex: [root@vlpr269 ~]# /usr/local/scripts/create_bmb_iso.sh --bmb bmb-1108004 --
name slpr092 --vlan 101 --mac b4:99:ba:5d:44:20 --env pr

• Once again download the iso from satellite server https://vlpr269.netpost/pub and boot it.
• You can follow the process on the remote console. When this is completed, server will reboot
automatically.
• Before reboot, server will configure :
Subscriptions

Repositories
Nimsoft
Puppet

• After installation we need to manually configure the network interface.


• Sometimes optional repo will not be available after installation. So, in that case we need to
subscribe the repo with optional rpms.
#subscription-manager repo-override –repo=rhel-7-server-optional-rpms –
add=sslverify:0
# subscription-manager repo-override –repo=rhel-7-server-rpms –add=sslverify:0
• Trigger a puppet run (in no-noop mode).
#puppet agent -t --no-noop

• Update the packages


#yum update
• Install the rpm of hpacucli
#rpm –ivh hpacucli-9.20-9.0.x86_64.rpm
# ln -s ssacli hpacucli
• Take a final reboot of the server.
• End of Maintenance mode
#esm_put_server_in_maintenance slpr092 1

Linux Installation Standard


• This document ('Linux Installation Standard') describes general standards for all Linux servers.
All Linux servers at Bpost which are provisioned, supported and maintained by the
Infrastructure Unix Systems department must adhere to these standards.

Page 16 of 97
DXC Technology Confidential
• This document covers both physical and virtual systems. Items or topics which are not relevant
for either physical or virtual systems will be clearly indicated.

HARDWARE STANDARD

STANDARD FOR PHYSICAL SERVERS

Server platform
Physical Linux servers will standardize as much as possible on HP Proliant servers. Only server models which
are certified by the OS vendor must be used (See hardware compatibility)

For support and management reasons, the Unix departments prefers following specific server models:

• HP Proliant DL360 (small server)


• HP Proliant DL380p (medium-range server)

If the above models are no longer available or do not match with application requirements, another model
should be selected in cooperation with Infrastructure Unix department. Preference should be given to models of
which the spare parts are compatible with parts of the above standard server models (reason: limit and re-use
of spare parts stock). After a new model has been chosen, it should be added to the list above.

Each server must be equipped with following specifications:


• CPU: in alignment with application requirements.
• Memory: in alignment with application requirements.
• Internal storage: RAID controller with at least 4 disks
o 2 disks (of at least 73GB + size of physical memory) are used exclusively for operating system and crash
dump location.
o Remaining disks (size is depending on the application needs) are used for application.
• FC storage:
o 2 QLogic 8Gb FC HBA
• Network:
o For 10GbE connectivity: 2x Emulex OneConnect 10Gb NIC
o For 1GbE connectivity: 2x Intel Pro 1000e 1Gb NIC
• Management: Remote access card
• Power: redundant power supply.

Hardware compatibility
Only parts which are supported/certified by the hardware vendor may be used.
• HP: Search for Maintenance & Service Guide of the server model on
http://www8.hp.com/us/en/troubleshooting.html
• Red Hat: https://hardware.redhat.com/

Firmware standards
For each server model a standard set of firmwares is defined. These firmwares will be applied by default on every
physical server.

Page 17 of 97
DXC Technology Confidential
Firmware’s will NOT be upgraded pro-actively when the server vendor releases an upgrade (because this is a
hazardous and time-consuming operation). However, there are 2 exception for which firmware upgrades will be
applied:

• In order to remain compliant with a vendor compatibility list (eg. firmware level of FC HBA)

• If the hardware vendor advises to install an upgraded firmware in order to remediate an incident or problem
In both cases will the new firmware version be reflected in the standard set for that server model. New servers of that
model will by default be provisioned with this new standard set. During the subsequent OS patching, the firmware will
be upgraded on the existing servers (because the standard set of firmwares clearly has an issue that has caused
impact).

Remote access card


Each physical server must be a equipped with a remote access card (eg. HP iLO or IBM RSA) to allow the system
engineers to perform certain operations (such as hard reset, power on, hardware diagnostics) without having to be
present in the data center.

Physical disk configuration


As described in the server specifications above, each physical server must be equipped with at least 4 internal disks.
• Disk 0 and 1 will be configured in a RAID-1 volume (volume 0) and used exclusively for the operating system and
crash dump location.
• The remaining disks will be configured in RAID-1 or RAID-5 and must be used for the application and its data.
o RAID-1: If there are an even number of disks, multiple RAID-1 volumes must be created. All these
volumes are concatenated in OS level (LVM).
o RAID-5 should only be used for 3 or more application disks and when write performance is not important.
This RAID level should never be used for SSD disks.

Example of configuration with only RAID-1 or RAID-1 (OS) and RAID-5 (APP).

In case additional internal diskspace should be added, following strategy will be applied:
1. Add new disks in the available free disk slots, configure these disks in the appropriate RAID level and add them to
the operating system.

Page 18 of 97
DXC Technology Confidential
2. If the server does not have enough free disks slots, the existing disks of the proper RAID volume are replaced one
by one while allowing sufficient time to rebuild RAID volumes. The unused space on the physical disks is then used
to create a new volume, which is later on added to the appropriate LVM volume group.

Fiber Channel storage


Physical servers which need to have access to diskspace served by a FC storage array must be equipped with 2 QLogic
8Gbps FC HBA. Each of these HBA is connected to a different FC fabric.
FC storage LUNs must be made available via both fabrics. Storage multipath software inside the operating system
must be configured to provide redundant access to the FC LUNs.

Network configuration
Physical servers will have at least 2 network interfaces for redundancy reasons. Both interfaces must be of the same
brand and type and use the same driver on Linux operating system level. These interfaces must be configured in an
ether channel (bonding). The different networks (VLANs) in which the server must have an IP address are configured
as virtual VLAN interfaces on top of the ether channel. The ether channel should be configured as described in XXX
The network interfaces are either 1GbE (Intel Pro 1000) or 10GbE (Emulex XXX), depending on the need of the
application running on the server.
Network cabling is defined by the Infrastructure team responsible for data center management.

STANDARD FOR VIRTUAL SERVERS

Virtualization platform
The standard platform for virtualization is VMware vSphere. The version of vSphere platform is defined the
Infrastructure department responsible for the virtual platform.

Disk configuration
A virtual server will be provisioned with at least 2 virtual disks (VMDK):
• The 1st will serve the operating system and swap. The size depends on the operating system: For RHEL5 the OS
disk must be 8GB. For RHEL6 the OS disk must be 16GB.
• The 2nd will serve the application and its data. The size will depend on the needs of the application.

The SCSI emulation used for the virtual disks will depend on the operating system inside the virtual server:
• RHEL5: The operating system disk(s) are served via a standard LSI Logic Parallel controller, while the remaining
application disk(s) are served via the optimized pvscsi controller. (Reason: Red Hat does not support to boot from
disks attached to pvscsi controller)
• RHEL6: All disks are served via the optimized pvscsi controller.

Network configuration
A virtual server is by default provisioned with 2 virtual interfaces (vNICs):
• public or front-end interface, which must be used by the application.
• private or back-end interface, which must be used to mount remote (NFS/CIFS) shares and server management.
Exceptions or deviations from this standard are possible in following cases:
• Virtual servers located in specific network zones (eg. secure financial zone, external DMZ) can have only 1
interfaces.
• Virtual servers running Oracle DBs must have 3 network interfaces (front-end, back-end and storage/dNFS)
All virtual interfaces are of type 'vmxnet3' as this is offers to use an optimized driver.

VMware Hardware version


• The standard VMware Hardware version (or VM Version) is defined by the Infrastructure department responsible
for the virtual platform.
• In case the standard version is changed, new virtual servers will be provisioned with this new version. An
upgrade plan will be prepared to upgrade the existing virtual servers.

Page 19 of 97
DXC Technology Confidential
CPU hot-plug
• CPU hot-plug allows to increase the number of vCPU allocated to a virtual server without having to restart the
virtual server.
• This functionality is only supported by all 64-bit Linux versions at Bpost and will be enabled as such.

For 32-bit virtual servers a restart will be required in order to change the amount of allocated vCPU.

Memory hot-add
Memory hot-add allows to allocate more physical memory to a virtual server without having to restart the virtual
server.
This is only supported by all 64-bit Linux versions at Bpost and will be enabled as such.

There is however 1 technical limitation: A virtual server that is configured with 3GB or less of memory cannot be
increased to above 3GB of memory without a restart.

For 32-bit virtual servers a restart will be required in order to change the amount of allocated memory.

Use of additonal VMware vSphere features


VMware vSphere offers a number of additional features, such as VMware HA, VMware DRS, etc... These features are
managed by the Infrastructure department responsible for the virtual platform.
VMware HA is responsible to restart a virtual server in case a physical ESX host crashes.
VMware DRS can be configured via DRS rules to:
• keep virtual servers together on the same physical ESX hosts (use case: 2 virtual servers with heavy exchange of
network packets)
• keep virtual servers on separate physical ESX hosts (use case: load balanced or clustered virtual servers)
• run virtual server(s) on a subset of the pool of physical ESX hosts
By default no DRS rules are created. If such rules would be required, they must be requested explicitly either via ISR.
The Infrastructure departments will evaluate the request before executing it.

Operating system standards

LINUX DISTRIBUTION
The Linux Operating System consists of the Linux kernel, a set of system libraries and a set of supporting packages.
Several vendors or open-source communities package a specific version of the kernel, system libraries and supporting
packages and refer to such a package as a 'distribution'). The most well-known examples are Red Hat Enterprise
Linux, CentOS, Fedora, SuSE Linux Enterprise Server, Ubuntu, Debian, etc...
Each server implements a particular Processor Instruction Set Architectures (eg. Intel x86 32-bit, Intel x86 64-bit, Sun
SPARC,).
At Bpost, the standard Linux distribution is "Red Hat Enterprise Linux". By default the version for 64-bit architecture
will be installed. For applications that can't run (or are not supported by their ISV) on 64-bit, a 32-bit version will be
installed, however 32-bit induces a memory limitation (both technical and support limitation).
The previous standard distribution (SuSE Linux Enterprise Server) is no longer supported and marked in the ICT
Landscape as to be decommissioned. No new servers will be installed with this distribution.

LINUX VERSION
Following Linux version are standard at Bpost:
• Red Hat Enterprise Linux 5
• Red Hat Enterprise Linux 6
• Red Hat Enterprise Linux 7

Page 20 of 97
DXC Technology Confidential
Below are the support life cycle dates for both standards. A description of each of the different product support stages
can be found on Red Hat support website (https://access.redhat.com/site/support/policy/updates/errata/#Overview)

Red Hat Enterprise Linux 7

General Availability End of Production 1 End of Production 2 End of Production 3(End End of Extended
of Production Phase) Life Phase
June 10, 2014 Q4 of 2019 Q4 of 2020 June 30, 2024 N/A

The table below shows the technological and support limitations of both RHEL5 and RHEL6.
RHEL5 RHEL5 RHEL6 RHEL6 RHEL7 RHEL7
x86 x86_64 x86 x86_64 x86 x86_64
Logical CPUs* 32 160 32 384 - 384
Memory 16GB 1TB 16GB 12TB - 12TB
Max number of block devices 1024 8192 10,000
ext3 file system size 16TB 16TB 16TB
ext3 file size 2TB 2TB 2TB
ext4 file system size - 16TB 50TB
ext4 file size - 16TB 16TB
* Red Hat defines a logical CPU as any schedulable entity. So every core/thread in a multicore/thread processor is a
logical CPU.

(Source: https://access.redhat.com/articles/rhel-limits)

OPERATING SYSTEM SUPPORT


All Linux server must be covered by license agreement with the operating system vendor.
RHEL licenses come in 2 flavors:
1. RHEL for Virtual Datacenter: Each of these licenses covers 2 physical CPU sockets on which an unlimited amount of
the virtual machines can be hosted.
2. RHEL Server: Each of these licenses covers either a physical server with 2 CPU sockets or 2 virtual machines.
Physical servers with more than 2 sockets will use multiple licenses from the pool.
Licenses for both flavours can be purchased with 2 levels of support: standard or premium. The main differences
between both levels are support coverage hours (8/5 vs 24/7) and response time SLA. (See
https://access.redhat.com/site/support/offerings/production/sla for more details)

By default all servers will be covered by standard support contact.


A number of business critical production servers (Oracle RAC, Jboss applications) are covered by premium support
contract.

OPERATING SYSTEM PATCHING


The vendor of an operating system performs extensive testing on each version of their operating system. Nevertheless
there will still be software errors ("bugs"). When bugs are encountered, the vendor will typically modify the source

Page 21 of 97
DXC Technology Confidential
code of the operating system (or related libraries) in order to fix the bug. This update will be released in form of a
software patch.
Patches can be categorized as following:
• Security patches: The bug resulted that either a software program (eg. application) or an individiual can obtain
more privileges (eg. full administrative access) on the system that anticipated. These additional privileges could be
used to compromise the stability of the server (and all applications running on it), install additional software in
such a way that the program or individual can obtain these privileges even after the bug has been fixed or to gain
access to information that the program or individual shouldn't have access to.
• Performance patches will improve the performance of a particular operation or component of the operating system.
(eg. a patch to the network driver increases the network throughput)
• Stability patches will enhance the fault recovery when an unexpected condition occurs. This could either limit the
impact of the fault condition (fault is isolated to a single process) or result in a more gracefull termination of the
system.
On a regular basis the Infrastructure Unix department will bundle all patches released by Red Hat and apply them on
all Linux systems installed and supported by the Infrastructure Unix department. After application of the patches, the
server is rebooted to activate the updated kernel and ensure that all applications make use of the patched version of
libraries. The application of patches is performed in a phased manner (development/systest > acceptance >
production/training). The maintenance window during which the servers are patched is discussed with Change
Management.
Under very exceptional cases, servers can be excluded from the standard patch process:
• There is a known issue with a patch. In this case, the Infrastructure Unix team will contact Red Hat support in
order to obtain a new patch which does not exhibit the issue. The patch will be tested before being applied.
• The application running on the server is not supported by the ISV of the application. In this case, the
Infrastructure Unix team will ask and insist that the team responsible for the application contacts the ISV to get a
solution.
Under no circumstances can a server be permanently excluded from the standard patch process. Exceptions will be
reviewed at the beginning of each new patch deployment cycle.

DISK CONFIGURATION

Operating system disks


The first OS disk will be configured with following partition layout:
For RHEL5:
Partition Size Purpose
1 100MB Boot file system
2 1GB Root file system
3 512MB Swap
4 rest Physical volume for vgos
For RHEL6:
Partition Size Purpose
1 250MB Boot file system
2 2GB Root file system
3 rest Physical volume for vgos

For RHEL7:

Partition Size Purpose


1 250MB Boot file system
2 2GB Root file system
3 rest Physical volume for vgos

Page 22 of 97
DXC Technology Confidential
The remaining OS disks will all be configured with a single partition and added to the LVM volume group vgos.

On virtual servers, special attention should be given to make sure the first partition on a virtual disk is properly
aligned according to storage array vendor recommendations.
In case of a virtual disk on NetApp NFS datastore, the first partition must start at offset 64. (See TR-3747 'Best
Practices for File System Alignment in Virtual Environments')

Application disks
All application disks will be configured with a single partition and added to the LVM volume group vgdata.

On virtual servers, special attention should be given to make sure the first partition on a virtual disk is properly
aligned according to storage array vendor recommendations.
In case of a virtual disk on NetApp NFS datastore, the first partition must start at offset 64. (See TR-3747 'Best
Practices for File System Alignment in Virtual Environments')

FC storage disks

FILE SYSTEMS

Default file system type


By default, all file systems are either ext3 (RHEL5) or ext4 (RHEL6 and RHEL7) formatted.

Default file system layout


Each server will have a number of standard system file systems.
For RHEL5 server
Mountpoint Size File system Partition/LV
/boot 100MB ext3 DISK0-PART0
Swap 512MB swap DISK0-PART1
/ 1GB ext3 DISK0-PART2
/usr 3GB ext3 /dev/vgos/lvusr
/opt 1GB ext3 /dev/vgos/lvopt
/home 512MB ext3 /dev/vgos/lvhome
/tmp 512MB ext3 /dev/vgos/lvtmp
/var 1GB ext3 /dev/vgos/lvvar
Swap See section on swap space swap /dev/vgos/lvswap.X
For RHEL6 server
Mountpoint Size File system Partition/LV
/boot 250MB ext4 DISK0-PART0
/ 21GB ext4 DISK0-PART1
Swap see section on swap space swap /dev/vgosl/lvswap.X
/usr 43GB ext4 /dev/vgos/lvusr
/opt 1GB ext4 /dev/vgos/lvopt
/home 512MB ext4 /dev/vgos/lvhome
/tmp 512MB ext4 /dev/vgos/lvtmp
/var 2GB ext4 /dev/vgos/lvvar
/var/crash size of memory ext4 /dev/vgos/lvcrash

Page 23 of 97
DXC Technology Confidential
For RHEL7 server
Mountpoint Size File system Partition/LV
/boot 250MB ext4 DISK0-PART0
/ 2 GB ext4 DISK0-PART1
Swap see section on swap space swap /dev/vgosl/lvswap.X
/usr 40GB ext4 /dev/vgos/lvusr
/opt 1GB ext4 /dev/vgos/lvopt
/home 512MB ext4 /dev/vgos/lvhome
/tmp 10GB ext4 /dev/vgos/lvtmp
/var 5GB ext4 /dev/vgos/lvvar
/var/crash size of memory ext4 /dev/vgos/lvcrash

Applicative file systems


File systems for applications will reside on separate LVM logical volumes, which are created in the LVM volume group
vgdata.
The name of the logical volumes must adhere to following naming convention:
lv<application acronym>-<environment>-<optional id>
Examples: lvauy-pr, lvfnt-pr-1, lvrom-ac-4

SWAP SPACE
Each Linux servers requires a certain amount of swap space. Swap space in Linux is used when the amount of physical
memory (RAM) is full. If the system needs more memory resources and the RAM is full, inactive pages in memory are
moved to the swap space. While swap space can help machines with a small amount of RAM, it should not be
considered a replacement for more RAM. Swap space is located on hard drives, which have a slower access time than
physical memory.
Swap space will be allocated to each server as specified in the table below:
Physical memory Swap space
< 4GB 2GB
4 - 16GB 4GB
16 - 64GB 8GB
64 - 256GB 16GB
> 256GB 32GB

When the physical memory of a server is modified, the amount of swap space should be updated accordingly.

SYSTEM SETTINGS

Magic SysRQ keyboard sequences


On all Linux servers the possibility to reboot the system by the CTRL-ALT-DEL sequence must be disabled.
On all Linux servers the Magic SysRQ keystrokes must be enabled to allow initiating a crash dump when a server is not
responsive anymore.

Page 24 of 97
DXC Technology Confidential
Regional settings
All Linux servers must be configured to use US English as language and localization.

Timezone
All Linux servers must be configured in timezone Europe/Brussels. The hardware clock however is kept in UTC.
Time synchronisation will be installed and configured – refer to section 'Time Synchronisation' under 'System Service
Configuration'.

Kernel parameters
Following kernel parameters must be configured in /etc/sysctl.conf:
• Disable forwarding of network packets between interfaces (cfr network router)

net.ipv4.ip_forward = 0

• Block "syn flood attack" (denial of service attach)

net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_synack_retries = 6

• Disable source routing

net.ipv4.conf.default.accept_source_route = 0

• Disable ICMP redirect messages to be accepted or send

net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

• Send TCP keepalive probes after a TCP connection has been idle for 30 minutes. The OS default is 2hrs, but our
firewalls drop connections after being idle for 1hr. Send 9 probes at 75sec interval to determine if the connection is
broken. See TCP Keepalives for more details.

net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

The limits on system resources that can be used by a user session are used as configured by the vendor, unless
requested explicitly by the vendor of the application running on the server. Exceptions must be added in
/etc/security/limits.d/<application-name>.conf.
• For RHEL5:

core file size (blocks, -c) unlimited


data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31567

Page 25 of 97
DXC Technology Confidential
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 31567
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

• For RHEL6:

core file size (blocks, -c) unlimited


data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1549116
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1549116
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Default boot level


All Linux servers must be configured to boot by default into runlevel 3.
This setting is configured in /etc/inittab.

Default umask
Umask defines the default permissions a file is assigned when it is created.
All Linux servers must be configured to create file with permissions 0644 or directories with permissions 0755. The
default umask must thus be set to 022 (for administrative account) and 002 for (for user accounts). This is the RHEL
default.
This setting is configured in /etc/bashrc.

Login banner
Upon login following banner must be presented.

#====================================================================
=========#
| This system is for the use of authorized users only. Individuals using this |
| computer system without authority, or in excess of their authority, are |
| subject to having all of their activities on this system monitored and |
| recorded by system personnel. |
| In the course of monitoring individuals improperly using this system, or in |

Page 26 of 97
DXC Technology Confidential
| the course of system maintenance, the activities of authorized users may |
| also be monitored. |
| Anyone using this system expressly consents to such monitoring and is |
| advised that if such monitoring reveals possible evidence of criminal |
| activity, system personnel may provide the evidence of such monitoring to |
| law enforcement officials. |
#====================================================================
=========#

This banner should be configured in /etc/motd.

NETWORK CONFIGURATION

Network interfaces
All relevant network interfaces must be configured with static IP address(es), subnetmask and default gateway
information providusered by the department responsible for management of Network and Telecom.

DNS configuration
By default all Linux servers must have the DNS resolving client configured. Exceptions are possible in case the server
is located in a network zone for which access to the internal Bpost DNS servers is not allowed.
Parameter Value
Search domains • netpost
• bpgnet.net
• post.bpgnet.net

Nameservers • 10.192.200.11
• 10.192.200.4

These settings are configured in /etc/resolv.conf.


In order to speed up DNS queries and reduce load on internal DNS server, the name caching service will be installed
and configured on all Linux servers. This service will be configured as described in the table below:
Parameter Value
Caching for service passwd disabled
Caching for service group disabled
Caching for service hosts (DNS) enabled
Timeout of successfull DNS queries 1800 sec
Timeout of failed DNS queries 20 sec
Make cache persistent across reboots yes
Share the cache memory between nscd and client yes

SYSTEM SERVICE CONFIGURATION

Remote login

Remote login
Administrative access is provided via secure shell (ssh). Unsecure methods to connect to the servers (such as telnet,
rlogin, rsh) must be disabled and if possible uninstalled.
Secure shell must be configured as described below. These settings should be configured in /etc/ssh/sshd_config.
• Allow only SSHv2 protocol,

Page 27 of 97
DXC Technology Confidential
• Delegate authentication/authorization to PAM libraries,
• Use the AUTHPRIV syslog facility, which is redirected to a logfile that can only be accessed by the root account,
• Enable X11 forwarding,
• Enable SSH key agent forwarding.
The file /etc/hosts.equiv must be removed as it enables the use of the BSD-style r-commands (rcp, rsh, rlogin) which
are insecure and unsafe.

Mail relay
Application running on Linux servers sometimes require the possibility to send emails. To support this, the postfix mail
relay will be installed and configured to forward all locally send email to the internal mail relay at Bpost
(smtpgate.netpost).
This local mail relay will be configured not to access remote connections (to avoid being an open remote mail relay).
These settings should be configured in /etc/postfix/main.cf.

Time synchronisation
Time on all Linux servers will be synchronized with the central time source at Bpost (ntp.netpost). The status of the
time sync daemon must be monitored and generate an incident is case it is not running.
This setting should be configured in /etc/ntp.conf.

System logging
The standard system logging and kernel message trapping daemon that is provided with the operating system
(sysklogd for RHEL5 - rsyslogd for RHEL6) will be configured as described below:
• /var/log/messages is the default location for informational or higher message. Authentication related messages are
excluded as these logs could include security sensitive information (eg. user enters the password in user question)
• /var/log/secure contains authentication related messages.
• /var/log/mail contains all postfix related logging.
• /var/log/cron contains all cron related logging.
System logging must be configured to forward following facilities to the central syslog server:
• kern.*
• user.*
• daemon.*
• auth,authpriv.*
• syslog.*
• lpr.*
• uucp.*
• local0,local1,local2,local3,local4,local5,local6,local7.*

Logrotation
Automatic and regular rotation of system logs must be configured.
The default rotation settings are:
• Weekly rotation
• Rotated log files are compressed and date is appended to log filename.
• Rotated log files are retained on the server for 12 rotations.
Following logfiles must be rotated:
• /var/log/btmp
• /var/log/wtmp
• /var/log/messages
• /var/log/secure
• /var/log/maillog
• /var/log/spooler
• /var/log/boot.log
• /var/log/cron
Page 28 of 97
DXC Technology Confidential
• /var/log/yum.log

RHN Satellite Server client


A RHN Satellite Server client is used to install operating system patches onto the Linux servers. It will be installed and
configured on all Linux servers.
Upon registration the correct RHN Satellite Server activation key should be used in order to assign the correct software
channel (containing the patches) and server group to the server.

VMware tools

This section is only applicable for VMware virtual servers.

VMware tools are a set of drivers and guest tools which provide better integration and enhanced performance of the
virtual server on the VMware vSphere platform.
VMware tools will be installed and configured on each virtual server. The tools will be kept in sync and up-to-date with
the version of the VMware platform. The VMware tools are configured against the currently running kernel version.
Upon kernel upgrade the VMware tools will be reconfigured against the new kernel.
Monitoring will be configured in such a way that an incident is generated when VMware tools are not running.

Command scheduling
By default following service will be enabled to allow commands to be executed on pre-scheduled timeframes:
• crond
• atd

NFS related services


Servers that mount remote shares via the NFS protocol will be configured with following services:
• netfs
• nfslock
• portmap

Disabled services
Following services will by default be disabled on all Linux servers:
• avahi-daemon
• avahi-dnsconfd
• conman
• cpuspeed
• gpm
• ipmi
• ipmievd
• iscsi
• iscsid
• mdmpd
• mcstrans
• netplugd
• pcscd
• rawdevices
• rdisc
• restorecond
• rpcgssd
• rpcidmapd
• rpcsvcgssd
• saslauthd
Page 29 of 97
DXC Technology Confidential
• setroubleshoot
• watchdog
• winbind
• xfs
• ypbind

SYSTEM ACCOUNTS

Password policies
The server must be configured to require a password of minimum 6 characters and that passwords can't be re-used
until 4 password changes have passed.

Root account
The Infrastructure Unix department has ownership of the root account.
Depending on the application and security requirements, the Infrastructure Unix department will grant a number of
members from other departments root access to the servers in order to install, configure and support the
application(s) on the server. Under no circumstance is it allowed for anyone outside of the Infrastructure Unix
department to provide root access to any other user.
The root password is managed by the Infrastructure Unix department and should under no circumstance be changed
by a member of any other department. Upon server creation, the root account of the new server will be registered in
the passwordvault, which will change the password on a regular basis.

STANDARD SOFTWARE PACKAGES


Linux servers are installed with a minimal set of packages as this reduces the footprint of the operating system,
reduces the risk on exploitable bugs and reduces the management overhead.
Following packages are available on all Linux servers:
• Shells: bash, sh, ksh
• Compression tools: bzip2, bunzip2, gzip, gunzip
• Performance tools: dstat, vmstat, iostat, iotop
• File tools: diffutils
• Process troubleshooting: strace, ltrace,
• Network troubleshooting tools: tcpdump, traceroute, ping, nmap, nc
• Scheduling: cron, at
• Scripting languages: perl, python, ruby
• Security related: sudo
• Text editor: vim
The version of the above packages are managed by the Infrastructure Unix department and should not be upgraded or
replaced by members of other departments.
Only official Red Hat RPM packages or RPM packages from the EPEL repository that matches the major operating
system version will be installed by the Infrastructure Unix department. Under NO circumstances will RPM packages for
other distributions (eg. SuSE Linux) be installed due to possible incompatibility with official RPM packages.

ADDITIONAL SOFTWARE

Configuration management
The Infrastructure Unix department uses puppet to automate and increase efficiency of manage the configuration on
its systems.
All Linux servers must therefore have a puppet client installed and configured as described in Puppet Design.

Page 30 of 97
DXC Technology Confidential
Backup
By default only production servers are backupped. Non-production will only be backupped upon request.
Backups will be performed using Symantec NetBackup.
On physical servers, the netbackup client must be installed to perform the backup.
On virtual servers the default method of backup is using integration with VMware vSphere, which allows to take a
backup without the need for a backup client.
See NetBackup Standard & Policies for more details.

Monitoring client
All Linux servers will be provisioned with a Nimsoft monitoring client that will be configured as described in Linux OS
monitoring.

Symantec Anti-Virus client


Linux servers can be provisioned with a Symantec Anti-Virus client, however this will not be part of the default
installation. The AV client must be requested by an Architect or Designer.
The criteria that should be met to install a AV client are described on Symantec Anti-Virus - Installation & upgrades.

Application start/stop scripts.


The Infrastructure Unix department strongly advises that start/stop scripts are installed and activated for all
applications installed on a server.
These scripts should adhere to following guidelines:
• They can be written in any scripting language that is by default installed on the server (shell, perl, python).
• They should not be interactive (eg. prompt for password)
• Upon system shutdown, the application should be stopped first.
• Upon system startup, the application should be started after all system related services have started.

If during an intervention a start/stop script interferes with the proper restart of a server, the system engineer
will interrupt the start/stop script and could disable the script all together so interactive login into the system is
possible again. The engineer will inform the team responsible for the application start/stop script of this event
via email so the issue can be remidiated for future interventions. The engineer will log this event in Server boot
issue list. For recurring issues, the engineer will ask for an incident ticket to be created to ensure proper follow
up is done to resolve the issue.

The start/stop scripts are responsibility of the team supporting the application.

MONITORING
A basic server monitoring will be configured as described in Linux OS monitoring.

SECURITY

SELINUX
Red Hat Enterprise Linux comes with an extensive security enforcing mechanism called selinux.
Due to the complexity of selinux, lack of sufficient knowledge and potential impact on applications, selinux will be
disabled on all Linux servers.

SUDO
In the event that a regular, non-privileged needs to be able to execute certain commands with privileged credentials,
the Infrastructure Unix department will evaluate the request and, if approved, make the necessary changes so the
regular user can execute the command using sudo.

Page 31 of 97
DXC Technology Confidential
HOST-BASED FIREWALL
As most operating systems, Red Hat Enterprise Linux comes with an integrated host-based firewall (iptables). This
firewall will not be enabled or configured on the Linux servers.

AUTHENTICATION FOR SINGLE USER MODE


By default a Linux server will not request for authentication when it is booted in single user mode. Obviously this could
be a security risk if physical access and console access are not properly secured. Any attacker that can obtain physical
access or access to the console of the server, can obtain privileged access to the server, even when authentication for
single user mode is configured. At best it'll slow down the attacker. As such the advantage is minor and will not be
configured.

AUDITING
By default basic auditing will be configured on all Linux servers. This auditing includes:
• Successful and failed logon attempts by users
• Users logging out
• Use of sudo to execute commands with alternative credentials (successful and failed attempts).
• Password changes
These auditlogs are stored both on the local server for up to 12 weeks and send to the central syslog server.
More enhanced auditing can be activated on an ad-hoc basis. This auditing is capable of logging every executable that
is executed by any user on the server.

SERVER ROLES
In a collaborative effort between the Infrastructure Unix and other departments at Bpost, a server role can be defined
for standardized platforms/applications.
A server role can consists of following:
• List of additional system settings (eg. kernel parameters, network tunables, ...)
• List of additional software packages to be installed (eg. java, system libraries, ...)
• List of additional operating system user(s) and/or group(s).
• List of additional file systems.
• List of additional remote file shares (NFS or CIFS).
The Infrastructure Unix department will describe these roles in the standard configuration management tool so the
settings are automatically applied. By preference the particular role is specified in the request for the server so the
settings can be applied immediately during server provisioning.

USER MANAGEMENT
Under NO circumstances shall it be allowed to run applications with full administrative ('root') priviledges on a server.

CREATION OF USERS
New users on Linux servers can be requested via the standard ISR process (See intake channels on Infrastructure
sharepoint site).
Following information is required in order to process the request:
• Requested username (preferred: <application acronym>-<environment>)
• Preferred user id (UID) (if applicable)
• Group to which the user should belong.
• Preferred home directory (by default: /home/<username>)
• The amount of dedicated diskspace should be provided in the home directory (if applicable). If not diskspace is
mentioned no dedicated space is allocated and the user shares with other users on the system.
Personal user accounts will not be created on the Linux servers.

Page 32 of 97
DXC Technology Confidential
CHANGE OR DELETION OF USERS
Changes to users or removal of users from a server should be requested via standard ISR process (See intake
channels on Infrastructure sharepoint site).

ROLES & RESPONSIBILTIES


The roles and responsibilities concerning Unix servers are described in 'Unix Servers - Roles and responsibilities'.

VIRTUAL APPLIANCES

In certain cases, applications come as part of a virtual appliance package. These are pre-configured virtual machines
which are installed on Bpost's VMware virtual platform. Due to the open-source nature of Linux, many of these
appliances run a Linux operating system.
Since the operating system on these appliances are not installed by the Infrastructure Unix department, we cannot
assume responsibility or be held responsible for any of these virtual appliances. It is the responsibility of the team that
provides operational support for the application running inside the appliance to support the appliance, install any
operating system patches, monitor the appliance and make sure the appliance is in line with company security policies.
The Infrastructure Unix department will however provide best effort support under following conditions:
• The appliance runs a Linux operating system.
• The vendor of the virtual appliance explicitly confirms no warranty or support will be void by intervention from the
Infrastructure Unix department.
• If console or remote login is necessary to provide support, the Infrastructure Unix department is provided with the
necessary credentials. The department will not execute any actions to gain priviledges access via any means that
would void warranty by the vendor (eg. hacking).
• The Infrastructure Unix department will not take lead in troubleshooting

Redhat VM Creation
• The task to create a new VM must always enter via OR. See OCM-planning tool
(https://infra.planningcenter.netpost/pst_appl_isr_/isr_menu/isr_index.php?appl=isr_&init=init)
• Open the OR document describing the VMs to be created.
• For servers in MGMT or JUMP zones make sure that the -back DNS is an alias of the hostname.
• For RHEL 5/6 VM, follow the steps described in section #RHEL 5/6/7

RHEL 5/6/7
1. Activate maintenance mode in ESM for the new VM (to avoid that control room will immediately create incidents
while the VM is still being created). This can be done by executing following command on cops.

$ esm_put_server_in_maintenance vlds322 120

2. Open RDP connection to VWPR528.


3. Start the script CreateVM in the folder D:\RS_Unix\VmProvisioning\script.
4. Enter the information as requested.

Page 33 of 97
DXC Technology Confidential
----------------------------------
Virtual Machine provisioning wizard
----------------------------------

# Connecting to vCenter...

## vCenter server
1. vwpr621

Select vCenter server: 1

vCenter server => vwpr621

Connection to vCenter 'vwpr621' opened.

# Retrieving information...

## Hostname
Enter hostname: vlac291

Hostname => vlac291

## OR
Enter OR: 8483

OR => OR8483

## Application
Enter application acronym: CUP

Application card => CUP

## Environment
* st
* tr
* pr
* ac
* pt
* dv
* sb
Enter environment: ac

Environment => ac

## Platform

Platform => esx

Page 34 of 97
DXC Technology Confidential
## OS Type

0. rhel-5-server-i386
1. rhel-5-server-x86_64
2. rhel-6-server-i386
3. rhel-6-server-x86_64
Select OS Type: 3

OS Type => rhel-6-server-x86_64

ESX GuestId => rhel6_64Guest

KS Hostgroup => rhel-6-server-x86_64-ac

## Virtual machine name

VM name => VLAC291

## Cluster
0. FOAC001_TIM
1. FOAC100_SQL
2. FOAC200_SHARED
3. FOAC500_ORACLE_ABS
4. FOAC520_ORACLE_BI
5. FOAC550_ORACLE_ONP
6. FOPR100_GEO
7. FOPR200_SHARED
8. FOPR250_SAN
9. FOPR300_BODS
10. FOPR400_MGT
11. FOPR500_ORACLE_ABS
12. FOPR520_ORACLE_BI
13. FOPR700_SHARED_UNIX
Select cluster: 2

Cluster => FOAC200_SHARED

## Datacenter

Datacenter => NON-PROD

## Resource pool

0. UAT
1. DEV-PT
2. TST-POC
Select resource pool: 0

Page 35 of 97
DXC Technology Confidential
Resource pool => UAT

## Datastore
# Name Capacity Free Free %
--- -------------------- ---------- ---------- ----------
0. VNASAC001_LAC01 2,300.00 G 288.44 G 12.54 %
1. VNASAC002_DMZLAC01 300.00 G 154.67 G 51.56 %
2. VNASAC002_LAC01 2,050.00 G 261.44 G 12.75 %
3. VNASDV001_LDS01 2,550.00 G 361.46 G 14.17 %
4. VNASDV002_DMZLDS01 220.00 G 192.77 G 87.62 %
5. VNASDV002_LDS01 2,200.00 G 295.54 G 13.43 %
Select datastore: 0

Datastore => VNASAC001_LAC01

## Folder
1. \Unix Servers\Applications
2. \Unix Servers\Applications\DEV
3. \Unix Servers\Applications\TST
4. \Unix Servers\Applications\UAT
5. \Unix Servers\Applications\UAT\PCE
6. \Unix Servers\Infrastructure
7. \Unix Servers\Infrastructure\Middleware
8. \Unix Servers\Infrastructure\Storage
9. \Unix Servers\Infrastructure\Systems
10. \Unix Servers\Oracle
11. \Unix Servers\Oracle\DEV_TST
12. \Unix Servers\Oracle\PRV
13. \Unix Servers\Oracle\UAT
Select folder: 7

Folder => Middleware

## vCPU
Enter # vCPU: 2

vCPU => 2

## Memory
Enter # vMemory (in GB): 4

Memory => 4

## Diskspace
Enter # vDiskspace for vgdata (in GB): 10

Diskspace => 10

Page 36 of 97
DXC Technology Confidential
## Network configuration
WARNING: The output of the command produced distributed virtual switch
objects. This behavior is
obsolete and may change in the future. To retrieve distributed switches, use
Get-VDSwich cmdlet in
the VDS component. To retrieve standard switches, use -Standard.
WARNING: The output of the command produced distributed virtual
portgroup objects. This behavior is
obsolete and may change in the future. To retrieve distributed portgroups,
use Get-VDPortgroup
cmdlet in the VDS component. To retrieve standard portgroups, use -
Standard.
Network interface: eth0 (DNS: vlac291)
IP address (10.195.67.11):
Subnetmask: 255.255.255.0
Gateway: 10.195.67.1
Vlan: 313

Virtual portgroup: ACC_SHA_OTH_0313


Network interface: eth1 (DNS: vlac291-back)
IP address (10.199.226.99):
Subnetmask: 255.255.240.0
Gateway:
Vlan: 106

Multiple Portgroups found. Select the appropriate portgroup:


1. MGT_BACK_ACC_0106-P on switch VSM-NON-PROD
2. MGT_BACK_ACC_0106 on switch VSM-NON-PROD
Note: In case of back-end VLANs, don't use the nx-Vl<VLAN_ID>-P
portgroups unless explicitly indic
ated.
Select portgroup: 2

# Create VM...

VM provisioning will be performed on host soac209.netpost


VM VLAC291 created.

# Making VM Configuration changes...

## Virtual hardware
- Attribute 'Application' set to CUP.
- Attribute 'Environment' set to ac.

## Attaching Virtual network interfaces

- Attaching - Network Adapter '1' to VM.


WARNING: Specifying a distributed port group name as network name is no

Page 37 of 97
DXC Technology Confidential
longer supported. Use the
-Portgroup parameter.
Network Adapter '1' succesfully attached to VM.

- Attaching - Network Adapter '2' to VM.


Network Adapter '2' succesfully attached to VM.

## Virtual disks
- Attaching new vDisk with capacity 10485760 KB

New vDisk with capacity 10485760 KB successfully attached.

## Configure CPU and Memory HotAdd


- Enabled CPU and Memory HotAdd

# Prepare VM and Satellite for kickstart...

## Executing hammer command script on kickstart server


- Hammer script executed successfully.

## Executing command script on kickstart server to fix host registration


- Script executed successfully.

## Create ISO on Kickstart server


Size of boot image is 4 sectors -> No emulation
Total translation table size: 2048
Total rockridge attributes bytes: 0
Total directory bytes: 290
Path table size(bytes): 10
Max brk space used 0
342 extents written (0 MB)

- KickstartPreparations-CreateISO script executed successfully.

## Copy ISO from kickstart server and attach to VM


- Copy ISO to VNASAC001_LAC01 successful.
- AttachISO successful.

VM kickstart preparations has been executed successfully


Installation will start.

## Starting VM
- VM VLAC291 powered on successfully.
## Waiting for Guest OS to be booted
- Not yet booted (NotRunning).
- Not yet booted (NotRunning).
- Not yet booted (NotRunning).

Page 38 of 97
DXC Technology Confidential
- Not yet booted (NotRunning).
- Not yet booted (NotRunning).

5. Wait a few minutes until the VM has rebooted successfully. This process might take a while as we are installing
updates, configuring monitoring agents, applying puppet profiles...

Sometimes the machine get stuck while booting, connect to Vcenter and open a console.
If you see this kind of message it means that the installation process is hanged. You can safely reset the VM.

6. Login onto the VM from root@cops.netpost. Confirm the security confirmation from ssh.

[root@vlpr095 ~]# ssh root@vlds322


Connecting to vlds322-back as user root...
The authenticity of host 'vlds322-back (10.199.244.91)' can't be established.
RSA key fingerprint is 86:2f:c4:1e:2f:38:89:a4:76:2f:01:3d:1c:68:9e:39.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'vlds322-back,10.199.244.91' (RSA) to the list
of known hosts.
Last login: Fri Jul 4 13:25:03 2014 from 10.199.240.1
#===============================================
==============================#
| This system is for the use of authorized users only. Individuals using this |
| computer system without authority, or in excess of their authority, are |
| subject to having all of their activities on this system monitored and |
| recorded by system personnel. |
| In the course of monitoring individuals improperly using this system, or in |
| the course of system maintenance, the activities of authorized users may
|
| also be monitored. |
| Anyone using this system expressly consents to such monitoring and is
|

Page 39 of 97
DXC Technology Confidential
| advised that if such monitoring reveals possible evidence of criminal |
| activity, system personnel may provide the evidence of such monitoring to
|
| law enforcement officials. |
#===============================================
==============================#

================================================
===========
TODO:
- Configure VMware tools (vmware-config-tools.pl -d)
- Deploy public keys via user management.
(http://wiki.netpost/display/UNIXSYS/UMF+-+Administrators+guide)
- Add server to pwvault.
(https://passwordvault.netpost/PasswordVault/)
- Check Nimsoft for errors.
(http://esm.netpost/web/)
- Update CMDB & OCM.
- Final puppet run (puppet agent -t --no-noop)

================================================
===========

Hostname : vlds322 // Red Hat Enterprise Linux Server release 6.5


(Santiago)
Kernel : 2.6.32-431.11.2.el6.x86_64

Managed with puppet v3.2.4 on puppet-ds.netpost.

[root@vlds322 ~]#

7. Execute the actions noted in the TODO section of the MOTD.

================================================
===========
TODO:
- Configure VMware tools (vmware-config-tools.pl -d)
- Deploy public keys via user management.
(http://wiki.netpost/display/UNIXSYS/UMF+-+Administrators+guide)
- Add server to pwvault.
(https://passwordvault.netpost/PasswordVault/)
- Check Nimsoft for errors.
(http://esm.netpost/web/)
- Update CMDB & OCM.
- Final puppet run (puppet agent -t --no-noop)

Page 40 of 97
DXC Technology Confidential
================================================
===========

8. Trigger a puppet run (in no-noop mode).

[root@vlds322 ~]# puppet agent -t --no-noop


Info: Retrieving plugin
Info: Loading facts in
/var/lib/puppet/lib/facter/windows_common_appdata.rb
Info: Loading facts in /var/lib/puppet/lib/facter/graphite_server.rb
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/netconsole.rb
Info: Loading facts in /var/lib/puppet/lib/facter/ip6tables_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppi_projects.rb
Info: Loading facts in /var/lib/puppet/lib/facter/lvm_support.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rsyslog_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/iptables_persistent_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/activemq_server.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/concat_basedir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/last_run.rb
Info: Loading facts in /var/lib/puppet/lib/facter/network_zone.rb
Info: Loading facts in /var/lib/puppet/lib/facter/rhn_address.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppetmaster.rb
Info: Caching catalog for vlds322.netpost
Info: Applying configuration version '1406108964'
...
Notice: Finished catalog run in 11.55 seconds

If there are errors, make sure to verify the message and take corrective
actions.
Errors messages such as the one below are caused by a bug in puppet
induced by auditing on a file that exists on the server, but is encountered for
the first time in puppet. The solution is to run the puppet agent again.
"Error: undefined method `rjust' for nil:NilClass
Error:
/Stage[main]/Profile::Base::Redhat/Logrotate::Rule[syslog]/File[/etc/logrotat
e.d/syslog]: Could not evaluate: undefined method `send_log' for
nil:NilClass"

9. Continue with the common post create actions.

Page 41 of 97
DXC Technology Confidential
DEPLOYING APPLICATION RELATED KEYS

ADD THE SERVER TO EXISTING UMF HOSTGROUPS (IF APPLICABLE)


You have 2 options
1. Request a server name from which we will cloned the group ownership. This server should have the exact same
role as the server freshly provisioned. if requester hesitate about the role, please go to option 2. Clone the group
membership (eg. vlpr298 should have the exact same right as vlpr292)

/usr/local/umf/bin/list_hostgroups_of_host.pl vlpr298-back
vlpr298-back: HG_MON_UNX_ALL_ENV HG_SYS_UNX_ALL_ENV

/usr/local/umf/bin/list_hostgroups_of_host.pl vlpr292-back
vlpr292-back: HG_MON_UNX_ALL_ENV HG_MW_ESB_PR
HG_SYS_UNX_ALL_ENV

/usr/local/umf/bin/add_host_to_hostgroup.pl vlpr298-back HG_MW_ESB_PR

2. Search for the application trigram inside the existing hostgroups and select the correct environment. If none found
you might need create a new hostgroup

#/usr/local/umf/bin/list_hostgroups.pl | grep ${APPLICATION}

/usr/local/umf/bin/list_hostgroups.pl/list_hostgroups.pl | grep -i ESB


- HG_MW_ESB_AC
- HG_MW_ESB_DVST
- HG_MW_ESB_PR

/usr/local/umf/bin/add_host_to_hostgroup.pl vlpr298-back HG_MW_ESB_PR

SINGLE NETWORK INTERFACE VMS


The VM creation process can handle single network interface VM in MGMT and Jump zones. However, if you need to
provision a single interface VM in other zones, follow this procedure.
First step is to create the VM using the temporary IP address (and interface).
After it is created and online, execute following actions:
1. De-activate and remove the secondary network interface.

$ ssh root@vlds327
# ifdown eth1
# rm /etc/sysconfig/network-scripts/ifcfg-eth1

2. Connect to vSphere client and remove the network adapter from the VM settings.
3. Adjust RH satellite connection settings: replace the serverURL parameter in the configuration file with rhn.netpost.

Page 42 of 97
DXC Technology Confidential
$ ssh root@vlds327
# vi /etc/sysconfig/rhn/up2date
...
serverURL=https://rhn.netpost/XMLRPC
...

4. Trigger a puppet run and use the correct puppet server.

$ ssh root@vlds327
# puppet agent -t --server puppet.netpost
# puppet agent -t --server puppet.netpost --no-noop

5. Follow the action described in the #Common post create actions.

CLEANUP ON FAILED
When VM creation fails, you might need to cleanup some resources before you can relaunch the script
• Delete VM from ESX farm: Connect to Vcenter GUI, stop the VM and delete from disk
• Delete host from Satellite: Connect to vlpr269 and run command following example below

hammer host delete --name $FQDN

COMMON POST CREATE ACTIONS


• Add the server to user management (UMF - Administrators guide)
• Add the server to passwordvault (https://passwordvault.netpost/PasswordVault/)
• Update CMDB: Set CI status to 'OPERATIONAL'.
• Open ESM console and check for errors. (http://esm.netpost/web/)
• Double-check OR document for customizations or special requests (shares, file systems, users, ...)
• Close OR task by filling in the 'done' date in the OCM-planning tool

Location of Vm creation Script Snapshot:

Page 43 of 97
DXC Technology Confidential
Page 44 of 97
DXC Technology Confidential
Example:
Please find the output attached for jboss vm provisioning.

BP-039 Jboss
baseline requirements for new VM.pdf

OR 11072 [ODP]
Jboss servers.docx

Page 45 of 97
DXC Technology Confidential
Checklist for Decommissioning of Linux server:
• Following items should be checked when decommissioning a Linux server:
• Did the request enter via ISR/OR? If not, ask the requestor to enter an ISR (and put Infra
Planning center in CC). (Note: By insisting on an official request a trace is created. The
requestor will not be able to deny the request if it turns out that the server could not be
decommissioned)
• Have you checked if there are still application related processes on the system?
• If a final backup of the server is required, verify with backup engineer if this backup has
been taken.
• Has the system been configured in ESM maintenance mode?
• Has the system been removed from ESM monitoring?
• If the system was running a UC4 agent, has the system been deconfigured in UC4 console?
• For SAN connected system, has a list of LUNs been captured so it can be communicated to
storage engineers?
• Has the network configuration (IP addresses) been captured?
• Has the system been powered down?
• In case of a VM, has the system been removed from disk? If the system has to remain
powered down before deleting it, configure a reminder in Outlook to delete the VM from disk.
• Has the system been removed from passwordvault?
• Has the system been removed from Red Hat Satellite Server?
• Has the system been removed from COPS?
• Has the info of this system in CMDB been updated?

Page 46 of 97
DXC Technology Confidential
• Decommission a unix/linux Server

Process Main Best Practices


Steps Activity
Planning Plan 1. Open ISR tool (ISR)
execution of 2. Open the OR task
the OR task. 3. Insert a planned date.
4. Save the OR task.

Gather Collect IP and IP configuration


info storage 10. Login onto the server
related
information. $ ssh root@<server>

--- Example
$ ssh root@vlds604

11. Get IP information and copy-paste into Info field of OR task.

# ip a | grep 'inet ' | grep -v '127.0.0.1'

--- Example
# ip a | grep 'inet ' | grep -v '127.0.0.1'
inet 10.194.193.13/24 brd 10.194.193.255 scope global eth0
inet 10.199.244.59/20 brd 10.199.255.255 scope global eth1
inet 10.194.253.20/24 brd 10.194.253.255 scope global eth2

Storage (NAS)
1. Get NAS mounted shares (NFS & CIFS) and copy-paste into info field of OR tasks

# mount | awk '$5 ~ /nfs|cifs/'

--- Example
# mount | awk '$5 ~ /nfs|cifs/'
vsnasdv99-473-1:/boe/co8/temp/co8v1 on /remote/db/CO8V1/temp type nfs
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.16)
vsnasdv99-473-2:/boe/co8/data_arch/co8v1 on
/remote/db/CO8V1/data/archive type nfs
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.17)
vsnasdv99-473-2:/boe/co8/arch/co8v1 on /remote/db/CO8V1/archlogs type
nfs
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.17)
vsnasdv99-473-2:/boe/co8/data_actv/co8v1 on
/remote/db/CO8V1/data/active type nfs
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.17)
vsnasdv99-473-1:/boe/co8/data_indx/co8v1 on /remote/db/CO8V1/data/index
type nfs
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.16)
vsnasdv99-473-2:/boe/co8/redo/co8v1 on /remote/db/CO8V1/redologs type
nfs

Page 47 of 97
DXC Technology Confidential
(rw,bg,vers=3,tcp,timeo=600,retrans=2,rsize=32768,wsize=32768,hard,noint
r,addr=10.194.253.17)

Storage (SAN) - ONLY FOR PHYSICAL SERVERS.

1. Connect to your personal account on COPS:


2. Get the list of SAN disks and copy-paste into Info field of OR task.

$ storage_mgmt.pl -t <server> -s asmdisk -c list -v storage

--- Example
$ storage_mgmt.pl -t slds821 -s asmdisk -c list -v storage
SERVER | NAME/WWID | LDEV | SIZE | TYPE | ARRAY
| REPL | FC PORTS
slds821-back | 360000970000292602150533030303542 | 00:5B | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030303543 | 00:5C | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030303544 | 00:5D | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030303545 | 00:5E | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030303546 | 00:5F | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030303630 | 00:60 | 0.00 G |
gk | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030383942 | 08:9B | 60.00 G |
other | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030383943 | 08:9C | 60.00 G |
other | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030433334 | 0C:34 | 15.00 G |
other | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533030434333 | 0C:C3 | 15.00 G |
other | EMC 150 | - | 219918 - 219924
slds821-back | 360000970000292602150533031313142 | 11:1B | 30.00 G |
other | EMC 150 | - | 219918 - 219924
...

Pre- Running 1. Check the server for running processes which are not system-related. (eg. application,
checks processes database, ...). If necessary check with application or database team before powering
down the server.
2. If the OR indicates a full backup should be taken before removal of the server, check
with backup team this backup has been taken.

Decom Actions for 3. Change the root password to 'd'.


actions physical
server only # passwd root

4. Move the network configuration to avoid duplicate IP in case the server boots
accidentally (eg. during re-install).

# mv /etc/sysconfig/network-scripts/ifcfg-* /root

Page 48 of 97
DXC Technology Confidential
Action for all Nimsoft agent.
servers. 6. Activation maintenance mode for 8 days. (this prevents alarms generated due to
removal to reach control room). Typical decom procedure states that the server is first
powered down for a week after which it is physical removed (either remove from disk or
from DC rack).

$ esm_put_server_in_maintenance <server> 11520

--- Example
$ esm_put_server_in_maintenance vlds604 11520
200 OK
Host vlds604 has been put in maintenance for 11520 minutes.

7. Stop Nimsoft agent on the server.

$ ssh root@<server>
# service nimbus stop

--- Example
$ ssh root@vlds604
# service nimbus stop
NIMBUS_USER: root
Stopping NimBUS:
waiting for program termination...
waiting for program termination...
waiting for program termination...

8. Remove the server from Nimsoft central console.


o Login in Nimsoft Infrastructure Manager (via jumpstation VWPR834)
o Search the server.
o Right click it and click remove.
o Check the section with alarms for alarms related to the server and acknowledge
them.

Action for all Satellite server


Red Hat
servers • Unregister the server

$ ssh root@<server>
# subscription-manager unregister

--- Example
$ ssh root@vlds604
# subscription-manager unregister
System has been unregistered

Shutdown All servers Shutdown


• Stop the server.

# shutdown -h now "Decom of server"

Page 49 of 97
DXC Technology Confidential
Check Nimsoft
• Check Nimsoft console (https://monitoring.netpost/) and acknowledge alarms for the
removed server.

PART B
Process Main Activity Best Practices
Steps
Clean up Action for all Physical Clear RAID configuration
Linux servers • Start the server.
• Upon boot enter RAID controller BIOS
• Wipe RAID configuration
• Power the server down.

Action for all Virtual Remove the server


Linux servers • Open VMware vCenter
• Search for the VM
• Delete the VM from disk.

Action for all Red Hat Puppet


servers See Removal of a Puppet agent
Satellite
Delete host profile

$ ssh root@vlpr269
# hammer host delete --name $FQDN

--- Example
$ ssh root@vlpr269
# hammer host delete --name vlds604.netpost

Action for ALL servers COPS


3. Login on cops as root

$ ssh root@cops

4. Execute removal script. (Since the server is already powered down, the
public keys will not actually be removed from the server)

# /usr/local/umf/bin/remove_host.pl <server> <server>-back

--- Example
# /usr/local/umf/bin/remove_host.pl vlds604 vlds604-back

5. Check if server is removed from COPS inventory.

# /usr/local/umf/bin/list_hosts.pl | grep <server>

--- Example
# /usr/local/umf/bin/list_hosts.pl | grep vlds60

Passwordvault
• Login on passwordvault (https://passwordvault.netpost/PasswordVault/)
Page 50 of 97
DXC Technology Confidential
• Search the server.
• Open server details.
• Click Delete and confirm.

Nimsoft
• Check Nimsoft console (https://monitoring.netpost/) and acknowledge
alarms for the removed server.

Administration CMDB
• Open CMDB module in Peregrine
• Search the server
• Change CI status to 'RETIRED' (if the current status is still 'REQUESTED', first
change it to 'OPERATIONAL' and than change to 'RETIRED').
• Save changes.

ISR tool
3. Close OR task(s).

Page 51 of 97
DXC Technology Confidential
2.3 USER ACCOUNT MANAGEMENT
• User roles and access are based on a custom framework
• Called UMF (User Management Framework) based
on a central gateway server (COPS).
• The unsecure key pair grants a user from his/her
workstation on his/her account on cops. This key
pair however doesn’t
• grant access on any of the accounts on any of the
servers.
• The private key of the secure key pair is stored on
cops. Each user account on cops has a personal
secure key pair which is generated upon creation
of the user account.
• The public key of the secure pair is distributed on
all accounts to which the user has access.
• The gateway offers different services
• -SSH
• -SCP
• -X11
• -SSH-agent forwarding
• Please see the attached doc for complete User
Management.

User
Management.pdf

ADD USER:
• To add a new user, user must sent a filled user template with manager approval.
• Once the manager will be approved, the user must sent the public key to the shared mail box.
• His/her mail must contain the department. Upon arrival of this mail, first send a reply back to
the user mentioning that the request was received and procedure to create his/her account has
been started.
• Copy the public key from the user to COPS.
Ex:#vi u519178_rsa_gk.pub

• The public key sent by the user can either be in putty or in openssh format. Gatekeeper requires
the openssh format.

Page 52 of 97
DXC Technology Confidential
• A key in Putty format begins with '---- BEGIN SSH2 PUBLIC KEY ----'.
• A key in openssh format is similar to:
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAIEApIQfGeQ87oH/hEWOFNtA22jmukpgKUMpH9YFJ
iT42oObh/49VWstxR8fdWO3ptaeiAiSMiAy8a+gYQ07QufFMWMMG7d+2GNwzRNCq9qsSdagD5
IkJA50Uc9SUDDzFZF6Iagfer3W7YoMVOQNT1Eh619oliq5jdB3p0qnLaBGtTE=
• The putty key can be converted to the openssh format with following command.
# ssh-keygen -i -f u519178_rsa_gk.pub > /usr/local/umf/PKD/ u519178_rsa_gk.pub
rsa_gk.pub

• Add an identifier to the public key.


#vi /usr/local/umf/PKD/ u519178_rsa_gk.pub rsa_gk.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAIEApIQfGeQ87oH/hEWOFNtA22jmukpgKUMpH9YFJ
iT42oObh/49VWstxR8fdWO3ptaeiAiSMiAy8a+gYQ07QufFMWMMG7d+2GNwzRNCq9qsSdagD5
IkJA50Uc9SUDDzFZF6Iagfer3W7YoMVOQNT1Eh619oliq5jdB3p0qnLaBGtTE=
u519178@dummy

• Start wizard

• Select 2 option to Add user


• Enter user: <uid>
• Enter username.
• Enter user: u519178

Page 53 of 97
DXC Technology Confidential
• Enter Full username (First LASTNAME):

• Select roles for user. Press enter to finish selecting roles. After each selected role, the access
lists on the server will be updated accordingly.
• Enter role:
• You can assign the same role as reference user which is mentioned in filled user template.
• After that assigning the roles key will be deployed on servers according to the role.

REMOVE USER :
• Select the 17 option from UMF wizard

• Enter user uid.

Page 54 of 97
DXC Technology Confidential
• Once it will be finish, delete the user uid from /etc/passwd,/etc/group and home directory.

• File transfer through sftp using Winscp to/from a server, we have required to open a tunnel.
For ex:

• For providing tunnel facility to user, we need to add user uid inside
/usr/local/umf/tunnels/etc/passwd,
• /usr/local/umf/tunnels/etc/shadow and /usr/local/umf/tunnels/etc/group.

• User access template

Page 55 of 97
DXC Technology Confidential
2.4 UNIX ENVIRONMENT
• The bpost server environment consists of approx. 1253 hosted Unix servers in which approx.
>95% are virtual. All the servers are running with Operating system Redhat Linux.
• A virtual linux VM ranges between 1 Vcpu with 2Gb RAM to 24 Vcpu with 600 Gb RAM. But in
average a virtual Linux VM consumes about 2,11 vCPU’s and 9 Gb RAM.
• The hardware is provided over the years by Compaq, HP, Dell & IBM with a mixture of Blade
servers & 1U form factors but is standardized on HP hardware for over 5 years. The physical
systems are currently connected to the LAN and to the SAN infrastructure.
• Bpost has 3 vCenter servers and therefore 3 separate vSphere environments, ‘Shared’,
‘FEXT’ and ‘VDI’.

• The ‘FEXT’ environment connects directly towards the FEXT firewall (outer boundary of
network) and contains non-domain joined internet facing virtual servers. The ‘VDI’
environment contains virtual desktops provisioned and managed by Citrix. All other virtual
servers are hosted on the ‘Shared’ environment.
• The standard server hardware is HP BL460c Gen9 equipped with 512GB RAM, however there
are some exceptions: For the Oracle- and SQL environment HP DL380 Gen9 are used, for
Oracle-BI DL580 Gen8.

Page 56 of 97
DXC Technology Confidential
• For monitoring the vSphere environment several products are used:
• - vCenter has a powerful alerting or notification system built-in.
• - On all vCenter Servers, a Nimsoft VMware monitoring probe has been installed to handle
all common monitoring and data collection tasks on VMware vCenter and ESXi Servers.
• For the Oracle New Platform environment, vRealize Operations Manager (the monitoring tool for
VMware) and vRealize LogInsight (to collect and analyze the ESXi and vCenter logs) are
configured. bpost also has the intention to extend these tools to the entire vSphere
environment.

Physical linux server hardware:


The physical servers are mostly dedicated server for a specific purpose, or consolidated platforms themselves, or
simply not capable to run in a virtualised environment. What follows is a list of the server groups; their hardware specs
and future plans.

Jboss
This consists of 11 servers; most HP DL 380 gen8, the 2 oldest IBM X3850 M2. All have about 192 Gb RAM and
between 12 and 16 cores each.
The near future plan (2016/2017) is to virtualize this to allow more security zoning, openshift development and for a
mixture between the different RHEL versions (RHEL5, 6 and 7). The proposed hardware for this project is 7 fairly
heavy blades.

Oracle RAC
Oracle RAC is running on ONP – Virtualization of oracle databases over dNFS on VMware ESX.

Page 57 of 97
DXC Technology Confidential
Netbackup infrastructure
This group consists of 8 servers (and 2 test systems); the media servers are HP DL 385 gen 7 with dual 10G and dual
fibers for connectivity to the PTL, and the masters are HP Dl 380 Gen8.

Type No Configuration item


Masterserver 1 sspr007(slpr086)
Standby Master
1 slpr087
server
slpr088/slpr089/slpr090/slpr091/slpr092/slpr0
Media servers 6
93

Red Hat Storage Gluster


This group consists of 12 identical servers; type HP SL 4540 Gen 8 in a DUAL configuration; dual 10G connections per
server; and this group hosts about 900 Tb of NAS-storage which we use uniquely for VTL purposes.

Internal tooling
This group consists of servers for trending, awstat, cups and BDF; and is composed of 8 HP DL 380 Gen 8 servers, Not
virtualised because of very high I/O load (BDF, awstat and trending) or because it doesn’t work virtualized (CUPS).

Page 58 of 97
DXC Technology Confidential
2.5 BEST PRACTICES

• Configuration of RAID on physical servers:

Configuration of
RAID on physical servers .pdf

• Configuration of UEFI/BIOS on physical servers:

Configuration of
UEFIBIOS on physical servers.pdf

• Configuration of RSA/ILO card:

Configuration of
RSAILO card.pdf

• Hardware repairs:

Hardware
repairs.pdf

• Remove a CIFS share:

Remove a CIFS
share.pdf

Page 59 of 97
DXC Technology Confidential
2.6 RED HAT STORAGE ENVIORNMENT
• Description & facts
• Configuration of Redhat Storage Servers :
• 25 x 4TB SATA disks per node
• 2 nodes per box
• 2 x 10G per node -1 port is actually a 40G port; but the adapter is included to reduce it
to 10G
• 5 Y support
• Design

Site A Muizen Site B Roosendaal

• The network setup is based on an isolation of this type of storage in one single Vlan (Vlan
120). This was done because of many reasons:

Page 60 of 97
DXC Technology Confidential
• Containment of this storage so it cannot be used for other purposes.
• Bandwidth capping because these machines, especially clustered, can deliver more
bandwidth than the network.
• The Current Setup :
BMB Serial Chassis Host Location Rack ILO IP ILO DNS OS IP
BMB- CZ3433ASJT CZ3433ASJM slpr600 Muizen AT32 10.199.198.71 bmb- 10.199.190.32
1409001 1409001.hardware.netpost
BMB- CZ3433ASJW CZ3433ASJM slpr602 Muizen AT32 10.199.198.73 bmb- 10.199.190.34
1409002 1409002.hardware.netpost
BMB- CZ3433ASJN CZ3433ASJL slpr604 Muizen AT32 10.199.198.74 bmb- 10.199.190.36
1409003 1409003.hardware.netpost
BMB- CZ3433ASJR CZ3433ASJL slpr606 Muizen AT32 10.199.198.75 bmb- 10.199.190.38
1409004 1409004.hardware.netpost
BMB- CZ3433AS4K CZ3433AS3Y slpr601 Roosenda CB23 10.199.198.77 bmb- 10.199.190.33
1409005 al 1409005.hardware.netpost
BMB- CZ3433AS4N CZ3433AS3Y slpr603 Roosenda CB23 10.199.198.78 bmb- 10.199.190.35
1409006 al 1409006.hardware.netpost
BMB- CZ3433AS40 CZ3433AS3X slpr605 Roosenda CB23 10.199.198.79 bmb- 10.199.190.37
1409007 al 1409007.hardware.netpost
BMB- CZ3433AS42 CZ3433AS3X slpr607 Roosenda CB23 10.199.198.81 bmb- 10.199.190.39
1409008 al 1409008.hardware.netpost
BMB- CZ3451KDA5 CZ3451KDA4 slpr608 Muizen AV35 10.199.198.162 bmb- 10.199.190.40
1501008 1501008.hardware.netpost
BMB- CZ3451KDA7 CZ3451KDA4 slpr610 Muizen AV35 10.199.198.163 bmb- 10.199.190.41
1501009 1501009.hardware.netpost
BMB- CZ3451KDAC CZ3451KDA9 slpr609 Roosenda CB20 10.199.198.170 bmb- 10.199.190.42
1501010 al 1501010.hardware.netpost
BMB- CZ3451KDAA CZ3451KDA9 slpr611 Roosenda CB20 10.199.198.171 bmb- 10.199.190.43
1501011 al 1501011.hardware.netpost

Note: We used one KVM connection for each box to configure the ILO per node even though there is only one ILO
cable in vlan 128; both ILO's work. You cannot switch the KVM from one blade to another; this type of box is not a
bladecenter, as it has no common parts except the ILO module.

We set up the replication like this:

Host Brick Brick Host


Slpr600 Brick1 <= Replica => Brick1 Slpr601
Slpr600 Brick2 <= Replica => Brick2 Slpr603
Slpr602 Brick1 <= Replica => Brick1 Slpr603
Slpr602 Brick2 <= Replica => Brick2 Slpr601
Slpr604 Brick1 <= Replica => Brick1 Slpr605
Slpr604 Brick2 <= Replica => Brick2 Slpr607
Slpr606 Brick1 <= Replica => Brick1 Slpr607
Slpr606 Brick2 <= Replica => Brick2 Slpr605
slpr608 Brick1 <= Replica => Brick1 slpr609
slpr608 Brick2 <= Replica => Brick2 slpr611

Page 61 of 97
DXC Technology Confidential
slpr610 Brick1 <= Replica => Brick1 slpr611
slpr610 Brick2 <= Replica => Brick2 slpr609
For 2 reasons:
• This allow us to add one node in each Datacenter.
• This avoids stressing one (mirror) host in case of host failure.

Installation and Configuration:


• Please follow the below attached document for installation.

RHSS 3 on HP
SL4540 using B120i RAID.pdf

RHSS 3 on HP SL4540 using B120i RAID.pdf

• Our current standard - described in the Red Hat Storage Description - is based on
proprietary HP machine, a HP SL 4540 gen 8.
• This machine has 2 vital components which do not have a driver in the Red Hat Storage
distribution; CQ the Mellanox 10 (and 40) Gb network adapters, and the B120I RAID
controller (for the OS disks).

Installation procedure:
• First: Configure the ILO via the KVM (DSview= > Port and KVM should be in CMDB) (Press
F8 during boot when you see the message about the ILO).
• IP: Ip adress; netmask, gateway
• Mode: Static (it defaults to DHCP).
• Add User => USERID / Password can be found in the passwordvault.
• reboot; connect a web interface to the ILO from a jumpserver.
• You will need a ILO license (in the administration section) to enable the remote console
(KVM)
• At this point; you don't need the KVM/DSVIEW anymore; and you can have the DCM team
attach the dongle to another node.
• After the license, open a console (via a .net or a JAVA plugin).
• Attach to the HP service pack for firmware updates & standardization:
• \\r200a\software\Software_Systems\07.hardware\02.hp\01. HP ServicePack for
Proliant\HP_Service_Pack_for_ProLiant_2014.09.0_792934_001_spp_2014.09.0-
SPP2014090.2014_0827.10
• Start like in the PDF above; follow ALL steps until you need to configure the large RAID6 disk
arrays.
• The web interface works fine for the B120i controller; but doesn't work for the smartarray
410i.
• Reboot the server.

Page 62 of 97
DXC Technology Confidential
• Use the CLI to of the smartarray 410i (F5 for the CLI).
• Create 2 diskgroups; each with 12 disks in a RAID 6 (ADG) configuration.
• The second diskgroup (Disks 13 - 24); add a hot-spare (Disk 25).
• Save the setup.
• Reboot; attach the customized DVD image (
\\r200a\software\Software_Systems\06.storage\RHS3.0\rhs-3.0-rhel-6-x86_64-dvd-post )
• Boot from the image; follow the PDF.
• Configure one network interface in VLAN 120; single IP, that you know that has Link. Simple
setup; ipv4 only; static configuration. In fact you only need to add something that has link
to make it continue; afterwards you need to reconfigure it anyway.
• When the graphical installer comes up; press ctrl-alt-f2
• You need to insert the hpvsa driver; otherwise the local disks are not recognized. It can be
found on the installation ISO under /mnt/stage2/hpvsa. I found it best to copy the entire
directory to a newly created directory (mkdir -p /mnt/hp; cp /mnt/stage2/hpvsa/* /mnt/hp).
• Insert the hpvsa module (insmod /mnt/hp/hpvsa.ko)
• ctrl-alt-f1 to the GUI; continue the installation like in the PDF above.
• You can now detect the local disks to install the operating system.
• Keep all choices default except the hostname => Slpr60X in our case (slpr600, slpr601...
until slpr607 currently).
• Root password: keep is super simple; as you will need to use this a few times.
• Once the installation finishes copying packages; break to the console again with Ctrl-Alt-F2.
• Attach the USB image which can be found along with the RHS Image:
SL4540_hpvsa_mlnx_driver.iso
• Mount the USB image: mkdir /mnt/stick; mount /dev/usb /mnt/stick
• copy everything from the stick to the local disk: cp /mnt/stick/* /mnt/usb
• Chroot to the freshly installed OS
• chroot /mnt/sysimage
• rpm –i /mnt/usb/kmod--‐hpvsa--‐1.2.10--‐120.rhel6u5.x86_64.rpm => local disk drives
• rpm –i /mnt/usb/kmod-mellanox-mlnx-en-1.5.7.2-1.rhel6u2.x86_64.rpm => 10G cards
• untar the slpr600.tgz in /root
• fix the networking interfaces; with LACP ; vlan tagging and a bonded interface with 2
slaves. Don't forget the default gateway in /etc/sysconfig/network
• Exit the chroot : exit

• Reboot the server; if everything goes well; it should reboot completely with redundant 10G
networking.
• Repeat the steps above PER node; sync them all. They should ALL be now in the same state.

Gluster Installation
• Verify correct boot on all nodes - they must be identical except for hostname and IP.

Page 63 of 97
DXC Technology Confidential
• Verify correct networking - redundant with both links up. (cat /proc/net/bonding/bond0 &
ping 10.199.190.1 (default gateway in vlan 120 )
• If not OK on ALL nodes => fix
• deploy COPS keys
• Sync the time config; verify it's OK. (NTP is very important for gluster) : for i in 0 1 2 3 4 5
6 7 ; do echo slpr60$i; ssh root@slpr60$i "date"; done
• Sync the /etc/resolv.conf and /etc/hosts file for each node
• Verify connectivity between nodes => from SLPR600 :
ping slpr601
ping slpr602
ping slpr603
ping slpr604
ping slpr605
ping slpr606
ping slpr607
• set up the gluster cluster => from slpr600 :
gluster peer probe slpr601;
gluster peer probe slpr602;
gluster peer probe slpr603;
gluster peer probe slpr604;
gluster peer probe slpr605;
gluster peer probe slpr606;
gluster peer probe slpr607;

• Configure the disks (from cops; all at once; keeps it nice and synchronized):
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "cat
/sys/block/sda/queue/optimal_io_size"; done
for i in 0 1 2 3 4 5 6 7 ; do ssh root@slpr60$i "cat /sys/block/sda/queue/optimal_io_size";
done
for i in 0 1 2 3 4 5 6 7 ; do ssh root@slpr60$i "cat /sys/block/sdb/queue/optimal_io_size";
done
for i in 0 1 2 3 4 5 6 7 ; do ssh root@slpr60$i "cat
/sys/block/sda/queue/physical_block_size"; done
for i in 0 1 2 3 4 5 6 7 ; do ssh root@slpr60$i "cat
/sys/block/sdb/queue/physical_block_size"; done
for i in 0 1 2 3 4 5 6 7 ; do ssh root@slpr60$i "parted -a optimal /dev/sda mkpart primairy
5120s 100%; parted -a optimal /dev/sdb mkpart primairy 5120s 100%"; done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "parted -l | grep 40.0"; done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "pvcreate --dataalignment
2560k /dev/sda1; pvcreate --dataalignment 256 0k /dev/sdb1"; done

Page 64 of 97
DXC Technology Confidential
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "pvs"; done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "vgcreate -s64 vgbrick1
/dev/sda1; vgcreate -s64 vgbrick2 /dev/sdb1";done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "lvcreate -l 100%VG -n brick1
vgbrick1; lvcreate -l 100%VG -n brick2 vg brick2"; done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "mkfs.xfs -i
size=512,maxpct=0 -n size=8192 -d su=256k,sw=10,agcount=300 /dev/vgbrick1/brick1";
done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "mkfs.xfs -i
size=512,maxpct=0 -n size=8192 -d su=256k,sw=10,agcount=300 /dev/vgbrick2/brick2";
done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "mkdir /mnt/brick1; mkdir
/mnt/brick2"; done
• Add the following lines to all /etc/fstab's :
/dev/vgbrick1/brick1 /mnt/brick1 xfs
node64,allocsize=4k,logbsize=256k,noatime,nodiratime,nobarrier 1 1
/dev/vgbrick2/brick2 /mnt/brick2 xfs
inode64,allocsize=4k,logbsize=256k,noatime,nodiratime,nobarrier1 1

Then mount the bricks:


for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "mount -a; df -h"; done
for i in 0 1 2 3 4 5 6 7 ; do echo slpr60$i; ssh root@slpr60$i "mount -a; df -h"; done

• Check the status of all nodes. They should now have 2 x 40Tb volumes available.
• Start the gluster CLI on one node. => "gluster"
• check the status of the peers :
• peer status => you should see ALL nodes except localhost
• pool list = status of all nodes including localhost
• Everything should be OK & connected.

• Create the volume:


• volume create rhs replica 2 transport tcp slpr600:/mnt/brick1 slpr601:/mnt/brick1
slpr600:/mnt/brick2 slpr603:/mnt/brick2 slpr602:/mnt/brick1 slpr603:/mnt/brick1
slpr602:/mnt/brick2 slpr601:/mnt/brick2 slpr604:/mnt/brick1 slpr605:/mnt/brick1
slpr604:/mnt/brick2 slpr607:/mnt/brick2 slpr606:/mnt/brick1 slpr607:/mnt/brick1
slpr606:/mnt/brick2 slpr605:/mnt/brick2
• volume start rhs
• volume status rhs => check if all is OK (the third column should be only "Y"; and all bricks should
be visible

Page 65 of 97
DXC Technology Confidential
• .This creates the following setup (<=> mean replica ):
Brick slpr600:/mnt/brick1/rhs <=> Brick slpr601:/mnt/brick1/rhs
Brick slpr600:/mnt/brick2/rhs <=> Brick slpr603:/mnt/brick2/rhs
Brick slpr602:/mnt/brick1/rhs <=> Brick slpr603:/mnt/brick1/rhs
Brick slpr602:/mnt/brick2/rhs <=> Brick slpr601:/mnt/brick2/rhs
Brick slpr604:/mnt/brick1/rhs <=> Brick slpr605:/mnt/brick1/rhs
Brick slpr604:/mnt/brick2/rhs <=> Brick slpr607:/mnt/brick2/rhs
Brick slpr606:/mnt/brick1/rhs <=> Brick slpr607:/mnt/brick1/rhs
Brick slpr606:/mnt/brick2/rhs <=> Brick slpr605:/mnt/brick2/rhs

• apply the profile for high-throughput : tuned-adm profile rhs-high-throughput


Additional steps
• Register with satellite server :
rhnreg_ks --force --activationkey 1-AK-RHS3.0 --serverUrl http://rhn.netpost/XMLRPC

• Install additional packages :


#yum install nimsoft-robot.x86_64.rpm (don't forget to configure the correct IP in
/opt/nimsoft/robot/robot.cfg )
#yum install hpssacli-2.0-23.0.x86_64.rpm (HP tool for RAID controllers & wear levelling)
#yum install hp-ams-2.0.0-1372.35.rhel6.x86_64.rpm (HP Agentless management service;
allow SNMP traps via the out-of-band ILO card)
yum install hpacucli-9.40-12.0.x86_64.rpm (HP CLI to interact with the RAID controller )
#yum install net-snmp.x86_64 net-snmp-utils.x86_64 net-snmp-libs.x86_64 (SNMP
functionality)

General Operations :
• Disk & hardware failures
You can check disk failures simply like this :

Page 66 of 97
DXC Technology Confidential
for i in slpr600 slpr601 slpr602 slpr603 slpr604 slpr605 slpr606 slpr607 ; do echo $i; ssh
root@$i "hpacucli ctrl all show config | grep "physicaldrive"| grep -v OK"; done

This command will return the below output if any disk will be fail :
slpr605
physicaldrive 2I:1:21 (port 2I:box 1:bay 21, SATA, 4000.7 GB, Predictive Failure)
• This means that on server slpr605; one disk is in "predictive failure" state; it's still working;
but one SMART parameter has gone wild and told the server that this disk has a much
higher failure chance.
• If disks fail; open a case with HP, and let HP replace the disk.
• If nodes fail - or part of a single node - It is better to leave it up if it is functional; gluster is
designed to handle machine failure. In one case we had a system failure; which lasted about
11 days; causing a delta of more than 6 000 000 files. To clean this; I executed this:
#gluster volume heal rhs full
• The "full" behind this command says to gluster to make it a higher priority to heal. In this
mode; healing took about 2 days; while in normal mode it would have taken more than a
week. In our case, this is advisable; since our 10G lines are capped on 3Gb/s, and we only
have 4 clients (the media servers), our CPU/memory usage is extremely low, so healing at a
higher rate is advisable because the customer doesn't notice.
• As always; it is needed to create a case with HP for each failure (Disk, Hardware ...). We
have paid for 5y 24h support.

• in case of a node crash; you can use gluster command on the other (remaining) servers to
gather info.
Examine Gluster health:
#gluster volume info rhs
Volume Name: rhs
Type: Distributed-Replicate
Volume ID: 6df43fd2-3f88-4631-8fb5-35bf65df0d73
Status: Started
Snap Volume: no
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: slpr600:/mnt/brick1/rhs
Brick2: slpr601:/mnt/brick1/rhs
Brick3: slpr600:/mnt/brick2/rhs
Brick4: slpr603:/mnt/brick2/rhs
Brick5: slpr602:/mnt/brick1/rhs
Brick6: slpr603:/mnt/brick1/rhs

Page 67 of 97
DXC Technology Confidential
Brick7: slpr602:/mnt/brick2/rhs
Brick8: slpr601:/mnt/brick2/rhs
Brick9: slpr604:/mnt/brick1/rhs
Brick10: slpr605:/mnt/brick1/rhs
Brick11: slpr604:/mnt/brick2/rhs
Brick12: slpr607:/mnt/brick2/rhs
Brick13: slpr606:/mnt/brick1/rhs
Brick14: slpr607:/mnt/brick1/rhs
Brick15: slpr606:/mnt/brick2/rhs
Brick16: slpr605:/mnt/brick2/rhs
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
• #gluster volume status rhs => this will show you the current status of the bricks
Status of volume: rhs
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick slpr600:/mnt/brick1/rhs 49152 Y 2053
Brick slpr601:/mnt/brick1/rhs 49152 Y 44406
Brick slpr600:/mnt/brick2/rhs 49153 Y 2052
Brick slpr603:/mnt/brick2/rhs 49152 Y 7193
Brick slpr602:/mnt/brick1/rhs 49152 Y 15494
Brick slpr603:/mnt/brick1/rhs 49153 Y 7204
Brick slpr602:/mnt/brick2/rhs 49153 Y 15505
Brick slpr601:/mnt/brick2/rhs 49153 Y 44417
Brick slpr604:/mnt/brick1/rhs 49152 Y 9799
Brick slpr605:/mnt/brick1/rhs 49152 Y 42660
Brick slpr604:/mnt/brick2/rhs 49153 Y 9810
Brick slpr607:/mnt/brick2/rhs 49152 Y 29163
Brick slpr606:/mnt/brick1/rhs 49152 Y 61727
Brick slpr607:/mnt/brick1/rhs 49153 Y 29174
Brick slpr606:/mnt/brick2/rhs 49153 Y 61738
Brick slpr605:/mnt/brick2/rhs 49153 Y 42671
NFS Server on localhost 2049 Y 15519
Self-heal Daemon on localhost N/A Y 15525

Page 68 of 97
DXC Technology Confidential
NFS Server on 10.199.190.32 2049 Y 2062
Self-heal Daemon on 10.199.190.32 N/A Y 2067
NFS Server on slpr604 2049 Y 9824
Self-heal Daemon on slpr604 N/A Y 9830
NFS Server on slpr606 2049 Y 61753
Self-heal Daemon on slpr606 N/A Y 61752
NFS Server on slpr603 2049 Y 7218
Self-heal Daemon on slpr603 N/A Y 7224
NFS Server on slpr605 2049 Y 42686
Self-heal Daemon on slpr605 N/A Y 42691
NFS Server on slpr601 2049 Y 44432
Self-heal Daemon on slpr601 N/A Y 44437
NFS Server on slpr607 2049 Y 29197
Self-heal Daemon on slpr607 N/A Y 29203

Task Status of Volume rhs


------------------------------------------------------------------------------
There are no active volume tasks

• #gluster volume heal rhs info => this will show you how many files are not
synchronized.
Brick slpr600:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr601:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr600:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr603:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr602:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr603:/mnt/brick1/rhs/
Page 69 of 97
DXC Technology Confidential
Number of entries: 0

Brick slpr602:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr601:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr604:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr605:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr604:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr607:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr606:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr607:/mnt/brick1/rhs/
Number of entries: 0

Brick slpr606:/mnt/brick2/rhs/
Number of entries: 0

Brick slpr605:/mnt/brick2/rhs/
/
<gfid:a50b7113-a854-4a97-94fa-61ad18c33f2b>
<gfid:659df82e-a414-47d4-8053-910ff965f93a>
Number of entries: 3

• =>The last lines were generated during a recabling of slpr604 and slpr606. This means
slpr605:brick2 has a few files not synchronized with slpr606; brick2. This disappeared a
minute later as the auto-heal daemon fixed the issue.

Page 70 of 97
DXC Technology Confidential
• =>To "follow" the recovery after a prolonged failure after you have executed the "gluster
volume heal rhs full" command; the "gluster volume heal rhs info" command takes too long
to execute; as it makes a list of files which need to be synchronized. I worked around that
problem like this :
#gluster volume heal rhs info >> rhs_heal &
#cat rhs_heal | grep "Number"

This will show you the amount of to-be-synched files. If you do this over longer periods of
time; you will see the number reducing untill it reaches zero; CQ healing is finished.

Commands to be performed for maintenance:


- Halt the system gracefully with the shutdown command from the ILO: slpr601 slpr603
slpr605 slpr607 slpr609 slpr611

- After the systems have been powered-on and are online, make sure that the firewall
service is stopped and disabled:

# systemctl status firewalld

If it is not the case:

# systemctl stop firewalld


# systemctl disable firewalld

- Check the status of glusterd service:

# systemctl status glusterd.service

Here is an example from slpr601, note that each brick has its own process

[root@slpr601 ~]# systemctl status glusterd.service


? glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset:
disabled)
Active: active (running) since Sat 2018-05-26 02:02:14 CEST; 4 months 24 days ago
Main PID: 1595 (glusterd)
CGroup: /system.slice/glusterd.service
+- 1595 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
+-23170 /usr/sbin/glusterfsd -s slpr601 --volfile-id rhs.slpr601.mnt-brick1 -p
/var/run/gluster/vols/rhs/slpr601...
Page 71 of 97
DXC Technology Confidential
+-23189 /usr/sbin/glusterfsd -s slpr601 --volfile-id rhs.slpr601.mnt-brick2 -p
/var/run/gluster/vols/rhs/slpr601...
+-23209 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glusters...

[root@slpr601 ~]# ps -ef|grep gluster


root 1595 1 0 May26 ? 00:57:16 /usr/sbin/glusterd -p /var/run/glusterd.pid --
log-level INFO
root 23170 1 15 Oct18 ? 03:55:03 /usr/sbin/glusterfsd -s slpr601 --volfile-id
rhs.slpr601.mnt-brick1 -p /var/run/gluster/vols/rhs/slpr601-mnt-brick1.pid -S
/var/run/gluster/19a1b61d078fdef8c2f13b6a05aaa0f5.socket --brick-name /mnt/brick1 -l
/var/log/glusterfs/bricks/mnt-brick1.log --xlator-option *-posix.glusterd-uuid=442de2c2-
c447-4a4b-9bf9-c1a50fc7ed34 --brick-port 49152 --xlator-option rhs-server.listen-
port=49152
root 23189 1 17 Oct18 ? 04:27:51 /usr/sbin/glusterfsd -s slpr601 --volfile-id
rhs.slpr601.mnt-brick2 -p /var/run/gluster/vols/rhs/slpr601-mnt-brick2.pid -S
/var/run/gluster/963093c6bff315df42193a66e2736841.socket --brick-name /mnt/brick2 -l
/var/log/glusterfs/bricks/mnt-brick2.log --xlator-option *-posix.glusterd-uuid=442de2c2-
c447-4a4b-9bf9-c1a50fc7ed34 --brick-port 49153 --xlator-option rhs-server.listen-
port=49153
root 23209 1 17 Oct18 ? 04:27:30 /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/3884db1ceddea6a7d8b74b7c2d625aba.socket --xlator-option
*replicate*.node-uuid=442de2c2-c447-4a4b-9bf9-c1a50fc7ed34
root 52894 52519 0 12:50 pts/0 00:00:00 grep --color=auto gluster
[root@slpr601 ~]#

- If not already started, start the glusterd service above using the following command:

# systemctl start glusterd.service

- Check the status of rhs volume:

# gluster volume status rhs

# gluster volume info rhs

There should be "Y" in the column Online, meaning that the brick is visible. Any node can
see the bricks from the other 11 nodes

Page 72 of 97
DXC Technology Confidential
[root@slpr601 ~]# gluster volume status rhs
Status of volume: rhs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick slpr601:/mnt/brick1 49152 0 Y 23170
Brick slpr600:/mnt/brick1 49152 0 Y 6816
Brick slpr601:/mnt/brick2 49153 0 Y 23189
Brick slpr600:/mnt/brick2 49153 0 Y 6858
Brick slpr603:/mnt/brick1 49152 0 Y 60718
Brick slpr602:/mnt/brick1 49152 0 Y 10959
Brick slpr603:/mnt/brick2 49153 0 Y 60737
Brick slpr602:/mnt/brick2 49153 0 Y 10978
Brick slpr605:/mnt/brick1 49152 0 Y 15867
Brick slpr604:/mnt/brick1 49152 0 Y 29853
Brick slpr605:/mnt/brick2 49153 0 Y 15887
Brick slpr604:/mnt/brick2 49153 0 Y 29872
Brick slpr607:/mnt/brick1 49152 0 Y 51959
Brick slpr606:/mnt/brick1 49154 0 Y 46510
Brick slpr607:/mnt/brick2 49153 0 Y 51978
Brick slpr606:/mnt/brick2 49155 0 Y 46532
Brick slpr609:/mnt/brick1 49152 0 Y 3272
Brick slpr608:/mnt/brick1 49152 0 Y 32728
Brick slpr609:/mnt/brick2 49153 0 Y 3279
Brick slpr608:/mnt/brick2 49153 0 Y 32747
Brick slpr611:/mnt/brick1 49152 0 Y 3593
Brick slpr610:/mnt/brick1 49152 0 Y 16265
Brick slpr611:/mnt/brick2 49153 0 Y 3602
Brick slpr610:/mnt/brick2 49153 0 Y 16284
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 23209
NFS Server on slpr607 N/A N/A N N/A
Self-heal Daemon on slpr607 N/A N/A Y 51998
NFS Server on slpr604 N/A N/A N N/A
Self-heal Daemon on slpr604 N/A N/A Y 29892
NFS Server on slpr605 N/A N/A N N/A
Self-heal Daemon on slpr605 N/A N/A Y 15910
NFS Server on slpr609 N/A N/A N N/A

Page 73 of 97
DXC Technology Confidential
Self-heal Daemon on slpr609 N/A N/A Y 2825
NFS Server on slpr603 N/A N/A N N/A
Self-heal Daemon on slpr603 N/A N/A Y 60757
NFS Server on slpr608 N/A N/A N N/A
Self-heal Daemon on slpr608 N/A N/A Y 32767
NFS Server on slpr611 N/A N/A N N/A
Self-heal Daemon on slpr611 N/A N/A Y 3248
NFS Server on slpr610 N/A N/A N N/A
Self-heal Daemon on slpr610 N/A N/A Y 16304
NFS Server on slpr606 N/A N/A N N/A
Self-heal Daemon on slpr606 N/A N/A Y 46560
NFS Server on slpr600 N/A N/A N N/A
Self-heal Daemon on slpr600 N/A N/A Y 6884
NFS Server on slpr602 N/A N/A N N/A
Self-heal Daemon on slpr602 N/A N/A Y 10998

Task Status of Volume rhs


------------------------------------------------------------------------------
Task : Rebalance
ID : b86021bc-75ce-40c6-aa9b-307ca7236a5f
Status : completed

[root@slpr601 ~]# gluster volume info rhs

Volume Name: rhs


Type: Distributed-Replicate
Volume ID: 98000f38-f99d-437f-aee4-16d31c7e6b46
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: slpr601:/mnt/brick1
Brick2: slpr600:/mnt/brick1
Brick3: slpr601:/mnt/brick2
Brick4: slpr600:/mnt/brick2
Brick5: slpr603:/mnt/brick1

Page 74 of 97
DXC Technology Confidential
Brick6: slpr602:/mnt/brick1
Brick7: slpr603:/mnt/brick2
Brick8: slpr602:/mnt/brick2
Brick9: slpr605:/mnt/brick1
Brick10: slpr604:/mnt/brick1
Brick11: slpr605:/mnt/brick2
Brick12: slpr604:/mnt/brick2
Brick13: slpr607:/mnt/brick1
Brick14: slpr606:/mnt/brick1
Brick15: slpr607:/mnt/brick2
Brick16: slpr606:/mnt/brick2
Brick17: slpr609:/mnt/brick1
Brick18: slpr608:/mnt/brick1
Brick19: slpr609:/mnt/brick2
Brick20: slpr608:/mnt/brick2
Brick21: slpr611:/mnt/brick1
Brick22: slpr610:/mnt/brick1
Brick23: slpr611:/mnt/brick2
Brick24: slpr610:/mnt/brick2
[root@slpr601 ~]#

- On each node, check that bricks are mounted with findmnt, e.g.:

[root@slpr601 ~]# findmnt /mnt/brick1


TARGET SOURCE FSTYPE OPTIONS
/mnt/brick1 /dev/mapper/vgbrick1-brick1 xfs
rw,noatime,nodiratime,seclabel,attr2,nobarrier,inode64,allocsize=4k,logbsize=25
[root@slpr601 ~]# findmnt /mnt/brick2
TARGET SOURCE FSTYPE OPTIONS
/mnt/brick2 /dev/mapper/vgbrick2-brick2 xfs
rw,noatime,nodiratime,seclabel,attr2,nobarrier,inode64,allocsize=4k,logbsize=25
[root@slpr601 ~]#

- Check from one node that other 11 nodes are Connected. It should display "State: Peer in
Cluster (Connected)" for each of the nodes.

# gluster peer status

Page 75 of 97
DXC Technology Confidential
[root@slpr601 ~]# gluster peer status
Number of Peers: 11

Hostname: slpr603
Uuid: b7b8e8f0-a91e-4df3-8929-6311e90e6821
State: Peer in Cluster (Connected)

Hostname: slpr611
Uuid: 908bd808-8484-49d1-9358-f3dc282a774f
State: Peer in Cluster (Connected)

Hostname: slpr606
Uuid: 48edebf4-6a61-4eb5-8709-0ed029ea4e4e
State: Peer in Cluster (Connected)

Hostname: slpr602
Uuid: 8727399a-322f-492a-8dd8-726b039e543c
State: Peer in Cluster (Connected)

Hostname: slpr610
Uuid: 736ff260-5e60-4b8b-beae-9cc1d5837ad7
State: Peer in Cluster (Connected)

Hostname: slpr608
Uuid: a0530e71-303c-4292-85e4-a7ef9a67d055
State: Peer in Cluster (Connected)

Hostname: slpr604
Uuid: b1cfc465-059c-4247-9d2d-3a3330319f18
State: Peer in Cluster (Connected)

Hostname: slpr607
Uuid: 3699c742-656a-47be-9788-078597de5b82
State: Peer in Cluster (Connected)

Hostname: slpr605
Uuid: 3970e458-9cc9-422f-867f-a0378fcc0203

Page 76 of 97
DXC Technology Confidential
State: Peer in Cluster (Connected)

Hostname: slpr609
Uuid: f9b3ecde-5af6-4c15-9844-c1f7135dcecf
State: Peer in Cluster (Connected)

Hostname: slpr600
Uuid: 6a362cbc-d992-4738-ae66-8d486a17ac05
State: Peer in Cluster (Connected)
[root@slpr601 ~]#

- Check that new files are well created in /mnt/brick1 and /mnt/brick2 on each RHS node

- Check status of gluster autohealing: as systems having brought down there will be some
files needing to be synchronized.
Repeat the check a few times (after 30 minutes) to validate that the the number of entries
in the output below go down.

[root@slpr601 ~]# gluster vol heal rhs statistics heal-count


Gathering count of entries to be healed on volume rhs has been successful

Brick slpr601:/mnt/brick1
Number of entries: 176

Brick slpr600:/mnt/brick1
Number of entries: 0

Brick slpr601:/mnt/brick2
Number of entries: 0

Brick slpr600:/mnt/brick2
Number of entries: 0

Brick slpr603:/mnt/brick1
Number of entries: 0

Brick slpr602:/mnt/brick1
Number of entries: 0

Page 77 of 97
DXC Technology Confidential
Brick slpr603:/mnt/brick2
Number of entries: 0

Brick slpr602:/mnt/brick2
Number of entries: 0

Brick slpr605:/mnt/brick1
Number of entries: 0

Brick slpr604:/mnt/brick1
Number of entries: 0

Brick slpr605:/mnt/brick2
Number of entries: 0

Brick slpr604:/mnt/brick2
Number of entries: 0

Brick slpr607:/mnt/brick1
Number of entries: 0

Brick slpr606:/mnt/brick1
Number of entries: 0

Brick slpr607:/mnt/brick2
Number of entries: 0

Brick slpr606:/mnt/brick2
Number of entries: 0

Brick slpr609:/mnt/brick1
Number of entries: 0

Brick slpr608:/mnt/brick1
Number of entries: 0

Page 78 of 97
DXC Technology Confidential
Brick slpr609:/mnt/brick2
Number of entries: 0

Brick slpr608:/mnt/brick2
Number of entries: 0

Brick slpr611:/mnt/brick1
Number of entries: 0

Brick slpr610:/mnt/brick1
Number of entries: 0

Brick slpr611:/mnt/brick2
Number of entries: 0

Brick slpr610:/mnt/brick2
Number of entries: 0
[root@slpr601 ~]#

Alternatively the command "date;gluster vol heal rhs info" can be used to have more details
on the files involved in the healing process

- In case of issue, browse through the log files located in /var/log/glusterfs

Red Hat Storage: management:


Some Basic guidelines-
• We consider this Red Hat Storage cluster like an appliance. We won't include it in our linux
patchings and treat it like such.
• It is ONLY connected to the media servers now. which means the amount of mount points is
VERY limited.

Page 79 of 97
DXC Technology Confidential
• Any hostname in the cluster is a good point for a mount point. It doesn't matter which host
you take as the first this Gluster does is sending a manifest with the complete cluster layout
(This is also how the mount point can survive a complete site-failure).
• Subscribe any host that needs to have a gluster mount to the RHEL5_64_pr_RHS channel;
as this contains the up-to-date gluster client packages.
• In case of failure & recovery, let the cluster heal itself.

Red Hat Storage: managing disk arrays:

HOW TO INTERACT WITH THE STORAGE ARRAY:


• On all Red Hat Storage servers, hpacucli should be installed. It is a standardized command
line utility that allows monitoring, but also interaction with the RAID controller subsystem.
• This is the Raid controller utilities' prompt. There are 2 ways to execute commands towards
the RAID controller.

[root@slpr603 ~]# hpacucli


HP Array Configuration Utility CLI 9.40.12.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.
=>

=> ctrl slot=0 show status


Dynamic Smart Array B120i RAID in Slot 0 (Embedded)
Controller Status: OK
=>

And

[root@slpr603 ~]# hpacucli ctrl slot=0 show status


Dynamic Smart Array B120i RAID in Slot 0 (Embedded)

Page 80 of 97
DXC Technology Confidential
Controller Status: OK
[root@slpr603 ~]#

Same command we can execute via 2 different ways. Doesn't make much difference, just be aware in which situation
you are working.

HPACUCLI: COMMAND + EXAMPLES:

Utility Keyword abbreviations


Abbreviations chassisname = ch
controller = ctrl
logicaldrive = ld
physicaldrive = pd
drivewritecache = dwc
hpacucli utility
hpacucli # hpacucli

# hpacucli help

Note: you can use the hpacucli command in a script


Controller Commands
Display (detailed) hpacucli> ctrl all show config
hpacucli> ctrl all show config detail
Status hpacucli> ctrl all show status
Cache hpacucli> ctrl slot=0 modify dwc=disable
hpacucli> ctrl slot=0 modify dwc=enable
Rescan hpacucli> rescan

Note: detects newly added devices since the last rescan


Physical Drive Commands
Display (detailed) hpacucli> ctrl slot=0 pd all show
hpacucli> ctrl slot=0 pd 2:3 show detail

Note: you can obtain the slot number by displaying the controller configuration (see above)
Status hpacucli> ctrl slot=0 pd all show status
hpacucli> ctrl slot=0 pd 2:3 show status
Erase hpacucli> ctrl slot=0 pd 2:3 modify erase
Blink disk LED hpacucli> ctrl slot=0 pd 2:3 modify led=on
hpacucli> ctrl slot=0 pd 2:3 modify led=off
Logical Drive Commands

Page 81 of 97
DXC Technology Confidential
Display (detailed) hpacucli> ctrl slot=0 ld all show [detail]
hpacucli> ctrl slot=0 ld 4 show [detail]
Status hpacucli> ctrl slot=0 ld all show status
hpacucli> ctrl slot=0 ld 4 show status
Blink disk LED hpacucli> ctrl slot=0 ld 4 modify led=on
hpacucli> ctrl slot=0 ld 4 modify led=off
re-enabling failed drive hpacucli> ctrl slot=0 ld 4 modify reenable forced
Create # logical drive - one disk
hpacucli> ctrl slot=0 create type=ld drives=1:12 raid=0

# logical drive - mirrored


hpacucli> ctrl slot=0 create type=ld drives=1:13,1:14 size=300 raid=1

# logical drive - raid 5


hpacucli> ctrl slot=0 create type=ld drives=1:13,1:14,1:15,1:16,1:17 raid=5

Note:
drives - specific drives, all drives or unassigned drives
size - size of the logical drive in MB
raid - type of raid 0, 1 , 1+0 and 5
Remove hpacucli> ctrl slot=0 ld 4 delete
Expanding hpacucli> ctrl slot=0 ld 4 add drives=2:3
Extending hpacucli> ctrl slot=0 ld 4 modify size=500 forced
Spare hpacucli> ctrl slot=0 array all add spares=1:5,1:7

HPACUCLI: PRACTICAL EXAMPLE: HOW TO MOVE A SPARE DISK FROM ONE ARRAY TO
ANOTHER?
• Since we have 25 disks per server; we have 2 arrays of 12 disks in a RAID 6 configuration.
That makes 24 disks. The 25th disk is the hot spare.
• But a hot spare can only be assigned to one array at a time. So what if a disk fails in the
other array; where there is no hot spare?

[root@slpr603 ~]# hpacucli ctrl slot=1 show config


Smart Array P420i in Slot 1 (sn: PCFBB%%LM6S0LE)
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (36.4 TB, RAID 6, OK)
physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA, 4000.7 GB, OK)

Page 82 of 97
DXC Technology Confidential
physicaldrive 2I:1:9 (port 2I:box 1:bay 9, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:10 (port 2I:box 1:bay 10, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:11 (port 2I:box 1:bay 11, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:12 (port 2I:box 1:bay 12, SATA, 4000.7 GB, OK)
array B (SATA, Unused Space: 0 MB)
logicaldrive 2 (36.4 TB, RAID 6, OK)
physicaldrive 2I:1:14 (port 2I:box 1:bay 14, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:15 (port 2I:box 1:bay 15, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:16 (port 2I:box 1:bay 16, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:17 (port 2I:box 1:bay 17, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:18 (port 2I:box 1:bay 18, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:19 (port 2I:box 1:bay 19, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:20 (port 2I:box 1:bay 20, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:21 (port 2I:box 1:bay 21, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:22 (port 2I:box 1:bay 22, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:23 (port 2I:box 1:bay 23, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:24 (port 2I:box 1:bay 24, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:25 (port 2I:box 1:bay 25, SATA, 4000.7 GB, OK)
physicaldrive 2I:1:13 (port 2I:box 1:bay 13, SATA, 4000.7 GB, OK, spare)
Enclosure SEP (Vendor ID HP, Model SL454x.2) 378 (WWID: 500143801370AD5B, Port: 2I, Box: 1)
Expander 380 (WWID: 500143801370AD2E, Port: 2I, Box: 1)
SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 379 (WWID: 5001438026F127AF)

In our scenario; we have a hot spare in array B => Disk 2I:1:13


Suppose a disk fails in Array A; we need to remove the hot spare from array B:

[root@slpr603 ~]# hpacucli ctrl slot=1 array B remove spares=all


[root@slpr603 ~]#

This command is fairly harmless; as it fails if you try to remove spares from the wrong array. Now; disk 13 has
become unassigned.

[root@slpr603 ~]# hpacucli ctrl slot=1 show config


...
unassigned
physicaldrive 2I:1:13 (port 2I:box 1:bay 13, SATA, 4000.7 GB, OK)
...

Next; assign it to the proper array

Page 83 of 97
DXC Technology Confidential
[root@slpr603 ~]# hpacucli ctrl slot=1 array A add spares=allunassigned
[root@slpr603 ~]#

That's it; if a drive failed, it should start to rebuild on the hot spare.

Side note: you can check the way the array interacts with the hot spare with this command:

[root@slpr603 ~]# hpacucli ctrl slot=0 show config detail


Dynamic Smart Array B120i RAID in Slot 0 (Embedded)
...
Spare Activation Mode: Activate on drive failure
...

You can disable this too; but this implies that you manually need to trigger hot spare activation.

2.7 HARDWARE SUPPORT


• Support conditions are negotiated / chosen per purchase and depending on the purpose of
the server.
• The most recent purchases all come by default with a 5year support contract. Previously this
was a However, for specific components the support and warranty may from the default
service.
• Repairs are done by engineers of the supplier and therefore need to be granted access to the
datacenter. bpost engineers do not execute any repairs on machines under warranty.
• Default support and warranty for HP machines is currently “HP Foundation Care 5Y 24x7
Service”:
• Coverage window: 24x7: Service is available 24 hours per day, 7 days per week including
HP holidays.
• 4-hour onsite response: For incidents with covered hardware that cannot be resolved
remotely, HP will use commercially reasonable efforts to respond onsite within 4 hours. An
HP authorized representative will arrive at the Customer’s site during the coverage window
to begin hardware maintenance service within four hours of the call having been received
and acknowledged by HP. Onsite response time specifies the period of time that begins when
the initial call has been received and acknowledged by HP, as described in the ‘General
provisions/Other exclusions’ section. The onsite response time ends when the HP authorized
representative arrives at the Customer’s site, or when the reported event is closed with the
explanation that HP has determined it does not currently require an onsite intervention.

Page 84 of 97
DXC Technology Confidential
• DC Address and Access Procedure:

Muizen DC : Roosendaal DC :
Address: Address:
Smisstraat 48 Colt Technology Services BV
2812 Muizen (Mechelen) Argonweg 9
Belgium 4706 NR Roosendaal

H&E intervention: SM9 ticket in E-INCFLS-GDO- H&E intervention: SM9 ticket in E-INCFLS-GDO-
DC-BRU-ONSITESUP, if urgent or OOO confirm DC-ROS-ONSITESUP, if urgent or OOO confirm
by calling +32 4955 723 29/+32 15 454 531 by calling +31 165 527 717
Delivery announcement (preferably 24h in Delivery announcement (preferably 24h in
advance): mail to dc-onsite-support- advance: mail to dc-onsite-support-
mui@dxc.com. ros@dxc.com
Note: If you want to request access for the Muizen or Roosendaal Datacenters yourself, use the
appropriate tooling: https://sam.itcs.houston.dxccorp.net/home

2.8 REMOTE CONNECTIVITY / CONSOLE MANAGEMENT


• Connect to Physical server Webconsole
• Find BMB DNS name in CMDB
• Open HP OpenView console
• Search for CI in Configuration Management
• Navigate to links Tab to retrieve the BMB name
• Connect to Jump Station and Connect to Webconsole
• Open a RDP session to VWPR366 or VWPR528
• Open a web browser to 'http://<server>.hardware.netpost'
• Login using user USERID. (password is available in pwvault).
• In case of HP ILO card:

• Click on 'Remote Console' in the menu on the left-side of the window.


Click 'Launch' in .NET Integrated Remote Console.

• From Citrix OMC you can jump to Windows station vwpr528,and from that windows server
vwpr528, we can launch:
• The vSphere client and from there we can search the CI and launch the console for VM.
• vwpr621 => production ESX Vcenter
• vwpr616 => DMZ vcenter

Page 85 of 97
DXC Technology Confidential
• vwds526 => Lab vcenter

3 NETWORK SERVICE MANAGEMENT

3.1 NTP SERVER


• Standard NTP server used in Bpost ntp.netpost (10.192.200.100)

3.2 DNS
• Two nameservers used in Bpost
nameserver 10.192.200.11
nameserver 10.192.200.4

4 PATCH MANAGEMENT
• Patching is managed by Satellite 6. There are currently 5 different lifecycle environments
- Sandbox
- System Test DEV/TST Patching Cycle
- Development
- User acceptance test UAT Patching Cycle
- Production PRD Patching Cycle

• At creation, every server is linked to one and only one environment. Server will be patched depending on
which environment it belongs to.
• Every year, one or multiple patching campaign are scheduled. Each Patching campaign consists of three
different phases with a 1 month validation period between each.
• DEV/TST Patching Cycle > 1 Month > UAT Patching Cycle > 1 Month > PRD Patching Cycle
• Due to the fact that some systems could not be patches together (members of clusters) and to make sure that
corresponding service/application is kept available, some servers should be separated to different waves which
are patched at different moments during the Cycle.
• Patch list to be sent to Planning coordinator and coordinator will check with application owners and provide the
downtime. After patching is completed we need to inform coordinator and he will update application owners to
test the servers.
• Patching calendar is prepared by Planning Coordinator - VAN DE VLIET Jan

• Patching of RHEL System with Satellite 6

PatchingofRHELsyst
emwithSatellite6.pdf

Page 86 of 97
DXC Technology Confidential
5 SECURITY MANAGEMENT

5.1 HARDENING ON FEXT LINUX VM'S


• Please find the attached doc for Hardening On Fext Linux VM’s

HardeningonFEXTLi
nuxVM's.pdf

5.2 ANTIVIRUS MANAGEMENT


• Symantec Anti-Virus - Installation & upgrades

Symantec Anti-Virus
- Installation and upgrades.pdf

• Symantec Anti-Virus – Troubleshooting

Symantec Anti-Virus
- Troubleshooting.pdf

6 BACKUP MANAGEMENT

6.1 BACKUP
• Please find the attached doc for detailed Backup Management.

EC2
AS_IS__Backup.docx

Page 87 of 97
DXC Technology Confidential
7 BPOST PROCESS
• Incident, Problem and Change Process:
SOPs & Training on ServiceNow-

bpost Incident bpost_Incident


Management
Management SOP_User Training Training1Pack_v1.0.pdf
Guide_Version 0.pdf

bpost Change bpost_Change


Management
Management SOP_User Training Training
Guide_Version 1 0.pdfPack_v1.0.pdf

bpost Problem bpost_Problem


Management
Management SOP_User Training Pack_v1.0.pdf
Training Guide_Version 1 0.pdf

BP1 Incident & BP2 Change


Problem Management.docx
Management.docx

• P1 and crisis P1 incident handling

Below are the standard steps which all the partners to follow during a P1.

In case of a P1
1) P1 will be assigned to assignment group by DCC.
2) DCC will do a warm Handover by Phone call/ Skype Call.
3) Team need to assign the Ticket to another assignment group Need to Call DCC for Warm Handover.
4) If the resolver team needs to reassign the tickets to some other assignment group then they need to get in touch with DCC
with a Phone/ Skype Call to inform about the same.
5) Every reassignment to other assignment group needs to be done through DCC with a warm handover call
6) After receiving the information by DCC, they will ensure a Warm Handover/ Call Out to respective team/assignment group

Page 88 of 97
DXC Technology Confidential
7) Resolver Team/ Assignment group needs to ensure the P1 ticket will be updated in every 30 Mins with the status which is
understandable by any non-technical resource also until the issue gets resolved.
8) Once the issue is fixed resolver team need to fill up Preliminary RCA ASAP in SNOW.

In case of Major Incidents


1) P1 will be assigned to assignment group by DCC
2) DCC will inform MIM over a Phone/ Skype call if it is a Major Incident
3) MIM will Open a Bridge & involve required stakeholders to troubleshoot the issue till the issue is fixed.
4) MIM will send out communication as it progresses & also reach out to escalation points as & when required.
5) Team need to assign the Ticket to another assignment group Need to Call DCC for Warm Handover.
6) If the resolver team needs to reassign the tickets to some other assignment group then they need to get in touch with DCC
with a Phone/ Skype Call to inform about the same.
7) Every reassignment to other assignment group needs to be done through DCC with a warm handover call
8) After receiving the information by DCC, they will ensure a Warm Handover/ Call Out to respective team/assignment group
9) Resolver Team/ Assignment group needs to ensure the P1 ticket will be updated in every 30 Mins with the status which is
understandable by any non-technical resource also until the issue gets resolved.
10) Once the issue is fixed resolver team need to provide Retrospective/ Preliminary RCA ASAP.

Page 89 of 97
DXC Technology Confidential
Page 90 of 97
DXC Technology Confidential
8 NETBACKUP MASTER HIGH AVAILABILITY AND FAILOVER PROCEDURE

INTRODUCTION

With the move from solaris to Linux; we need to have a new high available platform for the netbackup master.
To do this we opted for a boot from SAN; so we can move the Netbackup master seemless between datacenters. This also avoids other
dependencies; which increases resiliency during crisis situations, and in those you will need to be able to restore servers; databases;
applications …

SETUP

We have 2 identical servers; slpr086 and slpr087. Slpr086 is located in Muizen DC; slpr087 is located in Roosendaal DC.
Both servers are identical; hardware and connectivity.
The UEFI and boot-from-SAN setup are somewhat special; but are identical on both servers; and both servers actually have a list of
LUN’s that it can boot from. Due to the masking; it can only “see” one set of bootable disks; while the other server can only “see” the
other boot disks.
To access the UEFI; you will need some form of KVM functionality; the simpleste is to use the servers ILO. This will be needed
anyway because you need to be able to reboot, power on and off the 2 servers.

NEVER make a disk visible on both sites at the same time; as this will cause disk corruption.

Below is a design showing the details of the storage setup on the Netapp. The disks below in orange are the boot disks (OS) of the
servers; the NBU disk is the catalog and the binairies of the Netbackup mas

Page 91 of 97
DXC Technology Confidential
FAILOVER PROCEDURE

FROM NORMAL OPERATIONS TO DR MODE:


In normal circumstance, the master is running in Muizen; on slpr086 (now called SSPR007); on BMB-1510001 with serial
CZJ5400HR2.
The standby is running the standby-OS (without the netbackup disk); on slpr087; on BMB-1510002 with serial CZJ54006ZP. This is
need to verify the status of the hardware. As long as the standby OS is running, the hardware can be used also by the master.

The following procedure swaps their places; so SSPR007 – the actual master – is moved to Roosendaal; and the standby is moved to
Muizen.

Page 92 of 97
DXC Technology Confidential
A very important requirement is that the operating systems on
BOTH servers have to be DOWN!
Turning the servers off is best done via the ILO. You can do this via the SSH shell or via the webinterface. The password can be found
in the password vault; the procedures how to use these can be found in the wiki.

You also need to log into CNAS1100 and CNAS1200 (the heads from the metrocluster).

cnas1100::> lun unmap -vserver v08_vsnaspr74 -path /vol/v08_nbu_master/NBU_disk_OS -igroup SSPR007_NOS


cnas1100::> lun unmap -vserver v08_vsnaspr74 -path /vol/v08_nbu_master/NBU_disk_CAT -igroup SSPR007_NOS
cnas1100::> lun mapping show -vserver v08_vsnaspr74 -igroup SSPR007_*
These commands unmap the OS disk and the netbackup disk from the server in Muizen; and show you
that.

cnas1200::> lun unmap -vserver v08_vsnaspr75 -path /vol/v08_nbu_stby/NBU_disk_OS -igroup SSPR007_WOL


cnas1200::> lun mapping show -vserver v08_vsnaspr75 -igroup SSPR007_*

These commands unmap the OS disk and the netbackup disk from the server in Roosendaal; and show
you that.

cnas1200::> lun map -vserver v08_vsnaspr74 -volume v08_nbu_master -lun NBU_disk_OS -igroup SSPR007_WOL -lun-id 0
cnas1200::> lun map -vserver v08_vsnaspr74 -volume v08_nbu_master -lun NBU_disk_CAT -igroup SSPR007_WOL -lun-id 1
cnas1200::> lun mapping show -vserver v08_vsnaspr74 -igroup SSPR007_*

These commands map the OS and netbackup disk to the other servers initiator group; so the server is
Roosendaal can boot from this.

cnas1200::> lun map -vserver v08_vsnaspr75 -volume v08_nbu_stby -lun NBU_disk_OS -igroup SSPR007_NOS -lun-id 0
cnas1200::> lun mapping show -vserver v08_vsnaspr75 -igroup SSPR007_*

These commands map the OS of the standby to the server in Muizen.

After you have verified the correct disks are on the correct ig’s; both servers can be started. Slpr086 will boot in Roosendaal, and
SSPR007 will boot in Muizen.

FROM DR-MODE TO NORMAL OPERATIONS


Same comments as under point 5.1.

cnas1100::> lun unmap -vserver v08_vsnaspr74 -path /vol/v08_nbu_master/NBU_disk_OS -igroup SSPR007_WOL


cnas1100::> lun unmap -vserver v08_vsnaspr74 -path /vol/v08_nbu_master/NBU_disk_CAT -igroup SSPR007_WOL
cnas1100::> lun mapping show -vserver v08_vsnaspr74 -igroup SSPR007_*

cnas1200::> lun unmap -vserver v08_vsnaspr75 -path /vol/v08_nbu_stby/NBU_disk_OS -igroup SSPR007_NOS


Page 93 of 97
DXC Technology Confidential
cnas1200::> lun mapping show -vserver v08_vsnaspr75 -igroup SSPR007_*

cnas1100::> lun map -vserver v08_vsnaspr74 -volume v08_nbu_master -lun NBU_disk_OS -igroup SSPR007_NOS -lun-id 0
cnas1100::> lun map -vserver v08_vsnaspr74 -volume v08_nbu_master -lun NBU_disk_CAT -igroup SSPR007_NOS -lun-id 1
cnas1100::> lun mapping show -vserver v08_vsnaspr74 -igroup SSPR007_*

cnas1200::> lun map -vserver v08_vsnaspr75 -volume v08_nbu_stby -lun NBU_disk_OS -igroup SSPR007_WOL -lun-id 0
cnas1200::> lun mapping show -vserver v08_vsnaspr75 -igroup SSPR007_*

IN AN ACTUAL DISASTER
Make sure one server stays down or is destroyed or powered down hard.
Shutdown the server left that you want to become the master.
Unmap these disks if they are in the wrong datacenter : /vol/v08_nbu_master/NBU_disk_OS and
/vol/v08_nbu_master/NBU_disk_CAT
Unmap the disk of the standby (and keep in unmapped)
Map these disks to the server that’s left : /vol/v08_nbu_master/NBU_disk_OS and
/vol/v08_nbu_master/NBU_disk_CAT
Boot the server and the master will be back.

This is a one-side situation; make sure the other side doesn’t suddenly boot, as if the disks are still
mapped there; the master will go corrupt. The disks can be mapped to several ig’s; but they should
be mapped to only ONE

8.1 NET BACKUP MASTERS ON NET APP DISKS

Introduction
We are eliminating our SAN infrastructure, the boot-from-san high availability framework will cease to work.
Hence, decided to move the netbackup master to Netapp disk backend.
This changes a few things however.

Page 94 of 97
DXC Technology Confidential
Current situation:
The details for our current setup:
Server BMB-1510001 (CZJ5400HR2) BMB-1510002 (CZJ54006ZP)
Location Muizen Roosendaal
Hostname (normal situation) SLPR086 SSPR007 (SLPR087)
Local disks SSPR007 (SLPR087) 1A06 (R1) 1C81 (R2)
Catalog Disk to be added SRDF-catalog disk to be added
Local disk SLPR086 1EFE (R2) 1C82 (R1)
Connected mac-adresses 3c:a8:2a:15:65:68 3c:a8:2a:0d:a6:c4
3c:a8:2a:15:65:69 3c:a8:2a:0d:a6:c5
ILO 10.199.199.23 10.199.199.24
bmb-1510001.hardware.netpost bmb-1510002.hardware.netpost
The goal is to be able to move SSPR007 between datacenters; as simple as possible.
SLPR086 is only "alive" to test the hardware of the box not running SSPR007..
3. Future situation:

Server BMB-1510001 (CZJ5400HR2) BMB-1510002 (CZJ54006ZP) remarks


Muizen Roosendaal
Hostname (normal SLPR086 SSPR007 (SLPR087) Must be kept
situation)
Local disks SSPR007 wwid XYZ metrocluster HA for both disks With netapp, we use the
(slpr087) Catalog Disk to be added metrocluster facilities, which
means one disk is visible in both
DC's
Local disk SLPR086 metrocluster wwid ABC Same as above
Connected mac- 3c:a8:2a:15:65:68 3c:a8:2a:0d:a6:c4 Fibers are kept
adresses 3c:a8:2a:15:65:69 3c:a8:2a:0d:a6:c5
ILO 10.199.199.23 10.199.199.24
bmb-1510001.hardware.netpost bmb-1510002.hardware.netpost

The issues we need to test are the following:


Can the same wwid (or LUN) be visible via the brocades coming from 2 different sources?
How do we do IO-fencing; CQ prevent a disk to boot if it is already booted?

Critical Contacts with communication plan for outage handling


First contact would be Kalyanasundaram, Selvakumar (ARL) & Karunakaran, Shalini (Backup Team)

Page 95 of 97
DXC Technology Confidential
9 MISSLANEOUS

9.1 GRAFANA

Please refer the attached PDF.

Grafana_.pdf

Diamond collector
installation v1.0.docx

9.2 FLEXERA

Please refer the attached document for Flexera installation.

Flexera
installation.docx

9.3 NIMSOFT
Please refer the attached document for Nimsoft installation.

NIMSOFT
Installation.v.1.docx

Page 96 of 97
DXC Technology Confidential
Page 97 of 97
DXC Technology Confidential

You might also like