You are on page 1of 3

AIX AU QUOTIDIEN

Day to day operations


Backups:
 Take system backup mksysb on regularly atleast for one week and keep it in other server
preferably in NIM server
 Verify mksysb with “lsmksysb -l -f /mksysbimg”(check size).
 Check the /etc/exclude.rootvg to see if any important filesystem/dir is excluded from
the mksysb backup.
 Ensure file systems (non-rootvg )backups as per backup software of your company.
( Eg: TSM or Net backup)
 Take system snap for every week ( make cron entry) and keep log file in different server or
make a copy in your desktop

System Consistency Checks:


 Ensure the current OS level is consistent: “oslevel -s;oslevel -r;instfix -i|grep
ML;instfix -i|grep SP;lppchk -v” [If the os is inconsistent, then first bring the os level to
consistent state and then proceed with the change].
 Proactively remediate compliance issues.
 Check your firm policies on server uptime and arrange for reboot , generally some
organizations fix it as < 90 days / < 180 days period .

Troubleshooting issues:
1. Don't do issue fixing without a proper incident record.
2. Engage relevant parties while working on the issue
3. Always try get the information about the issue from the user ( requestor) with questions
line "what, when, where"
4. Look at errpt first
5. Check ‘alog -t console -o’ to see if its boot issue
6. Also looking log files mentioned in "/etc/syslog.conf" , may give some more information
for investigation.
7. Check backups if your looking for configuration change issues
8. if your running out of time,involve your next level team and managers
9. Take help from vendors like IBM,EMC,Symantec if necessary

P1 issues:
if its a priority 1 (P1) issue you may need to consider few more additional points apart from above.

1. On sev1 issues, update the SDM (Service Delivery Manager) in the


ST/Communicator multi chat at regular intervals.
2. Over the conference voice call(bridge call ), if they verbally request you to perform any
change, get the confirmation in writing in the multi ST chat.
3. Update the incident record ( IR) in regular intervals.
4. Update your team with the issue status(via mail).
5. Document any new leanings(from issues/changes) and share it with team.
Working on a Change:
Thumb Rule: Change should go in sequence manner DEV ==> UAT/QA==>
PROD environment servers.

1. Make sure change record is in fully approved otherwise don't start any of your task
2. Ensure proper validated CR procedure is in place; Precheck -> Installation -> Backout ->
Post-Verification
3. Supress alerts if needed
4. Remember Application/Database teams are responsible for their Application/Database
backup/restore and stop/start. Therefore alert the application teams .
5. Check the history of the servers(CRs or IRs )…to see if there were any issues or change
failures for these servers.
6. EXPECT THE UNEXPECTED : Ensure you have the proper back out plan in place.
7. Ensure you are on right server('uname -n'/'hostname') before you perform change.
8. Make sure your id as well as root id is not expired and working.
9. Ensure no other from your team are working on the same task to avoid one change being
performed by multiple SAs. Its better to verify with the ‘who -u’ command, to see if there
are any SAs already working on the server.
10. Remember one change at onetime; multiple changes could cause problems & can
complicate troubleshooting.
11. Ensure there are no other conflicting changes from other departments such as SAN,
network, firewall, application.. which could dampen your change.
12. Maintain/record the commands run/console output in the notepad(named after the
change).

if its configuration change:

 Take backup of pre and post values and document them


 Take screenshots if you comfortable in taking
 If your are updating configuration of a file
take #cp -p
<filename> filename_`date +"%m_%d_%Y_%H:%M`"

if its a change to reboot or update s/w :

1. Check if the server is running any cluster (HACMP/PowerHA), if so then you have to follow
different procedure.
2. Always remember three essential things are in place before you perform any change
“backup(mksysb); system information; console”
3. Take system configuration information (sysinfo script).
4. Check the lv/filesystems consistency “df -k”(df should not hang); all lvs should be in sync
state “lsvg -o|lsvg -il”.
5. Check errpt & ‘alog -t console -o’ to see if there are any errors.
6. Ensure latest mksysb(OS image backup) kept in relevant NIM server
7. Ensure non-rootvg file systems backup taken
8. Verify boot list & boot device: “bootlist -m normal -o” “ipl_varyon -i”
9. Login to HMC console

Additional points for reboot:

1. Put the servers in maintenance mode (stop alerts) to avoid unnecessary incident alerts.
2. Check filesystems count “df -g|wc -l” ; verify the count after migration or reboot.
3. Ensure there are no schedule reboots in crontab. If there is any then comment it before
you proceed with the change.
4. If the system has not rebooted from long-time(> 100 days); then perform ‘bosboot’ & then
reboot the machine(verify the fs/appfs after reboot), & then commence with the
migration/upgrade. [Don't reboot the machine if the bosboot fails!]
5. Look for the log messages carefully; don't ignore warnings.

Additional points for OS & S/W upgrades:

1. Ensure hd5(bootlv) is 32MB (contiguous PPs) [very important for migration]


2. For OS updates Initiate the change on console. If there is any network disconnection
during the change, you can reconnect to the console and get the menus back.
3. If situation demands ,ensure there is enough free filesystem space(/usr, /var, / ), required
for the change.
4. Have the patches/filesets pre-downloaded and verified.
5. Check/verify the repositories on NIM/central server; check if these repositories were
tested/used earlier.
6. If there are two disks in rootvg, then perform alt disk clone for one disk. This is fastest &
safest back-out method in case of any failure. Though you perform alt disk clone, ensure
you as well take mksysb.
7. For migration change, check if there is SAN(IBM/EMC..) used, if so, then you have to
follow the procedure of exporting vgs, uninstall sdd* fileset;and after migration reinstall
sdd* fileset, reimport vgs etc.
8. Perform preview(TL/SP upgrade) before you perform actual change; see if there are any
errors reported in preview(look for keyword ‘error’ / ‘fail’); look for the tail/summary of
messages;
9. Though the preview may report as ‘ok’ at the header, still you have to look in the messages
and read the tail/summary of preview.
10. If preview reports any dependency/requisite filesets missing then have those downloaded
as well.
11. Ensure you have enough free space in rootvg. Min of 1-2 GB to be free in rootvg(TL
upgrade/OS migration).
12. Ensure application team have tested their application on the new TL/SP/OS to which you
are upgrading your system.
13. If you have multiple putty sessions opened; then name the sessions accordingly [Putty ->
under behaviour -> window title]; this will help you in quickly getting to the right session
or else use PuttyCM ( Putty Connection Manager)
14. Ensure for TL upgrades, you go by TL by TL, shortcut to direct TL could sometimes cause
problem.

What if you are crossing change widow ?

 inform the relevant application teams and SDMs and take extended with proper approvals
 Raise a incident record in supporting the issue.

What if change fails ?

 Inform the relevant application teams and SDMs


 Close the record with the facts
 Attend the change review calls for the failed changes

Successful Change:

 if possible send the success status to relevant parties with artifacts


 Update the change request with relevant artifacts and close it

You might also like