This action might not be possible to undo. Are you sure you want to continue?
where are the servers and what was the gateways used and the databases naming conventions used and where to made concentration and where to look up for the particular error and how to resolve that or what are the steps to escalate and whom to mail about the issue. And he let me to know where does the password file exists and the password for that file database.kdb. And the purpose of each databases and he made me to identify which are the production databases and which are the development databases. Duane updates me what are the things done on the database, for ex, Updating patches for the DST issues and about the weekly maintenance done on the databases and about recycling the database. Duane had sent me many documents regarding the instructions of how to get into the tool and how to monitor etc. He let me know what the fog light messages are and which one to keep and which one to delete. Duane drives always and some time I used to drive to dig around the issues by his guidance. The following introduction was just the juice of our discussion made till now. First of all, we need to make a VPN connection with DecisionOne VPN Server to access the servers in Decisionone. Following were the tasks needed to be performed in the daily basis 1. FogLight Monitoring: Need to Monitor the health of the databases [Production] using the Foglight Operations Console [FOC]. 2. Control-M Monitoring: Need to monitor the daily backup scripts scheduled with in the Control-M. 3. BCTOP Database Status: Need to monitor the BCTOP database data load has been finished successfully. 4. Crystal Reports: Need to send the crystal reports scheduled in the reports server. Let us see the above tasks in detail: 1. Fog Light Monitoring Monitoring is mainly need for Production Servers which runs Oracle database in it. Duane had sent me a document which talks detail about Messages for Monitoring from Fog light tool. For example, which messages need to attend and which message need to clear from alert area. We noticed one alert message (Checkpoint Not Complete) comes everyday from development database d1mid01 from server dsdbfra002. We need to resolve this error so first we plan to change the parameter. On 14th Feb we change the init parameter LOG_CHECKPOINT_INTERVAL to high value and the database has been restarted with the new parameter file. Alert message continues to come so we plan to add
log groups and increase the size of existing redo log files. On 21st Feb we increase redo log files size from 1M to 20M and we add 3 redo log Groups each having 2 files with 20M to resolve the issue. Another warning message also comes frequently which is “Log files are not copied more than one location”, Duane said in Decision One we don’t have such a policy to copy the log files more than one location so on 21st Feb we remove the mail sending option from the agent Ora_Archive_Multiplex rule. We update the file “Fog light Product Change Log” from the shared directory about the changes made to the tool. We need to check the health of the databases on a daily basis. For this we use FOC.
1. Foglight Monitoring : Foglight Install. DBA Script - Foglight monitoring.doc 1Review critical errors (a published list has not yet been compiled, this may be developed as we go through the review of this document).
2- Review functionality. 3- Monitoring needed only for production databases.
4- When an issue is worked, document the resolution, send an email to firstname.lastname@example.org and david.baker@DecisionOne.com. 5- To get the description of an error, right-click (rc) on the message, click ‘show details’.
6- To clear the error (like the one above), rc the error and select ‘clear event’. 7- To look at an agent’s rules (like oracle_FS80PR), highlight the agent, rc, select ‘edit’, click on ‘rules’.
8- This screen shows all of the rules defined.
9- Double-click on the Ora_Can_Connect rule to see how the rule is defined. You notice the different levels of warning – Normal / Warning / Critical / Fatal. Many of these messages have been modified for D1 specifically.
10- Go back to the oracle_FS80PR agent, double-click / actions / ASPs.
11- These are the Automated Startup Parameters for the agent, and are specific to the agent.
12- Close that out. On main Foglight Operations Console, click ‘Tools’, then ‘Foglight Registry’.
13- Each of the areas have info to maintain, but we normally work in the ‘World’ area.
14- I showed you how to change the DBA_PAGER variable.
15- Here you can see that the ‘Production’ definition, which is under the ‘WORLD’ directory, defines each of our production unix servers (located in Auburn Hills). What this does is allow us to override the CPUWarning, CPUFatal and CPUCritical variables for just the production servers.
16- And, on the Development side, you see we override the values for DBA_PAGER to an email id (an invalid email id). What this does is not put pages out to the pager for the development servers.
2. Control-M Monitoring:
Control-M - Xsession Procedures Used to Access the Software Hummingbird Exceed Session 1. Configure Xterm software. I use Hummingbird Exceed, that runs on my laptop. 2. Double-click the software to startup a Session 3. DBA Script - Control-m monitor.doc Hummingbird Xsession Control-M Logon Commands 1. on SMF001, start a unix session. 2. Login with: ecs 3. Password: xxxx
4. The DISPLAY variable must be set. If this system parm is not set, you will get a "Cannot open display" error. NOTE: The user profile has been changed for most Unix accounts, to automatically set the DISPLAY variable. So for the most part, step 4 is done for you. If the error message displays that the DISPLAY has not been set, do the following steps: Find the IP address of the laptop / pc you are on. Can be obtained by doing ipconfig at a command prompt. In the smf001 session, at the prompt, enter "export DISPLAY=xxx.xxx.xxx.xxx:0.0", where the xxx's are equal to your ISP assigned TCP/IP Address
5. 6. 7. 8.
At the prompt enter: root_menu Enter the ECS Username : ecs Enter the Password: password Select Option 1 from the ENTERPRISE/CS ROOT MENU
9. The Enterprise Constrolstation menu appears. (no action on this menu).
10. when the ‘Load Net’ menu appears, click on the middle ‘LOAD’ icon to open.
11. when the ‘SHOW’ menu appears, click on the upper left hand icon to close, this is not needed.
12. The ECS gui (Enterprise Controlstation Network View) display window can be resized by dragging the sides of the window.
13. Here’s the size of window that focuses on the active job list (jobs loaded for today’s schedule).
Control-M – Displaying a Job 1. Get down to the job detail level in control-m
2. Right-click on the job. On the intermediate menu, select ‘Details’. This view shows the detail of the current job from the active job file (a scheduled job).
3. Back on the Enterprise Controlstation Network View screen, right-click and click on ‘View Sysout List’. This screen shows the list of job output from the scheduled job. There are multiple entries if the job has been rerun.
Double-click on this entry to review the output of this backup job.
4. To rerun a job - after review of the sysout, and correcting any errors, go back to the ECS Display. Right-click the job and click RERUN. The job will be re-submitted under the job owner (oracle8i in this example).
Control-M - Forcing a Job to Run
From the first ECS Menu, click Scheduling, then Scheduling Definitions
Double Click the TABLE Definition you want Highlight the Job you want to rerun Select Menu Force
Then Control-m will come back with an intermediate screen… Click “Confirm” to run the job.
Job will have the following summary shown.
Back in Enterprise Controlstation Network View Window: Right Click on Job that was re-submitted. Why – Error Missing Condition message is ok
Highlight this condition and click ‘Add Prerequisite Condition’. Intermediate screen comes up, click ‘Confirm’.
Close Why screen. Job should be running. The Yellow color indicates the job is running.
Enterprise Controlstation Network View Window: Right Click Item – Select Free Selected Job should turn yellow and execute Color Indication: Green – completed Yellow – executing. Red - error, need to be fixed Grey – awaiting resource, do right-click on job, and do ‘WHY’ Blue - awaiting resource, do right-click on job, and do ‘WHY’ Resolve the WHY condition if necessary.