This action might not be possible to undo. Are you sure you want to continue?
Platform: Sun Solaris 10 Database: 10g R2 (10.2.0.4) Problem Description: A problem was faced at a client site where one of the ARCH processes on primary database would hang intermittently while archiving a redo log sequence to a remote standby. This would stall any further shipping of archive logs to standby database. So managed recovery on standby would stall in absence of new archive logs, thus compromising the availability of an updated standby database in the event of primary site failure. The error was being reported every fortnight, on an average. Alert log shows:
ARC1: Evaluating archive log 1 thread 1 sequence 3087 ARC1: Beginning to archive log 1 thread 1 sequence 3087 Creating archive destination LOG_ARCHIVE_DEST_4: 'ABCD' Fri Feb 19 07:14:32 2010 ARC0: Evaluating archive log 1 thread 1 sequence 3087 Fri Feb 19 07:14:32 2010 Thread 1 advanced to log sequence 3089 Fri Feb 19 07:14:32 2010 ARC0: Unable to archive log 1 thread 1 sequence 3087
Trace File shows:
*** SESSION ID:(770.1) 2010-02-19 02:38:24.668 Maximum redo generation record size = 156160 bytes Maximum redo generation change vector size = 150676 bytes tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x10) tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x10) *** 2010-02-19 07:14:32.108 Warning: log write time 760ms, size 1KB *** 2010-02-19 11:07:22.313 Warning: log write time 1100ms, size 1KB *** 2010-02-19 11:21:02.449 Warning: log write time 1120ms, size 1KB *** 2010-02-19 13:12:55.537 Warning: log write time 1270ms, size 1KB *** 2010-02-19 13:15:50.042 Warning: log write time 1140ms, size 1KB
It just mentioned the above error entries in one of the trace files. Based on the above search. the Oracle note didn’t talk about hanging of the ARCH process for standby database. this solution was advised for 9.018 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-19 23:58:12. it was found that it is a 10. Summary: . It will release the lock held by ARC1 for redo log 3087. it seemed to work and the new ARCH process spawned was able to continue the shipping of archived logs to standby database. However. it’s been more than 3 months.2. we applied the patch on both the primary and standby nodes. it might go unnoticed till next morning. but the problem was never reported again.0. It required manual intervention to detect and resolve the problem.*** 2010-02-19 14:58:26.. this approach had a serious shortcoming.344 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-19 23:43:13. However. thus seriously compromising the utility of standby database.780 ……………………. Unsatisfied with above approach.920 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-20 00:13:13. if available for the platform. thus allow the redo log to be archived by other ARCH process. Since the patch was available for our platform (Solaris). After applying the patch. we further dug into the issue and analyzed other trace files generated during the hung period of ARCH process.854 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-20 00:58:12.179 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-19 23:28:13.2.613 On research. When implemented. but somehow it worked in our case on 10g.4 – version specific issue.6 by Oracle.292 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-20 00:28:12. it was found that Oracle suggests killing the hung ARCH process holding the lock on that archive log file. A new ARCH process will be spawned automatically.0.883 ABC: tkrsf_al_read: No mirror copies to re-read data *** 2010-02-20 00:43:12. So if error occurred during night. The workaround suggested to either replacing ‘ARCH SYNC’ by ‘LGWR ASYNC’ in the related log_archive_dest_n parameter or apply oPatch 7136489. It was then that we observed the given entries in one of the trace files: Second Trace File shows: *** 2010-02-19 23:13:13. Further.
the given solution ensures better availability in case of failure.Our research led to the conclusion that both the trace files were being generated at the same time. but the circumstances in our case encouraged us to try the solution. Even though.1] . Oracle suggested application of patch for a different purpose. Further.1 Metalink Doc ID: 748425. It worked perfectly and the team was spared the effort of manually killing the ARCH process. every time the problem occurred. References: Metalink Doc ID: 364342.