This action might not be possible to undo. Are you sure you want to continue?
This is the final instalment of a four part series covering my “top 10” performance strategies for Oracle databases. In part one, we looked at methodology, database and application design, and indexing . In part two, we covered the essential tuning tools, the SQL optimizer and strategies for tuning SQL and PL/SQL. In the third instalment we looked at contention, memory management and IO optimization. In this final instalment we’ll consider the performance optimization of Oracle Real Application Clusters (RAC). RAC is central to Oracle’s grid architecture, and is a significant and important technological advantage for Oracle. I always advise Oracle DBAs to get familiar with RAC and know how to get the most of it. RAC performance optimization is a big topic and I can only provide an introduction in this article. You can find more information on each in my book Oracle Performance Survival Guide.
RAC is a shared disk clustered databases: every instance in the cluster has equal access to the database’s data on disk. This is in contrast to the shared nothing architecture employed by other RDBMS clusters. In a shared nothing architecture, each instance is responsible for a certain subset of data. Whenever a session needs that data, then the appropriate instance must be involved in serving up the data. The main challenge in the shared disk architecture is to establish a global memory cache across all the instances in the cluster: otherwise the clustered database becomes IO bound. Oracle establishes this shared cache via a high-speed private network referred to as the cluster interconnect. All the instances in a RAC cluster share access to datafiles on shared disk, though each have private redo logs and undo segments. Each instance has its own SGA and background processes and each session that connects to the cluster database connects to a specific instance in the cluster.
© Quest Software, 2010
an overloaded instance might cause problems for its neighbours as well as itself. • • Measuring Cluster overhead We can see the overall contribution of cluster related waits in comparison to other high level time categories in the following query: Page 2 © Quest Software. and scale well. Global Cache requests are intended to avoid the necessity of a disk read. The overhead incurred through cluster activities is a small proportion of the total database time. 2010 .Figure 1 High level RAC architecture RAC will perform well. if the following are true: • The time taken to request a block across the interconnect (Global Cache requests) is much lower – say ten times less – than the time to retrieve a block from the disk. and a cluster second. and sometimes the disk read must occur even after the Global Cache request. We want our RAC database to be a database first. or at least there are no overloaded instances in the cluster. Since so many RAC operations involve two or three instances. The cluster is well balanced.
ROUND ( (time_secs).66 Commit 1557 3. 2) pct 3 FROM (SELECT wait_class wait_class.33 43.SQL> SELECT wait_class time_cat .45 Cluster 7838.06 6. Reducing Global Cache latency The RAC architecture requires and expects that instances will fetch data blocks across the interconnect as an alternative to reading those blocks from disk.10 As a rule of thumb. 'DB CPU')) 14 ORDER BY time_secs DESC. 2 ROUND ( (time_secs) * 100 / SUM (time_secs) OVER (). Oracle processes such as the Global Cache Service (LMS) have to perform a significant amount of CPU intensive processing each time a block is transferred.-----CPU 21554. 11 ROUND ((SUM(VALUE) / 1000000).06 . Interconnect latency is certainly an important part of overall Global Cache latency: but it’s not the only part. we might expect that cluster-related waits comprise less than 10% of total database time. which we will call Global Cache latency.83 User I/O 3302.5 .75 Network 142.23 12.49 6.09 10.59 . 2010 . we use the wait interface as exposed by GV$SYSTEM_EVENT.24 System I/O 3387. The performance of RAC is therefore going to be very sensitive to the time it takes to retrieve a block from the Global Cache.75 Application 5077.82 15.29 Configuration 49.14 Concurrency 371. In certain circumstances nonCPU operations – such flushing redo entries to disk – will also contribute to Global Cache latency. Some documents or presentations suggest that Global Cache latency is primarily or exclusively Interconnect latency: the time it takes to send the block across the interconnect network. 2) time_secs. and this CPU time is usually as least as significant as any other factor in overall Global Cache latency.---------. Time category TIME_SECS PCT -------------------. The following query reports on average times for each of the Global Cache request types as well as single-block read time (for comparison): Page 3 © Quest Software.80 Other 6322. To measure Global Cache latency. Waits above 20% certainly warrant investigation. 4 sum(time_waited_micro) / 1000000 time_secs 5 FROM gv$system_event 6 WHERE wait_class <> 'Idle' 7 AND time_waited > 0 8 GROUP BY wait_class 9 UNION 10 SELECT 'CPU'. 2) time_secs 12 FROM gv$sys_time_model 13 WHERE stat_name IN ('background cpu time'.
ROUND(SUM(time_waited_micro) / 1000000.193 396 1.-----------.913 93 .-----------. Total Time Avg Wait Wait event Waits (secs) (ms) -----------------------------. 2) avg_ms FROM gv$system_event WHERE wait_class <> 'Idle' AND( event LIKE 'gc%block%way' OR event LIKE 'gc%multi%' or event like 'gc%grant%' OR event = 'db file sequential read') GROUP BY event HAVING SUM(total_waits) > 0 ORDER BY event. The best way to determine the interconnect contribution to overall performance is to use the ping utility to measure latency independently of the Oracle stack.158 214 1. In Oracle 11g this information is available in the view GV$CLUSTER_INTERCONNECTS. Ping packet handling is not identical to RAC packet handling.11 gc cr block 3-way 162. it’s taking longer than we’d expect if the interconnect and RAC were fully optimized. Tuning the interconnect When Global Cache waits are high. SUM(total_waits) total_waits.580 20 .690 42 . The average wait for Global Cache consistent read requests (as shown by ‘gc cr block 2-way’ and ‘gc cr block 3-way’) is more than 1 millisecond and more than 1/10th of the time for a db file sequential read. 2010 . 2) time_waited_secs.016 25 .32 gc cr grant 2-way 141.48 gc current block 2-way 325.44 gc current grant busy 168. In Oracle 10g the view X$KSXPIA shows the private and public IP addresses being used by the current instance.99 gc cr block 2-way 356.978 6. we should first determine if the latency is primarily the result of interconnect network waits.--------db file sequential read 283. ROUND(SUM(time_waited_micro)/1000 / SUM(total_waits).79 gc current grant 2-way 45.459 296 1.76 gc current multi block request 91.265 242 .192 1. While the Global Cache is still faster than disk.46 This example output provides reason for concern. but if ping latency is high then you can confidently assume that network responsiveness is an issue.18 gc cr multi block request 503.065 227 . The following query shows us the private interconnect IP address plus other identifying information for the current instance (this query must be run as SYS): Page 4 © Quest Software.70 gc current block 3-way 117.SQL> 2 3 4 5 6 7 8 9 10 11 12 13 14 SELECT event.
interconnect issues can show up as “lost” or congested blocks.168.168.0. pipe 2 The ping output above indicates very low latency – about .12) 8192(8220) bytes of data. 8200 bytes from 192. Page 5 © Quest Software.168. On a Linux system.260 ms ms ms ms ms --.0. Balancing the cluster Achieving balance in a RAC configuration is important for scalability. Use a faster protocol – maybe 10 Gigabit Ethernet or Infiniband Enable Ethernet “Jumbo” frames Increase the UDP packet size High Global Cache latencies can also occur if the remote instance is very busy: often balancing the cluster (as outlined in the next section) is the solution.168. we can use the “–s 8192” flag to set an 8K packet size so as to align with the block size of this Oracle database.263 8200 bytes from 192. host_name.168.251/0. Sessions on idle instances wait for blocks from busy instances.259/0.12 ping statistics --5 packets transmitted. time 3999ms rtt min/avg/max/mdev = 0.dev.SQL> SELECT instance_number.0. Inst # ---3 Host Name INSTANCE_NAME ------------------------.au.168.qsft Net IFace ----eth1 Private IP -----------192.25 ms . instance_name. 2 name_ksxpia network_interface.0. There’s a few things we can do at the network level to optimize the interconnect: • • • • Use NIC bonding to aggregate the bandwidth of multiple network cards.12: icmp_seq=1 ttl=64 time=0.0. On Windows the appropriate flag is “-l”: $ ping -c 5 -s 8192 192.168.12: icmp_seq=3 ttl=64 time=0.265/0.168. 0% packet loss.0.0.melquest.12: icmp_seq=2 ttl=64 time=0. ip_ksxpia private_ip 3 FROM x$ksxpia 4 CROSS JOIN 5 v$instance 6 WHERE pub_ksxpia = 'N'.251 8200 bytes from 192.across the interconnect.12: icmp_seq=0 ttl=64 time=0. Benefits of adding new instances may not be realized Tuning is harder because each instance has different symptoms. In an unbalanced cluster the following undesirable situations can arise: • • • • Sessions on busy instances get poor service time.me MELRAC3 l.12 PING 192.265 8200 bytes from 192.020 ms.168.0.---------------melclul32. Asides from high latencies – as exposed by the ping command .168.192.12 We can then ping the IP address from another node in the cluster to determine average latency. manageability and performance. 2010 . 5 received.12 (192.260 8200 bytes from 192.12: icmp_seq=4 ttl=64 time=0.0.0.
99 17. 10 ROUND(db_time*100/SUM(db_time) over(). SUM(CASE stat_name WHEN 'DB time' 3 THEN VALUE END) db_time.48 1. 2010 .------------.03 1. increasing cluster workload will almost certainly lead to performance degradation as MELRAC2 becomes the bottleneck for the entire cluster.2) db_time_secs. DB time and logical reads on each instance within the cluster since startup: SQL> WITH sys_time AS ( 2 SELECT inst_id.444.------------. Instance DB Time Pct of CPU Time Pct of Name (secs) DB Time (secs) CPU Time -------.00 MELRAC1 5. 'background cpu time') 5 THEN VALUE END) cpu_time 6 FROM gv$sys_time_model 7 GROUP BY inst_id ) 8 SELECT instance_name. 4 SUM(CASE WHEN stat_name IN ('DB CPU'. 12 ROUND(cpu_time*100/SUM(cpu_time) over().705. Quest Software’s Spotlight on RAC – now available in the Toad DBA Suite RAC edition .30 24. Spotlight on RAC displays cluster balance from a number of perspectives and performs a statistical analysis to determine if the imbalance is systematic or due to short term random fluctuations.48 4.-------MELRAC3 3.probably has the most advanced RAC balance monitoring. 9 ROUND(db_time/1000000.------.150.96 In this example it is clear that MELRAC2 is being subjected to a disproportionate level of CPU load: if this is not addressed. Page 6 © Quest Software.2) db_time_pct.23 41.06 21.119.85 61.2) cpu_time_secs.278. 11 ROUND(cpu_time/1000000.96 34.2) cpu_time_pct 13 FROM sys_time 14 JOIN gv$instance USING (inst_id).03 MELRAC2 6.We can assess cluster balance fairly easily: the following query reports on CPU.010.
some of which may have different service level objectives. Services can help you share a RAC cluster across multiple applications. server side and application level load balancing facilities that you can tweak to get better balance of workload across the cluster.Figure 2 Spotlight on Oracle RAC cluster balance There’s a number of techniques that you can try to balance your cluster database. However. Services serve two main purposes in RAC: • • By partitioning certain types of workload to certain instances. each Global Cache request adds overhead: it’s far better to find the data you want in the local buffer cache than to retrieve it from another instance. we effectively allocate the service a bigger share of cluster resources. 2010 . Oracle also provides client side. In particular Services allow you to allocate workloads to specific instances within a cluster. By allocating more instances in the cluster to a specific service. since similar workloads are most likely to utilize similar data blocks. we can reduce the amount of Global Cache traffic. Page 7 © Quest Software. Global Cache requests are integral to RAC and represent both the “cost” of the RAC architecture and the basis of its scalability. and can be either the cause and/or cure of cluster imbalances. Avoiding a disk read by fetching a needed block from another instance prevents RAC databases from becoming IO bound. Minimizing Global Cache traffic As we saw earlier.
Range or list partitioning the segments in conjunction with isolation of user populations can also be considered. 2010 .-----------. Instance Logical GC Blocks Physical Phys/Logical GC/Logical name Reads Received Reads Pct Pct ---------. the more likely that the blocks it needs are in the memory of another. 13 ROUND(gc_blocks_recieved*100/logical_reads. Hash partitioning can split up the hot blocks.18 7. but instead of isolating specific transaction types. the number of logical reads).-----------.27 MELRAC2 148. 3 SUM(CASE WHEN name LIKE 'gc%received' 4 THEN VALUE END) gc_blocks_recieved. Isolating sessions that are likely to work on the same data.792.531 . 7 SUM(CASE WHEN name = 'physical reads' 8 THEN VALUE END) physical_reads 9 FROM gv$sysstat 10 GROUP BY inst_id) 11 SELECT instance_name. logical_reads.756.099 .-----------. instance.-----------.353. providing that: Page 8 © Quest Software. we can compare the number of blocks fetched across the interconnect with the total number of block accessed (e. Partitioning the segments with the highest levels of Global Cache activity. RAC will scale and perform well.614 1. the notorious Buffer Cache Hit Ratio): SQL> WITH sysstats AS ( 2 SELECT inst_id.---------MELRAC3 15.366 39. This is similar to isolating workloads..29 1.818 23.2) gc_to_logical_pct 14 FROM sysstats JOIN gv$instance 15 USING (inst_id).730. Reverse key indexes can help relieve hot Global Cache index leaf and branch blocks • • Summary The most significant difference between a RAC and a single instance database is the use of Global Cache requests to fetch blocks from other instances in the cluster rather than to read them from disk. 12 ROUND(physical_reads*100/logical_reads.94 Note how in the above example it’s the least busy instances (in terms of logical reads) that have the highest Global Cache/Logical request ratio: the less busy an instance is.882 438.903. 5 SUM(CASE WHEN name = 'session logical reads' 6 THEN VALUE END) logical_reads. more busy.To determine how often the database needs to make Global Cache requests.2) phys_to_logical_pct. The following query performs that calculation as well as determining the ratio of physical to logical reads (yes. hopefully reducing Global Cache contention for those blocks. we isolate sessions that are likely to work on the same sets of data.471 .18 MELRAC1 21. gc_blocks_recieved. If every instance in the cluster is fighting over the same set of “hot blocks” then we will see very high Global Cache traffic We can attempt to reduce the amount of inter-instance traffic through one of the following techniques: • • Isolating workloads to a particular instance or groups of instances. physical_reads.331 1.15 11.g.311 1.730. We can do this through services configuration as discussed earlier.
Figure 3 Spotlight on Oracle RAC Page 9 © Quest Software. The cluster is reasonably well balanced. and making sure that no instances get too busy to respond to Global Cache requests in a timely manner. then you’ll probably like what we’ve provided for you in the Toad suites. Quest products – such as those found in the Toad DBA suite – are designed to help you maximize your efficiency and effectiveness when working through Oracle performance issues. 2010 . and other instances in the cluster. If you like the approach we’ve taken in these articles. and implements all of the RAC tuning ideas outlined in this article. The rate of Global Cache requests is reasonable. “hot” blocks that are in constant contention across the cluster should be eliminated. Achieving this involves both optimizing the interconnect network. Spotlight on Oracle RAC is now available within the Toad DBA suite for Oracle RAC edition. • • I hope you’ve found this article – and the top 10 tuning strategies series as a whole – useful. In particular. I’ve generally only been able to scratch the surface of the myriad of tuning issues and opportunities presented by Oracle: more detail can be found in my book Oracle Performance Survival Guide or in the Oracle documentation. no instance should be overloaded: an overloaded instance is likely to cause performance problems both for itself. In particular. In particular.• Global Cache latency is much less than disk read latency.
guyharrison. Page 10 © Quest Software.net. such as Toad®. Guy is the author of Oracle Performance Survival Guide (Prentice Hall.harrison@quest. Guy can be found on the Internet at www. articles and presentations on database technology.com and is @guyharrison on twitter. 2010 . performance tuning and software development. is an Oracle ACE and has over 20 years experience in application and database administration.Guy Harrison is a Director of Research and Development at Quest Software. 2009) and MySQL Stored Procedure Programming (O’Reilly with Steven Feuerstein) as well as other books. Guy is the architect of Quest's Spotlight® family of diagnostic products and has contributed to the development of other Quest products. on email at guy.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.