You are on page 1of 4

Reason and Prerequisites

All too often over the last years, storage subsystems have been sized based on storage volume and not on I/O operations/sec (IOPS). However as rapidly as the data volumes which can be stored on a disk grew over the last decade, as slow were the improvements in accelerating a single block read from a disk. Even in the year 2011 we need to calculate that a rotating media with 15K RPM can only deliver around 150 180 random reads of 8K. Rotating spindles with 10K RPM proportionally less. That leads to the fact that storage deployments often need to be sized according to IOPS and to a lesser degree in regards to storage volume. Due to the workload characteristics of SAP applications, the RDBMS underneath a SAP application can produce significant high volumes of IOPS. A lack of throughput on the storage side always will express itself as increased I/O latency. High I/O latencies again are causing severe performance impacts on your SAP application. Hence it is necessary to:    Dimension the storage backend also for certain # of IOPS Keep within certain ranges of I/O latencies Check and monitor I/O latency frequently, especially if performance issues frequently occur with the SAP application

Solution
1. What are our I/O latency targets? Analyzing the I/O pattern of SQL Server under SAP workload and the way SQL Server works, then we need to look at two critical I/O paths:  Reading from the SQL Server Data files: Having high I/O latency in this path will slow down query execution. Especially in situations where queries need to read hundreds or thousands of pages and the data can't be found in the SQL Server Buffer Pool, I/O latency will have a massive impact on the performance of such a query. Writing into the SQL Server Transaction Log: High I/O latency in this path will delay commits, extend duration of locks held and with that will dramatically impact concurrency at the end. As a side effect, workprocesses on the SAP application will remain occupied for a longer time and response times towards the users will increase Looking at the read and write sizes in these two I/O paths, we usually look at:  Dominantly 8K Random I/O patterns reading from data files. Exceptions usually are SQL Server systems under SAP BW where also a lot of 64K random read I/O patterns might show up and rarely 256K reads will emerge. Only during backups against disk backup devices 1MB sized reads will be observed. Writes into the Log files which usually are somewhere between 10K-60K. Dependent on the current SAP functionality or number of bytes read and written can be found also. All values are per workload. Only in exceptional cases we see writes smaller than 10K.

In order to be able to perform great, SQL Server ideally experiences the following I/O latencies with this type of read and write workload:  Reading from the data files: <=10ms would be in the most optimal zone. The lower the better. Up to 15ms latency is hardly particular noticed in performance when a page here and there need to be read. However scanning massively data from disk latencies in the range from 10-20ms could result already in noticeable performance degradation. >20ms certainly will result in noticeable performance degradation under all types of reads from disks and severely will impact scans from disk as those could happen with SAP BW massively.

If this is not the case. And as a result slows down the time critical processing. doesn't stay within the optimal zones. that these latencies can be kept in the optimal zone as defined above. The interesting columns representing latencies are 'ms/Op' (IO Stall ms/Operation) which gives the average latency for read and write operations combined. Latencies <=10ms are not yet critical under typical SAP workload and usually don't show impact. in the left hand go to 'Performance' --> IO Performance. SAP in their DBACockpit implementation uses a SQL Server DMV which we will name in more detail.sap. We at the end will describe both methods here. Latencies >20ms for writing into the Transaction Log will impact work of the SAP application considerably. After 'Reset' is pressed a new button 'Since Reset' shows up and accordingly allows looking at the I/O performance values since the reset. those numbers give a great idea on how the general average was over the last weeks or months since SQL Server was up. the data file with the highest latency usually . The one method is using what DBACockpit has to offer and the other method is to use Windows performance Monitor directly on the hosting operating. The view also allows looking at performance data of the tempdb database (tempdb performance is largely irrelevant in SAP systems for the average except those based on SAP BW). However one should make sure that during the times of highest workload or with the most time critical processing these most optimal latencies are met. Means. one should try to configure storage replication in a way and distance. On top of the view. The next two columns 'ms/Read' (IO Stall ms/Read Request) and 'ms/Write' (IO Stall ms/Write Request) The 'IO Performance' view allows immediate measurements. The values calculated based on SQL Server's DMV sys.sdn. there is a button 'Reset' which resets the values at display (the values in the SQL Server DMV). Expectation also is that the files were grown over time in equal amounts (see this paper in regards to the optimal deployment of SAP Netweaver applications on SQL Server: http://www.com/irj/sdn/mss?rid=/library/uuid/4ab89e84-0d01-0010-cda282ddc3548c65 ). The button 'Current' will display values since the last snapshot executed by the SAP Database Collector. The best daily average of e. Other conclusions one can draw out of the data displayed in this area are:  Expectation is that the SAP application was deployed with a certain number of data files of the same size. But they don't give a good overview on a specific time period out of the past. Writing into the transaction log in times of large SAN caches should show latencies of <=5ms and hence represent a most optimal zone. Remember we are also talking about average values. A list will appear which shows the key performance indicators against each data file and logfile of the SAP application database. Hence as a result one should see that the I/O load expressed in the columns 'KB Rd/sec' and 'KB Wrt/sec' are roughly the same for each of the data files. Hence it is normal that an individual I/O operation here and there will take longer.g 6ms in reading from SQL Server's data files is worth nothing if the average in the 30min per day.dm_io_virtual_file_stats. In opposite. There are no special considerations or latency ranges which are acceptable due to storage replication. a) SAP DBA Cockpit At the startup page of DBACockpit. using storage replication. then we look at an unbalance of load throughout the data files which to a degree might explain differences in latencies between the files. How do we measure or monitor I/O latencies SQL server is experiencing? There are two methods of measuring I/O performance. Due to SQL Server striping data in 64K chunks over the different data files. 2. Or even some smaller periods of time will cross the averages. It very important to note that the numbers initially displayed are calculated on SQL Server's accumulation since SQL Server got started. Latencies >10ms definitely should trigger investigations since impact might become noticeable. where the most time critical business is running.

impacts the general throughput dramatically. One area it is measuring the I/O performance data around disk volumes. use the category 'Virtual Filestats (File)' to view the history of the IO counters in the columns beginning 'IOStall. Disk Bytes/Write Avg. It also allow to look at the individual performance within these 20minutes covered by a snapshot. Under the item 'SAP SQL Server Monitor' and 'SQL Server'. With the columns 'Reads/sec' and 'Wrt/sec'.dm_io_virtual_file_stats. Disk Bytes/Read Avg. Unlike Windows Performance Monitor.. the IOPS values against each data file should roughly be the same. However keep in mind that some of those values might be a bit on the higher side since the reads of SQL Server backups are counted as well. Disk sec /Write Disk Read Bytes/sec  . 3. These snapshots are getting saved and can be used to look at I/O performance of past days and to peak certain time periods out of the past few days. the base data used by DBACockpit logic is out SQL Server's DMV sys. Go to 'Performance'-->'History'-->'Database Collector Data'. With the storages today.. Windows Performance Monitor Windows Performance Monitor is a more generic tool to measure and monitor a lot of different angles of hardware and software. To monitor the disk performance counters it usually is best to look at the 'LogicalDisk' Performance Counter Group. To get to those snapshots in DBACockpit go to the left hand pane. the values are presented in the millisecond unit and hence are easily comparable to our latency target recommendations above. Unbalanced data files also will make correction of high latencies more difficult and complex  Sizes of I/Os. SQL Server also for internal reasons is measuring the latency of every single I/O operation. This means physical disks and their performance data might or might not be accurately displayed to Windows. Therefore it doesn't make sense to check out the performance counters under 'PhysicalDisk' these days. it might look very bad in some critical time periods. Disk sec /Read Avg. the SAP monitoring frame work will collect a snapshot of the values in sys.'. These values also build up on the SQL Server DMVs. the number of IOPS against each data and log file is shown. The key counters to observe under the 'LogicalDisk'      Avg. Looking at a range of those time periods and have a graph drawn. As mentioned. You now will get a new snapshot of the last few minutes of I/O performance. SQL Server performs these measurements based on a per file basis. Again. it might become obvious that despite the latency looking good in average. As we expect the same workload against each data file. the value for 'KB Rd/Req' might appear high. The number of I/Os against each database file.dm_io_virtual_file_stats every 20min. In this functionality. Another possibility where within SAP latency of the I/O operations are displayed is the CCMS monitor (RZ20). physical disks usually are part of a logical volume administrated by a SAN or NAS device. a) Performance History In order to look at the I/O performance of certain time periods in the past. Since those reads can be as large as 1MB/request when backing up against disk devices. The columns 'KB Rd/Re' and 'KB Wrt/sec' will show the average Read and Write size by SQL Server. You can individually select columns from this view and graph them in order to have a visual representation of the IOs over time.

Are the components in the storage path balanced? Are there components which might act as bottleneck? One would look at whether enough throughputs can be achieved through the iSCSI or Fiber Channel links. SAP and Microsoft can only provide you these counters to help identify when IO is becoming a bottleneck but we cannot provide the detailed analysis and recommendations on how you improve your performance via storage tuning. E. As a general rule Fiber Channel disks within SAN/NAS devices are still superior in performance over SATA/FATA equipped SAN/NAS devices as of beginning 2011. The other counters. Disk sec/Write. Similar to a one 1-GBit Ethernet card. Also Early Watch reports so far were based on different values which indirectly indicates I/O latency. Those at best show cache hit rates of 30-40% and are not comparable to the DBMS cache in their efficiencies. One FC card of 1GHz 1 Mbps will only deliver around 100MB/sec throughput. Therefore values shown in EW reports might differ a bit and might also indicate other categories. 5. If those two values are in the regions defined as good earlier in this note. 4. Disk Write Bytes/sec One might be surprised not to find the Disk Queue Length counters listed above. Are there enough rotating spindles supporting the number of IOPS which are asked of the system? Keep the rough limits of those devices named earlier in mind. Reason is that RZ20 is taking the real-time actual data and hence can easily show short-term way higher values than the one we discussed here as a longer term average value for time critical time periods.g. Don't be fooled by large SAN/NAS caches. RZ20 and EarlyWatch Report Rating Criteria on I/O latency Please be aware that the rating values which are categorized in RZ20 andcan trigger an alert are different to the values we gave as guidance in this OSS note. the areas to look into are:   Is there an increase or accumulation of workload which simply has the potential to drive the I/O subsystem to its limits? Is such an increase or accumulation of workload incidental or is it systematic triggered by special periodical processing or is it simply the continuing tendency triggered by onboarding of more users and/or functionality? Is the workload running optimally supported by indexes on the DBMS so that unnecessary I/O activity is avoided? Is the cache hit ratio of SQL Server's buffer pool sufficiently high (ideally>98%) and with it the memory allocated to SQL Server so that the storage is not overburdened? Keep in mind that a reduction of the cache hit ratio from 99% (1 out of 100 pages looked up needs to be read from disk) to 98% (2 out of 100 pages looked up needs to be read from disk) already doubles the IOPS rate against the storage. We will over time adapt the EW reports. What can I do if I/O latency is high and impacts my performance? Independent whether such a situation is developing over time or whether it does happen here and there. the best path to improving your IO performance is to involveyour storage hardware partner. Is synchronous SAN/NAS replication adding additional latency in the I/O path based on the distance which needs to be covered and the algorithms of replication?       In general. Disk sec/Read and Avg. But the goal is to have good latency numbers which are expressed by Avg. the queues on the disks can be sky high and with that meaningless. . besides the two counters giving disk latency numbers are listed to achieve parity with the data you can get out of DBA Cockpit.