You are on page 1of 35

End to end performance and capacity testing from the PCRF vantage point.

CISCO Policy Manager


Performance and Capacity Testing

CISCO and Openet

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Version History
Date 01/07/2011 1/10/2011 1/22/2011 Version 1.0 1.1 1.2 Author AS/PS AS/PS AS/PS Change Description -OrigAfter review with AT&T Populate test results

Document References
Description Hardware_Estimations_for_ATT_-_LTE_v1_15_PCRF_site_PS_model.doc Cisco HA Architecture for ESC 0.16.doc Cisco Policy Manager (PCRF) Business Integration Analysis (BIA) pLTE SBP Dual APN TOL 010411.xlsx - Policy Performance_EDP4 tab 01-04_01.03.13.PM__3G_Subs_PCRF_500k_Perf3_1800TPS__RID-33.xls Spirent sample report Version 1.15 0.16 0.8 010411

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Contents
INTRODUCTION ............................................................................................................................................. 5 Purpose ..................................................................................................................................................... 5 Goals ......................................................................................................................................................... 5 Stakeholders ............................................................................................................................................. 5 Scope ......................................................................................................................................................... 5 Terminology .................................................................................................................................................. 6 Roles and Responsibilities ............................................................................................................................. 8 PROJECT SUMMARY ...................................................................................................................................... 9 Description ................................................................................................................................................ 9 Test Environment .................................................................................................................................... 10 Hardware and Software .......................................................................................................................... 11 PCRF and DB Hosts .............................................................................................................................. 11 SAN ...................................................................................................................................................... 11 Software .............................................................................................................................................. 12 Testing Approach .................................................................................................................................... 13 Description .......................................................................................................................................... 13 Measurement...................................................................................................................................... 13 Logging ................................................................................................................................................ 13 Statistics .............................................................................................................................................. 13 Test Execution ......................................................................................................................................... 14 Performance Tests .............................................................................................................................. 14 Capacity Tests ..................................................................................................................................... 15 Test Scenarios ............................................................................................................................................. 16 Performance Test 1- Gx TPS on PHONE APN ...................................................................................... 16

CISCO Policy Manager


Performance and Capacity Testing v. 1.2 Results of Performance Test 1- Gx TPS on PHONE APN ..................................................................... 17 Performance Test 2 Gx TPS on BROADBAND APN ........................................................................... 18 Results for Performance Test 2 Gx TPS on BROADBAND APN ......................................................... 19 Performance Test 3 Gx and Sy TPS on BROADBAND APN (3G)........................................................ 20 Results of Performance Test 3 Gx/Sy TPS on BROADBAND APN (3G) ............................................. 21 Performance Test 4 Gx/Sy TPS on BROADBAND APN (4G) .............................................................. 22 Results for Performance Test 4 Gx/Sy TPS on BROADBAND APN (4G) ............................................ 23 Capacity Testing Scenarios ...................................................................................................................... 24 Capacity Test 1 5 Million Concurrent 3G Phone APN users. ............................................................ 24 Results Capacity Test 1 5 Million Concurrent 3G Phone APN users. ............................................... 25 Capacity Test Scenario 1 5 Million Concurrent LTE subscribers. ..................................................... 26 Results of Capacity Test Scenario 1 5 Million Concurrent LTE subscribers. ..................................... 27 Performance Results at a Glance ............................................................................................................ 28 Capacity Results at a Glance ................................................................................................................... 29 Performance Acceptance Criteria ........................................................................................................... 30 Performance Targets........................................................................................................................... 30 Capacity Targets .................................................................................................................................. 30 Deliverables................................................................................................................................................. 31 Reporting ................................................................................................................................................ 31 Testing Tools ............................................................................................................................................... 32 Spirent Call Generator ........................................................................................................................ 32 Performance Statistics ........................................................................................................................ 32 Appendix A Statistics................................................................................................................................... 33 Gx Statistics - PCRF .................................................................................................................................. 33 Sy Statistics - PCRF .................................................................................................................................. 33

CISCO Policy Manager


Performance and Capacity Testing v. 1.2 Sp Statistics - PCRF .................................................................................................................................. 33 Database Statistics - PCRF ....................................................................................................................... 34 Policy Manager API - PCRF ...................................................................................................................... 34 Session Store - PCRF................................................................................................................................ 34 Spirent ..................................................................................................................................................... 34

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

INTRODUCTION
Purpose
Performance and Capacity test of the PCRF solution in a production-like setup where PCRF servers are connected to the real network elements on all external interfaces.

Goals
The goal of this testing exercise is to meet performance targets set out in the contract. Performance targets fall into three categories: Transactions per second Latency Number of concurrent sessions

Performance target details for each testing scenario are provided in Test Scenarios section.

Stakeholders
CISCO AS Openet Services AT&T CTO Group

Scope
Identify whether the performance criteria are met or they are not. Collect all standard performance test metrics. These include the following: Throughput in Transactions Per Second (TPS on Gx, Sy and Sp) Latency (Gx, SPR Lookup and LDAP interface average latencies), Sy latency from the NetScout raw data, if available Machine Statistics (I/O statistics, memory usage, CPU usage etc.) Number of concurrent subscriber sessions

Collection/evaluation/definition of performance related test objectives and data for nodes other than PCRF application servers and PCRF database servers is out of scope.

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Terminology
Symbol
AAA APN ATTM AVP BM CCA

Description

Authentication, Authorization, and Accounting Access Point Name AT&T Mobility Attribute Value Pair Balance Manager network element that stores subscriber volume usage counters Credit Control Answer (PCRF PCEF), with types corresponding to the request type: CCAI, CCA-U, CCA-T CCR Credit Control Request (PCEF PCRF ), of three types: initial (CCR-I), update (CCR-U), and terminate (CCR-T) CLI Command-Line Interface CSG Cisco Content Services Gateway CSG2 Second Generation Cisco Content Services Gateway Diameter A networking protocol for AAA; a successor to RADIUS DPE Dynamic Provisioning Environment FFA First Field Application (formerly FOA First Office Application) IMS IP Multimedia Subsystem (used at ATTM for VSC) JRD Joint Requirements Document LDAP Lightweight Directory Access Protocol MAG Mobile Application Gateway platform by Openwave MIND Master Integrated Network Directory -- An Openwave LDAP server that stores subscriber information at AT&T Mobility MRC Monthly Recurring Charge postpaid subscribers that had been migrated to the Balance Manager for monthly volume cap tracking Netcool SNMP management system used by ATTM for SNMP traps NGG Next Generation Gateway platform by Ericsson OCS Online Charging Service OCG Openet Charging Gateway OFCS Offline Charging Service PCC Policy and Charging Control PCEF Policy and Charging Enforcement Function (used interchangeably with eGGSN for the purposes of this document) PCRF Policy and Charging Rules Function (used interchangeably with Cisco Policy Manager for the purposes of this document) P-CSCF Proxy Call Session Control Function (part of IMS) PDP Packet Data Protocol PO Postpaid a billing plan that uses off-line charging PR Prepaid a billing plan that uses online, real-time charging QoS Quality of Service RAA Re-Authentication Answer (PCEF PCRF as ACK only) RADIUS Remote Authentication Dial-In User Service (an AAA protocol) RAR Re-Authentication Request (PCRF PCEF) 6

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Symbol
SBP SBP pLTE SBP ST SL SMPP SMSC SNMP SOAP SPR XML VCS VSC

Description
Session Based Pricing SBP pre-LTE migration of existing 3G SBP services SBP Speed-Tiers next phase of the SBP/PCRF functionality Smart Limits a billing plan that uses online charging systems for subscriber usage limits enforcement, but is actually charged offline Short Message Peer-to-Peer protocol Short Message Service Center Simple Network Management Protocol Simple Object Access Protocol Subscriber Profile Repository eXtensible Markup Language Veritas Cluster Server by Symantec Video Share Calling

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Roles and Responsibilities


The following table identifies the roles and responsibilities of the people involved in this process. Role Engineer Engineer Engineer Project Manager Lead Engineer Name AT&T Leslie Moore David Henseler John Crocker CISCO Sondra Nevitt Aaron Cunningham MIND MIND MIND Overall Project Management Direction of test design and execution. Test planning, tracking and reporting, Test design and execution. Database Analysis and Testing. (PCRF Oracle 11gR2) CSG support CSG support S/P Gateway S/P Gateway Spirent design, development, implementation and support Openet Project Manager Technical Lead DBA Engineer Product Engineering Engineer Engineer Ron Roades Ilja Maslov Dennis Freeman Oleg Popenko Niall Byrne Jim Daniel Abhijit Biswas CPM Project Manager Test consultation and support (PCRF Cisco Policy Manager 2.0) Database Support and analysis (PCRF Oracle 11gR2) Testing support. Testing support and analysis of test data. Technical Support (BM 2.0h) Technical Support (BM 2.0h) Responsibility

DBA Engineer Engineer Engineer Engineer Engineer Engineer

Gagan Kumar Tom Nguyen Jiming Shen Mo Miri Arghya Mukherjee Landon Holy

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

PROJECT SUMMARY
Description
The Cisco Policy Manager will have interfaces with SPR system over Sp interface to perform subscriber profile retrieval PCEF system over Gx interface to perform policy enforcement Counter store center over Diameter Sy interface

Provisioning

Policies

PAS
Policies

Counter Gn/Gp store


Sy

VPN

SPR (MIND, ED & HSS)

Sub Info

PCRF
Session & Sub Info

Policies

Mobiles Gn/Gp VPN


OFCS (FusionWorks Mediation) CDR

PCEF

Services Gn/Gp VPN


Quota OCS (FusionWorks Charging)

Performance testing of the Policy Manager is focused on the performance of the Diameter protocol messages (TPS and latency) over the Gx Interface.

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Test Environment

Spirent

Gn Interface for 3G tests

The traffic will be driven by 10 Spirent servers towards 3 Calico zones each containing 3 PGW and 3 CSG2 active blades (9 in total). 3G messaging will use Gn interface between Spirent and P-GW. LTE messaging will use S11 interface between Spirent and S-GW. There are 3 Service Zone with a total of 9 CSG2 peers will be evenly distributing requests across 5 PCRF application servers. Each of the 5 PCRF application servers will connect to a load balancer for MIND queries with 2 active MIND servers behind the load balancer. 5 PCRF application servers will connect to 4 Sy Balance Manager peers to send AAR/STR requests and 4 Sy Balance Manager peers to receive Sy RARs from, all Balance Manager peer instances are running on 2 physical servers. Only control plane traffic is generated by the Spirent, CCR-Us may be generated by simulating PLMN or RAT change events. CCR-U TPS is difficult to apply consistently across all tests, so the expected CCR-U TPS will be evenly distributed between CCR-I and CCR-T. For example, 1800, 400, 1800 (CCR-I, CCR-U, CCR-T) traffic flow will become 2000, 2000 (CCR-I, CCR-T).

10

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Hardware and Software PCRF and DB Hosts

SAN

11

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Software
All hosts run Solaris 10 10/09 Database servers run Oracle Database Enterprise Edition in RAC configuration, Oracle Grid, all version 11.2.0.1 PCRF application servers run Oracle 11.2.0.1 client and Java JDK 6 Update 20 (both 32- and 64-bit) PCRF application servers have the following Cisco Policy Manage software and patches installed: Timestamp, action, version, status Thu Jan 06 12:36:39 EST 2011, EndInstall, FW_6.1B2734, SUCCESS Thu Jan 06 12:36:51 EST 2011, EndPatch, FW_6.1.0.6B1, SUCCESS Thu Jan 06 12:36:58 EST 2011, EndPatch, FW_6.1.0.9B1, SUCCESS Thu Jan 06 12:37:04 EST 2011, EndPatch, FW_6.1.0.14B1, SUCCESS Thu Jan 06 12:37:10 EST 2011, EndPatch, FW_6.1.0.35B1, SUCCESS Thu Jan 06 12:37:16 EST 2011, EndPatch, FW_6.1.0.45B1, SUCCESS Thu Jan 06 12:46:07 EST 2011, EndInstall, PM_3.0.2B79, SUCCESS Thu Jan 06 12:46:20 EST 2011, EndPatch, FW_6.1.0.25B2, SUCCESS Thu Jan 06 12:46:27 EST 2011, EndPatch, FW_6.1.0.30B4, SUCCESS Thu Jan 06 12:46:39 EST 2011, EndPatch, PM_3.0.2.2B1, SUCCESS Thu Jan 06 12:46:45 EST 2011, EndPatch, PM_3.0.2.3B1, SUCCESS Thu Jan 06 12:49:21 EST 2011, EndInstall, PRDE_3.0.2, SUCCESS Thu Jan 06 12:51:02 EST 2011, EndUpgrade, PRDE_3.0.2.1, SUCCESS Thu Jan 06 12:51:09 EST 2011, EndPatch, PRDE_3.0.2.2B10, SUCCESS Thu Jan 06 12:52:43 EST 2011, EndUpgrade, PRDE_3.0.2.3, SUCCESS

12

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Testing Approach Description


Performance tests will be run according to performance scenarios, performance and capacity data will be methodically sampled as the systems scale to the maximum load.

Measurement
Spirent measurements are provided in Appendix A Statistics, a sample XLS report file is also referenced in Document References. Round trip latencies (Gx, Sp, Sy) are measured by analyzing binary network traces taken on respective interfaces by NetScout. Product latencies are measured from collected product statistics. All latency figures used are the average latency unless otherwise stated. Database measurements are obtained via AWR snapshots. Transaction Audit Logging disk space consumption is recorded after each run and TAL partition cleaned.

Logging
The Unified Logs will be written to the Logging Database. Log levels will be set to ERROR, FATAL and WARN for the test run. The same logging configuration is expected to be used in production.

Statistics
Please see Appendix A Available Product and Custom Statistics. AT&T requirement for statistics periodicity in production is 900 seconds. In order to collect sufficient amount of samples, statistics will be re-configured to period of 60 seconds for this performance exercise and then reverted back to 900 seconds upon completion.

13

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Test Execution Performance Tests


Each stage of the performance test will generate the exact same sustained TPS load for 15 minutes. The team will take performance measurements at the end of 15 minutes, and then stop the stage to prepare for the next stage with higher TPS level. Higher CCR-I and CCR-T rates will be used in lieu of introducing CCR-Us into the Spirent tests. The following Gx TPS stages will apply to all Gx performance tests: Generate 1000 TPS (500 TPS CCR-I, 500 TPS CCR-T) for 15 minutes, collect data, and then stop. Generate 2000 TPS (1000 TPS CCR-I, 1000 TPS CCR-T) for 15 minutes, collect data, and then stop. Generate 3000 TPS (1500 TPS CCR-I, 1500 TPS CCR-T) for 15 minutes, collect data, and then stop. Generate 4000 TPS (2000 TPS CCR-I, 2000 TPS CCR-T) for 15 minutes, collect data, and then stop.

The following Sy TPS stages will apply to all Sy performance tests, it is expected that Gx TPS performance scenarios will generate sufficient Sy TPS and separate test runs will not be required to reach Sy TPS targets: Generate 250 TPS (125 TPS AAR, 125 TPS STR) for 15 minutes, collect data, and then stop. Generate 500 TPS (250 TPS AAR, 250 TPS STR) for 15 minutes, collect data, and then stop. Generate 750 TPS (375 TPS AAR, 375 TPS STR) for 15 minutes, collect data, and then stop. Generate 1000 TPS (500 TPS AAR, 500 TPS STR) for 15 minutes, collect data, and then stop.

After the component has been configured Spirent Landslide will generate GTP-C messages on either Gn or S11 interface depending on particular test scenario. The following high-level procedure will be used for each test run: 1. 2. 3. 4. 5. 6. 7. 8. Verify test setup at low TPS Turn down logging levels to ERROR/FATAL Turn off session validation Start the data collection script Run traffic for 15 minutes for TPS test and up to 30min for capacity tests Stop traffic Collect results using script kicked off in step four: collects AWR, product statistics for the test run, machine statistics output, TAL disk utilization and adds everything to tar archive Record test run start/stop time and manually collect NetScout network traces for that period

14

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Capacity Tests
Subscriber sessions will be gradually created over a period of time and measurement snapshots taken when concurrent session targets for a particular stage have been reached. TPS numbers will be collected and reported, but the objective of this test is to reach target numbers for concurrent subscriber sessions. Generate 1 Million concurrent sessions, verify system is stable, then continue Generate 2 Million concurrent sessions, verify system is stable then continue Generate 3 Million concurrent sessions, verify system is stable, then continue Generate 4 Million concurrent sessions, verify system is stable, then continue Generate 5 Million concurrent sessions, collect performance data, then stop

The TPS used to reach 5 million concurrent users will be the highest possible rate for the lowest performing component, which is currently estimated at 1000 Gx CCR-I TPS. At this rate the test will take about 1 hour and 30 minutes to reach 5 million, and an equal amount of time to gracefully terminate all sessions.

15

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Test Scenarios
Performance Test 1- Gx TPS on PHONE APN
Objective: To reach target Gx TPS in 4 stages, and measure the Cisco Policy Manager and Oracle database performance with high TPS, without MIND and without Balance Manager Components: PCEF and PCRF only.

Spirent

S/PGW

CSG

CPM

Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Message Flow TPS MSISDN Range APN Counters Decision Tables

Gx TPS on PHONE APN Gx Reach target Gx TPS in 4 stages None, only Billing-Plan-Name returned TPS Threshold CCR-I, CCR-T Gx: 1000, Gx: 2000, Gx: 3000, Gx: 4000 Gx: 8000 included in test, but beyond solution requirements. phone none
Type Size

Command Level APN Mapping SPR Mapping

15 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

16

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results of Performance Test 1- Gx TPS on PHONE APN


Test Name TPS - Gx Subs attached per at 1 second Front End Highest Avg CPU % Front End Highest Avg Memory Used LDAP Search Latency Sy Latency Session AAR/ AAA Connect Time (Spirent)

Baseline 1_1000_3 1_2000_1 1_3000_2 1_4000_1 1_8000_1

0 1000 2000 3000 4000 8000

0 500 1000 1500 2000 4000

1% 3% 10% 14% 19% 43%

3.5 GB 9.0 GB 9.7 GB 8.7 GB 8.8 GB 8.9 GB

n/a n/a n/a n/a n/a n/a

n/a n/a n/a n/a n/a n/a

n/a 38 ms 44 ms 38 ms 44 ms 67 ms

For each test, the system utilization is averaged across 30 seconds; the Highest Average is the single highest 30 second average seen on any single system. For every 15 minute test described in this document, there are 330 sampled averages of each measurement across the platform. The highest average is the worst of those 330 samples.

Load Profile of Oracle Session Server #1 at 8000 TPS Gx


Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 6.5 3.8 4,224,774.6 25,029.3 23,195.4 0.6 212.2 6,430.7 12.8 0.0 0.0 0.1 4,308.9 0.2 1,432.8 Per Transaction 0.0 0.0 2,948.7 17.5 16.2 0.0 0.2 4.5 0.0 0.0 0.0 0.0 3.0 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

17

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Performance Test 2 Gx TPS on BROADBAND APN

Objective: To reach target Gx TPS in 4 stages, and measure the Cisco Policy Manager and Oracle database performance with high TPS, with MIND, but without Balance Manager. Components: PCEF and PCRF and MIND

Spirent

S/PGW

CSG

CPM

Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Message Flow TPS

Gx TPS on BROADBAND APN, POST subscribers Gx, Sp Reach target Gx TPS in 4 stages None, only Billing-Plan-Name returned TPS Threshold CCR-I, LDAP search, CCR-T Gx: 1000 Gx: 2000 Gx: 3000 Gx: 4000 broadband none
Type Command Level APN Mapping SPR Mapping Size 15 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

MSISDN Range APN Counters Decision Tables

18

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results for Performance Test 2 Gx TPS on BROADBAND APN


Test TPS - Gx Subs attached per at 1 second Front End Highest Avg CPU % Front End Highest Avg Memory Used LDAP Search Latency (success) Sy Latency Session AAR/ AAA Connect Time (Spirent)

2_1000_2 2_2000_7 2_3000_1 2_4000_1

1000 2000 3000 4000

500 1000 1500 2000

16% 29 % 45 % 99 %

8.9 GB 9.3 GB 9.5 GB 9.5 GB

121 ms 119 ms 121 ms 122 ms

n/a n/a n/a n/a

173 ms 172 ms 179 ms 186 ms

Investigation revealed that 9000 binds were being sent by the PCRF one for each ldap timeout, and this does not scale when the F5 or MIND has an issue. Issue opened with development team. Both the F5 and MIND were reported to have several thousands of open connections to the PCRFs ; by design should only have 1000. These excessive connections caused MIND and F5 outages. The ldap search latency statistic is based on successful searches and doesnt reveal the timeouts. It was also discovered that the F5 in front of the MIND servers was adding 100ms to each ldap search. Issue opened with F5 team.

Load Profile of Oracle Session Server #1 at 4000 TPS


Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 2.0 1.9 2,185,992.4 13,818.3 12,093.1 0.1 146.4 3,304.1 3.9 0.0 0.0 0.1 2,210.3 0.0 735.5 Per Transaction 0.0 0.0 2,972.2 18.8 16.4 0.0 0.2 4.5 0.0 0.0 0.0 0.0 3.0 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

19

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Performance Test 3 Gx and Sy TPS on BROADBAND APN (3G)

Objective: To reach target Gx TPS in 4 stages, and measure the Cisco Policy Manager and Oracle database performance with high TPS, with MIND and with Balance Manager, using 3G SBP Policy. Components: PCEF and PCRF and MIND and Balance Manager

Spirent

S/PGW

CSG

CPM

Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Message Flow TPS

Gx and Sy TPS on BROADBAND APN, SBP subscribers (3G) Gx, Sp, Sy Reach target Gx and Sy TPS in 4 stages 2 Service Groups 4 Services TPS Threshold CCR-I, LDAP Search, AAR, Gx RAR, CCR-T, STR Gx: 250 / Sy 250 Gx: 500 / Sy 500 Gx: 750 / Sy 750 Gx: 1000 / Sy 1000 broadband 2
Type Command Level Service Status APN Mapping SPR Mapping Size 15 Rows x 3 Input Cols 4 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

MSISDN Range APN Counters Decision Tables

20

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results of Performance Test 3 Gx/Sy TPS on BROADBAND APN (3G)


Test TPS - Gx Subs attached per at 1 second 1200 2500 3690 5000 Front End Highest Avg CPU % 6% 7% 11 % 19 % Front End Highest Avg Memory Used 9.2 GB 9.3 GB 9.4 GB 9.4 GB LDAP Search Latency (success) 103 ms 6 ms 6 ms 66 ms Sy Latency Session AAR/ AAA Connect Time (Spirent) 96 ms 93 ms 77 ms 89 ms 174 ms 78 ms 71 ms 133 ms

3_250_1 3_500_1 3_750_1 3_1000_2

240 500 750 1000

In test 500 and 750 the F5 team found a parameter to remove the 100ms latency during ldap searches. The ldap search latency using the primary MIND VIP were significantly improved going from 107ms to 5 ms, and this is reflected in the session connect times. Test 1000 we discovered that the secondary path to the MIND VIP through the F5 OAM load balancer also needed to be modified to remove the 100ms latency. This is reflected in the higher session connection times. To accommodate the Sy signaling, the Spirent test case was modified to wait 10 seconds rather than 1 second before sending a disconnect. This increased the number of concurrent sessions for any 1 second measurement period.

Load Profile of Oracle Session Server #1 at 1000 TPS


Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 1.5 1.5 1,411,308.9 11,311.2 8,051.1 0.2 210.5 2,696.7 7.2 0.0 0.5 0.1 1,984.8 0.0 312.2 Per Transaction 0.0 0.0 4,520.6 36.2 25.8 0.0 0.7 8.6 0.0 0.0 0.0 0.0 6.4 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

21

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Performance Test 4 Gx/Sy TPS on BROADBAND APN (4G)

Objective: To reach target Gx TPS in 4 stages, and measure the Cisco Policy Manager and Oracle database performance with high TPS (1000 Gx and Sy), with MIND and with Balance Manager, using 4G SBP Policy. Components: PCEF and PCRF and MIND and Balance Manager

Spirent

S/PGW

CSG

CPM

Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Message Flow TPS MSISDN Range APN Counters Decision Tables

Gx and Sy TPS on BROADBAND APN, MRC subscribers (LTE) Gx, Sp, Sy Reach target Gx and Sy TPS in 4 stages 2 Service Groups 4 Services 18 QoS categories TPS Threshold CCR-I, LDAP Search, AAR, Gx RAR, CCR-T, STR Gx: 1000 / Sy: 1000 broadband 2
Type Command Level Service Status APN Mapping SPR Mapping Size 15 Rows x 3 Input Cols 4 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

22

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results for Performance Test 4 Gx/Sy TPS on BROADBAND APN (4G)

Test

TPS - Gx

Subs attached per at 1 second 5000

Front End Highest Avg CPU % 20 %

4_1000_1

1000

Front End Highest Avg Memory Used 9.5 GB

LDAP Search Latency (success) 51 ms

Sy Latency Session AAR/ AAA Connect Time (Spirent) 90 ms 132 ms

On 4_1000, the F5 OAM load balancer did not have the delay ack timer disabled and all ldap searches were delayed by 100ms. This increases the Session Connect time significantly.

Load Profile of Oracle Session Server #1 at 1000 TPS


Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 1.5 1.4 2,737,317.0 22,921.9 18,248.3 0.6 336.4 2,489.8 5.0 0.1 0.4 0.0 1,830.7 0.0 288.1 Per Transaction 0.0 0.0 9,501.2 79.6 63.3 0.0 1.2 8.6 0.0 0.0 0.0 0.0 6.4 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

23

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Capacity Testing Scenarios Capacity Test 1 5 Million Concurrent 3G Phone APN users.
Objective: To reach target concurrent sessions in 5 stages with PHONE APN (3G subscribers) and collect PCRF utilization statistics. Components: PCEF and PCRF and MIND and Balance Manager
Spirent S/PGW CSG CPM Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Message Flow Capacity levels MSISDN Range APN Counters Decision Tables

Concurrent sessions on PHONE APN Gx Reach target concurrent sessions in 5 stages None, only Billing-Plan-Name returned Concurrent sessions CCR-I, CCR-T 5 Million phone 2
Type Command Level APN Mapping SPR Mapping Size 15 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

24

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results Capacity Test 1 5 Million Concurrent 3G Phone APN users.


Test TPS - Gx Subs attached per at 1 second (grows) 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 15 % 11.5 GB n/a n/a Front End Highest Avg CPU % Front End Highest Avg Memory Used LDAP Search Latency (success) Sy Latency AAR/ AAA Session Connect Time (Spirent) 62 ms 63 ms 118 ms 80 ms 191 ms

5_2000_2 5_2000_2 5_2000_2 5_2000_2 5_2000_2

4000 4000 4000 4000 4000

The test took about 2 hours to run, and a few short periods of network issues were seen, one creating more than 40k session errors. Troubleshooting was not performed and these 30 seconds were excluded from the Spirent connect time average. SGSN and CSG and Radius retransmit timers need to be modified so as not to create a retransmission storm when latency is encountered. Cisco has an action item to review these settings in the lab.

Load Profile of Oracle Session Server #1 with 5 million 3G Phone APN Subscribers
Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 0.6 0.4 491,817.8 4,998.4 2,362.6 0.5 943.6 234.2 17.8 0.1 0.1 0.1 134.8 4.2 116.5 Per Transaction 0.0 0.0 4,222.9 42.9 20.3 0.0 8.1 2.0 0.2 0.0 0.0 0.0 1.2 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

25

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Capacity Test Scenario 1 5 Million Concurrent LTE subscribers.


Objective: To reach target concurrent sessions in 5 stages with LTE subscribers and collect PCRF utilization statistics. Components: PCEF and PCRF and MIND and Balance Manager
Spirent S/PGW CSG CPM Oracle

MIND

BalMgr

Scenario Name Interfaces Objectives Policy Configuration Threshold Type Diameter Interactions Capacity levels

Concurrent sessions on BROADBAND APN, MRC subscribers (LTE) Gx, Sp, Sy Reach target concurrent sessions in 5 stages 2 Service Groups 4 Services 18 QoS categories Concurrent sessions CCR-I, LDAP Search, AAR, Gx RAR, CCR-T, STR 1 Million 2 Million 3 Million 4 Million 5 Million broadband 2
Type Command Level Service Status APN Mapping SPR Mapping Size 15 Rows x 3 Input Cols 4 Rows x 3 Input Cols 10 Rows x 1 Input Col 12 Rows x 4 Input Cols

MSISDN Range APN Counters Decision Tables

26

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Results of Capacity Test Scenario 1 5 Million Concurrent LTE subscribers.

Test

TPS - Gx

6_1000_1 6_1000_1 6_1000_1 6_1000_1 6_1000_1

2000 2000 2000 2000 2000

Subs attached per at 1 second (grows) 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000

Front End Highest Avg CPU %

Front End Highest Avg Memory Used

LDAP Search Latency (success)

Sy Latency AAR/ AAA

Session Connect Time (Spirent) 84 ms 94 ms 90 ms 87 ms

35 %

10.5 GB

55 ms

100 ms

82 ms

To achieve a 5 million subscriber mix of 3G and LTE, 3 million 3G subscribers were used with 2 million 4G subscribers. In order to minimize the impact on other testing teams and to not overload any network component during the test, an activation rate of no more than 2000 session per second was used.

Load Profile for Oracle Session Server #1 during the 5 Million LTE subscriber tests.
Per Second DB Time(s): DB CPU(s): Redo size: Logical reads: Block changes: Physical reads: Physical writes: User calls: Parses: Hard parses: W/A MB processed: Logons: Executes: Rollbacks: Transactions: 2.3 1.9 2,440,966.5 18,649.5 12,177.4 13.8 1,051.0 2,784.5 10.2 0.0 0.3 0.1 2,091.1 0.0 496.1 Per Transaction 0.0 0.0 4,920.0 37.6 24.5 0.0 2.1 5.6 0.0 0.0 0.0 0.0 4.2 0.0 Per Exec 0.00 0.00 Per Call 0.00 0.00

27

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Performance Results at a Glance


Table 1 - Gx TPS on Phone APN Test Name TPS - Gx Subs attached per at 1 second 0 500 1000 1500 2000 4000 Front End Highest Avg CPU % 1% 3% 10% 14% 19% 43% Front End Highest Avg Memory 3.5 GB 9.0 GB 9.7 GB 8.7 GB 8.8 GB 8.9 GB LDAP Search Latency n/a n/a n/a n/a n/a n/a Sy Latency AAR/ AAA n/a n/a n/a n/a n/a n/a Session Connect Time n/a 38 ms 44 ms 38 ms 44 ms 67 ms

Baseline 1_1000_3 1_2000_1 1_3000_2 1_4000_1 1_8000_1

0 1000 2000 3000 4000 8000

Table 2 - Gx TPS with Broadband APN Test TPS - Gx Subs attached per at 1 second 500 1000 1500 2000 Front End Highest Avg CPU % 16% 29 % 45 % 99 % Front End Highest Avg Memory 8.9 GB 9.3 GB 9.5 GB 9.5 GB LDAP Search Latency (success) 121 ms 119 ms 121 ms 122 ms Sy Latency AAR/ AAA n/a n/a n/a n/a Session Connect Time 173 ms 172 ms 179 ms 186 ms

2_1000_2 2_2000_7 2_3000_1 2_4000_1

1000 2000 3000 4000

Table 3 - Gx and Sy TPS on Broadband APN (3G) Test TPS - Gx Subs attached per at 1 second 1200 2500 3690 5000 Front End Highest Avg CPU % 6% 7% 11 % 19 % Front End Highest Avg Memory 9.2 GB 9.3 GB 9.4 GB 9.4 GB LDAP Search Latency (success) 103 ms 6 ms 6 ms 66 ms Sy Latency AAR/ AAA 96 ms 93 ms 77 ms 89 ms Session Connect Time 174 ms 78 ms 71 ms 133 ms

3_250_1 3_500_1 3_750_1 3_1000_2

240 500 750 1000

Table 4 - Gx and Sy TPS on Broadband APN (4G) Test TPS - Gx Subs attached per at 1 second 5000 Front End Highest Avg CPU % 20 % Front End Highest Avg Memory Used 9.5 GB LDAP Search Latency (success) 51 ms Sy Latency AAR/ AAA Session Connect Time (Spirent) 132 ms

4_1000_1

1000

90 ms

28

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Capacity Results at a Glance


Table 5 - Capacity perfomance of 5 Million 3G Phone APN subscribers Test TPS - Gx Subs attached per at 1 second (grows) 5,000,000 Front End Highest Avg CPU % 15 % Front End Highest Avg Memory Used 11.5 GB LDAP Search Latency (success) n/a Sy Latency AAR/ AAA Session Connect Time (Spirent) 174 ms

5_2000_2

4000

n/a

Table 6 - Capacity performance of 5 Million 3G and LTE subscribers Test TPS - Gx Subs attached per at 1 second (grows) 5,000,000 Front End Highest Avg CPU % 35 % Front End Highest Avg Memory Used 10.5 GB LDAP Search Latency (success) 55 ms Sy Latency AAR/ AAA Session Connect Time (Spirent) 88 ms

6_1000_1

2000

100 ms

29

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Performance Acceptance Criteria Performance Targets


A TPS is defined as a pair of send and receive messages, such as: Gx Gx Sy Sy CCR-I, CCA-I RAA, RAR AAR,AAA STR,STA

One 3G Phone APN users will create 1 Gx TPS during session initiation and 1 Gx TPS during session termination (and no Sy, and no Sp traffic). One LTE SBP subscriber will create 2 Gx and 1 Sy TPS during session initiation and 1 Gx and 1 Sy TPS during session termination.

Interface Gx TPS across 5 PCRF nodes

Requirement 1800 CCR-I + 400 CCR-U + 1800 CCR-T or 2000 CCR-I + 2000 CCR-T 500 AAR + 500 STR 50ms

Result 4000 Gx.CCR-I + 4000 Gx.CCR-T 500 Sy.AAR + 500 Sy.STR 44 ms @ 4000 TPS.

Sy TPS across 5 PCRF nodes PCRF Processing Latency with no external interface latency

Capacity Targets
5,000,000 concurrent subscriber sessions across all PCFR nodes.

30

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Deliverables
Reporting
A progress report will be generated nightly and distributed to stakeholders and project management. A tar file will be generated for each test run. This will be stored in TBD. Each tar file will contain the following: Machine statistics collected during the test run (iostat, prstat, top etc.) Product statistics containing a subset of the available product statistics Raw statistics containing a full database dump of all statistics collected during the test run Oracle AWR (Automatic Workload Repository) report TAL mount point utilization Text file containing information about the test run

Results will be collected per single test run. Tests will be mapped to the specific tests as outlined in QC and QC will be updated daily. The specific tests contained within QC will be replaced by the tests as outlined in this document. The overall test report will be written and stored in the following location: TBD

31

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Testing Tools
In NDC-1, performance is measured using the following tools:

Spirent Call Generator


In NDC-1, Spirent Landslide version 9.0.0 GA, TAS Version 9.0.0.10 is used to drive control traffic. Landslide is designed to conduct performance and scalability testing for a wide range of wireless technologies and applications including LTE. Landslide tests the core components for each technology in either stand-alone or end-to-end configurations. For 3G testing scenarios, Spirent will emulate 3G SGSN and send transactions towards PG-W over the Gn interface. For LTE testing scenarios, Spirent will emulate MME and will send transactions towards SGW over the S11 interface. Landslide provides measurement of TPS and latency for round-trip transactions.

Performance Statistics
Openet uses scripts to capture the results of the collection of the Performance Statistics. Sample scripts will be made available to this project, but will take some time to adjust to particulars of the NDC-1 environment. The scripts perform an initial snapshot of the environment, and capture the ending Statistics to form a snapshot of the Statistics during the performance run.

32

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Appendix A Statistics
Gx Statistics - PCRF
Statistic Name Latency_CCR_INITIAL_REQUEST Latency_CCR_TERMINATION_REQUEST Latency_CCR_UPDATE_REQUEST Count_CCR_INITIAL_REQUEST Count_Failed_CCR_INITIAL_REQUEST Count_CCR_UPDATE_REQUEST Count_Failed_CCR_UPDATE_REQUEST Count_CCR_TERMINATION_REQUEST Count_Failed_CCR_TERMINATION_REQUEST Measurement The latency of CCR-I messages The latency of CCR-T messages The latency of CCR-U messages Total CCR-I count Failed CCR-I count Total CCR-U count Failed CCR-U count Total CCR-T count Failed CCR-T count

Sy Statistics - PCRF
Statistic Name Latency_AAR Latency_RAA Latency_Sy_AA-Answer Latency_Sy_Re-Auth-Request Latency_Sy_Session-Termination-Answer Count_AAR Count_Failed_AAR Measurement May be only applicable to synchronous EBMI Is this for Sy? Sy AA-Answer Latency Sy RAR Latency Sy Session-Termination-Answer Latency Total AAR count Failed AAR count

Sp Statistics - PCRF
Statistic Name Latency_SPR_Lookup Latency_SPR_Roaming_Classification_Lookup Latency_SPR_LDAP_Search Latency_SPR_LDAP_Search_Plugin Count_SPR_Lookup Count_SPR_Roaming_Classification_Lookup Count_Failed_SPR_Roaming_Classification_Lookup Count_SPR_LDAP_Search Count_Failed_SPR_LDAP_Search Measurement Total latency of the SPR Lookup from PRDE perspective SPR API latency of Roaming Classification SPR API latency of LDAP Search procedure SPR API latency of the LDAP Search operation at the plugin level Total SPR Lookup count from PRDE perspective Total SPR API Roaming Classification count Failed SPR API Roaming Classification count Total SPR API LDAP Search count Failed SPR API LDAP Search count

33

CISCO Policy Manager


Performance and Capacity Testing v. 1.2

Database Statistics - PCRF


Statistic Name Latency_DB_COMMIT Measurement The latency of database commits

Policy Manager API - PCRF


Statistic Name Latency_PMAPI_DoDecisionTableLookup Latency_PMAPI_GetPeerQOSParameters Latency_PMAPI_GetPeerTriggerInfo Latency_PMAPI_PCCRuleGroupLookup Latency_PMAPI_PCCRuleLookupByName Measurement Latency of decision table lookup Latency of QOS parameters lookup Latency of peer trigger lookup Latency of PCC Rule group lookup Latency of PCC Rule group lookup by name

Session Store - PCRF


Statistic Name Latency_Session_DeleteAllAppliedRules Latency_Session_DeleteAllAppliedServiceStatuses Latency_Session_DeleteSession Latency_Session_InsertAppliedRule Latency_Session_InsertAppliedServiceStatus Latency_Session_InsertSession Latency_Session_SelectAppliedRules Latency_Session_SelectAppliedServiceStatuses Latency_Session_SelectBySessionId Latency_Session_UpdateAppliedRule Latency_Session_UpdateAppliedServiceStatus Latency_Session_UpdateSession Measurement Latency of deleting all applied rules Latency of deleting all applied service statuses Latency of deleting a session Latency of inserting an applied rule Latency of inserting service status Latency of inserting sessions into store Latency of retrieving applied rules Latency of retrieving service statuses Latency of retrieving by session id Latency to update applied rule Latency of updating applied service statuses Latency of update session

Spirent
Statistic Name Attempted Connect Rate (Sessions/Second) Attempted Disconnect Rate (Sessions/Second) Actual Connect Rate (Sessions/Second) Actual Disconnect Rate (Sessions/Second) Sessions Established Session Errors Attempted Session Connects Attempted Session Disconnects Statistic Name Actual Session Connects Actual Session Disconnects Average Session Connect Time Average Session Disconnect Time Minimum Connect Time Maximum Connect Time User Authentication Failure

34

You might also like