Professional Documents
Culture Documents
0
Capacity Monitoring Guide
(BSC6910-based)
Issue 04
Date 2014-01-20
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not
be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all
statements, information, and recommendations in this document are provided "AS IS" without warranties,
guarantees or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
1.1 Purpose
Growing traffic in mobile networks, especially in newly deployed networks, requires more
and more network resources, such as radio and transmission resources. Lack of network
resources will affect user experience. Therefore, monitoring network resources, locating
bottlenecks, and performing capacity expansion in real time are critical to the provision of
high quality services.
This document describes how to monitor the usage of various network resources and locate
network resource bottlenecks.
This document applies to BSC6910s and 3900 series base stations.
Radio Network Controllers (RNCs) mentioned in this document refer to Huawei BSC6910s.
For details about the MML commands, parameters, alarms, and performance counters, see section
"Operation and Maintenance" in BSC6910 UMTS Product Documentation or 3900 Series WCDMA
NodeB Product Documentation.
For details about flow control, see Flow Control Feature Parameter Description in the RAN
Feature Documentation.
1.2 Organization
This document is organized as follows:
Chapter Description
1 Overview Introduces network resources and monitoring methods.
2 Network Resource Provides capacity monitoring principles, monitoring
Monitoring methods, and optimization suggestions.
3 Network Resource Provides methods for analyzing and locating network
Troubleshooting congestion and overload problems.
4 Metrics Definitions Provides a table that includes all metrics currently in use.
5 Reference Documents Lists the documents referenced within the text and provides
the document name, document package, and document
package download path at http://support.huawei.com.
Change History
The latest document issue contains all changes made in previous issues.
04 (2014-01-20)
This is the fourth issue of RAN15.0.
Compared with Issue 03 (2013-11-15), Issue 04 (2014-01-20) includes the following changes:
Correct the counter name in chapter 4 .
03 (2013-11-15)
This is the third issue of RAN15.0.
Compared with Issue 02 (2013-06-25), Issue 03 (2013-11-15) optimizes figures in this
document.
02 (2013-06-25)
This is the second issue of RAN15.0.
Compared with Issue 01 (2013-05-04), Issue 02 (2013-06-25) includes the following changes:
Added section 2.13 "HSPA Users."
Updated the monitoring methods and optimization suggestions in sections 2.10 "CE
Usage" and 2.12 "NodeB CNBAP Load."
01 (2013-05-04)
This is the first issue of RAN15.0.
Compared with Draft A (2013-01-30), Issue 01 (2013-05-04) includes the following changes:
Added section 2.3 "RMP CPU Load."
Updated the monitoring principles, monitoring methods, and optimization suggestions in
chapter 2 "Network Resource Monitoring."
Deleted section 2.13 "NodeB CPU Load."
Deleted chapter 3 "HSPA Related Resources."
Draft A (2013-01-30)
This is a draft of RAN15.0.
Contents
1 Overview......................................................................................................................................... 5
1.1 Network Resources .......................................................................................................................................... 5
1.2 Monitoring Methods ......................................................................................................................................... 7
1 Overview
SCU provides the function of inter-subrack information exchange in the RNC. When the
traffic volume of inter-subrack communication approaches the overload threshold, voice
service quality, data service quality, and network KPIs deteriorate, causing the system to
become unstable.
For details about other resources, counters, and monitoring thresholds, see the corresponding
sections.
This procedure applies to most resource monitoring scenarios, but sometimes the system may
be overloaded because of other abnormalities instead of traffic increases. Some of these
abnormalities can be discovered by resource cross-checking. If more extensive resource
cross-checking is unsuccessful, more advanced troubleshooting is required. Chapter 3
"Network Resource Troubleshooting" addresses the more advanced troubleshooting
procedures.
For the sake of simplicity, it is assumed that there are no other abnormalities on the network
during the proactive monitoring process described in chapter 2 "Network Resource
Monitoring."
NOTE
In effective mode, run the MML command LST BRD to query information about a specific board,
for example, whether the board is an interface board.
In ineffective mode, find the MML command ADD BRD in the RNC configuration file. If the value
of the BRDCLASS parameter is INT, the board is an interface board. You can obtain information
about this interface board using this command.
NOTE
For details about the Transmission Resource Pool in RNC feature, see Transmission Resource Pool in
RNC Feature Parameter Description in the RAN feature documentation.
Inappropriate LA planning will cause PCH overload because paging messages are
broadcast across the entire LA.
− If paging messages are not retransmitted, the paging message loss rate is 5% when
the PCH usage reaches 60%. In this scenario, analyze the cause of paging message
loss or replan the LAs.
− If paging messages are retransmitted once or twice, the paging message loss rate is
1% when the PCH usage reaches 70%. In this scenario, analyze the cause of paging
message loss or replan the LAs.
FACH overload
If the FACH usage reaches 70% during peak hours for three consecutive days in a week,
the following measures are recommended:
− If the network is configured with only one SCCPCH, add a second SCCPCH to carry
FACHs.
− If SCCPCHs cannot be added, add carriers, NodeBs, or micro NodeBs.
RACH overload
If the RACH usage reaches 70% during peak hours for three consecutive days in a week,
add carriers.
The downlink cell load is indicated by the mean utility ratio of transmitted carrier power in a
cell.
The mean utility ratio of the transmitted carrier power for non-HSPA users in a cell
(including non-HSPA users on CCHs) is calculated using the following formula:
MeanTCP (NonHS) Usage = MeanNonHSTCP /MAXTXPOWER x 100%
The mean utility ratio of the transmitted carrier power for all users in a cell is calculated
using the following formula:
MeanTCP Usage = MeanTCP/MAXTXPOWER x 100%
NOTE
To obtain MAXTXPOWER, run the LST UCELL command to query the value of the Max
Transmit Power of Cell parameter, and convert the parameter value from the unit "0.1 dBm" to
"watt."
The NodeB measures the RTWP on each receive channel in each cell. The cell RTWP
obtained by the RNC is the linear average of the RTWPs measured on all receive channels in
a cell under the NodeB. The RTWP indicates the interference to a NodeB and the signal
strength on the RX port on the RF module.
The uplink cell capacity is restricted by the rise over thermal (RoT), which equals the RTWP
minus the cell's background noise. The formula is as follows:
The relationship between the RoT and the uplink load factor
UL is as follows:
1
RoT 10 log( )
1 UL
For example, a 3 dB noise increase corresponds to 50% of the uplink load and a 6 dB noise
increase corresponds to 75% of the uplink load.
Figure 2-3 Relationship between RTWP, noise increase, and uplink load
A large RTWP value in a cell is caused by traffic overflow, hardware faults (for example, poor
quality of antennas or feeder connectors), or external interference. If the RTWP value is too
large, the cell coverage shrinks, the quality of admitted services declines, or new service
requests are rejected.
If the value of the VS.MeanRTWP counter is greater than -90 dBm during peak hours for
three consecutive days in a week, there may be hardware faults or external interference.
Locate and rectify the faults and attempt to eliminate the interference. If the value of the
VS.MeanRTWP counter is still greater than -90 dBm after hardware faults are rectified
and external interference is eliminated, enable the following features as required:
− WRFD-140215 Dynamic Configuration of HSDPA CQI Feedback Period
The system reserves code resources for HSDPA services, and these code resources can be
shared among HSDPA services. Therefore, HSDPA services do not require admission control
based on cell code resources.
Figure 2-7 shows NodeB-controlled dynamic code allocation.
Enable the WRFD-010631 Dynamic Code Allocation Based on NodeB feature if this
feature has not been enabled. Preferentially allocate idle codes to HSDPA UEs to
improve the HSDPA UE throughput.
Add a carrier or split the sector.
Enable the WRFD-010653 96 HSDPA Users per Cell feature if it is not enabled.
NOTE
For details about how to enable the WRFD-010631 Dynamic Code Allocation Based on NodeB feature
and the WRFD-010653 96 HSDPA Users per Cell feature, see HSDPA Feature Parameter Description in
the RAN Feature Documentation.
2.10 CE Usage
2.10.1 Monitoring Principles
CE resources are baseband processing resources in a NodeB. The more CEs a NodeB supports,
the stronger the NodeB's service processing capability. If a new call arrives but there are not
enough CEs (not enough baseband processing resources), the call will be blocked.
Uplink CE resources can be shared in an uplink resource group, but not between uplink
resource groups. Downlink CE resources are associated with the baseband processing
boards where a cell is set up. CE resources allocated by licenses are shared among services on
the NodeB. (CE resources are shared on a per operator basis in MOCN scenarios.)
The NodeB sends the RNC a response message that carries its CE capability. The NodeB's CE
capability is limited by both the installed hardware and the configured software licenses.
The methods of calculating the credit resource usage of admitted UEs are different before and
after the CE Overbooking feature is enabled. Table 2-3 describes the details.
Table 2-3 Credit resources consumed by admitted UEs before and after CE Overbooking is
enabled
NOTE
For details about CE Overbooking, see CE Overbooking Feature Parameter Description.
CCHs do not require extra CE resources because the RNC reserves CE resources for services
on these channels. Signaling carried on an associated channel of the DCH does not consume
extra CE resources, either. One CE can be consumed by a 12.2 kbit/s voice call.
Table 2-4 to Table 2-9 provide the number of CEs consumed by different services.
UL 3.4 256 1 2 2 4
13.6 64 1 2 2 4
8 64 1 2 2 4
16 64 1 2 2 4
32 32 1.5 3 2 4
64 16 3 6 3 6
128 8 5 10 5 10
144 8 5 10 5 10
256 4 10 20 10 20
384 4 10 20 10 20
DL 3.4 256 1 1
13.6 128 1 1
8 128 1 1
16 128 1 1
32 64 1 1
64 32 2 2
128 16 4 4
144 16 4 4
256 8 8 8
384 8 8 8
Table 2-6 CEs consumed by an HSUPA service (10 ms TTI, SRB over DCH)
UL 32 64 32 1 2 2 4
64 128 16 2 4 2 4
128 256 8 4 8 4 8
608 608 4 8 16 8 16
Table 2-7 CEs consumed by an HSUPA service (2 ms TTI, SRB over DCH)
UL 608 608 4 8 16 8 16
1280 1280 2SF4 16 32 16 32
2720 2720 2SF2 32 64 32 64
Table 2-8 CEs consumed by an HSUPA service (10 ms TTI, SRB over HSUPA)
UL 16 64 32 1 2 2 4
32 128 16 2 4 2 4
128 256 8 4 8 4 8
608 608 4 8 16 8 16
1280 1280 2S 16 32 16 32
F4
1800 1800 2S 32 64 32 64
F2
Table 2-9 CEs consumed by an HSUPA service (2 ms TTI, SRB over HSUPA)
UL 608 608 4 8 16 8 16
1280 1280 2SF4 16 32 16 32
2720 2720 2SF2 32 64 32 64
NOTE
Table 2-6 to Table 2-9 apply only to WBBPb, WBBPd, and WBBPf boards in 3900 series base stations.
HSDPA services do not consume downlink CEs allocated to R99 services. HSUPA and R99
services share uplink CEs.
Iub bandwidth congestion will cause RRC connection setup and RAB setup failures. Table
2-10 describes Iub bandwidth congestion scenarios.
Scenario Description
The physical Admission control is performed based on the sufficiency of
transmission transmission resources. If transmission resources are
resources are sufficient, the admission succeeds. Otherwise, the
sufficient, but the transmission fails. Admission control prevents excessive
admission fails. service admission and ensures the quality of admitted
services.
The physical The physical transmission bandwidth between the RNC
transmission and NodeB is insufficient.
bandwidth is
insufficient.
IP Transmission
On an IP transmission network, the RNC and NodeB can monitor the average uplink and
downlink loads on their interface boards. The Iub bandwidth usage is represented by the ratio
of the average uplink or downlink load to the configured Iub bandwidth. In addition, the RNC
and NodeB can dynamically adjust the bandwidth of a service based on the QoS requirements
of the service and user priority and improve the Iub bandwidth usage using the reserve
pressure algorithm on their interface boards.
1. Bandwidth-based admission success rate
The counters for monitoring the bandwidth-based admission success rate are as follows:
VS.ANI.IP.Conn.Estab.Succ: Number of Successful IP Connections for IP Transport
Adjacent Node
VS.ANI.IP.Conn.Estab.Att: Number of IP Connection Setup Requests Received by IP
Transport Adjacent Node
VS.ANI.IP.FailResAllocForBwLimit: Number of Failed Resource Allocations Due to
Insufficient Bandwidth on the IP Transport Adjacent Node
The IP connection setup success rate is calculated using the following formula:
IP connection setup success rate = VS.ANI.IP.Conn.Estab.Succ/VS.ANI.IP.Conn.Estab.Att
If the IP connection setup success rate is less than 99%, bandwidth congestion has probably
occurred on the user plane.
If the value of the VS.ANI.IP.FailResAllocForBwLimit counter is not zero, bandwidth
congestion has occurred on the user plane.
2. Physical bandwidth usage
(1) Control plane
The counters for monitoring the number of bytes transmitted or received on an SCTP link
(carrying NCP, CCP, and ALCAP) are as follows:
VS.SCTP.TX.BYTES: Number of IP bytes transmitted on an SCTP link
VS.SCTP.RX.BYTES: Number of IP bytes received on an SCTP link
The uplink and downlink traffic volumes of the SCTP links on the control plane over the Iub
interface are calculated using the following formulas:
SCTP UL KBPS = VS.SCTP.RX.BYTES x 8/<SP>/1000
SCTP DL KBPS = VS.SCTP.TX.BYTES x 8/<SP>/1000
If ALM-21542 SCTP Link Congestion is reported on the Iub interface, bandwidth congestion
has occurred on the control plane.
(2) User plane
In transmission resource pool networking, the number of IP bytes transmitted or received on
the user plane by each adjacent node is measured by the following counters:
VS.IPPOOL.ANI.IPLAYER.TXBYTES: Number of IP bytes transmitted on the user
plane of an adjacent node
VS.IPPOOL.ANI.IPLAYER.RXBYTES: Number of IP bytes received on the user plane
of an adjacent node
The Iub bandwidth usage of a transmission resource pool is calculated using the following
formulas:
UL Iub Usage on User Plane = VS.IPPOOL.ANI.IPLAYER.RXBYTES x
8/<SP>/1000/RX BW_CFG
DL Iub Usage on User Plane = VS.IPPOOL.ANI.IPLAYER.TXBYTES x
8/<SP>/1000/TX BW_CFG
If the uplink or downlink bandwidth usage of the user plane on the Iub interface reaches 70%,
bandwidth congestion has occurred on the user plane. Then, capacity expansion is required.
NOTE
The activity factor equals the ratio of actual bandwidth occupied by a UE to the bandwidth allocated to
the UE during its initial access. It is used to predict the bandwidth required by admission. Each NodeB
can be configured with its own activity factor. The default activity factor for voice services is 70%, and
the default activity factor for PS BE services is 40%.
----End
Figure 3-1 Basic call flow chart and possible block/failure points
Step 5 If the RNC is congested when receiving the RRC connection setup request, the RNC may
drop the request message. See block point #3.
Step 6 If the RNC receives the RRC connection setup request and does not discard it, the RNC
determines whether to accept or reject the request. The request may be rejected due to
insufficient resources, as shown in block point #4.
Step 7 If the RNC accepts the request, the RNC instructs the UE to set up an RRC connection. The
UE may not receive RNC's response message or may find that the configuration does not
support RRC connection setup. See failure points #5 and #6.
Step 8 After the RRC connection is set up, the UE sends NAS messages to negotiate with the CN
about service setup. If the CN determines to set up a service, the CN sends an RAB
assignment request to the RNC.
Step 9 The RNC accepts or rejects the RAB assignment request based on the resource usage of RAN.
See block point #7.
Step 10 If the RNC accepts the RAB assignment request, it initiates an RB setup process. During the
process, the RNC sets up radio links (RLs) to the NodeB over the Iub interface and sends an
RB setup message to the UE over the Uu interface. The RL or RB setup process may incur
failures. See failure points #8 and #9.
----End
In most cases, the troubleshooting procedure includes detecting abnormal KPIs, selecting top
N cells, and analyzing abnormal KPIs.
In the case of system bottlenecks, accessibility-related KPIs are usually checked first because
most of the access congestion issues are caused by insufficient system resources.
Figure 3-3 shows key points for bottleneck analysis. The following sections then describe
these key points.
As shown in Figure 3-4, if the CP CPU load exceeds 50%, analyze the cause of the problem
and prevent the CPU load from increasing further. In addition, perform an in-depth analysis of
the high CPU load, for example, check whether the parameter configurations are appropriate.
If the high CPU load is caused by an increase in traffic volume, it is recommended that you
expand hardware capacity.
CP/UP flexible deployment can adjust the hardware resources in the CP and UP pools so that
these pools have basically the same average load. Because the non-pooled load in the CP
cannot be shared, dynamic cell allocation is required to balance this load in the CP pool.
If the traffic model changes, the load can be balanced between CP and UP pools to a certain
degree but cannot be completely balanced. For example, if the CP CPU load is lower than the
UP CPU load but the remaining CP resources are insufficient to bear new NodeBs and cells, it
is recommended that you reduce the number of cells in the RNC or add new boards.
If the UP CPU load exceeds 60%, analyze the cause of the problem and prevent the CPU load
from increasing further. If the high CPU load is caused by an increase in traffic volume, it is
recommended that you expand hardware capacity.
CE resources can be shared within a resource group. Therefore, CE usage on the NodeB must
be calculated to determine whether CE congestion occurs in a resource group or the NodeB. If
CE congestion occurs in a resource group, reallocate CE resources among resource groups. If
CE congestion occurs in the NodeB, perform capacity expansion.
Generally, adding a carrier is the most effective means of relieving uplink power congestion.
If no additional carrier is available, add a NodeB or reduce the downtilt of the antenna.
4 Metrics Definitions
Congestion-related Metrics
5 Reference Documents
This chapter lists the documents referenced within the text and provides the document name,
document package, and document package download path at http://support.huawei.com.