You are on page 1of 21

N2-METAWIN

Measurement-based Optimization of the GPRS/UMTS Core Network


Technical Report FTW-TR-2005-009

About this document This is a document of the N2-METAWIN project. Its scope is to report on the exploitation of the data measured by the METAWIN system for the optmization of the Core Network. Specically, we address the problem of nding the optimal assignement of Gb links to SGSNs. The optimality criteria are twofolds: balance the peak number of attached MSs to each SGSN, and at the same time minimize the volume of inter-SGSN Routing Area Updates. We present a method to extract the relevant input data (e.g. mobility matrix) from the packet traces collected by the METAWIN system. The traces are collected by passive monitoring of Gb links. We show how to manipulate the measured data (e.g. error clean-up, outlier ltering) and prepare them to serve as the input of the optimization process. We provide a novel Integer Linear Programming (ILP) formulation for the problem under study. Finally, we present numerical results for a case study derived from traces collected in October 2004 from the operational network of mobilkom austria AG & Co KG,. For any comment and/or request for clarication, please contact the author. This report and the future updated versions are available from: http://publications.ftw.tuwien.ac.at/publications/project summary.asp?projectno=310

Author Fabio Ricciato (ftw, ricciato@ftw.at).

Document versions: Version number v1.0 v1.2 Date 25.9.2005 30.9.2005 Comments Preliminary version (internal) First public version (work in progress) DOCUMENT STATUS: PUBLIC

Contents
1 Problem Statement and Organization of the Document 2 Problem Formulation 2.1 Introduction to the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 ILP Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Input data: extraction and preparation 3.1 Extraction of the Mobility Matrix bi,j (k) . . 3.2 Extraction of the Attachment Vector ai (k) 3.3 Exploratory analysis . . . . . . . . . . . . . 3.4 Outlier ltering . . . . . . . . . . . . . . . . 4 Numerical Results 5 Application to real networks 6 Possible renements and extensions 6.1 Support heuristic for large networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Multi-period optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Rened ltering process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A List of Acronims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5 5 6 6

8 . 8 . 9 . 10 . 14 15 17 19 19 19 20 21

Problem Statement and Organization of the Document

The scheme of the GPRS network is given in Figure 1, where the main building blocks of interest for this study are evidenced 1 . The PCU (Packet Control Unit) are the logical components within the BSC (Base Station Controller) that are in charge of handling packet-switched trac. Conventionally the BSC (hence the PCU) are considered as part of the Radio Access Network (RAN), while the SGSN are in the Core Network (CN), therefore the Gb links between them represent the frontier between the two domains. Each Gb link connects one PCU to the associated SGSN. In the most simple case one PCU covers a single Routing Area (RA), but it is also possible that one RA is split among multiple PCUs, or that a single PCU covers a conglomerate of RA. The design of the radio coverage (i.e. the placement of cells and the assignements of cells to RAs) as well as the association between RAs and PCUs (RA-PCU mapping) are part of the RAN planning process, that is out of the scope of this work. Here we assume that the conguration of the radio access network is given, and the problem is to nd an optimal assignements of PCU to SGSNs. In other words we consider the problem of nd the optimal PCU-SGSNs mapping given a certain RA-PCU mapping and the measured mobility patterns. As a side value, the optmization method will produce in output the minimum number of SGSNs required to support certain load requirements. We do not cover here the problem of dimensioning individual Gb links, which is decoupled from the assignement problem.

Figure 1: Scheme of the GPRS Core Network. The optimization problem is complicated by the fact that two concurrent goals have to be considered: balance the number of attached MSs amongst the SGSNs - or equivalently minimize the maximum number of MSs attached to each SGSN; minimize the rate of inter-SGSN Routing Area Updates. These two objectives are generally conicting, and a trade-o is in place between them. The overall optimization problem is formally dened in Section 2. In this type of engineering tasks based on measurements one has to cope with the volatility of the data at hand. We will show how to nd an optimal solution given a certain mobility matrix. But the mobility process of real users is not deterministic: it is subject to daily variations and random uctuations, not to mention long-term trends and sporadic anomalies. Also, measurement data are subject to errors. Then the question is : How to distill a meaningful mobility matrix for the specic problem under study ? We try to answer this question in Section 3. We have applied these methods to a case study based on one week of traces collected in October 2004 from the operational GPRS network of Mobilkom Austria. At that time, the monitoring system was covering a very high fraction of the RAs attached to the SGSNs located at the Arsenal site. The numerical results and exemplaray gures derived from such dataset are presented in Section 4.
1 We

assume that the reader is familiar with the fundamental aspects of a GPRS network and the related terminology

In our case study we considered a number of simpliying assumptions (e.g. the RA-PCU mapping being 1:1, absence of geographical constraints onto the PCU-SGSN mapping). In general, the applicability of the optimization method to real production networks requires the introduction of a number of additional constraints into the formulation (e.g. the actual RA-PCU mapping, geographical constraints). These can be easily handled from the conceptual point of view, but in practice their implementation involves non-negligible eorts (e.g. availability of a complete RA-PCU database). These aspects are covered in Section 5. The methods proposed here can be used satisfactorily to support the planning process of real networks. In Section 6 we discuss some limitations of the proposed methods and identify some directions for further research.

2
2.1

Problem Formulation
Introduction to the problem

In order to minimize the resource consumption within each individual SGSN, it is desirable to balance the total workload across the set of available SGSNs. Generally speaking, the workload on the SGSN has a multi-dimensional nature, since dierent dimensions of physical and logical resources (CPU cycles, buer, bandwidth, memory states etc.) are consumed by derent types of trac units (e.g. signaling messages, data packets). However, to make the problem tractable we will consider a mono-dimensional metric for the SGSN workload, namely the number of contemporary attached MSs. This is justied by the following factors: the level of SGSN resource consumption along each dimension is grossly proportional to the instantaneous number of attached MS; the capacity of commercial SGSNs is typically expressed in terms of maximum number of supportable attached MSs. Denote by m the peak number of MSs attached to SGSN m. Therefore, the primary objective of the optimization would be to minimize the maximum value = maxm m . An additional optimization objective is to minimize the incidence of inter-SGSN Routing Area Updates (iRAU for short). Recall that when a MS moves from RA i to j it performs a RAU procedure. If both RAs are attached to the same SGSN, the RAU consumes resources exclusively within the SGSN. If instead the two RAs are attached to dierent SGSNs, this leads to a more resource-consuming procedure, namely the inter-SGSN RAU (iRAU for short). Each iRAU involves four (!) dierent elements: the two SGSNs, the GGSN and the HLR. Each iRAU consumes a certain amount of resources within each of such elements, and at the same time generates comunication overhead between them (signaling trac on the Gn network). Therefore, it is desirable to minimize the total rate of iRAU messages across the network, hereafter denoted by . In general, the two goals of minimizing and balancing are concurrent and a trade-o is in place between them. To see that, just consider that the minimum possible value of is be obtained by attaching all Gb links to a single SGSN (leading to = 0) but that would maximize the value of (no balancing). The optimization process must be designed to nd the optimal trade-o curve between these two quantities. Based on that the network engineers will select the best operational point by taking into account external informations (e.g. equipment features) by assigning dierent weights to each objective. In the above tractation we considered scalar values of the involved quantities and . However in a real-world network any trac component - including the number of attached user to each RA and the instantaneous iRAU rate between two RAs - varies in time and should be regarded as function of time. For sake of simplicity we will consider a discrete time axis, therefore we will handle discrete time-series (e.g. ai (k) and bi,j (k)). This matches with the nature of the measured data, i.e. counters for time-bins of xed length T . From the point of view of network planning it is important to minimize the peak values of the quantities related to resource consumption. Within the scope of this work we will adopt the static optimization approach (see discussion in Section 6.2), therefore we will preliminarily reduce each time-series to a single scalar value representing its peak (i.e. ai (k) i and bi,j (k) i,j ). This will involve a procedure somewhat more complex than a plain maximization, since some pre-ltering il applied to the time-series to eliminate outliers and anomalous peaks (see Section 3). For the problem under analysis we can consider the RAs as the smallest trac generator unit. However in a real network there are some constraints that must be considered when designing the PCU-SGSNs mapping. For instance a RA can not be attached to multiple SGSNs, therefore if the RA is split among multiple PCUs, all such PCUs must be associated to the same SGSN. Additionally, some PCUs might be preferably attached to certain SGSNs based on geographical proximity. For ease of tractation in the rest of this section we take the following simpliying assumptions: the mapping between PCU and RA is strictly 1:1; there are no preferential RA-SGSN associations. In Section 5 we will discuss how to handle such constraints when optimizing real-world networks.

2.2

Notation

For sake of clarity, we will index the RAs with i, j and the SGSNs with m. The total number of RAs and SGSNs will be denoted by N and M respectively. The index k will be used for the discrete time intervals. The following notation is introduced. ai (k) is the measured number of MS attached to RA-i at time tk ; the complete set {ai (k)} will be referred to as the Attachment Vector (AV). bi,j (k) is the measured average intensity of the RAU ow from RA-i to RA-j in the kth time-bin k ; the complete set of measured {bi,j (k)} will be referred to as the Mobility Matrix (MM). i maxk ai (k) and i,j maxk bi,j (k) denote the peak values of the series ai (k) and bi,j (k) respectively (the symbol recalls that some data cleaning is applied before maximization, as explained later in Section 3). Note that the time axis for the two signals ai (k) and bi,j (k) are independent from each other - despite we use the same index k for both. In other words, the set of characteristics instants {t k } where the number of attached MS is sampled do not need to be synchronized with the set of time-bins {k }. xi,m is the binary variable accounting for the RA-SGSN mapping: xi,m = 1 means RA-i is connected to the mth SGSN. m is the peak number of MS attached to the mth SGSN; = maxm {m }) is the maximum number of MS attached to a single SGSN; + and are the peak intensities of the outbound / inbound iRAU ow from / to the mth SGSN; m m = maxm {+ + } is the maximum rate of cumulated iRAU ow (inbound and outbound) for m m a single SGSN;
+ yi,j,m and yi,j,m are binary support variables used in the ILP formulation; they are dened only + for RA pairs with non-null RAU ow (i.e. i, j : i,j = 0); the value yi,j,m = 1 [resp. yi,j,m = 1] means that the RAU ow from RA-i to RA-j contributes to the outbound [resp. inbound] iRAU ow for the mth SGSN.

2.3

ILP Formulation

The goal of the optimization is to jointly minimize both and , given the set of constraints formulated below. As usal with dual objective optimization, two possible strategies can be adopted. One strategy foresee the adoption of a combined cost function C = w + (1 w) , where the relative weight factor w serves as a tuning knob to trade-o between the minimization of the two objective variables. The alternative strategy foresee a two-steps optimization. In Step I the primary objective is minimized and the absolute optimal value is found, denoted by . In the second Step II, the primary variable is constrained within a neighborhood of the absolute optimal value dened by a slack factor (i.e. ), and the secondary variable is minimized. With this approach the value of the parameter serves as a tuning knob in the trade-o between the two objective variables. In this work, we have preferred the two-steps strategy. The following set of constraints dene a Integer Linear Programming (ILP) formulation of the problem: minimize: (in Step I) or (in Step II) subject to: (constraint 1 is present only in Step II)

(1)

xi,m = 1, m = 1..M.
i=1..N + yi,j,m = m=1..M + yi,j,m 1, m=1..M m=1..M m=1..M yi,j,m 1 i, j : i,j = 0. yi,j,m i, j : i,j = 0.

(2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

+ xi,m + yi,j,m = xj,m + yi,j,m , i, j : i,j = 0, m = 1..M. + yi,j,m + yi,j,m 1, i, j : i,j = 0, m = 1..M.

= m
i,j:i,j =0

i,j yi,j,m , m = 1..M.

+ = m
i,j:i,j =0

+ i,j yi,j,m , m = 1..M.

+ + , m = 1..M m m m =
i=1..N

i xi,m , m = 1..M. m , m = 1..M.

+ xi,m , yi,j,m , yi,j,m binary

Constraint 2 forces each RA to be attached to one SGSN. Contstraints 3-4 enforce iRAU ow conservation. The constraints 5-6 are the central component of the ILP formulation: together with the previous + constraints 3-4 they drive the support variable yi,j,m = 1 [resp. yi,j,m = 1 ] i xi,m = 1 and xj,m = 0 + [resp. xi,m = 0 and xj,m = 1]. Recall that the binary support variables yi,j,m and yi,j,m are key to the formulation: they account for the contribution of the RAU subow from RA-i to RA-j to the outbound /inbound iRAU ow from / to the mth SGSN. Constraints 7-8 build up the inbound / outbound iRAU + ow for each SGSN from the values of the support variables yi,j,m and yi,j,m . Contraints 10 and 11 dene the number of MS attached to each SGSN m and its maximum .

Input data: extraction and preparation

In this section we describe the process of collecting measurements and transform them into input data for the optimization problem formulated in Section 2. We present numerical results and examplary gures derived for a case-study based on real measurements. The measurements were collected from the operational network of mobilkom austria AG & Co KG, with the METAWIN monitoring system [1]. The dataset used here includes one full week of measurements in October 2004 (from Monday 00:00 to Sunday 24:00) on a subset of the Gb links. We monitored only a subset of the Gb links, specically those attached to the K SGSN co-located at a single site. The dataset includes 127 dierent RAs, representing a fraction F of the total network coverage. Note: For proprietary reasons we can not disclose the values of K and F , nor provide absolute quantitative values like trac volumes, number of MS, number of Gb links, etc. Therefore in the following graphs the values of the vectors bi,j (k) and ai (k) have been re-scaled by an arbitrary un-disclosed factor.

Figure 2: Monitored RAs

3.1

Extraction of the Mobility Matrix bi,j (k)

Whenever a MS moves from RA i to a new RA j it generates a RAU messages directed to the SGSN covering j. The RAU message seen on Gb contains both the identiers of the new and old RAs. A specic software module (hereafter denoted as HRO) was developed during the project to parse all the RAU messages and count the occurrences of i, j pairs at time bins of length Tb (unless dierently specied we used Tb = 5 min). Note that a new set of counters is produced. for each new pair (i, j) seen from the RAUs. Non monitored RA Our monitoring system covers only a fraction of the Gb links, hence only a fraction of PCUs and RAs. For the monitored RA (e.g. x and y in Figure 2) it is possible to derive the inbound RAU ows from any other RA, included non-monitored RAs. Instead, the outbound RAU ows towards a non-monitored RA (e.g. z in Figure 2) can not be measured. In summary, with reference to Figure 2, we can measure the RAU ow bz,x but not bx,z . Note that also foreign RAs (e.g. RA from another operator) are visible as source RAs when the MS moves into a monitored RA. These can be discriminated by the prex value of the RA identier. For our purposes we merge all the non-monitored RAs into a single virtual external RA, therefore for each RA i we will be able to measure the inbound ow from the external RA bext,i , but not the outbound bi,ext . In Section 5 we will discuss further how to handle such RAU ow in the problem formulation.

Erroneous identiers Recall that the identier for the destination RA j is lled in by the BSC, and therefore can be always trusted assuming no conguration errors in the BSC. Instead, the identier of the source RA i is advertised directly by the MSs. As a consequence, it is subject to errors by the MS, that could advertise a wrong or non-existent RA. For instance in our dataset there was a very large amount of RAU messages advertising the invalid value 0-0-0-0 as the old-RA. Additionally, sporadic errors by the MS build up a large number of useless counters, since the HRO generates a vector of counters b i,j (k) for each pair of new pair (i, j) seen on Gb. Sporadic elements It can be expected that the mobility matrix is sparse, i.e. many elements are null. For instance, RA that are distant from each other do not exchange RAU ow. This is an advantage for our optimization, since the size of the ILP problem (specically the number of constraints) depends on the number of non-null elements i,j > 0. In some cases we found pairs of RAs with very sporadic RAU messages, i.e. bi,j (k) holds non-null values only in very few time bins. This is probably the case of distant RAs that occasionally come into proximity (e.g. due to uctuations in the radio coverage) or to RA within a region with very low mobility (e.g. rural area distant from major roads). For our purpose it is convenient to ignore these spurious elements in order to reduce the prolem size, with a negligible impact on the quality of the nal solution. Anomalous peaks The problem formulation given in Section 2.3 takes as input the peak values of the RAU ow vector bi,j (k). However for some (i, j) pairs the time series bi,j (k) display some anomalous peaks that are considerable higher than the normal peaks we want to consider in the network planning. Anomalous peaks can be originated by occasional events like BSC failures or pre-planned rebooting. In fact, when RA i is turned o, all MSs attached to i migrate instananeously to the neighboring RA, say j, generating a high peak in the RAU ow bi,j (k). Conversely, when the RA is turned on again, most MS (but not all, due to histeresys) will return to i. If the peak value i,j is computed by simple maximization of the time-series bi,j (k), such anomalous events would inate its value and ultimately lead to large over-estimation. To counteract such eect it is required to identify and lter away the anomalous peaks before applying the maximization. This process requires the implementation of some outlier-ltering methods that are discussed in details below in Section 3.4. Data preparation From the above discussion it should be clear that the HRO output must be cleaned-up in post-processing in order to derive a meaningful bi,j dataset. The following actions must be performed: The bi,j (k) samples associated to source RA identier that are known to be wrong bust be dropped from the dataset, e.g. the well known 0-0-0-0. Notably this will lead to some slight under-estimation of the real RAU load, since in general RAU messages are processed by the SGSNs despite an invalid old-RA identier. Sporadic non-null elements of the bi,j (k) dataset must be ltered away. This can be achieved by asimple threshold-based approach: for a specic (i, j) pair, the full vector bi,j (k) in k is dropped from the dataset if its maximum value maxk ({bi,j (k)}) is below a certain threshold (e.g. 1 message/min). In this way also sporadic elements due to occasional errors by the MS are eliminated. Outlier ltering (see Section 3.4). Merging of foreign RAs into a single external RA. We applied these procedures to our dataset, and derived a cleaned up version of the b i,j (k) vectors for the 128 RA (127 real RA plus one external RA). From it, we derived the peak elements i,j of the Mobility Matrix by simple maximization. A graphical representation of such matrix is given in Figure 3. Note that the index 128 is associated to the external RA, which is only present as source RA.

3.2

Extraction of the Attachment Vector ai (k)

From the traces on Gb we can extract the number of attached users in each RA and at each instant. This requires the tracking of Attach and Detach messages for each MS (identies through the IMSI), and the 9

implementation of simplied state machines for each MS in order to trigger implicit detachement after a specic timeout (Routing Area Update timeout). We have developed an ad-hoc stateful code scalable enough to track millions of dierent MSs. An internal vector of counter indicates the number of attached MSs to each RA. The counter is sampled at regular intervals (e.g. 1 minute) in order to generate the discrete signal ai (k). Note that the time-serie ai (k) has a dierent periodicity than bi,j (k) (1 min vs. 5 min). Similarly to the RAU ow signal bi,j (k), also the signal ai (k) requires outlier-ltering to eliminate anomalous peaks before maximization. This issue is discussed in details in Section 3.4. Some sample graphs of ai (k) are given in Figure 4.

3.3

Exploratory analysis

In this section we explore the data at hand in order to evidence the need for outlier ltering. In Figure 5-top we plot the number of estimated attached MSs in RA 50/89 2 and the number of inbound / outbound RAU for the same RA (lower curves). We notice two major events, labelled A and B zoomed in g. 5-bottom, which are discussed below. As a general remark, it can be seen that the daily peak in the number of attached MSs is not synchronized with the peak in the inbound/outbound RAU. This should be taken into account in the planning process: the concept of a unique peak-hour does not suce for all the dimensions involved in the optimization. The case of missing data A rst notch (A) at time 710-719 is due to missing data. There are neither inbound nor outbound RAU messages during a period of 45 minutes, and we also veried that also periodic RAU and other
2 In the rest of this document we label each RAs with a pair of arbitrary indices, in the format xx/yy, in order to not disclose the real Routing Area Identier (RAI).

Figure 3: Mobility Matrix (peak values, re-scaled).

10

signaling messages were missing for the same period. Interestingly, this has an impact on the estimation of the number of attached MSs. In fact, the HRO that estimates the number of attached MSs based on the Gb traces relies on packets from / to the terminal to decide whether a terminal is attached or not: basically it tries to approximately reconstruct the state machine within the SGSN in order to nd the instant of implicit detach. As a consequence, missing packets might cause the HRO to consider the terminal as implicitly detached. In those cases where the packet was present in the network but was then lost by the monitoring system, this automatically led to a temporary understimation error in the number of attached terminals.
Attached users in RA70 12000

Attached users in RA34 4000

3500
10000

Num. attached users [5 min bins]

8000

Num. attached users [5 min bins]

3000

2500

6000

2000

1500

4000

1000
2000

500
Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000

Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000

Attached users in RA38 7000 7000

Attached users in RA56

6000 6000 Num. attached users [5 min bins] Num. attached users [5 min bins] Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400

5000

5000

4000

4000

3000

3000

2000

2000

1000

1000 Measured Data Filter Median UP 1600 1800 2000

Attached users in RA125

Attached users in RA69

350

7000

300 Num. attached users [5 min bins]


Num. attached users [5 min bins]

6000

250

5000

200

4000

150

3000

100

2000

50 Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000

1000 Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000

Figure 4: Number of attached MS to some sample RAs (one week in October 2004). Each graph includes the signal envelope after outlier ltering and the corresponding peak (note: numerical values on the Yaxis are re-scaled by undisclosed factor).

11

A rst case of o/on cycle The second event (B) occurrs at time 850 (around 11:00 pm of Wednesday) and it is clear associated to a switch o / on cycle of the RA - since it occurred over night time, it might be due to a planned maintenancetipically intervention. When the RA goes down (B1 in g. 5-bottom), the terminals immediately move to others neighboring RAs. Similarly, when it comes up again (B2) after few minutes, most terminals return back. This is evident from the zoomed graph in g. 5-right. However, the level of attached MSs after the o/on cycle is lower than before (specically, with a gap of 1000 c.ca MSs from 6000 c.ca). This can be explained with the hysteresis in the cell selection process (some terminals will remain attached to the last visited RA if the power gain is not sucient to justify a handover procedure). As can be seen from the gure, the eect of the o/on cycle in terms of reduced number of attached MSs are still visible after several hours: in the specic case, it will take approximately 8 hours (period C) for the attached MSs curve to realign with the normal daily behavior. It is important to explain why the curve of (estimated) attached MSs does not reach the zero level during the RA o period. First, some terminals can not jump to another RA since they are not under the radio coverage of other stations. Secondly, those terminals that are temporary out of radio coverage remain attached to the current RA - that is, the state in the SGSN and in the terminal itself keep alive - until the expiration of the timeout for implicit detachement 3 . Impact on neighboring RA The o/on cycle of the previous RA 50/89 induced a peak in the inbound RAU ow of the neighboring RA 43/5 which was located within the same Location Area (recall that RAs are subsets of LAs). The weekly behavior of the latter is reported in Figure 6. The small peak in the attached MS (upper graph) indicates that most of the MSs migrated to the RA 43/5 during the o/on cycle of RA 50/89.

8000 A 7000 6000 Number of users 5000 4000 3000 2000 1000 Attached users RA50 0 0 200 400 600 800 1000 time [5 min bins] 1200 1400 1600 1800 2000 B

1500 B Inbound RA89 Outbound RA89

RAU msg / minute

1000

500

200

400

600

800

1000 time [5 min bins]

1200

1400

1600

1800

2000

8000 7000 6000 Number of users 5000 4000 3000 2000 1000 Attached users RA50 0 700 720 740 760 780 800 time [5 min bins] 820 840 860 880 900 A B1

B2

1500 B1 Inbound RA89 Outbound RA89

RAU msg / minute

1000

B2

500

0 700

720

740

760

780

800 time [5 min bins]

820

840

860

880

900

Figure 5: Measured data for RA 50/89 (re-scaled).


3 This raise an interesting paradox: for o/on cycles shorter than the timeout, the terminals which are out of radio coverage remain attached to the RA, while those with good radio coverage immediately migrate to other RAs.

12

8000 7000 6000 Number of users 5000 4000 3000 2000 1000 Attached users RA43 0 0 200 400 600 800 1000 time [5 min bins] 1200 1400 1600 1800 2000

500 450 400 350 RAU msg / minute 300 250 200 150 100 50 0 0 200 400 600 800 1000 time [5 min bins] 1200 1400 1600 1800 2000 Inbound RA5 Outbound RA5

8000 7000 6000 Number of users 5000 4000 3000 2000 1000 Attached users RA43 0 700 720 740 760 780 800 time [5 min bins] 820 840 860 880 900

500 Inbound RA5 Outbound RA5 400 RAU msg / minute

300

200

100

0 700

720

740

760

780

800 time [5 min bins]

820

840

860

880

900

Figure 6: Measured data for 43/5 (re-scaled). A second case of o/on cycle Another case of anomalous RAU peak caused by o/on cycle of RA is found for RA 16/85. Its weekly behavior is plotted in g. 7. It shows a peak in the inbound RAU at time 150. The detailed analysis suggest that this RA had a short o/on cycle, but dierently from the previous case (RA 50/89) no large outbound peak is seen at the beginning of the o period. There are two possible explications: The MS could not migrate to other neighboring RA, perhaps because of non overlapping radio coverage, and therefore remain hanging. The MS migrated to an external (non monitored) RA. When the RA comes up again, the MSs return back, thus originating a peak in the inbound ow and recovering the previous level of attached MSs. A o/on cycle of non-monitored RA Further inspection of the data showed the case of RA x/104 whose outbound RAU ow is plotted in Figure 3.3. This is a foreign RA from another operator, therefore could not be directly monitored, therefore we can not measure the number of attached MS nor the inbound RAU ow. However we can measure the cumulated outbound RAU ow towards the neighboring monitored RAs, shown in Figure 3.3. Shortly before 3:00 am of Thursday this RA injected a massive RAU ow into our network, more than 10 times the normal daily peak. The hypothesis that the RA went through an o/on cycle is comforted by the fact that after the peak the outbound ow reached zero for a few minutes (label D in g. 3.3-bottom). Since the event occurred overnight, it is likely that the o/on event was a pre-planned OM intervention. Additionally, we incidentally note that the daily prole after this event changed permanently, with a lower level of RAU ow: this suggests that OM intervention was not just a rebooting but involved some

13

8000 7000 6000 Number of users 5000 4000 3000 2000 1000 Attached users RA16 0 0 200 400 600 800 1000 time [5 min bins] 1200 1400 1600 1800 2000

800 700 600 RAU msg / minute 500 400 300 200 100 0 Inbound RA85 Outbound RA85

200

400

600

800

1000 time [5 min bins]

1200

1400

1600

1800

2000

7000

6000

5000 Number of users

4000

3000

2000

1000 Attached users RA16 0 80 100 120 140 time [5 min bins] 160 180 200

700 Inbound RA85 Outbound RA85 600

500 RAU msg / minute

400

300

200

100

80

100

120

140 time [5 min bins]

160

180

200

Figure 7: Measured data for RA 16/85 (re-scaled). permenent reconguration (e.g. a reduction of the RA coverage, or the installation of an additional RA absorbin some of the MSs in the area 4 .

3.4

Outlier ltering

The exploratory analysis of the data has revealed several cases of anomalous short-term peaks. In this section we describe the method adopted to identify and eliminate them automatically. We have applied this method to each individual vector bi,j (k) and ai (k). Given a discrete-time vector x(k) we consider a moving window of length W . We denote by m W (k) and W (k) respectively the median and standard deviation of the samples in the range (kW/2, k+W/2). Note that we use the median rather than the mean as the indicator of the mass of the distribution. This is due to the better stability in presence of very large outliers: they bias the mean but have almost no impact onto the median. The generic sample in k is marked as outlier if it exceeds the value of mW (k) + 3W (k). In this case its value is set down to mW (k). The window length W must be chosen carefully. Ideally, the signal can be assumed stationary in the window W , and the number of samples in each window is sucient to estimate the rst moments of the amplitude distribution. In our study we assumed a window length equal to 1 hour.

4 As a side comment, this example shows how much information one can infer - also about foreign networks - from the detailed analysis of the mobility signaling.

14

350 Inbound RA104 Outbound RA104

300

250

RAU msg / minute

200

150

100

50

200

400

600

800

1000 time [5 min bins]

1200

1400

1600

1800

2000

50 Inbound RA104 Outbound RA104 45

40

35

30 RAU msg / minute

25

20

15

10

0 850

860

870

880

890

900 time [5 min bins]

910

920

930

940

950

Figure 8: Outbound RAU ow for RA x/104 - foreign RA from another operator.

Numerical Results

In this chapter we provide numerical results and examplary gures derived for a case-study based on real measurements. The input measurements were collected from the operational network of mobilkom austria AG & Co KG with the METAWIN monitoring system. The dataset used here includes one full week of measurements in October 2004 (from Monday 00:00 to Sunday 24:00) on a subset of the Gb links. We monitored only a subset of the Gb links, specically those attached to the K SGSN co-located at a single site. The dataset includes 127 dierent RAs, representing a fraction F of the total network coverage. Note: For proprietary reasons we can not disclose the values of K and F , nor provide absolute quantitative values like trac volumes, number of MS, number of Gb links, etc. Therefore in the following graphs the values of the vectors bi,j (k) and ai (k) have been re-scaled by an arbitrary un-disclosed factor. The number of attached users for each RA (sampling period: 1 min) and the RAU ows (counted in time bins of 5 min) were extracted with ad-hoc extraction code (HRO) developed during the project. The resulting data were cleaned-up according to the methods previously described in Section 3, implemented in MATLAB scripts. The optimization problem formulated in Section 2.3 was implemented in AMPL language [2] and solved with CPLEX [3], a well-known commercial tool for mathematical optimization. We remark that the whole resolution process involves several stages using dierent software modules and tools (HRO, MATLAB, AMPL, CPLEX), and that each stage is performed manually. In our case-study we have taken a number of simplifying assumptions: We ignore the real RA-PCU mapping, and assume a simple 1:1 mapping. In other words, each RA can be freely assigned to any SGSN without considering the super-RA clusters introduced later (see Section 5). 15

3000
opt for M=5 opt for M=4 opt for M=3

2500 RAU flow (rescaled)

M=5 M=4 M=3 M=2

2000

opt

for M=2

1500

1000

500

0 1

1.5

2.5 3 3.5 max attached MSs (rescaled)

4.5 x 10

5
5

Figure 9: Optimal curve for the case-study (rescaled values). We do not consider the RAU ow from/to the external-RA (see Section 3.1). The optimization problem formulated in Section 2.3 was implemented in AMPL language [2] and solved with CPLEX [3]. Several problem instances were solved with dierent values of the parameters M (number of SGSNs) and . The latter serves as a tuning knob for the trade-o between the two minimization objectives and . For each combination of these parameters, the optimal RA-SGSN assignement is found and the associated values of (Y-axis) and (X-axis) are reported Figure 9. The curve represents the optimal trade-o region between and . Clearly, the minimum value of (computed with = 0, marked by vertical lines in Figure 9) depends on the number M of available SGSNs. With more SGSNs one can achieve lower values of by distributing the MSs into more subsets, but in most cases that comes at the cost of a higher iRAU ow . For large values of the slack factor the iRAU ow minimization dominates the overall optimization process, and the solutions for M = X overlap with those for M = X h (h=1,2..), meaning that h out of M SGSN are left unused in the optimal solution. Intestingly, we note that in some cases the availability of one more SGSN allows a better placement of the RAs, yelding a smaller value of for the same value of (e.g. compare M = 5 vs. M = 4 for = 1.6e5, and similarly M = 4 vs. M = 3 for = 2.1e5). The curve in Figure 9 can be used by the network sta to nd the more convenient operating point based on external cost factors, and to derive the optimal number of SGSN M required to implement the optimal solution. The method proposed above can be directly used to optimize the PCU-SGSN assignement for a real network or part of it. For example a sub-problem of practical interest is to optimize the PCU-SGSN assignement limited to the SGSNs co-located at a single physical site.

16

Application to real networks

The numerical results presented in Section 4 were obtained assuming complete freedom in the assignement of RA to SGSN. Such a simplifying assumption is ne to illustrate the overall optimization process in the rst place. However the adoption of the above approach to the planning of a real network would require taking into account a number of additional constraints. Below we discuss how to adapt the optimization process to address such aspects. Physical contraints Geographical proximity and considerations related to the costs of the wired infrastructure interconnecting the PCU to the SGSN would probably dictate the assignement of certain PCU (hence RA) to some pre-dened SGSNs. Such constraints can be easily implemented by xing the values of the routing variables xi,m associated to these RA. Logical contraints The RA-to-PCU assignement can be not as simple as 1:1. It is possible that several RAs are attached to the same PCU (i.e. BSC): in this case, all these RAs will be assigned to the same SGSN. It is also possible that one RA is split among multiple PCU on a cell-by-cell basis (this is the case for very large RAs). But the 3GPP specications forbid that a single RA is split between multiple SGSNs. As a consequence, all the PCUs sharing the same RA will be necessarily linked to the same SGSN. These aspects introduce mutual constraints between the associations of RA to SGSN, which can be used to reduce the size of the optimization problem. In fact, if n dierent RAs are forced to be attached to the same SGSN, we can group them under a single virtual entity, called super-RA, with a single set of routing variables x i,m instead of n (in this case the index i refer to the super-RA). In practice, it is required to pre-process the set of RA and associated PCUs in order to dene the minimum set of super-RAs, then the optimization problem given in Section 2.3 is applied to the super-RA set. An example of super-RA associations is given in Figure 10: the RAs r1 and r2 share the same PCU p1, therefore will be merged into the same super-RA; but since r2 is also attached to PCU p2, all the RAs attached to p2 must be merged into the same super-RA to avoid splitting of r2 between multiple SGSNs. In the same example, RA r5 and r6 do not share any PCU with other RAs, therefore can be mapped 1:1 to individual super-RA. In the end, we have reduced the number of entities from 9 RAs to only 4 super-RAs. The pre-processing can be automatized by means of a very simple algorithm. In fact we can build a non-directed graph as follows: assign a node to each RA and to each PCU, and draw a link between each RA and its attached PCU. The resulting graph will be a collection of sub-graphs, called interconnected components, separeted from each other. It can be easily shown that each sub-graph corresponds to a super-RA. While the determination of super-RA is conceptually very simple, its practical implementation requires the setup of a database collecting all the RA identiers and associated PCUs. In fact the number of entities to be handled is typically in the order of several hundreds, and it is impractical to rely on manual computation. Implementation requirements In order to achieve a meaningful global solution, it is required to collect the data for the complete network, i.e. to monitor all RAs. In some cases the network sta is interested in solving a partial sub-problem: given a subset of SGSN, typically co-located in the same physical site, re-optimize the PCU-SGSN assignement for these only. In this case it is not required to monitor the RAs attached to other SGSN, since they are excluded a priori from the problem. However we maintain the requirement to monitor all the RA attached to the set of SGSN under consideration. Handling non-monitored RA The iRAU ow from / to external SGSNs (i.e. SGSN that are not considered in the optimization, like foreign SGSN) contribute to the iRAU load of each monitored SGSN, therefore should be taken into account in the optimization process. This can be tted into the structure of the ILP formulation by dening an additional virtual RA, called external RA, that embeds all non-monitored RAs. Recall that we have assumed above that that non-monitored RAs do not share the same SGSN of the monitored RA. In other words, for a generic SGSN, either we monitor all RAs potentially attached to it or none of

17

them. From this assumption it derives that the RAU ows from / to the external-RA must be always considered as inter-SGSN RAU. The inbound iRAU ow from the external-RA towards each other RA is known, but the outbound ow (towards the external-RA) is not. A possible workaround would be to assume the external outbound ow equal to the inbound for each RA.

Figure 10: Scheme of RA-PCU mapping and associated set of super-RA.

18

Possible renements and extensions

In this section we identify possible directions for progressing this work.

6.1

Support heuristic for large networks

The resolution of the ILP formulation with CPLEX through the standard branch-and-bound method on a standard PC takes few hours for the case study under analysis: 128 RA, up to N = 5 SGSNs. It would be inpractical to attack larger problem instances by direct resolution of the full problem. For larger networks, it would be required to develop decomposition heuristics in order to drive the optimization problem towards a good sub-optimal solution in a reasonable time. With this approach, the problem is decomposed into a set of smaller sub-problems that are resolved sequentially. In a further phase of iterations can be added in order to improve the quality of the nal solution. Such heuristics can be implemented directly in the AMPL language. A similar approach was adopted in some previous works about ILP optimization applied to networks [4] [5]. We remark however that in most practical cases the size of the problem would still be attackable by direct branch-and-bound resolution. In fact, despite the number of RA is certainly much larger than in our case-study (probably one order of magnitude), the physical and logical constraints discussed in Section 5 would probably drive the number of free routing variables to be dramatically lower.

6.2

Multi-period optimization

The term seasonality indicates pseudo-periodicity in the time-series. Trac data typically display strong seasonality at two levels: daily and weekly. Daily cycles reect the changes in user activity at dierent hours, with large dierences between morning, evening and night. Weekly cycles mainly reect dierences between working days and weekends. In general dierent network sections display dierent daily / weekly patterns. For instance, business areas hold higher trac during the morning than in the evening, while the opposite is true for residential areas. In principle, it is possible to exploit such dierences to save resources: by coupling trac ows with opposite daily behaviour on the same network resources, the total maximum load can be kept below the sum of the individual maxima. More formally, consider two trac ows x1 (t) and x2 (t) with peak values p1 = maxt (x1 (t)) and p2 = maxt (x1 (t)) sharing the same network resource. A planning process based on the individual peak values will reserve a total amount of resource equal to p static = p1 + p2 = maxt (x1 (t)) + maxt (x2 (t)) which is larger than the maximum trac value maxt (x1 (t) + x2 (t)). By reducing the input signals xi (t) to a single scalar value pi , the full time-prole of the process is lost hence we shall refer to such approach as static optimization. A more sophisticated approach would consider the full time-prole of the process, and feed the entire vector xi (t) into the optmization process rather than just its peak. In this way, one can achieve a better estimation of the required resource. This approach, referred to as time-varying optimization - or multi-period optimization - has been explored in some past papers [4] [5]. From a mathematical point of view, the problem formulation given in Section 2.3 could be easily extended towards multi-period optimization. However in this work we do not consider this option, and let the application of multi-period optimization to Core Network planning for future study. Here we adopt the classical (and simpler) approach of static optimization, which is similar in principle to the peak-hour methods that are traditionally used in network planning. The reasons for that are the following: The eective gain in resource saving between the static and multi-period approach heavily depends on the data at hand. When most of the network areas follow quasi-synchronized daily patterns, the gain is negligible. A multi-period optimization is much heavier to solve than a static one, since several variables and constraints must be duplicated for each time-period. Because of the above point one has to develop some heuristic methods to derive a solution for networks of practical size. The implementation of heuristics in AMPL is time-consuming and would require considerable amount of manpower.

19

The higher complexity in the implementation and resolution of the multi-period formulation would be justied only if it yeld a considerable resource saving. As a probing activity, we implemented a mixed formulation where only the seasonality of the ai (k) signal (attached MS) was considered. Preliminary results applied to our data showed a rather poor saving with respect to static optimization - less than 10% - that does not justify further eorts in this direction for the moment being. In summary, multi-period optmization is left out of this work, but might be resumed as an interesting direction for further research when new data will become available (see section 5).
Attached users in RA124

1600

1400

Num. attached users [5 min bins]

1200

1000

800

600

400

200 Measured Data Filter Median UP 0 0 200 400 600 800 1000 1200 time [5 min bins] 1400 1600 1800 2000

Figure 11: RA with recurrent short-term peaks in the RAU ow (note: numerical values on the Y-axis are re-scaled by undisclosed factor).

6.3

Rened ltering process

The ltering process described in Section 3.4 is meant to lter out short-term anomalous peaks. The implementation is relatively simple, however has some limitations. For instance short-term spikes that are recurrent over many days should not be regarded as anomalies - an example is given in Figure 11. These peaks are cut by the ltering method proposed above. In fact it is based only on the local behavior of the signal in the window (t W/2, t + W/2) to identify an anomalous peak in t. A more rened approach would also consider the signal characteristics at the same hour of previous days, i.e. (t W/2 nD, t + W/2 nD) (with D = 24h and n=1,2,..). Such renement have been left out of the scope of this paper. In fact in our study recurrent spikes like those found Figure 11 are very rare (only appear in a couple of RAs), and ultimately the impact onto the nal solution of the optimization problem is negligible.

20

List of Acronims

BSC Base Station Controller iRAU inter-SGSN Routing Area Update GGSN Gateway GPRS Support Node MS Mobile Station PCU Packet Control Unit RA Routing Area RAU Routing Area Update SGSN Serving GPRS Support Node

References
[1] METAWIN home page. http://www.ftw.at/ftw/research/projects/ProjekteFolder/N2. [2] AMPL: A Modeling Language for Mathematical Programming. http://www.ampl.com. [3] ILOG CPLEX. http://www.ilog.com/products/cplex/. [4] F. Ricciato, S. Salsano, M. Listanti. O-line Conguration of a MPLS over WDM Network under Time-Varying Oered Trac. IEEE INFOCOM02, New York, June 2002. [5] F. Ricciato, U. Monaco. Routing Demands with Time-Varying Bandwidth Proles on a MPLS Network. Journal of Computer Networks (Elsevier), 47(1), January 2005. *** *** *** END OF DOCUMENT

You might also like