You are on page 1of 174

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

– Definitions of Traffic Management Mechanisms Public Socially-aware Management of New Overlay Application Traffic

Socially-aware Management of New Overlay Application Traffic with Energy Efficiency in the Internet

European Seventh Framework Project FP7-2012-ICT- 317846-STREP

Deliverable D2.2 Report on Definitions of Traffic Management Mechanisms and Initial Evaluation Results

The SmartenIT Consortium

Universität Zürich, UZH, Switzerland Athens University of Economics and Business - Research Center, AUEB, Greece Julius-Maximilians Universität Würzburg, UniWue, Germany Technische Universität Darmstadt, TUD, Germany Akademia Gorniczo-Hutnicza im. Stanislawa Staszica w Krakowie, AGH, Poland Intracom SA Telecom Solutions, ICOM, Greece Alcatel Lucent Bell Labs, ALBLF, France Instytut Chemii Bioorganicznej PAN, PSNC, Poland Interoute S.P.A, IRT, Italy Telekom Deutschland GmbH, TDG, Germany

© Copyright 2013, the Members of the SmartenIT Consortium

For more information on this document or the SmartenIT project, please contact:

Prof. Dr. Burkhard Stiller Universität Zürich, CSG@IFI Binzmühlestrasse 14 CH—8050 Zürich Switzerland

Phone: +41 44 635 4331 Fax: +41 44 635 6809 E-mail: info@smartenit.eu

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Document Control

Title:

Report

on

Definitions

of

Traffic

Management

Mechanisms

and

Initial

Evaluation Results

 

Type:

Public

Editor(s):

Valentin Burger

 

E-mail:

valentin.burger@informatik.uni-wuerzburg.de

 

Author(s): Thomas Bocek, Valentin Burger, Paolo Cruschelli, George Darzanos, Manos Dramitinos, Zbigniew Dulinski, Jakub Gutkowski, Gerhard Haßlinger, David Hausheer,Tobias Hoßfeld, Fabian Kaup, Sylvaine Kerboeuf, Roman Lapacz, Andri Lareida, Lukasz Lopatowski, Sergios Soursos, Guilherme Sperb Machado, Ioanna Papafili, Patrick Poullie, Sabine Randriamasy, George D. Stamoulis, Rafal Stankiewicz, Michael Seufert, Corinna Schmitt, Matthias Wichtlhuber, Mateusz Wielgosz, Krzysztof Wajda, Piotr Wydrych

Doc ID:

D2.2-v1.3

AMENDMENT HISTORY

Version

Date

Author

Description/Comments

V0.1

November 1, 2012

Burkhard Stiller

 

First version

V0.2

May 7, 2013

Valentin

Burger,

Michael

Seufert,

Draft for TOC

Tobias Hoßfeld

 

V0.3

May 31, 2013

Valentin

Burger,

Ioanna

Papafili,

Addressed comments on TOC, included vINCENT

Matthias Wichtlhuber

V0.4

June 7, 2013

Valentin Burger

 

Responsibilities, List of TMS

V0.5

June 17, 2013

Valentin,

Paolo,

Michael,

Ioanna,

Included Traffic Management Solutions with Bullet Points, Tables that give Overview of TM solutions and TM mechansims

Patrick, Piotr

 

V0.6

July 8, 2013

Corinna, Roman, Ioanna, George D., George S., Manos

Included more Traffic Management Solutions

V0.7

July 22, 2013

Ioanna, Roman

 

Included MPLS Solution, Sections for models,

V0.8

August2, 2013

Sabine, Sylvaine, Ioanna, Patrick, Andri, Matthias, Michael, Piotr, Valentin

Included sections 4.13 and 4.14, Provided Chapter 3 and Solutions in Chapter 4 in text-form.

V0.9

September 9, 2013

Lukasz, Patrick,

Ioanna, Paolo,

Completed missing solutions in chapter 4, Chapter 5 added, Game theoretic model and energy models added

Mateusz, Corinna, Thomas, Fabian, Andri

V1.0

October 7, 2013

Sergios, Krzysztof, Rafal, Lukasz, Valentin, Partick, Thomas, Guillherme, Andri, Ioanna, Michael, Fabian, Matthias, David, Piotr, Gerhard, Sabine

Executive Summary added, Document format .docx, Sections in chapter 4 were revised, Section 5 added, Added Mappings to SmartenIT architecture, Two additional traffic management solutions added, Revision of section 3, Introduction added, Conclusion added, References formatted

V1.1

October 21, 2013

Burkhard,

Sprios,

George,

Chris,

Document reviewed and commented, Reviews merged

Valentin

V.1.2

October 29, 2013

Valentin, all contributors

 

Addressed reviewer comments, revised contributions, merged revisions, final formatting

V.1.3

October 31, 2013

George,

Sergios,

Chris,

Fabian,

Addressed comments, revision of summary, finalization

Valentin

Legal Notices The information in this document is subject to change without notice. The Members of the SmartenIT Consortium make no warranty of any kind with regard to this document, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The Members of the SmartenIT Consortium shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

Table of Contents

1 Executive Summary

12

2 Introduction

14

2.1 Purpose of this Document

15

2.2 Document Outline

16

3 Definition of Relevant Applications and Related Models

17

3.1 Selection of Relevant Applications for SmartenIT

17

3.2 Models for Addressed Applications

18

3.2.1 Simulation Models

18

3.2.2 Theoretical Models

29

4 SmartenIT Traffic Management

Solutions

42

4.1

Home Router Sharing based on Trust

42

4.1.1 Addressed Scenarios

43

4.1.2 Definition of SmartenIT Traffic Management Mechanisms

44

4.1.3 Identification of Key Influence Factors

46

4.1.4 Key Performance Metrics

46

4.1.5 Initial Evaluation Results and Optimization Potential

46

4.1.6 Mapping of Mechanism to SmartenIT Architecture

47

4.1.7 Example Instantiation of Mechanism

47

4.2

Socially-aware TM for Efficient Content Delivery

48

4.2.1 Addressed Scenarios

49

4.2.2 Definition of SmartenIT Traffic Management Mechanism

50

4.2.3 Identification of Key Influence Factors

51

4.2.4 Key Performance Metrics

52

4.2.5 Initial Evaluation Results and Optimization Potential

52

4.2.6 Mapping of Mechanism to SmartenIT Architecture

55

4.2.7 Example Instantiation of Mechanism

55

4.3

Mechanism for Inter-Cloud Communication

55

4.3.1 Addressed Scenarios

56

4.3.2 Definition of SmartenIT Traffic Management Mechanisms

57

4.3.3 Identification of Key Influence Factors

60

4.3.4 Key Performance Metrics

61

4.3.5 Initial Evaluation Results and Optimization Potential

61

4.3.6 Mapping of Mechanism to SmartenIT Architecture

62

4.3.7 Example Instantiation of Mechanism

62

4.4

Dynamic Traffic Management

63

4.4.1 Addressed Scenarios

64

4.4.2 Definition of SmartenIT Traffic Management Mechanisms

66

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

4.4.3 Identification of Key Influence Factors

68

4.4.4 Key Performance Metrics

 

68

4.4.5 Initial Evaluation Results and Optimization Potential

68

4.4.6 Mapping of Mechanism to SmartenIT Architecture

69

4.4.7 Example Instantiation of Mechanism

69

4.5

RB-Tracker: User Traffic Management

70

4.5.1 Addressed Scenarios

 

70

4.5.2 Definition of SmartenIT Traffic Management Mechanisms

71

4.5.3 Identification of Key Influence Factors

71

4.5.4 Key Performance Metrics

 

71

4.5.5 Initial Evaluation Results and Optimization Potential

72

4.5.6 Mapping of Mechanism to SmartenIT Architecture

72

4.5.7 Example Instantiation of Mechanism

73

4.6

Selection Mechanism for Storage Providers

74

4.6.1 Addressed Scenarios

 

74

4.6.2 Definition of SmartenIT Traffic Management Mechanisms

74

4.6.3 Identification of Key Influence Factors

77

4.6.4 Key Performance Metrics

 

78

4.6.5 Initial Evaluation Results and Optimization Potential

78

4.6.6 Mapping of Mechanism to SmartenIT Architecture

78

4.6.7 Example Instantiation

of Mechanism

78

4.7

Static Resource Allocation in the IaaS Federation

79

4.7.1 Addressed Scenarios

 

80

4.7.2 Definition of SmartenIT Traffic Management Mechanisms

80

4.7.3 Identification of Key Influence Factors

80

4.7.4 Key Performance Metrics

 

81

4.7.5 Initial Evaluation Results and Optimization Potential

81

4.7.6 Mapping of Mechanism to SmartenIT Architecture

81

4.7.7 Example Instantiation of Mechanism

82

4.8

Optimized Upgrade and Planning Processes in Load Balancing Networks

83

4.8.1 Addressed Scenarios

 

85

4.8.2 Definition of SmartenIT Traffic Management Mechanisms

86

4.8.3 Identification of Key Influence Factors

86

4.8.4 Key Performance Metrics

 

87

4.8.5 Initial Evaluation for Stepwise Upgrades in Full Mesh Core Networks

87

4.8.6 Mapping of Mechanism to SmartenIT Architecture

89

4.8.7 Example Instantiation of Mechanism

90

4.9

vINCENT

90

4.9.1 Addressed Scenarios

 

91

4.9.2 Definition of SmartenIT Traffic Management Mechanisms

91

4.9.3 Identification of Key Influence Factors

92

4.9.4 Key Performance Metrics

 

93

4.9.5 Initial Evaluation Results and Optimization Potential

93

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

4.9.6 Mapping of Mechanism to SmartenIT Architecture

95

4.9.7 Example Instantiation of Mechanism

95

4.10

ALTO-driven Application Quality Aggregation System

95

4.10.1 Addressed Scenarios

96

4.10.2 Definition of SmartenIT Traffic Management Mechanisms

96

4.10.3 Identification of Key Influence Factors

98

4.10.4 Key Performance Metrics

99

4.10.5 Initial Evaluation Results and Optimization Potential

99

4.10.6 Mapping of AQAS to SmartenIT Architecture

99

4.10.7 Example Instantiation of AQAS

99

4.11

Multi-Criteria Application End-Point Selection

100

4.11.1 Addressed Scenarios

101

4.11.2 Definition of SmartenIT Traffic Management Mechanisms

101

4.11.3 Identification of Key Influence Factors

104

4.11.4 Key Performance Metrics

104

4.11.5 Initial Evaluation Results and Optimization Potential

106

4.11.6 Mapping of Mechanism to SmartenIT Architecture

105

4.11.7 Example Instantiation of Mechanism

105

4.12

QoE and Energy Aware Mobile Traffic Management

106

4.12.1 Addressed Scenarios

106

4.12.2 Definition of SmartenIT Traffic Management Mechanisms

107

4.12.3 Identification of Key Influence Factors

109

4.12.4 Key Performance Metrics

110

4.12.5 Initial Evaluation Results and Optimization Potential

110

4.12.6 Mapping of Mechanism to SmartenIT Architecture

110

4.12.7 Example Instantiation of Mechanism

111

5 Configuration and Communication Frameworks

112

5.1

Inter-ALTO Communication Framework

112

5.1.1 Specification of the Framework

115

5.1.2 Application to Traffic Management Solutions

116

5.1.3 Evaluation Environment and Initial Results

117

5.2

OpenFlow-based Network Configuration Framework

119

5.2.1 Specification of the Framework

119

5.2.2 Application to Traffic Management Solutions

121

5.2.3 Evaluation Environment and Initial Results

122

5.3

MPLS-based Network Configuration Framework

123

5.3.1 Specification of the Framework

123

5.3.2 Application to Traffic Management Solutions

124

5.3.3 Evaluation Environment and Initial Results

124

6 Synergies between Mechanisms

126

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

6.1 Adherence to the Scenarios

126

6.2 Properties of Mechanisms

130

6.3 Observation and Decision Metrics

136

6.4 Discussion of Synergies between Mechanisms

141

7 Summary and Conclusions

142

7.1 Key Outcomes and Lessons Learnt

142

7.2 Next Steps

144

8 Smart Objectives

145

9 References

147

10 Abbreviations

154

11 Acknowledgements

156

12 Appendices

157

12.1 OpenFlow Test Configuration Details

157

12.2 MPLS Test Configuration Details

160

12.3 Mapping of Mechanism to SmartenIT Architecture

168

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

List of Figures

Figure 1: Interaction of users on Facebook during a day

19

Figure 2: Buffered playtime while streaming a YouTube

23

Figure 3: Finite state machine for a video streaming source traffic

24

Figure 4: Cumulative distribution of measured YouTube video bit-rates and video

24

Figure 5: Cumulative distribution of measured block

25

Figure 6: Cumulative distribution of pre-buffered playtime

25

Figure 7: Video buffer while streaming a video with limited network data rate [58]

26

Figure 8: Most important use cases for Dropbox

28

Figure 9: MOS as function of waiting times for four difference task scenarios: initialization,

storage, retrieval, and multidevice sync. Figures are taken from

30

Figure 10: Mapping functions of stalling parameters to MOS. Video duration is fixed at 30

s. No initial delay is introduced. Parameters are given in Table 7

32

Figure 11: Simple QoE model maps a number N of stalling events of average length L to a

MOS value, , = 3.50 − 0.15 + 0.19 + 1.50. [58]

33

Figure 12: N = 4000 traffic traces and the estimated 95-th percentile

35

Figure 13: Difference of the 95-th percentile minus actual traffic

35

Figure 14: Throughput under a specific transit charge, with and without the scheduling

mechanism operating ideally and with perfect information

36

Figure 15: Basic HORST functionality

43

Figure 16: Potential of caching on the end-user device for response-time and energy

 

consumption

 

47

Figure 17: Video hosted on Facebook video server

49

Figure 18: Video hosted on YouTube video

49

Figure 19: Prefetching accuracy vs. number of watched

53

Figure 20: Prefetching accuracy vs. number of pre-fetched

53

Figure 21: Inter-AS traffic generated due to

53

Figure

22:

Total

inter-AS

traffic

54

Figure 23: Total inter-AS traffic during one simulation

54

Figure 24: Cloud

 

57

Figure 25: Instantiation of the ICC mechanism in the case of a cloud

63

Figure 26: Sample network model for the use-cases description

65

Figure 27: Cost functions used for accounting cost of inter-domain traffic

66

Figure 28: A cost map as a function of traffic volume on both inter-domain links and cost

optimization potential

67

Figure 29: Illustration of a traffic compensation mechanism

68

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Figure 30: Comparison of traffic growth on links during the accounting period with (green

curve) and without (red) DTM

69

Figure 31: Traffic pattern in link 2 with and without DTM mechanism

70

Figure 32: Initial experiments show no traffic peak reduction in a random preference

scenario.[11]

72

Figure 33: Deployment example of RB-Tracker

73

Figure 34: Optimized load balancing often includes NP-complete problems, e.g., Bin-

Packing

86

Figure 35: Options for algorithms and graphical views provided by the TE-Scout tool

87

Figure 36: Cost and energy optimization by stepwise upgrades in an 8-node full mesh

88

Figure 37: Timing and Cost Decrease for Stepwise Upgrades

89

Figure 38: vINCENT – Infrastructure

91

Figure 39: Virtual Node concept of vINCENT

92

Figure 40: Measurement of existing P2P streaming

94

Figure 41: Energy Efficiency of end-devices

94

Figure 42: Application quality aggregation system for an ALTO guided population of ALTO

Endpoint QoE Cost

97

Figure 43: example deployment of MUCAPS: Multi-Cost ALTO Client block integrated in an ISP DNS resolver and coupled with (i) an automated Application Metric Mapping

function (ii) an automated metric weight tuning

103

Figure 44: Prototype and example scenario for MUCAPS-based AEP selection

104

Figure 45: video streaming application with 3 candidate AEPs and 2

105

Figure 46 Architecture of the Network Optimizer from [67]

108

Figure 47: Inter-ALTO communication framework architecture

115

Figure 48: Example instantiation of the inter-ALTO framework –

116

Figure 49: Example instantiation of the inter-ALTO framework – communication schemes.117

Figure 50: Example instantiation of the inter-ALTO framework – resulting data

Figure 51: Topology used during simulations. N× means that there are N links of a

117

indicated category between given

118

Figure 52: Average traffic on the link between AS5 and

119

Figure

53:

An

OpenFlow

communication

between

OpenFlow-enabled

switch

and

Controller (ONF, OpenFlow Switch Specification 1.0.0, Dec. 31,

120

Figure 54: OpenFlow evolution [25]

120

Figure 55: OpenFlow switch 1.3.0 (ONF, OpenFlow Switch Specification 1.3.0, June 25,

2012)

121

Figure 56: OpenFlow test domain topology

122

Figure 57: Multi-domain network for MPLS tests

125

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

Figure 58: Topological view of architecture with added scenario overlay (based on

component map taken from D3.1)

127

Figure 59: Detailed MPLS test topology

160

Figure 60: Mapping of HORST to SmartenIT

168

Figure 61: Mapping of SECD to SmartenIT architecture

169

Figure 62: Mapping of ICC to SmartenIT

169

Figure 63: Mapping of DTM to SmartenIT

170

Figure 64: Mapping of RB-Tracker to SmartenIT

170

Figure 65: Mapping of SMSP to SmartenIT architecture

171

Figure 66: Mapping of MRA to SmartenIT architecture

171

Figure67: Mapping of OptiPlan to SmartenIT

172

Figure 68: Mapping of vINCENT to SmartenIT architecture

172

Figure 69: Mapping of AQAS to SmartenIT architecture

173

Figure 70: Mapping of MUCAPS to SmartenIT

173

Figure 71: Mapping of QoEnA to the SmartenIT architecture

174

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

List of Tables

Table 1: Results of the service relevance survey. Sum over all criteria. Green: very

18

Table 2: Extended distribution of video categories in YouTube to include 19 categories. Assignment of popularity values to the 2 new categories (i.e. 18 and 19) and normalization of the 10 most popular categories so that popularities of all 19

relevant, red: not relevant at

categories sum up to 100%

20

Table 3: Bandwidth statistics

22

Table 4: State transition matrix for the state transition function δ:S x Σ → S

23

Table 5: Characteristics of Dropbox accounts from the 49 volunteers [1]

27

Table 6: Characteristics of User Profiles (B=Beginners [22% of users], S=Synchronization

28

Table 7: Parameters of mapping functions (seeFigure 10) of stalling parameters to MOS

Users [30%], P=Power Users [48%]) [1]

together with coefficient of determination R2 as goodness-of-fit measure. [59]

32

Table 8: Users and its Cloud services’ preference ranking

76

Table 9: The steps to combine Cloud services’ preference ranking of each users

77

Table 10: Probabilistic Values for the Preference

77

Table 11: Local application scenario before shaping

82

Table 12: Local application scenario after shaping

83

Table 13: Savings in stepwise link upgrade cycles

89

Table 14: Adherence of mechanisms to

128

Table 15: Overview of proposed TMS w.r.t. scenarios. Absolute values. 3 (dark green):

TMS mainly addresses scenario, 0 (white): TMS does not address

129

Table 16: Summary of mechanisms’ properties

133

Table 17: Overview of mechanisms’ decision-taking process and envisioned innovation138

Table 18: Overall SmartenIT SMART objective addressed. (Source: [110])

146

Table 19: Theoretical SmartenIT SMART objectives addressed. (Source: [110])

146

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

(This page is left blank intentionally.)

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

1 Executive Summary

This document is the deliverable D2.2 “Report on Definitions of Traffic Management Mechanisms and Initial Evaluation Results” of Work Package 2 “Theory and Modelling” within the ICT SmartenIT Project 317846. The main objectives of Deliverable D2.2 are as follows:

Objective 1: Propose and justify traffic management mechanisms, which overcome current limitations of existing services in communication networks. In particular different viewpoints of the involved stakeholders are not taken into account. In SmartenIT the view points of the stakeholders, specifically Internet Service Providers (ISPs), Cloud Providers, and End-Users, are addressed, to avoid unnecessary costs, e.g., by saving energy or expensive inter domain traffic while providing good service quality in terms of Quality-of- Experience to end users. More details on the addressed stakeholders and scenarios can be found in D1.1 and D1.2 respectively. Further on, current traffic management solutions do not take into account social awareness which can be utilized to predict demand of content and analyze user interaction.

In order to collect propositions for a subsequent SmartenIT traffic management solution, a set of potential mechanisms with innovative features have to be defined, together with initial evaluation results and assessment of its positioning against the SmartenIT scenarios and architecture, thus providing evidence whether the traffic management mechanism should be further pursued and assessed.

Objective 2: Provide models for the evaluation of the proposed mechanisms. For subsequent performance evaluation of the proposed mechanisms, theoretical models and simulation models are necessary. The models need to take into account the different stakeholders and their goals which are involved in service delivery chain. In particular models need to be developed for mechanisms that specifically address applications selected by SmartenIT.

Addressing objective 1, a broad set of traffic management solutions were proposed, on the basis of both the scope of the project and of the overview of existing overlay traffic management solutions given in D2.1 [109]. For each solution proposed, the scenarios defined in D1.2 [108] were addressed, namely inter-cloud communication, global service mobility, social awareness and energy efficiency.

Since considerable overlaps in the four scenarios defined initially were identified, it was decided to merge these scenarios in the recent progress of the project. This established a) the operator focused scenario, which covers the perspective of ISPs and cloud operators and solutions with decision metrics at data centers or the backbone-network and b) the end user focused scenario, which covers traffic management solutions that address the perspective of the end user and deploy decision metrics at end devices or access- networks, as documented in D1.2, which evolved in parallel with the present deliverable.

Therefore, the traffic management solutions introduced and studied in this deliverable were also mapped to the operator focused and the end user focused scenarios. The mapping shows that the traffic management solutions mainly addressing inter-cloud communication constitute the operator focused scenario, whereas solutions addressing mainly global service mobility and social awareness are covered by the end user focused scenario.

For subsequent performance evaluation of traffic management solutions, the factors that have a key influence on the associated mechanism were identified, as well as key

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

performance metrics. An initial evaluation for each proposed traffic management mechanism was provided, as far as possible in this stage of the project. The evaluations were based on initial estimations and argumentation, and/or were taken from literature and address whether the relevant solution is worth detailed examination.

Addressing objective 2, relevant applications for a SmartenIT solution were identified in the present deliverable. For this purpose, a service relevance survey and an application relevance survey of services and applications considered for a SmartenIT traffic management solution were conducted. The results of the surveys show that video-on- demand and file-storage are the most relevant services for SmartenIT, as well as that YouTube and Dropbox are the most relevant applications for the respective services. For evaluation purposes, we use existing theoretical and simulation models from the literature, but also develop new models, to cope with proposed modifications in the protocols and algorithms. Since applications like Dropbox came up just recently, appropriate models for such applications barely exist and may have to be developed.

Furthermore models for the Quality-of-Experience perceived by end-users for each specific application were defined, in order to assess the performance of the mechanisms from the end-user perspective. To get one step further towards a complete evaluation framework, for each simulation and theoretic model, it was shown how it is deployed in a broader environment to evaluate traffic management solutions. This was done by identifying models that should complement them in a complete evaluation framework for the proposed traffic management solutions and addressed scenarios, which will be developed in task T2.4.

The synergies of the proposed mechanisms were studied extensively to find possible overlaps and complementarities. The mechanisms were grouped into 5 different categories reflecting the main characteristics of the mechanisms, namely “Content Placement”, “Delivery Scheduling”, “Ranking”, “Communication Protocols” and “Configuration Frameworks”. The categories show traffic management mechanisms that share the same characteristics. The main outcome of the deliverable is that certain proposed solutions share the same goal. The identified synergies allow grouping traffic management solutions that can be combined to fit a particular use-case. Thus, a basis for deliverable D2.3 is provided where the use-cases and their parameters will be defined.

Finally, this deliverable also provides a basis for the decision on which traffic management solutions will be further evaluated by analysis and simulation in WP2. The analysis of the mechanisms as well as that of their synergies will also serve as the basis for limiting the set of mechanisms whose implementations will be integrated in the system architecture developed in WP3.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

2 Introduction

SmartenIT targets an incentive-compatible cross-layer network management scheme for network and cloud operators, cloud service providers, and end-users, as denoted in [110]. Specifically, SmartenIT aims to address accordingly load and traffic patterns or special application requirements, to employ Quality-of-Experience (QoE)-awareness. Additionally, one of the key targets of SmartenIT is the exploitation of social awareness (in terms of user social relationships and interests as an extra channel of information to characterize the end-users of the cloud services, and thus, predict demand. As a result, efficient content placement and pre-fetching can be supported, as well as migration of workload and Virtual Machines (VMs), etc.

Moreover, one of the key objectives of SmartenIT is energy efficiency both in the Provider- and the End-User-side. Therefore, SmartenIT aims to design Traffic Management (TM) mechanisms that will achieve energy efficiency, i.e. keep energy consumption low for data centers, networks or in end-users’ mobile devices. Thus, the energy efficiency with respect to both end-user devices and underlying networking and application provisioning infrastructure is tackled to ensure an operationally efficient management. Nonetheless, incentive-compatibility of network management mechanisms for improving metrics in all layers and among all players will serve as the major mechanism to deal with real-life scenarios. Furthermore, major overlay applications, whose traffic is to be tackled by SmartenIT, as selected by WP1, include video streaming and online storage applications; major representatives of which are YouTube and Dropbox, respectively.

Regarding the design of appropriate TM mechanisms that inevitably deal with TM of inter- domain flows at the Internet scale, a major challenge has been how to assure that the data/content transfers will indeed elicit the desired Quality-of-Service (QoS) properties; thus, attaining the relevant QoE goals for the end user. To this end, three alternatives appear:

a. Pure IP and the current EGP (e.g., BGP) / IGP (e.g., OSPF) set-up.

b. IP networks management/control plane enhancements allowing inter-domain QoS- related prioritization mechanisms or novel TE products, so as to have some control on the per-AS statistical performance of the routes selected.

c. Sub-IP layer routing and/or internal and external routing protocols configuration.

Regarding the first option, this is the most general approach, ensuring the applicability of the proposed mechanisms at Internet scale; this is, thus, a worst case scenario but also the most generic approach. The network topology and routing are taken as given and routing decisions cannot be affected in inter-domain scale. In particular, the respective mechanisms rely on overlay decisions regarding the placement of caches, load balancing techniques relying on DNS or overlay information, deciding on the scheduling of data transfers, and their respective sending rates along with shaping.

Regarding the second option, on the IP layer, the softest approach is the Differentiated Services (DiffServ) mechanism, if applied in different interconnected domains. However, the involved network operators have to agree on similar DiffServ classes, accept marking at the Points of Interest (PoIs) and respect commonly defined QoS policies and classification schemes. Technically, this is supported with the use of DiffServ brokers. However, network operators don’t seem to have enough incentives to deploy such brokers, and therefore, we can conclude that DiffServ is currently not supported in the inter-domain context.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

The third and final option is complementary to the first one. Mechanisms defined to work on top of the Internet could be further enhanced with TE allowing for better routes for content delivery that are engineered: i) in sub-IP layer, i.e. properly configured and provisioned MPLS tunnels, or ii) to alter IGP/EGP configurations using Intelligent Route Control techniques [40]. Such solutions are applicable only for multi-homed networks and result in significant gains only for large networks with multiple neighboring networks.

The SmartenIT proposed mechanisms in this deliverable have been mostly designed to function over pure IP. This means that no additional functionality is required so that the proposed TM solutions can work and improve any considered service. This decision is due to the fact that the large-scale IP network is the dominant paradigm today, i.e. networks across multiple administrative domains simply exchange BGP information and data, which do not implement inter-domain QoS.

Finally, another interesting issue that affects the design of the SmartenIT mechanisms presented in this deliverable is the awareness regarding the type of traffic for which the mechanisms are applicable. A major constraint here is that inter-domain traffic over peering and transit links is by definition a service-agnostic aggregation of both elastic and inelastic traffic. Although DPI is deployed in certain cases, its existence cannot be taken for granted. In general, it is inherently too costly for networks to “examine” the composition of those traffic aggregates and try to treat differently the various constituent traffic streams. Thus, this is a major constraint that also complicates the mechanism design as well as the actual potential of intervention of the SmartenIT mechanisms on the inter-domain network layer. Concluding, the deployment of smart overlays or cross-layer mechanisms that combine information from both the network and the cloud layer at the network edges are promising in this context.

2.1 Purpose of this Document

The main goals of this deliverable are as follows:

Proposal and specification of incentive-based TM mechanisms and their intelligence for the efficient handling of traffic generated by overlay applications in an energy efficient manner.

Definition of related scenarios and use-cases for each proposed TM mechanism and identification of key influence factors and evaluation metrics.

Development of theoretical and simulation models for the evaluation of the specified TM mechanisms, as well as models employing game-theoretic aspects to investigate behaviors of different stakeholders when adopting such TM mechanisms,

Description of preliminary results of the evaluation of the various TM mechanisms (where applicable).

Deliverable D2.2 is the 2 nd deliverable of WP2 and sets the basis for the development of intelligence, mechanisms and models that will constitute the heart of the SmartenIT solutions. The work presented in Deliverable D2.2 will be further evolved in the next phases of the project, and it will be finalized and concluded in Deliverable D2.4 “Report on Definitions of Traffic Management Mechanisms and Initial Evaluation Results (Final Version)” which will be delivered at the end of Year 2 of the project.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

2.2 Document Outline

This document is organized as follows:

Chapter 3 initially summarizes work performed in WP1 on the selection of overlay applications whose traffic is to be tackled by SmartenIT. Then, appropriate models developed within SmartenIT for the assessment of QoE/QoS as well as other metrics related to the selected application categories are provided.

Chapter 4 provides the main contribution of this document, which is the specification of the various TM mechanisms proposed by SmartenIT, the main scenarios that they address, the key influence factors, i.e. parameters that have significant impact on their performance, key performance metrics that should be monitored and are aimed to be improved by each mechanisms, and finally, some preliminary evaluation results, where available already.

Chapter 5 provides communication and configuration frameworks that might be useful for a SmartenIT solution. For each framework, its specification, its potential applicability to TMS and initial theoretical or functional evaluation results are provided.

Chapter 6 addresses the potential synergies among the specified mechanisms and aims to qualitatively assess the impact of their operation when combined, so as to address more complex use cases, or use cases that are not sufficiently addressed by each of mechanisms alone.

Chapter 7 summarizes the deliverable and draws the major conclusions on the specified TM mechanisms and next steps of the investigations of SmartenIT WP2.

Chapter 8 reports which SMART objectives, as described in SmartenIT’s Description of Work (DoW) [110], have been addressed by the work performed in WP2 and reported in

D2.2.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

3 Definition Models

In this chapter we define relevant applications considered for a SmartenIT solution. Relevant applications were identified in a two stage survey among partners. For more details on the application selection survey, please refer to deliverable D1.2 [108].

For the selected applications we define simulation models and theoretical models for the evaluation of the traffic management solution proposed in Chapter 4. To be able to evaluate the solutions with respect to the scope of the project we further define models for energy efficiency, Quality-of-Experience and game theory.

of

Relevant

Applications

and

Related

3.1 Selection of Relevant Applications for SmartenIT

The number of cloud service applications provided in the Internet is constantly increasing. To be able to focus on a manageable subset of applications to work on in SmartenIT, a cloud application survey has been conducted among the project partners. The purpose of the survey was to identify the applications most relevant for SmartenIT, based on carefully selected criteria.

The cloud application survey was performed among the partners in two steps. The first step was a service relevance survey. Its goal was to identify the most relevant services for SmartenIT. In the second step most relevant applications were selected out of the most relevant service categories.

For each service the relevance of 13 criteria was rated with a value from 1 to 5, for “not relevant at all” to “very relevant for SmartenIT”. Table 1 shows the service with highest sum over all criteria. For end-users most relevant services in the survey are “File Storage and File Sharing” and “Video on Demand”. For the service enabling technologies Data Centers are rated most relevant for SmartenIT.

Based on the most relevant services “File Storage” and “Video on Demand” a subsequent survey on applications relevant for SmartenIT was conducted. The criteria were limited to 8 to cover only the criteria which differ for the considered applications.

The “Video / Music on Demand” application most relevant for SmartenIT is YouTube since it has the highest mean scores, is very popular and produces a high traffic volume. The problem with applications like YouTube is the limited intervention potential. The clients are based on html5 / flash player and are proprietary.

The “File Storage” application most relevant for SmartenIT is Dropbox since it has the highest mean scores, is very popular and produces a high traffic volume. The intervention potential of Dropbox is not expected to be high, but has still to be investigated. Zettabox is an application similar to Dropbox which was developed by a project partner and therefore has a high intervention potential. OwnCloud is an open source Dropbox clone and also gives the opportunity to modify a Dropbox like file storage application.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Table 1: Results of the service relevance survey. Sum over all criteria. Green: very relevant, red: not relevant at all.

criteria. Green: very relevant, red: not relevant at all. It was agreed within the project to

It was agreed within the project to have YouTube and Dropbox as primary applications that represent respectively the services “Video on Demand” and “File Storage”. If a modification of the client/server functionality is needed for a solution the corresponding Open Source implementations Zeta-Box, PiCsMu, Owncloud for “File Storage” and VLC for “Video on Demand” will be used.

3.2 Models for Addressed Applications

In order to evaluate the performance of existing applications and newly proposed traffic management solutions, models of the addressed applications and their key metrics are needed. Thereby, the foundations are laid for analytical and simulative evaluations. However, all models have to cope with a trade-off between accuracy and complexity. The more accurately a model represents the application behavior, the more complex it is to get results, and vice versa.

By modifying the application models, insights into possible optimization approaches are provided. Moreover, these modified models can be used to predict the expected gain with respect to a certain metric. In this section initial models for the simulation of the addressed applications are presented, and theoretical models which are relevant for optimization are described.

3.2.1 Simulation Models

In this subsection, we present models developed to serve the need of simulating and evaluating some of the traffic management mechanisms proposed in Chapter 4.

3.2.1.1 Model for Video Dissemination among the Users of an OSN The model for video dissemination among the users of an OSN was developed in order to evaluate a Socially-aware mechanism for Efficient Content Delivery (SECD) (presented in Section 4.2), which employs a P2P overlay and a Social Proxy Server (SPS) to assist video delivery among the users of an OSN, and compare it with an existing approach in literature, i.e. SocialTube [76]. Therefore, we designed and implemented a complete evaluation framework to simulate an OSN, whose users are consumers of a video service offered either by the servers of the OSN, or by the server of a third-party-owned

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

CDN/video platform; the designed evaluation framework can be used to evaluate other similar approaches. To do so, we did not model accurately the evolution of the P2P overlay, rather we focused more on the definition of demand and supply models, i.e. video viewing and video uploading/sharing respectively, video distributions per interest category, users distribution per AS, etc. Below, we provide the constituent elements of the evaluation model.

3.2.1.1.1 Time considerations

In our evaluation, time is slotted in slots of 20 minutes, in order to be consistent with the fact that a user is active in Facebook about 20 minutes per day on the average [31]. Thus,

a user will be active only in one of these 20-minute slots, and each of his activities will

occur within this interval. Regarding the users’ activity, we assume that only 50% of the OSN users are active daily. We chose randomly the users that will be active each given day, but users with more friends have an extra possibility to be active. Specifically, a weight is assigned to each user denoting the user’s probability to be active within a day;

this weight is calculated according to the formula:

_ = 1 _ / 1000,

If a user is active in Facebook on a given day, he is considered to be active only for 20

minutes in a selected 20-minutes slot. For the selection of the slot, we perform weighted random choice based on the information extracted by Figure 1.

choice based on the information extracted by Figure 1. Figure 1: Interaction of users on Facebook

Figure 1: Interaction of users on Facebook during a day [91].

Furthermore, we assume that each user is active in the Internet for 140 minutes [32] (i.e. seven 20-minutes slots). We assume that these seven timeslots are contiguous, and thus when a user log in Facebook does it in the middle of any of these timeslots. Regardless a user’s activity in Facebook within a specific day, that user can seed content which he has stored while active in Facebook in previous timeslots.

3.2.1.1.2 Users’ characteristics

We have assigned 4 video interest categories to each user, while each user is considered

to share and watch videos only out of these 4 categories. To decide in which 4 categories

a user is interested in, we used a weighted random choice and we chose 4 categories out

of 19 total interest categories. Based on the popularity of the video categories of YouTube as reported in [24], we extended the list of categories by two more and assigned

popularities to each video following the Power Law distribution [21] as it appears in Table

2.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Table 2: Extended distribution of video categories in YouTube to include 19 categories. Assignment of popularity values to the 2 new categories (i.e. 18 and 19) and normalization of the 10 most popular categories so that popularities of all 19 categories sum up to 100%.

Category

Percentage of videos

Entertainment

25.3

% (-0,1)

Music

24.7

% (-0,1)

Comedy

8.6% (-0,1)

People & Blogs

8.6

% (-0,1)

Films & Animation

8.5

% (-0,1)

Sports

7.5

% (-0,1)

News & Politics

3.5

% (-0,1)

Autos & Vehicles

3.3

% (-0,1)

How-to & Style

2.3

% (-0,1)

Pets & Animals

1.6

% (-0,1)

Travel & Events

 

1.6

%

Education

 

1.1

%

Science & Technology

 

1.0

%

Unavailable

 

0.8

%

Nonprofits & Activism

 

0.3

%

Gaming

 

0.2

%

Removed

 

0.2

%

Added Category 18

 

0.5%

Added Category 19

 

0,5%

Additionally, we assume that each user is located in one specific AS by assigning him an AS id. We have assigned to each AS a rank that denotes the popularity of the AS, assuming that ASes with higher popularity have more users. Then, in order to distribute the OSN users among the ASes, we used the Zipf distribution.

3.2.1.1.3 Categorization of viewers We categorize the viewers of an uploader in three categories, as follows:

Followers: are considered to be the 1-hop or 2-hops friends, who watch over 80% of the videos uploaded by the uploader.

Non-followers: are assumed to be the 1-hop or 2-hops friends who watch less than 80% but more than 30% of the videos uploaded by the uploader.

Other viewers: are assumed to be the 1-hop or 2-hops friends who watch less than 30% but more than 20% of the videos uploaded by the uploader, since every viewer of an uploader is assumed to watch at least 20% of videos uploaded by the latter.

Based on the aforementioned categorization of users and the observations in [31] for the number of users watching specific percentages of uploaders’ videos, we assume for each viewers category in 1 and 2 social hops that it holds: 90% of viewers are at most within two social hops, while the remaining 10% are in three or more hops; while viewers having at least one common interest with the users that they follow is a prerequisite. Thus, according

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

to these percentages, the assignment of users in 1 and 2 hops (i.e. 90% of viewers) is a random choice from users with at least one common interest, as follows:

Followers

33% of viewers are characterized as Followers at 1-hop.

2% of viewers are characterized as Followers at 2-hops.

Non-followers

37% of viewers are characterized as Non-followers at 1-hop.

12% of viewers are characterized as Non-followers at 2-hops.

Other viewers

2% of viewers are characterized as other viewers at 1-hop.

6% of viewers are characterized as other viewers at 2-hops.

3.2.1.1.4 Video viewing and related parameters We consider a pool of videos in order to simulate a video platform like YouTube. Since each user is active in Facebook for 20 minutes on the average per day and taking into account that the average length of a video is 4 minutes [24], we assume that a user may watch 1 to 5 videos in this 20-minutes interval, where the number of videos watched follows the uniform distribution. Each user can have access to the videos published from his 1-hops friends because of the privacy settings of Facebook, however we assume that he watches videos only related to his interests. As expected, videos of top interest for users, as well as videos with highest popularity are more likely to be watched.

Moreover, we assume that the number of videos uploaded daily in our system is equal to 1/20 of the total number of user in our system. In each day we decide who users will be uploaders, i.e., they upload and share videos. For this choice we use Bernoulli distribution, where the users are chosen with uniformly random way. Additionally, each user can upload none, one or more videos, but only within the 20-minute slot that he is active in Facebook. Finally, the probability for a user to re-share a video that he has already watched from a friend, i.e. stored in Facebook’s server, is 11.8%, while the probability to upload a video watched from the video server of a third-party is 88.2% [76].

Finally, each user is considered to be able to push only one video prefix through his messaging overlays in any given day. We make this assumption because a user hardly uploads a video per day, so there is no point trying to push more video prefixes. In the case where a user uploads more than one video, say two, then he is considered to push only one video prefix within that day, while he pushes the video prefix of the remaining un- pushed video in the next day.

3.2.1.1.5 Implementation details We assume that each user is available to serve the local P2P overlay, i.e., a P2P overlay network, which is built per video and AS to support the dissemination of that specific video, as a leecher during a 4-minutes slot is active in Facebook, while each user is available to serve the local P2P overlay as a seeder during a 20-minute slot is online more generally in Internet. Then, the estimation of the intra-AS traffic generated by a user watching a video is based on the percentage of seeders and leechers that are active during that 4-minutes slot and additionally, are located within the same AS. On the other hand, the estimation of inter-AS traffic generated by a user watching a video is based on the percentage of

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

seeders and leechers that are active during this 4-minutes slot but are located in other ASes, plus the contribution of the external server where the video is hosted.

Furthermore, we use the upload bandwidth available to each user from the other users in the swarm as well as from the SPS (or the external server in case of SocialTube) as a proxy for users’ QoE. Our main objective is to keep this available bandwidth for each user higher than the (average) bit rate of the video being watched in order to assure high (or at least adequate) QoE. In order to estimate both traffic and QoE, we assign an UL and a DL bandwidth to every user in our system; the assignment is based on statistics presented in Table 3.

Table 3: Bandwidth statistics [23].

presented in Table 3. Table 3: Bandwidth statistics [23]. Next, we will describe the framework setup

Next, we will describe the framework setup for our evaluation: First, we created 3963 nodes and defined their social relationships based on the SNAP dataset [80]. Second, we distribute users (nodes) in 4 different ASes of varying sizes using the Zipf distribution; specifically, we assume that the AS with id 1 has rank 1 and thus, the highest number of users is assigned to it, while the AS with id 4 has rank 4 and thus, the least users of all 4 ASes.

Moreover, we created a pool of 9000 videos and we assigned to each video an interest category and a popularity value following the methodology described in Section 4.2.4. Additionally, each video has been considered to have a random size from 20 to 30 MB (uniform selection), and the bit rate of each video has been set equal to 330 Kbps.

Furthermore, we set the cache size of each user equal to 300 MB, which can be considered as a rather low value taking into account the TBs of storage available (at low cost) in users’ premises, and the cache size on each one of the four SPSs, one SPS per AS, to be proportional to the number of users assigned to the respective AS. For each user connected to the SPS, the SPS increase the size his cache for one prefix and one video, i.e., 33 MB in total for a prefix and a video. Finally, the simulation lasted 30 cycles corresponding to 30 days. Finally, we implemented our evaluation framework in MATLAB.

3.2.1.2 Simulation Model for HTTP Video Streaming In February 2012 YouTube introduced the Range Algorithm to control the data flow for http video streaming. In contrary to the previously used Throttling Algorithm, the Ranged Algorithm only requests a block of data when the pre-buffered playtime drops below a certain threshold. Therefore network bandwidth is consumed only when needed. Especially when users don’t watch a video until the end or if they jump through the video this saves bandwidth compared to the Throttling Algorithm. The requests of the Throttling Algorithm only depend on the bitrate of the video and do not consider the pre-buffered playback time.

In this subsection we present our results from experiments with the YouTube video playback buffer to investigate the Range Algorithm. The goal of the experiments is to reverse engineer the range algorithm to develop a model for HTTP video streaming. For the study measurements for 100 randomly chosen popular videos were conducted. Each video was replayed at least 12 times in different resolutions, so that more than 2400

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

samples were obtained. Samples with measurement errors were omitted for the analysis. The YouTube monitoring tool YoMo was used to monitor and record the buffered playtime while the videos were played back in the measurements.

while the videos were played back in the measurements. Figure 2: Buffered playtime while streaming a

Figure 2: Buffered playtime while streaming a YouTube video.

Figure 2 shows the pre-buffered playtime of a YouTube video dependent of the playtime. The buffered playtime increases sharply at the beginning of the playback to pre-buffer playtime. Hence, blocks are requested immediately after completing the previous block. After exceeding a threshold of 50 seconds pre-buffered playtime blocks are only requested when the playtime drops below that threshold. When the last block is downloaded the rest of the video is downloaded and can be played back.

This behavior can be modeled by a finite state machine. Figure 3 shows a simple finite state machine which models the YouTube player requests and hence the YouTube source traffic. A finite state machine is defined by a quintuple (Σ, S, s_0, δ, F).

Table 4: State transition matrix for the state transition function δ:S x Σ → S

Input\Current State

pb == 0

pb += s/γ

pb -= c

a, pb>β

pb += s/γ

pb -= c

pb -= c

a, pb<β

pb += s/γ

pb += s/γ

pb += s/γ

!a, pb>β

pb == 0

pb == 0

pb -= c

!a, pb<β

pb == 0

pb == 0

pb += s/γ

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Mechanisms Public Seventh Framework STREP No. 317846 Figure 3: Finite state machine for a video streaming

Figure 3: Finite state machine for a video streaming source traffic model.

The input alphabet is Σ={a,!a}x{pb>β,pb<β}, where a indicates that blocks are available, pb is the pre-buffer size and β is the block request threshold. Thus, the input alphabet indicates whether blocks are available and whether the buffer is above or below the threshold. Moreover, the set of states is S={pb==0,pb+=s/γ,pb-=c}, which represent an empty buffer, an ongoing download (buffer is increased by block size s divided by block bitrate γ), and the playback state (buffer is decreased by update time c). The initial state is s 0 ={pb==0} where the buffer is empty, the final state F is empty, and the state transition function δ which is described in the state transition matrix in Table 4.

The parameters defining the model are the block request threshold β, the block-bitrate γ, the block-size s and the video size S. To parameterize a model for YouTube we measure YouTube video downloads in different qualities. In the following we describe the results of the initial measurements.

we describe the results of the initial measurements. Figure 4: Cumulative distribution of measured YouTube video

Figure 4: Cumulative distribution of measured YouTube video bit-rates and video sizes.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

Figure 4 shows the cumulative distribution of measured YouTube video bitrates and video sizes for three different resolutions 240p, 360p and 480p. As expected the video bit-rate increases with the resolution of the video. Videos of the same resolution can have different bit-rates, since for recent codecs the video-bitrate highly depends on the amount of self- information in the video. E.g., videos with frequently changing scenes need a higher bit- rate than still images, because more information has to be encoded. For these initial measurements a uniform distribution of the video bit-rates can be assumed. Additional measurements are required to get a better assessment of the video bit-rate distribution for more videos. Furthermore, the distribution of block bit-rates has to be measured, which is needed as parameter γ for the model.

Corresponding to higher bit-rates video sizes tend to be larger for higher video resolutions. The video size distribution has a decent tail, e.g., for resolution 480p more than 95% of the measured videos are smaller than 50MB and few videos are larger than 90 MB. This proposes to use an Erlang-k distribution to model the video size distribution of YouTube videos. The parameters of the Erlang-k distribution to fit YouTube video sizes still have to be determined.

to fit YouTube video sizes still have to be determined. Figure 5: Cumulative distribution of measured

Figure 5: Cumulative distribution of measured block sizes.

Figure 5: Cumulative distribution of measured block sizes. Figure 6: Cumulative distribution of pre- buffered playtime.

Figure 6: Cumulative distribution of pre- buffered playtime.

Figure 5 shows the cumulative distribution of block sizes of YouTube video streams in three different resolutions. The block size distributions are depicted separately for middle blocks and the last block of a video. A YouTube video can consist of zero or more middle blocks and one last block. The middle blocks have constant size and the size of the last block is simply the size of the rest of the video. For resolutions 240p and 360p a middle block of a YouTube video stream is about 1.78 MB. For 480p resolution the middle block of a YouTube video stream is about 2.46 MB. The last blocks containing the rest of the video are assumed to be distributed uniformly with lower bound 0 and upper bound 1.78 MB for resolutions 240p/360p or upper bound 2.46 MB or resolution 480p.Hence, for YouTube video streaming we can define the block size parameter s for a video with size S:

=

1! ∙

, $ , % .

C is a constant which is 1.78 MB for 240p/360p and 2.46 MB for 480p resolution for YouTube video streaming. Hence a video with size S has & / ' 1 blocks. The video duration D can be determined by dividing the block sizes with the block bitrates ( :

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

) = *

+

,-

(

Figure 6 shows the buffered playtime at the time of a block request for different video resolutions. The buffered playtime is 50 seconds on average when a new block is requested. The playtime when a block is requested varies and can be fit with a normal distribution. Hence, in a detailed model for YouTube the threshold β can be modeled as a random variable that follows a normal distribution with mean 50 seconds.

This basic model does not consider the available bandwidth and download speed of the blocks. The key influence factor on http video streaming QoE is stalling. Stalling occurs if the video buffer drops below a certain threshold, such that a fluent playback is no more possible. The buffer of a video that is stalling while playback is depicted in Figure 7. The video is initially pre-buffered until the video buffer hits a certain threshold. Then the video playback starts. In this case the network data rate is slower than the video bit-rate, due to a bottleneck which could be limited bandwidth. The video buffer decreases until it drops below a threshold and the video stalls. The video stops playing until the buffer exceeds the playing threshold and starts running again.

exceeds the playing threshold and starts running again. Figure 7: Video buffer while streaming a video

Figure 7: Video buffer while streaming a video with limited network data rate [59].

The basic model could be adopted by including the network data rate such that the state pb==0 is reached if the data rate is too slow, which means that the video is stalling. A more detailed model for http video streaming should also consider dynamic adaptive changes of the video resolution, which could be modeled by adding a dimension in the state machine, having one state machine for each resolution that are connected to each other.

The simulation model for HTTP video streaming can be used as a source traffic model for video traffic. Such it can be implemented in any flow- or packet-level simulation environment including video sources, which could be servers in data centers of a CDN or

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

peers in a P2P video streaming system. For a complete evaluation model of a HTTP video streaming system we need QoE models for HTTP streaming which are described in 3.2.2.2. To evaluate video streaming in the context of the whole service infrastructure

which consists of several servers, and to take into account social awareness, we further need models for the video-CDN infrastructure and the propagation and requests of videos

in online social networks. There have been various measurement studies which analyze

the CDN infrastructure of YouTube [3][116][4].A model for video requests and the video

popularity in online social networks is provided in Section 3.2.1.1 and in [77].

3.2.1.3 Simulation Model for Dropbox

To derive typical usage scenarios and QoE influence factors of cloud storage services for

a subsequent simulation model, we conducted a Dropbox survey. For this survey, a

dedicated application was installed on the participants’ Dropbox account in order to gather information on available and used storage capacity. Depending on users’ goals and specific purposes for using Dropbox, their personal characteristics and the usage situation, the impact of influence factors on Dropbox QoE may differ. Therefore, the information collected in this second survey is used to define user profiles and groups of QoE influence factors by using the Expectation Maximization (EM) cluster algorithm. For modeling Dropbox QoE depending on the actual usage context and situation, we analyse the connection between these clusters to map user groups to sets of QoE influence factors.

For the Dropbox survey 49 volunteers were recruited. Table 5 depicts the percentage of workers and volunteers with different Dropbox account sizes and amounts of stored data (i.e. used space). The table shows that 17% of the workers only have the initial amount of data stored (example files and folders with a total size of 1.4 MB) in their Dropbox folder. For 71.15%, the available account size is between 3GB and 10GB Dropbox space.

Table 5: Characteristics of Dropbox accounts from the 49 volunteers [1]

Used Space

Percentage of volunteers

Initial amount

17.31%

100MB

67.31%

1GB

40.38%

Account Size

Percentage of volunteers

2GB

17.31%

3GB

71.15%

10GB

57.69%

We defined a user profile by taking into account: (a) the usage duration of Dropbox (for a few days, up to one year, more than a year), (b) the number of linked devices (1,…, 5, more than 5), (c) experience with in-conflict files, and (d) especially their main use case / reason to use Dropbox (backup, synchronization, collaboration, file sharing and version control). We used the Expectation Maximization (EM) cluster algorithm of the machine learning software WEKA to determine different user groups. This approach resulted in six clusters containing two empty clusters and one with only three respondents who did not answer some of the questions. These three clusters were excluded from further analysis.

In the following we will refer to the three remaining clusters as beginners, synchronization

users and power users.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Table 6: Characteristics of User Profiles (B=Beginners [22% of users], S=Synchronization Users [30%], P=Power Users [48%]) [1]

Characteristic

Beginners B

Sync. Users S

Power Users P

Ratio

22%

30%

48%

Usage Time

Up to 1 Year

> 1 Year

> 1 Year

Avg. Number of Devices

1.4

3.8

2.6

OS (Main Device)

Windows

Windows, MacOS

Windows, MacOS

‘In-conflict’ Files

10.0%

0.0%

68.2%

In Table 6 some characteristics of the different user profiles are shown. The beginners cluster contains 10 respondents using Dropbox up to one year. The synchronization users cluster consists of 14 users and is characterized by a common Dropbox usage time of more than a year (78.6%). 22 users are part of the power users cluster. In this cluster all the respondents use Dropbox for more than a year. Further, the table depicts that synchronization users push to many linked devices (mean=3.8) while power users on average use 2.6 devices for using Dropbox.

Figure 8depicts the main usage for the different clusters. The beginners use Dropbox mainly for collaboration (50.0%) and file sharing (40.0%) while the synchronization users make a greater use of it for synchronization (64.3%) and backup (14.3%). For the power users synchronization is still dominating (40.9%) but the use purposes are more balanced than in the other clusters. Moreover, Table 6 shows, that 10.0% of the beginners and 68.2% of the long-term users have experienced in conflict files. This can be explained by the use of Dropbox for collaboration of the beginners and power users and the overall usage duration of the power users. [1]

users and the overall usage duration of the power users. [1] Figure 8: Most important use

Figure 8: Most important use cases for Dropbox [1].

For a complete evaluation model of a cloud storage system we need the QoE model specified in the next section 3.2.2.1. Furthermore we need models for the number of files stored on the file storage systems and their file-sizes. Finally we need to study and model the propagation of files in collaboration networks and the file-sharing behavior of cloud

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

storage users. This includes the formation of groups which share a set of files dependent on file-size and file-type.

To simulate Dropbox traffic at flow-level we need to further investigate the Dropbox-P2P protocol and its dissemination strategy.

3.2.2 Theoretical Models

For evaluating the user perceived quality of the selected applications in SmartenIT, QoE models are required which allow mapping objectively measurable parameters like download time onto QoE. In this section, theoretical models will be described which can be used for further evaluations and studies.

3.2.2.1 Quality-of-Experience Models for Dropbox The cloud-based file storage service QoE Model was described in detail in [1] from which the following material is taken:

A subjective lab study was conducted at the premises of Telecommunications Research Center Vienna (FTW) in order to quantify the impact of waiting times on QoE for cloud storage and file synchronization services like Dropbox considering the following tasks:

initialization, storage, retrieval, multi-device sync. Figure 9 depicts the obtained results in terms of overall quality. The study results do not only show that perception (and rating) is highly non-linear and exhibits only limited saturation effects (similar to file downloads), but also that end-user sensitivity is dependent on the task context. For example, users tend to be more tolerant with slow storage operations as compared to retrieve ones, as observed in Figure 9b and Figure 9c, and are even more patient with multi-device file synchronization, as depicted in Figure 9d. In addition, saturation effects are different for both storage/retrieval scenarios: a slight saturation effect occurs for file retrieval after 2 seconds, which is not observed in the case of storage. For more details on this study please refer to [19].

According to the actual situation S including the actual task and conditions like number of files, different shapes of curves are observed in Figure 9. Thus, the QoE model function f S (t) provides the MOS for this situation depending on the short-term influence factor waiting time t; an appropriate such function is f S (t) = a log(t) + b. However, the overall QoE Q(S, t, F) also needs to take into account the long-term influence factors F which provide an upper bound for Q. Additional degradations during the usage of the service, i.e. through waiting times, may occur. For example, security is not affected during the usage of Dropbox, but it may define an upper bound for QoE. In contrast, during the usage of Dropbox mainly waiting times, but also the appearance of in-conflict files shape the user perception, and thus should be taken into account. For the sake of simplicity, we focus on waiting times only as short-term influence factor. Thus,

and for t = 0 it is

.

. / , ,0! = 1 !

/ , 0, 0! = *

∈3

Thus, the importance of an annoyance factor i is reflected by the weight w i . For the

example f S (t) = a log(t)+b, we arrive at / , , 0! = 4 log ! + ∑

with 9 = ∑

∈3

∈3

.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Mechanisms Public Seventh Framework STREP No. 317846 (a) Client initialization (b) File storage (upload) (c) File

(a) Client initialization

(b) File storage (upload)

317846 (a) Client initialization (b) File storage (upload) (c) File retrieval (download) (d) Multi-device

(c) File retrieval (download)

(d) Multi-device sychnchronization

Figure 9: MOS as function of waiting times for four difference task scenarios: initialization, storage, retrieval, and multidevice sync. Figures are taken from [1].

For a holistic File Storage QoE model, the different usage scenarios and the user profiles have to be taken into account which is the case for Q(S, t, F). However, future research is needed on how to quantify and measure long-term influence factors. Moreover it is not clear whether long-term and short-term influence factors are interacting and whether the model parameters are interacting (e.g., a and b). Future research also should guide how to integrate long- and short-term factors in a QoE model and how to integrate several short- term factors like waiting times and in-conflict files. Such a holistic model would enable the precise evaluation of the effects of our traffic management solutions on the QoE of Dropbox.

3.2.2.2 Quality-of-Experience Models for HTTP-Streaming A QoE Model of HTTP video streaming was presented in [59] from which the most important material is taken.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

User perceived quality of video streaming applications in the Internet is influenced by a variety of factors. As a common denominator, four different categories of influence factors [56], [101] are distinguished, which are influence factors on context, user, system, and content level.

- The context level considers aspects like the environment where the user is consuming the service, the social and cultural background, or the purpose of using the service like time killing or information retrieval.

- The user level includes psychological factors like expectations of the user, memory and recency effects, or the usage history of the application.

- The technical influence factors are abstracted on the system level. They cover influences of the transmission network, the devices and screens, but also of the implementation of the application itself like video buffering strategies.

- For video delivery, the content level addresses the video codec, format, resolution, but also duration, contents of the video, type of video and its motion patterns.

In this section, a simple QoE model for YouTube is presented whose primary focus is its application for QoE monitoring (within the network or at the edge of the network).

Therefore, we take a closer look at objectively measurable influence factors, especially on the system and content level. For this purpose, subjective user studies are designed that take into account these influence factors; for more details refer to [59].

The identification of key influence factors has shown that YouTube QoE is mainly determined by stalling frequency and stalling length. To quantify YouTube QoE and derive an appropriate model for QoE monitoring, we first provide mapping functions from stalling parameters to MOS values. Then, we provide a simple model for YouTube QoE monitoring under certain assumptions. Finally, we highlight the limitations of the model.

3.2.2.2.1 QoE Mapping Functions As fundamental relationship between the stalling parameters and QoE, we utilize the IQX hypothesis [37] which relates QoE and QoS impairments x with an exponential function :! = 4 ;<= + (. In [57], concrete mapping functions for the MOS values depending on these two stalling parameters, i.e. number N of stalling events and length L of a single stalling event, were derived. To be more precise, YouTube videos of 30 s length were considered in the bottleneck scenario leading to periodical stalling events. In order to determine the parameters α, β, γ of the exponential function, nonlinear regression was applied by minimizing the least-squared errors between the exponential function and the MOS of the user ratings. This way we obtain the best parameters for the mapping functions with respect to goodness-of-fit.

However, the aim here is to derive a model for monitoring YouTube QoE. Therefore, we reduce the degree of freedom of the mapping function and fix the parameters α and γ. If we consider as QoS impairment x either the number of stalling events or the stalling duration, we observe the following upper and lower limits for the QoE f(x), i.e. lim =A :! = B + ( and lim =C :! = (, respectively. In case of no stalling, i.e. x=0, the video perception is not disturbed and the user perceives no stalling. As we asked the user “Did you experience these stops as annoying?”, the maximum MOS value is obtained, i.e. α+γ=5. In case of strong impairments, however, i.e. : → ∞, a well-known rating scale effect in subjective studies occurs. Some users tend to not completely utilize the entire scale, i.e. avoiding ratings at the edges leading to minimum MOS values around 1.5 [115]. Hence, we assume α=3, γ=5 and derive the unknown parameter from the subjective user

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

tests. The obtained mapping functions as well as the coefficient of determination R² as goodness-of-fit measure are given in Table 7. In particular, the mapping function f L (N) returns the MOS value for a number N of stalling events which have a fixed length L. It can be seen that R² is close to one, which means a very good match between the mapping function and the MOS values from the subjective studies.

Figure 10 depicts the MOS values for 1s, 2s,3s, and 4s stalling length for varying number of stalling events together with exponential fitting curves (as discussed in [37]). The x-axis denotes the number of stalling events, whereas the y-axis denotes the MOS rating. The results show that users tend to be highly dissatisfied with two or more stalling events per clip. However, for the case of a stalling length of one second, the user ratings are substantially better for same number of stalling events. Nonetheless, users are likely to be dissatisfied in case of four or more stalling events, independent of stalling duration. As outlined in [102], most of the users accept a quality above 3 on the ACR scale, i.e. a fair quality.

Table 7: Parameters of mapping functions (seeFigure 10) of stalling parameters to MOS together with coefficient of determination R2 as goodness-of-fit measure. [59]

event length L (in s)

mapping function depending on number N of stalling events

coefficient of determination

R 2

1

f 1 (N)=3.50 e -0.35 N + 1.50

0.941

2

f 2 (N)=3.50 e -0.49 N + 1.50

0.931

3

f 3 (N)=3.50 e -0.58 N + 1.50

0.965

4

f 4 (N)=3.50 e -0.81 N + 1.50

0.979

f 4 (N)=3.50 e - 0 . 8 1 N + 1.50 0.979 Figure 10: Mapping

Figure 10: Mapping functions of stalling parameters to MOS. Video duration is fixed at 30 s. No initial delay is introduced. Parameters are given in Table 7

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

It has to be noted that it is not possible to characterize the stalling pattern by a simple total stalling duration T=L·N only, as the curves for f L (N) depending on the total stalling duration T=L·N differ significantly [60]. Therefore, stalling frequency and stalling length have to be considered separately in the QoE model.

3.2.2.2.2 Simple Model for QoE Monitoring Going beyond the pure mapping functions, we develop next an appropriate QoE model for monitoring. The intention of the monitoring is to provide means for QoE management [60]for ISPs or the video streaming service provider. Hence, the model has to consider an arbitrary number N of stalling events and stalling event length L, while the subjective user studies and the provided mapping functions f L (N) in the previous section only consider a finite number of settings, i.e. ∈ E1,2,3,4Hs. As a result of the regression analysis in the previous section, the parameter L of the exponential mapping function I ! 3.5 ;< J K 1.5 is obtained as given in Table 7.

The parameter β L of the obtained mapping function for given length L of single stalling event can be fitted with a linear approximation which yields a high goodness-of-fit R² close to 1. The linear relationship can be easily found as β(L) = 0.15L+ 0.19.

As simple QoE model f(L,N), we therefore combine our findings, i.e. f L (N) and β(L), into a single equation taking the number of stalling events N and the stalling length L as input

, ! 3.50 ; A.-LIMA.-N!K 1.50 for ∈ Q M , ∈ R (QoE)

Figure 11 illustrates the obtained model for YouTube QoE monitoring as surface plot. On the x-axis the number N of stalling events is depicted, on the y-axis the stalling event length L, while the actual MOS value f(L,N) according to Eq.(QoE) is plotted on the z-axis. The figure clearly reveals that the number of stalling events determines mainly the QoE. Only for very short stalling events in the order of 1 s, two stalling events are still accepted by the user with a MOS value around 3. For longer stalling durations, only single stalling events are accepted.

durations, only single stalling events are accepted. Figure 11: Simple QoE model maps a number N

Figure 11: Simple QoE model maps a number N of stalling events of average length L to a MOS value, , ! 3.50 ;A.-LIMA.-N!K 1.50. [59]

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

For other influence factors and limitations of this model please refer to [59]. In the context of SmartenIT, this QoE model can be exploited in order to estimate the influence of networking conditions on the user perceived quality. To this end, the stalling pattern as perceived on application layer, i.e. the number of stalling events and the average length of stalling events, are required as input. This application layer information can be easily extracted from the application directly or by analyzing the network traces on IP layer as described in [59].

3.2.2.3 Simple model for the Estimation of Transit Cost Considering the inter-cloud communication scenario (as initially described in Deliverable D1.1 [1]), we expect that it is likely that traffic flows between different clouds may span multiple ASes, while traffic generated due to the inter-cloud communication, including:

data/content replication and placement generated by, e.g., video streaming platforms such as YouTube,

data storage and replication for fault tolerance employed by online storage systems like Dropbox,

workload and VM migration within data centers of one or more cloud operators, etc.

will cross expensive inter-domain links in the ISP level. Therefore, in this subsection, we propose a simple model to investigate the impact of a traffic management mechanism performing scheduling; nevertheless, the proposed model can be employed to assess the impact of any traffic management mechanism addressing inter-domain traffic.

In particular, we use the model for a first evaluation of the mechanism for Inter-Cloud Communication (ICC), while is described in Section 4.3. The ICC mechanism is considered to exploit the discontinuity of the 95-th percentile rule, which is applied to estimate transit costs between ISPs, in order to route larger traffic volumes, without simultaneous increase of the inter-connection costs; a similar investigation has also been performed in [71]. The avoidance of the inter-connection costs increase can also be achieved by “hiding” the extra traffic volumes, i.e., sending them within 5-minute intervals, when traffic is lower than the expected 95-th percentile of a given period, i.e. one month. To this end, we address neither how the expected 95-th percentile is to be predicted, nor the scheduling mechanism itself; we only address the optimization potential of such a mechanism, if it would operate ideally and with perfect information.

For this investigation, we generated traffic traces using the Pareto distribution [70] for N = 4000 5-minute intervals. Then, we calculated the resulting 95-th percentile of the N samples, and the difference of the 95-th percentile and the actual traffic measurement at every 5-minute interval:

T extra =

max {T percentile - T instanteneous , 0} .

Figure 12 depicts the generated traffic traces following the Pareto distribution, while Figure 13 shows the difference of the 95-th percentile minus the traffic samples, whenever positive. Note that if the difference is negative, that would imply that the specific traffic trace exceeds the 95-th percentile, it belongs to the upper 5% of the observed samples. Then, further investigation is needed: if the top 4% values are increased, then the 95-th percentile is not affected; otherwise, if the 95th observation increases, then an increase of the inter-domain transit cost can be expected.

Traffic

Tpercentile-Tinstantaneous

Public instantaneous trace 95-th percentile 0 0 500 1000 1500 2000 2500 3000 3500 5min
Public
instantaneous trace
95-th percentile
0
0
500
1000
1500
2000
2500
3000
3500
5min intervals
0 500 1000 1500 2000 2500 3000 3500 5min intervals 4000 0.5 0.45 0.4 0.35 0.3

4000

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 0 500 1000 1500 2000 2500 3000 3500
0
0
500
1000
1500
2000
2500
3000
3500

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

4000

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Figure 12: N = 4000 traffic traces and the estimated 95-th percentile.

0.5

5min intervals

Figure 13: Difference of the 95-th percentile minus actual traffic traces.

Finally, Figure 14 illustrates the total volume of traffic that can be routed through an inter- domain link where the 95-th percentile rule is applied, when the scheduling mechanism is not employed (blue bar), and when it is (green bar). As it can be observed, throughput is more than 80% higher when such a scheduling mechanism is in place.

Additional significant benefit can be attained by an algorithm that transmits more traffic, up to the capacity of the link, within the 5-minute intervals with higher load than that of the 95th percentile; if this interval could be predicted accurately, then such an intervention would not affect the interconnection cost either. In general, the transmission of data volumes within the 5-minute intervals whose traffic traces are similar to that of the 95th percentile must be performed carefully, and requires accurate estimation of the expected traffic during these intervals.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

250 no mechanism 95th-aware mechanism 200 +82% 150 100 50 0 Throughput
250
no mechanism
95th-aware mechanism
200
+82%
150
100
50
0
Throughput

Figure 14: Throughput under a specific transit charge, with and without the scheduling mechanism operating ideally and with perfect information.

Next steps of this investigation involve the development of a model to accurately predict, e.g., based on historical data, the amount of traffic passing through a transit link at time t, as well as the expected 95-th percentile; such a model will allow ICC mechanism to perform more efficiently the scheduling of the data flows of the inter-cloud communication.

3.2.2.4 Energy Models Energy models may be used to estimate the energy consumption of networked devices based on their device state. This is beneficial as no direct power measurement is required after the model generation phase, and the estimation of the energy consumption may be executed on an entity other than the device to be gauged. This gauging entity then needs the knowledge of the energy model as well as the device state to infer the target devices’ energy consumption.

To allow a holistic view on the network, energy models for all networked devices used within SmartenIT are required. These models can be taken from the literature, if the publication was recent and the changes in the technology are small. Otherwise, models have to be generated by measuring the power consumption of a representative device, while stressing different aspects of the hardware. Most likely, this might be done by loading the network or varying the number of connected devices, but also idle modes must be considered.

The generation of models for individual devices, in particular end-user devices, is a linear process. Generating models of cloud instances requires knowledge of the cloud data center configuration, its relative usage, the placement of VMs on the physical machines, the services running, and their demands on CPU, hard drives, and network, as well as the energy consumption of the individual servers. As the available energy on mobile devices is limited, and in general scarce, this section focuses on the improvement of the energy efficiency of the mobile devices and the traffic control to satisfy the requirements of the mobile user while reducing the absolute energy consumed on the mobile device. Below energy models for mobile devices, NaDas, network infrastructure devices, and its implications on the exemplary applications within SmartenIT are discussed.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

Energy measurements on mobile devices may be conducted by directly measuring the power consumption between the battery and the device, or by modeling the power consumption of the mobile device, based on system parameters. This approach is proposed by [123]. The model generation is a two-step process. First, the power consumption of the device is recorded together with device parameters, like CPU usage, network activity, display status, brightness, and others. These measurements are then evaluated and the influence of each component is derived in a regression based approach.

As the measurements in [123] are quite aged, and the available devices and their feature set have evolved, models must be recreated for the devices used within SmartenIT. These models can then be used to accurately estimate the power consumption of the mobile devices, based only on the system parameters. Gross et al. show in [45], that the error in most cases is below 5%. Still, more experiments must be conducted to reliably determine the error margin.

Similar methods are also possible for wired network devices. The measurement of the power models for switching and routing hardware is straight-forward. Hlavacs et al. [54] suggest modeling the power consumption as a constant. This is justified, because the influence of the traffic moved through the device is low. Under some circumstances, it might even be beneficial to move large amounts of traffic. The power consumption of the optical Internet backbone is analyzed in [53]. For OpenFlow hardware, no models are available yet.

The energy consumption of cellular networks can be modeled for conventional base stations as well as for microcells. Arnold et al. have developed a model for both types of GSM and UMTS cells in [10]. Guo et al have simulated the energy consumption of a 4G network in a London case study [46], also making use of microcells to show possible energy optimizations.

For video streaming, Hinton et al. [53] separate the influences on the power consumption into network transmission and storage. Hinton et al. also suggest storing multiple copies of popular videos throughout the network to reduce the power consumption of the network. The modeling of video on demand, according to [53], is straight-forward, by adding the fixed power consumption of the servers to the transmission cost. Still, they disregard the need for an increase in server capacity for highly popular content.

In the case of UNaDas, first measurements indicate almost constant power consumption while running [16]. Still, more measurements are necessary to develop a fully qualified model. The network throughput as well as the power consumption of connected storage is not considered yet.

The above described power models can be combined with throughput models to derive the energy cost for individual users or data transmissions. This is possible by simply dividing the power consumption by the number of users or the number of bytes transmitted. These models need to be combined in the SmartenIT Traffic Manager to derive the optimal routing and caching decisions.

As far as the hardware is standardized, or differences between different hardware models are expected to be low, measurements may be omitted. Instead, the analytical models described in the literature might be used to calculate the energy consumption of the hardware components. If these models are known, and proven to be accurate, traffic measurements, or even the number of connected users suffice to model the energy consumption of the full network with reasonable accuracy. It is part of current work to develop these models and energy consumption functions.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Knowing the energy models of the individual components, the energy consumption for different services can be estimated. To accomplish this, traces of the application generated traffic must be recorded. These traces can then be combined with the energy model to derive the power consumption of the network components and the end user device. After establishing the state-of-the-art power consumption, experiments can be run to optimize the energy efficiency. This can be accomplished using the models only. Possible optimizations are improved routing or scheduling of connections. Here, the applications under consideration, YouTube and Dropbox, offer different optimization potential. As Dropbox, in general, can be considered delay-tolerant, scheduling of connections to more energy efficient time periods or connection types are possible. In the case of YouTube, the margins are much smaller, but as the same content is consumed by different users, in network caching or pre-fetching may be used to store a copy of the content close to the user, hence optimizing the delivery time to the mobile device.

3.2.2.5 Model on Resource Allocation Data centers offer resources, such as CPU, RAM, disk space, and bandwidth, to customers, e.g., end-users or cloud service providers, which consume different amounts of these, in particular different ratios. For example, one customer may consume 2GHz, 2GB RAM, 1TB disk space, and 1GB/s of bandwidth while another will consumes 1GHz, 1GB RAM, 3TB disk space, and 1GB/s, which makes it is hard to say which customer consumes “more” resources. In academic environments, where different customers, e.g., chairs, do not have to pay for the resources of, e.g., a cluster rented by their university, ensuring fairness is important. However, it may also be desirable, to integrate fairness guarantees into SLAs of commercial data centers, which usually offer their resources either in static shares or a nontransparent best effort manner.

Subsequently, a definition to make consumption profiles comparable is presented, which, if used to guide resource allocations, allows to make share guarantees while also enabling statistical multiplexing, i.e., the advantages of allocating resources statically and dynamically are combined. In particular, every consumption profile is mapped to a number that can be associated to the greediness of that customer, i.e., if the number is positive the customer “consumes beyond his means” without appropriately ceding resources of what would be his share to other consumers, and, if the number is negative, the opposite is the case. This number is therefore referred to as the greediness of a customer. If resource scarcity occurs, constraining greedy customers stronger in their resource usage will enforce fairness. By not trimming resources of the share of a customer, who has greediness less or equal to zero, will implement resource guarantees.

Such sharing policy can be integrated into the SLA offered by commercial data centers, as it allows combining the attractiveness of guarantees with the attractiveness of on demand resource shares. However, also for non-commercial resource sharing such policy will make sense. When resources are not to be paid for, customers are usually asked for their expected needs by the administration to rent an according infrastructure. When the infrastructure is deployed subsequently, resources that were claimed by a customer should be guaranteed to him. On the other hand, when resources are not used by a customer, these should be reallocated to optimize costs.

Let d i,j 0 be what customer c i consumes of resource r j , e.g., CPU used or RAM allocated. For any resource r j , let e(c i ,r j ) be the endowment of customer c i of resource r j . For example, if resources are divided equally among all customer, we have e(c i ,r j )= q(r j ) ÷ m for all c i and r j , where m is the number of customers and q(r j ) what is available of resource r j . Then, d i,j − e(c i ,r j ) is the amount of resource r j that customer c i consumes beyond his

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

endowment (if the difference is negative, c i is willing to release some of his endowment). If d i,j > e(c i ,r j ), other customers have to release some of their endowment of r j in order to cover for c i ’s additional demand, i.e., for what c i demands beyond his endowment. Therefore the additional demand should be added to the greediness of c i . If d i,j e(c i ,r j ), customer c i ’s greediness should decrease. However, in this case, only to an extent that other customers benefit from the release, which applies, when other customers request r j beyond their endowment. This notion can be formalized as follows. Define α(r j ) as the sum

of what customers demand beyond their equal-share of r j , i.e.,

demand beyond their equal-share of r j , i.e., and β (r j ) as the

and β(r j ) as the sum of what customers release of r j , i.e.,

) as the sum of what customers release of r j , i.e., The greediness of

The greediness of customer c i is defined as

, i.e., The greediness of customer c i is defined as , where b(d i ,

,

where b(d i,i ) is the balance for consumption d i,j and defined as

the balance for consumption d i , j and defined as Note that in case β
the balance for consumption d i , j and defined as Note that in case β

Note that in case β(r j ) is 0, the else-part of the above definition is never reached and therefore α(r j ) is never divided by 0.

A data center operator can monitor the different resources consumed by customers and

then apply the definition to identify heavy/greedy customers. In case scarcity occurs in the

data center, the scarce resources allocated to greedy customers can be trimmed stronger than those allocated to non-greedy customers. In this way, the greediness of customers is aligned and thereby fairness achieved.

3.2.2.6 Model on Federation of Clouds

A federation of clouds or data centers is formed by smaller players to increase geographic

diversity and offer a richer resource pool, thereby enabling competition with larger players. In order to allow customers of a federation to take the best possible advantage of the federation’s diversity, they must be able to choose data centers within the federation from which they consume resources, i.e., they are not forced to connect to a certain data center that then relays their requests. However, if a customer is allowed to request resources from any data center in the federation individual data centers may easily be overloaded, wherefore load balancing between data centers becomes necessary. Furthermore, fairness between customers may have to be enforced. If this is done by each data center independently (each data center enforces internal/local fairness), customers are forced to consume resources evenly on all data centers in the federation. Such enforcement of local fairness is inefficient, if customers have complementary demands, which may be the case, for example, for geographical reasons or because the resources of some data center are better suited for a customer (faster disk, better graphic rendering, faster CPU, etc.). To illustrate the difference between local and global fairness consider a federation of two data centers D1 and D2 with two customers C1 and C2 (with equal endowments). Local

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

fairness would require both customers to receive 50% of D1's resources and 50% of D2's resources (shares of different resources inside a data center may be different). Global fairness, however, would for example, also allow to allocate 100% of D1's resources to C1 and 100% of D2's resources to C2 (assuming that both data centers provide an equal amount of resources). Note that, an allocation that is locally fair for every data center is also globally fair, but not vice versa, wherefore global fairness is a relaxation that allows for more flexible thus more efficient resource allocation.

To allow a data center federation to arrive at a globally fair allocation and also give incentive to load balancing throughout the federation, each data center has to account for the resources consumed by customers. This local consumption is then announced to all data centers, such that all data centers can calculate the global consumption of customers to apply shaping based on this information and thereby ensure global fairness. (In Section 4.7, it will be argued why the Greediness metric presented there is suited to exchange such information in a compact manner.) When this scenario is modeled game-theoretically two kinds of self-interested agents exist: data centers and customers. It is assumed that customers pay a flat fee to use the federation’s resources. In particular, even if volume based charging would be assumed, this charging model would not take into account the location of the consumed resources. Therefore, customers will try to consume as many resources from their preferred data centers as possible while they have no inherent interest in load balancing (Tussle I). This tussle can only be resolved by close cooperation of data centers and the use of a suited consumption metric. Depending on whether the federation splits revenues by a contribution based scheme or a fixed scheme, interest of data centers is to use their capacity to the highest possible degree (as this increases their worth for the federation and therefore payments they receive) or to not have their resource used at all (as this decreases their energy costs). Therefore data centers have incentive to over-report used resources or to not provide resources to customers (Tussle II). In order to expose such strategic data centers, other data centers can introduce dummy customers to the system and monitor how much resources they receive. Since a malicious data center does not know which customers are the dummy ones, it will misreport resource usage for these customers as well. By comparing received and reported resources for dummy customers, malicious data centers can be identified easily.

3.2.2.7 Game-theoretic Model In order to sort out game-theoretic models for the strategic situation of SmartenIT stakeholders, three aspects are needed. That is players, their strategies and payoffs they achieve. Deliverable D1.1 [1] proposed terminology and relation model describing stakeholders and their interests (crucial in assessing payoffs). This document provides us with strategies (traffic management mechanisms) and offers insights into our understanding of potential payoffs.

It is important to understand that not only goals of stakeholders differ, but also their willingness to employ new strategies may be different. For ISP, cloud service providers , content providers, even a small change in energy cost, small reduction of network congestion or transit traffic can add up to a considerable improvement of revenue. Also they can be expected to be rational players. This may not be the case for end-users. Users may evaluate new application or traffic solution via larger set of metrics, including, for example, habits that reduce willingness to try out new applications. Therefore strong incentives may be required when users approach. Subjective study (3.2.2.1) shows that user’s QoE relation to objective measures is non-linear, and therefore may mitigate perception of improvement brought by SmartenIT solutions.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

Therefore, some of traffic management mechanisms presented in this document can be considered as “vertical games”, games between different types of stakeholders involving different metrics for players. Other, “horizontal games”, will involve strategic decisions among stakeholders of the same type, strategizing in similar manner.

Users may be indifferent to traffic solutions that keep traffic local, unless it brings them significant improvement in QoE. Thus localization of traffic can be treated as a game, where all players are interested in reduction of transit traffic. This makes it easier to coordinate all players to coordinate their strategies.

Trust based solution like HORST (4.1), while beneficiary towards all SmartenIT scenarios, can potentially introduce interesting aspect from game theory standpoint. If high trust score ensures benefits from resources shared by other HORST users, then game is competitive, with players putting effort in order to gain high score.

If the solution requires coordination of actions from different type stakeholders, it will be essential for solutions to satisfy their differentiated interests in order to expect coordination. Game-theoretic evaluation models of each TM solution will require outcomes in relevant metrics for involved parties.

This section provides the basis for a detailed definition of game-theoretic models. It is part of future work to investigate which game-theoretic models can be applied to the selected traffic management solutions and use-cases, which will be defined in D2.3.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

4 SmartenIT Traffic Management Solutions

In this section we propose and justify traffic management solutions. A traffic management solution is composed by one or more traffic management mechanisms. The scenarios addressed by the traffic management solutions are described to enable classifying the proposed solutions to identify common use-cases in task T2.2.

The structure of the subsections is recurring with the aim to provide all means for subsequent performance evaluation of the solutions. Despite the description of the traffic management solutions the factors that have a key influence on the associated mechanisms are identified. To assess the performance of the mechanisms key performance metrics are defined. Finally initial evaluation results for each proposed traffic management mechanism are provided.

4.1 Home Router Sharing based on Trust

According to [20], the number of mobile-connected devices will exceed the world's population in 2013 and mobile data traffic is ever increasing. To handle the growth and reduce the load on the mobile networks, offloading to WiFi has come to the center of industry thinking [119]. In 2012 already 33% of total mobile data traffic was offloaded onto the fixed network through WiFi or femtocell, and the number of public WiFi hotspots is increasing up to several millions. Additionally, there is a much larger amount of private WiFi hotspots, which could also be utilized for data offloading.

With users increasingly sharing their lives in online social networks (OSNs) and content spreading along connected friends (so called social cascades), there is a new reason to utilize private home routers. Social awareness, i.e. the collection and exploitation of social signals, can be used to predict social cascades, i.e. the propagation of content links in OSNs, and thus specify where and by whom content will be requested. As home routers are/can be equipped with storage capacities, a socially-aware traffic management mechanism is possible which proactively sends the content to a router at which it is/will be requested.

Home router sharing based on trust (HORST) is such a mechanism which addresses three use cases: data offloading, content caching/prefetching, and content delivery. Our solution consists of a firmware for a home router, an OSN application, and a mobile device app. The firmware sets up two WiFi networks (SSIDs) - one for private usage and one for sharing. Additionally, a user-owned nano data center (UNaDa) is established on the home router. The owner of the home router uploads the WiFi access information of the shared WiFi to the OSN application. Each user can share his WiFi information to other trusted users via the app and request access to other shared WiFis. As the application knows the position of the users, it can recommend WiFis near to the users, or automatically request access and connect the users for data offloading. Social cascades will be used to predict which content will be requested by which user. As the application also knows about the current and future users of each WiFi, the UNaDa on the home router can be used to cache or prefetch delay-tolerant content which will be delivered when the user is connected to the WiFi. Finally, HORST federates all UNaDas to form an overlay content delivery network (CDN), which allows for efficient content placement and traffic management.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

– Definitions of Traffic Management Mechanisms Public Figure 15: Basic HORST functionality. 4.1.1 Addressed

Figure 15: Basic HORST functionality.

4.1.1 Addressed Scenarios

The main aspect of HORST is providing a ubiquitous Internet access via WiFi to all participating users. Additionally, it is a socially-aware traffic management solution which utilizes network resources more efficiently and improves users' QoE.

Figure 16 shows the basic functionality of HORST. HORST uses personal data, friendship relations, and communication patterns from OSNs to compute trust relations which, e.g., can be based on common trusted connections, or reliability, or recent cooperative behavior. Furthermore, the OSN can provide information about the popularity of content and about the interest of specific users. The OSN may also provide geo-location information about the users, which allows for recommendation of nearby shared WiFi access points of trusted users, and prediction of content request locations. HORST uses this information a) to authorize access to WiFi for data offloading, b) to enable content caching and prefetching, and c) to efficiently manage an overlay CDN.

Thus, HORST addresses all four SmartenIT scenarios social awareness (data from OSNs and position information are used to compute trust score, to predict user location, and to enable pre-fetching), global service mobility (global WiFi access is provided, and delay- tolerant data can be made available globally via NaDas), energy efficiency (data offloading to WiFi reduces load on 3G links, and idle home routers can reach a higher utilization), and inter-cloud communication (content distribution). In the following paragraphs the covered use cases within these scenarios are described in more detail.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

4.1.1.1 Data Offloading

As users can access shared WiFis through HORST, they can use the WiFi network to connect to the Internet. This reduces the load on the 3G link and will eventually lead to less cost for mobile network operators. End users experience a higher bandwidth and their mobile device has lower energy consumption because of lower signal strength and faster data transmission. Additionally, HORST can guide users to the nearest WiFi and manage the access request based on the trust between them and the owners. This leverages the scheduling of both upload and download transmissions for delay-tolerant content, such that they are conducted when the user is connected to a WiFi network.

4.1.1.2 Content Caching and Prefetching

OSN and UNaDas provide information such as social cascades and access history, which can be used to calculate and predict temporal and spatial popularity of content. Based on that, it is possible to decide which content to prefetch to which local cache, and how long to keep the content in the cache for best performance. As a result, users who want to access content which is shared via OSN (social cascade) or which is requested frequently via their friends' home routers will often find that their home router already stores the wanted content. Additionally, users can indicate that they want to access content at a later time, e.g., when they have WiFi coverage or when they are back at home, and in the meantime HORST will prefetch this content to the specified UNaDa. Thus, it is possible to download content which will not be consumed immediately in off-peak hours to avoid congestion. When users request content via WiFi from their router, they can access it with much less delay and a higher bandwidth resulting in a higher QoE for almost all services. At the same time cloud operators benefit from reduced load on servers, decreased storage demand, and improved Quality-of-Service.

4.1.1.3 Content Delivery

Content which is about to become popular for end users will be requested from the original content provider and distributed via the UNaDa-induced overlay CDN. In doing so, HORST will efficiently utilize the network resources, e.g., if possible request and distribute content only in non-peak hours. If a user requests content which is not yet stored on her UNaDa, HORST will decide from which resource the content is requested. The selected resource can be, e.g., another UNaDa or the server of a cloud service provider, depending on different metrics, e.g., network (latency, congestion), cost (inter-AS traffic), cloud (data center workload), overlay (location of users and stored content), energy (offloading savings), or QoE (user satisfaction) metrics. Thus, HORST can flexibly integrate various metrics (if available) in order to perform traffic management and optimize the overlay.

4.1.2 Definition of SmartenIT Traffic Management Mechanisms

The traffic management mechanism will consist of the following components which will be described briefly: home router firmware, online social network, and mobile device application. Additionally, a decision entity is needed which decides on traffic management based on different metrics. However, the placement (i.e., centralized or decentralized) of this entity, the scope of decisions, and the concrete decision algorithms are not finalized yet, and thus, this entity is omitted here and will be covered by future work.

4.1.2.1 Home Router Firmware

Due to legal issues, a shared WiFi, which is separated from the private WiFi network, is required for home router sharing. To host a nano data center and to contribute to an

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

efficient CDN overlay, further requirements arise. The home routers need to push or pull content from another node in the overlay network (which includes other UNaDas as well as original content providers, e.g., a cloud service). Then, they need to be able to intercept requests from end users to directly serve cached content. Based on the load and location of the home router, content requests can also be redirected to other nodes in the overlay. Thus, load balancing and traffic management can be established, and service quality is assured.

4.1.2.2 Online Social Network Application The major innovation of HORST is the OSN application, which provides input for all traffic management mechanisms. It allows for the utilization of the convenient and well-known user management of the OSN. Thus, to participate in HORST users simply log on with their OSN credentials and grant permissions to the application. The required permissions include access to personal data, communication data, and position data. Moreover, users have to specify information about their home router, i.e. WiFi SSID, WiFi and UNaDa access passwords, home router position, and IP address.

Furthermore, the OSN application provides a mechanism to compute trust scores. This may be an explicit rating of other users, i.e. a user indicates which other users she does or does not trust, or an implicit mechanism in which the application computes trust scores based on OSN topology, personal data, and communication data. Users could then set a rule, e.g., to automatically trust all users which have a score above a certain threshold. Moreover, also a combination of explicit and implicit mechanisms is possible, e.g., a system which recommends trustworthy users which have to be confirmed explicitly.

Users who want to get access to another WiFi have to send a request to the owner. If the owner trusts the user, she can get the WiFi credentials and access the new WiFi. While users are moving, the application can analyze their position data and recommend or automatically request access to near WiFis. An incentive mechanism which rewards users for sharing of their home router still has to be developed, e.g., a credit point system in which users gain credit points for each share, but have to pay some credit points for each request. This mechanism should also take into account users who have no router to share but also want to participate in HORST for improved QoE.

With the OSN application, information about users (interests, preferences, position) and content (popularity, social cascades) can be gathered and exploited for enhanced traffic management. First, popular content can be detected and distributed over the network of UNaDas on the home routers to minimize delay and increase reliability. Additionally, content access patterns can be taken into account, such that the content can be distributed more efficiently for network operators, e.g., during non-peak hours. Second, depending on the importance and sensitivity of the content, it is possible to share and distribute content only to UNaDas of trusted users. Finally, the same mechanisms can be used in combination with users' location data to prefetch or cache content, which is interesting for a specific user, on that home router to which she already is or soon will be connected. Here again, it would be possible that a user explicitly indicates the content she wants to consume, the time of consumption, and the home router to which it shall be transported.

To put it in a nutshell, based on the information from the OSN, HORST allows for efficient user-centric content placement which minimizes the distance between content and users, reduces loading times, and thus increases the users' QoE. Additionally, it takes into

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

account when and where the content is accessed, which makes it possible to utilize the resources of network operators more efficiently.

4.1.2.3 Mobile Device Application Instead of using the OSN application in a browser, a mobile device application makes the usage of HORST more natural. The mobile device automatically provides the needed data to the HORST application (e.g., position), such that the user can benefit without the need to be constantly engaged and manually upload information. Moreover, the mobile device application not only requests the WiFi credentials from the OSN application, but also stores them on the device for automatic connection to the WiFi network. It manages the handover between different interfaces (3G, WiFi) or between different access points. Finally, it includes a transmission scheduler which manages for both upload and download, whether content is more or less delay-tolerant, and whether it can or cannot be offloaded to a WiFi network.

4.1.3 Identification of Key Influence Factors

As HORST brings benefits through applying a caching/prefetching mechanism for content, its performance is mainly influenced by content demand and popularity. Therefore, spatial, temporal, and topic dependent characteristics of the demand should be considered when distributing the content via the UNaDas. Second, HORST’s ability to save inter-AS traffic is depending on the distribution of cloud service resources, UNaDas, and users over ISPs. If many actors are in the same AS, locality can be exploited by HORST. Third, the geographic distribution and movement patterns of users influence the performance of HORST. In particular they determine where content will be requested and by which access technology, i.e., whether an offloading is possible (in case the user has WiFi coverage) or not. This has a big influence on energy savings introduced by HORST. Finally, upload/download bandwidth of cloud resources, UNaDas, and end user devices influence the content distribution speed and can thus influence the resulting quality perceived by the end user.

4.1.4 Key Performance Metrics

The performance of HORST can be measured by several metrics. As HORST will optimize the overlay application service, the improvement in terms of QoS parameters and QoE parameters of that overlay application must be taken into account. Moreover, HORST specific performance metrics can be applied. As HORST is a caching approach, its cache- hit rate measures its performance in exploiting social information and predicting the content demand correctly. Next, energy savings (savings of data center resources as well as better utilization of home routers) and traffic savings (inter- and intra-AS traffic and offloading potential) have to be considered.

4.1.5 Initial Evaluation Results and Optimization Potential

Nano data centers (NaDa) are a distributed computing platform on ISP-controlled home gateways, which were first presented in [12], [13]. They can be used for content delivery and showed to be significantly more energy efficient compared to traditional data centers. Thus, NaDas can form a CDN on their own but also facilitate the deployment of applications such as peer (NaDa) assisted video on demand streaming [14]. In [15], it is described that shared WiFi routers could be utilized for pre-fetching of content, which reduces the perceived delay up to 50%.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

In [69] the end device is used as NaDa and serves as its own cache. Both content that is globally popular and content that is of personal interest are cached overnight directly on the end device. Figure 16 shows, that there is a high potential to reduce both, response time and energy consumption with respect to different access technologies.

consumption with respect to different access technologies. Figure 16: Potential of caching on the end-user device

Figure 16: Potential of caching on the end-user device for response-time and energy consumption

Social awareness is a novel approach to traffic management on the Internet. With socially-aware caching future access to user generated content (e.g., videos) shall be predicted based on information from OSNs. Hints are generated for replica placement and/or cache replacement which show to increase cache performance. In [16] the classical approach of placing replicas based on the access history is improved. Therefore social cascades are identified in an OSN, and locations of potential future users (i.e. OSN friends of previous users) are taken into account. In [17] standard cache replacement strategies are augmented with geo-social information from OSNs. Again social cascades are analyzed to recognize locally popular content and keep it longer in the cache. Specialized solutions [18], [19] exist for video streaming, which explore social relationships, interest similarity, and access patterns for efficient pre-fetching to improve users’ QoE.

4.1.6 Mapping of Mechanism to SmartenIT Architecture

The components of HORST can be seen in Figure 60. HORST works mainly on the end user level where it employs an overlay management and a social monitor. The cloud is considered as a (fallback) content source and thus also part of HORST. The HORST mechanism basically relies on social awareness and traffic management, but an improved decision algorithm taking into account more metrics is possible.

4.1.7 Example Instantiation of Mechanism

The HORST mechanism eases data offloading to WiFi by sharing WiFi networks among trusted friends. Moreover, it places the content near to the end user such that users can access it with less delay and higher speed, which generally results in a higher Quality-of-

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Experience. In order to participate, a user needs a flat rate Internet access at home, he has to install the HORST firmware to his home router, and he needs to install an application to his mobile device.

The HORST firmware establishes two separate WiFis (a private and a shared WiFi) and manages the local storage of the home router as a cache. It forms an overlay with HORST systems on other routers to exchange overlay information (such as home router location, cache content, prefetch commands) and conducts (active or passive) traffic measurements between the end points. The application on the mobile device of users sends social information (location, activity patterns, and interest) to the HORST system on its own router. Thus, private data of the user stays on its own devices. Additionally, the HORST router has a social monitor component to collect social information from an online social network about the router’s owner and his trusted friends. If a user approaches the home router of a trusted friend, he is provided with access data via the mobile application to connect to the shared WiFi. After the user has connected, the friend’s home router sends a notification to the users own router.

Every HORST system predicts the content consumption (i.e., when and where will which content be requested) of his owner based on location, activity patterns, interests, and information from the online social network such as content popularity and spreading. If a predicted content is not yet available in the local cache, it will be prefetched. If the user is connected to a friend’s home router, a prefetch command is sent to the HORST system on the friend’s router. For prefetching as well as for actual requests which cannot be served locally, HORST chooses the best source (either another home router or a cloud source) based on overlay information and traffic measurements, and fetches the desired content. In regular intervals, HORST checks if the content in his own local cache is still relevant (either for local consumption or as a source for content delivery) and decides whether to keep or replace it.

As the development phase is ongoing, the basic usage of HORST might be altered in some of the described aspects. Moreover, it shall be noted that the HORST system leaves room for improvement in several ways, e.g., by performing explicit traffic management between the home routers, by taking into account external network information, by introducing an incentive system for sharing resources (such that also users without a home router or a flat rate could also participate), or by collaborating with cloud services, ISPs, or OSNs.

4.2 Socially-aware TM for Efficient Content Delivery

There is plenty of work in literature examining the distribution of content published on OSN, either popular or long-tailed, as well as the exploitation of information by the OSNs to accommodate the dissemination of this content. Two recent approaches, i.e. SocialTube [76] and WebCloud [124], have been overviewed in Deliverable D2.1 [109], have comprised the basis for the development of an innovative TM mechanism that sufficiently addressed the requirements set by SmartenIT (also described in [109]).

Therefore, in this section, we propose a traffic management mechanism, called Socially- aware mechanism for Efficient Content Delivery (SECD), in order to efficiently deliver videos published in OSN websites. First, we describe the use-cases of video viewing, which are being addressed by the proposed socially-aware mechanism, and we proceed with the specification of the proposed mechanism and its components. Then, we identify key influence factors and key performance metrics for our mechanism and its evaluation,

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

while finally, we provide evaluations results of the proposed mechanism, as well as we compare to an existing approach in literature.

4.2.1 Addressed Scenarios

The proposed socially-aware mechanism for efficient content delivery addresses two use- cases, which can be categorized under the more generic scenario of the exploitation of social information for efficient content delivery described in Deliverable D1.1 [1].

We take Facebook (as the most popular OSN) and videos published on Facebook as case-study for our approach. Video viewing is an increasingly popular application on Facebook. Most of the users upload videos in their profiles; the videos uploaded are hosted either by a Facebook video server or by an external video platform like YouTube.

4.2.1.1 Video viewing case 1

In the first use-case, we consider that a user shares a video to his profile on Facebook which is uploaded and hosted to a Facebook video server; this is the case for 14% of the videos shared in Facebook [31]. Then, users can view the video directly from the Facebook video server as depicted in Figure 17.

from the Facebook video server as depicted in Figure 17. Figure 17: Video hosted on Facebook

Figure 17: Video hosted on Facebook video server.

4.2.1.2 Video viewing case 2

In the second use-case, we assume that a user copies a link of a video from an external site, e.g., directing to a YouTube server, and posts this link on his wall in Facebook; this is the case for up to 80% of videos [31]. Users can then view that video by clicking on this link and becoming redirected to the external server that hosts the video. Finally, the viewer will download and watch the video from that server as depicted in Figure 18.

watch the video from that server as depicted in Figure 18. Figure 18: Video hosted on

Figure 18: Video hosted on YouTube video server.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

4.2.2 Definition of SmartenIT Traffic Management Mechanism

The SECD mechanism exploits social relationships, interest similarities with respect to content and OSN content locality of exchange in order to enhance the content delivery that takes place on top of OSNs. The proposed mechanism has been initially designed for enabling efficient video dissemination; nevertheless, it provides the capability to efficiently handle any type of content shared in OSNs.

The basic constituent elements of our mechanism include:

a. a socially-aware messaging overlay for alerting potential viewers of a video in order to request the prefix (first chunk) of a video.

b. a Social Proxy Server (SPS) located in every local region (e.g., Autonomous System – AS) in order to enable social awareness and achieve traffic localization. Each user of the OSN is considered to be connected to the SPS of his AS (region).

c. a content-based P2P overlay to perform video dissemination, both in intra- and inter-AS level if peering links between ISPs exist.

d. a two level caching strategy employed both in the caches of the users and the cache of the SPS.

Moreover, the successful operation of the SECD mechanism is heavily based on the functionality of the following algorithms:

i. the socially-aware messaging overlay construction algorithm: the algorithm creates clusters of nodes (users) who are potential viewers of an uploader, where messages will be disseminated later by a pull-based prefetching algorithm. This algorithm runs as soon as a video is published/posted (a future adjustment is to run periodically) and the necessity of this algorithm is to determine which user is a potential viewer of the uploader for each of the categories of his interest. This algorithm can run either in the clients’ of the end-users, e.g., browser add-ons, or in local SPS.

ii. the socially-aware pull-based prefetching algorithm: the algorithm uses the messaging overlay to push an alert message and triggers the prefetching for the nodes (users) which received the alert message. This algorithm runs for each user when the user uploads a video, and pushes the alert message in all users in the cluster corresponding to the videos’ category of interest. Part of the algorithm runs on the users’ client and the other part on local SPS. The prefetching algorithm is pull-based because users are considered to request the prefix from their local SPS and then the local SPS pushes it to them; we decide to follow this strategy in order to avoid multiple downloads of the same video prefix within the same AS.

iii. the content-based local P2P overlay construction algorithm: the algorithm is activated when a “watch” activity for a video occurs. The algorithm creates a local P2P overlay, e.g., a BitTorrent-like overlay, with nodes (users), which have already viewed and stored the video in order to assist SPS in video sharing (seeders) and with users watching the video at this moment (leechers). If a swarm already exists for a specific video, the algorithm will just update the set of users that constitute the swarm. The algorithm runs in each local SPS.

The objective of our mechanism is to improve the QoE of OSN users in terms of decreased latency, i.e. time to start watching the video, as well as to reduce inter-AS traffic and therefore, reduce transit inter-connection costs for the ISP.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

4.2.3 Identification of Key Influence Factors

In this section, we summarize the set of parameters, which are considered significant in the context of the proposed mechanism, which have also been used in our evaluation

studies; note that a detailed description of this set of parameters is provided in Section

3.2.1.1.

As described in Subsection 3.2.1.1.1, in our evaluation, time is slotted in slots of 20 minutes, in order to be consistent with the fact that a user is active in Facebook about 20 minutes per day on the average. Additionally, we take that the average video length is 4 minutes [24]; thus, we assume that a user may watch from 1 to 5 videos in this 20-minutes interval, where the number of videos watched follows the uniform distribution. As expected, videos of top interest for users, as well as videos with highest popularity are more likely to be watched.

Regarding the users’ activity, we assume that only 50% of the OSN users are active daily. We chose randomly the users that will be active each given day, but users with more friends have an extra possibility to be active. If a user is active in Facebook on a given day, he is considered to be active only for the duration of a selected 20-minute slot. Furthermore, we assume that each user is active in the Internet (in general) for 140 minutes (i.e. seven 20-minutes slots). We assume that these seven timeslots are continuous, and thus when a user logs in Facebook, he does it in the middle of any of these timeslots. Regardless a user’s activity in Facebook within a specific day, that user can seed content, which he has stored while active in Facebook in previous timeslots.

Concerning users’ interests in video categories, we have assigned 4 video interest categories to each user, while each user is considered to share and watch videos only out of these 4 categories. To decide in which 4 categories a user is interested in, we used a weighted random choice and we chose 4 categories out of 19 total interest categories.

Additionally, we considered a distribution of users in ASes; specifically, we assumed that each user is located in one specific AS by assigning an AS id per user. Then, in order to distribute the OSN users among the ASes, we used the Zipf distribution.

Depending on the users’ viewing behavior, we performed a categorization of them into followers, non-followers and other viewers, while taking into account the social graph and based on the number of hops between a user (uploader) and a viewer of his, we extended the categorization of followers, non-followers and other viewers to those in 1 and those in 2 (social-graph) hops away from the uploader of a video.

Regarding the video popularity (potentially) disseminated within the OSN, we created a pool of videos in order to simulate a video platform like YouTube, and assigned to each of them popularity following the Power Law distribution and an interest category by weighted random choice using as weights the percentages of each interest category as it appears in Table 2.

Moreover, we assume that the number of videos uploaded daily in our system is equal to 1/20 of the total number of user in our system. In each day we decide which users will upload/share videos, where the probability that a user becomes an uploaded is modeled by a Bernoulli distribution. Additionally, each user can upload none, one or more videos watched from the video server of a third-party, or re-share a video that he has already watched from a friend’s wall but only within the 20-minute slot that he is active in Facebook.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Finally, concerning the number of prefixes that can be uploaded be a user, we assume that each user is able to push only one video prefix through his messaging overlays in any given day. We make this assumption because a user hardly uploads a video per day, so there is no point trying to push more video prefixes. In the case where a user uploads more than one video, say two, then he is considered to push only one video prefix within that day, while he pushes the video prefix of the remaining un-pushed video in the next day.

4.2.4 Key Performance Metrics

In order to evaluate the SECD mechanism and to be able to compare it with other approaches in literature, such as SocialTube proposed in [76], we define a set of performance metrics of interest. Specifically, we consider and monitor during our simulations the following metrics:

Inter/Intra AS traffic: traffic generated by video dissemination (including prefetching) both in the intra-AS and inter-AS links.

Contribution of server hosting the video: the percentage of traffic handled by the external server, e.g., YouTube or Facebook server, where the video is hosted.

Caching accuracy of SPS: the percentage of video prefixes or videos, which had already been stored in the cache of the SPS, when a user requested it.

Accuracy of prefetching: the percentage of video prefixes stored in the cache when a user requested to watch the corresponding video.

Useless prefetching: the amount of video prefixes pushed to and never used by the users who received them.

Redundant prefetching: Redundant prefetching occurs, when the same prefix is being pushed to a user from multiple sources, i.e. two or more of his friends.

4.2.5 Initial Evaluation Results and Optimization Potential

In this section, we provide preliminary evaluation results obtained by means of the simulation framework described in Subsection 3.2.1.1.

First, we evaluated the SECD mechanism w.r.t. to the prefetching accuracy. In particular, we observed that both the proposed mechanism and SocialTube [76] achieve high overall prefetching accuracy around 88%. This is due to the fact that both mechanisms follow the same philosophy in the clusters construction of users where a prefix will be pushed. Both mechanisms add in the clusters of a user his followers and his non-followers have the corresponding interest. In others words both mechanisms follows same approach in prefetching. Their difference is that in our mechanism the users send an alert message through the messaging overlay and then the prefix pushed by the local SPS, while in SocialTube the prefix is pushed from the users directly to their viewers in the clusters.

In Figure 19, the prefetching accuracy of SECD for an increasing number of watched videos is depicted. We observe that as the number of watched videos by a user increases, our mechanism achieves an improved prefetching accuracy. Next, in Figure 20, the prefetching accuracy of SECD as the number of video prefixes pushed to a user increases. Again, the results indicate that as pushed prefixes increase, our mechanism achieves higher improvement.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

100 90 80 70 60 50 40 30 20 10 0 5 15 25 35
100
90
80
70
60
50
40
30
20
10
0
5
15
25
35
Number of watched videos
Prefetching accuracy

Figure 19: Prefetching accuracy vs. number of watched videos.

100 90 80 70 60 50 40 30 20 10 0 1 2 3 4
100
90
80
70
60
50
40
30
20
10
0
1
2
3
4
5
Number of prefetched videos
Prefetching accuracy

Figure 20: Prefetching accuracy vs. number of pre-fetched videos.

One of the main targets of our mechanism is to achieve reduction of the inter-AS traffic generated due to the video and prefixes dissemination. The results in Figure 20 and Figure 21 show that SECD indeed achieves significant reduction of inter-AS traffic. As it appears in Figure 22, if we assume that the prefetching in the current OSN video sharing system architecture, i.e., following a client-server architecture, generates 100% inter-AS traffic, then SocialTube is found to generate 69% of inter-AS traffic, while SECD only 18%; namely, 82% reduction of inter-AS traffic is achieved. This high reduction of inter-AS traffic achieved by SECD is due to the fact that the prefix of each video is downloaded only once per AS and then; it is cached from the local SPS. Thus, SECD minimizes the redundant traffic due to multiple downloads of the same prefixes in inter-AS links.

100% 100 69% Client-Server 50 18% SocialTube 0 Proposed
100%
100
69%
Client-Server
50
18%
SocialTube
0
Proposed

Figure 21: Inter-AS traffic generated due to prefetching.

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

Moreover, in Figure 22, we can see that SECD achieves high reduction of the total traffic inter-AS traffic (i.e. traffic created from a “watch” activity by a user on video) generated by the video dissemination. If we consider again the client-server architecture where a user downloads the complete video from the video server is hosted, we assume that inter-AS traffic created is 100%. For the mechanism of SocialTube, inter-AS traffic created is 90% of total traffic, while for SECD inter-AS traffic created accounts the 18% of all traffic. Figure 23 illustrates inter-AS traffic generated by each mechanism within a full day; as expected traffic is higher (under all approaches, when users’ activity is higher (see Section 4.1.2).

100% 90% 100 50 18% 0
100%
90%
100
50
18%
0

Figure 22: Total inter-AS traffic generated.

40000 30000 20000 Client-Server SocialTube 10000 Proposed 0 1 8 15 22 29 36 43
40000
30000
20000
Client-Server
SocialTube
10000
Proposed
0
1
8 15 22 29 36 43 50 57 64 71
A day/20min intervals
traficc in MB

Figure 23: Total inter-AS traffic during one simulation day.

Thus, concerning the preliminary evaluation of the proposed mechanism, we summarize below major results obtained by means of simulations. Specifically, the socially-aware traffic management mechanism:

Improves the QoE of OSN users by achieving high overall prefetching accuracy

~88%.

Achieves high reduction of the inter-domain traffic by keeping (to high extent) the video dissemination locally within the boundaries of the AS that deploys it, and thus, may decrease the potentially high transit inter-connection costs.

Next steps in this evaluation include the measurement of the server contribution, as well as the reduction of redundant traffic due to pre-fetching.

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

4.2.6 Mapping of Mechanism to SmartenIT Architecture

The SECD TM mechanism addresses mainly the Network and End User layers, and partly the Cloud layer. SECD aims to enhance content delivery by improving end-users’ QoE and reducing traffic redundancy in the links of the network layer. SECD will employ the Overlay Management component to handle the local P2P overlays, the Social Monitor to capture the social activities of the users of the OSN and the Social Awareness to exploit useful information by the derived data of the Social Monitor. Moreover, the QoE Monitor is needed to identify poor end-users’ QoE, so as to enable SPS participation in the video dissemination. Additionally, in the network layer, Traffic Monitoring is employed on intra- and (most importantly) inter-domain links, as well as Incentives & Economics component to estimate inter-connection cost. The mapping of the SECD mechanism to the SmartenIT architecture is depicted in Figure 61.

4.2.7 Example Instantiation of Mechanism

In order to provide an instantiation of the SECD mechanism, we consider video viewing case 2 (see Subsection 4.2.1.2). The SECD mechanism involves the construction of a socially aware messaging overlay per user and per interest category; specifically, each cluster consists of this users’ 1-hop and 2-hop friends, i.e. all of his followers or those non- followers with a common interest (as defined in Section 4.2.1).

Then, the socially-aware pull-based prefetching algorithm is performed; according to it, whenever a user wants to upload or share a video pushes an alert message in the respective cluster, which contains the link of the video which is about to be published. Then, any user who becomes recipient of this alert message requests to prefetch the first prefix of the video from the Social Proxy Server (SPS) of his region, i.e. his NSP. When the local SPS receives a prefix request, it downloads the prefix of the video from the third- party video server where the video is hosted, pushes the prefix of the video to the users who have asked to prefetch it, while the latters are storing the video prefix in their local cache.

As a final step, a local content-based P2P overlay is created in order to assist the video delivery among the users that request to watch it. The SPS of each NSP operates both as a P2P tracker and a cache, while for each video a local P2P overlay is constructed with:

Users (seeders) who have already viewed and stored the video in order to assist the SPS in video sharing, and

Users (leechers) who are watching the video in present time.

Note that if the available total upload bandwidth per user exceeds a specific threshold, e.g., the bit rate of the video that is served, then the SPS stops acting like a cache and the video is delivered only by other users in the overlay. On the other hand, if the total upload bandwidth per user is rather low, the SPS serves most of the chunk requests of the leechers.

4.3 Mechanism for Inter-Cloud Communication

In this section, we propose a TM mechanism to address Inter-Cloud Communication (ICC). Specifically, we consider communication among multiple cloud operators, the data centers of each of which are in general placed in geographically distributed locations served by multiple different Network Service Providers (NSPs), each of which consists of one or more Autonomous Systems (ASes).

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

4.3.1 Addressed Scenarios

The ICC mechanism addresses mainly the Inter-Cloud Communication scenario and the Collaboration for Energy Efficiency scenario, while also some use-cases fall under the Global Service Mobility scenario, as identified in [1]. Currently, after the consolidation of the four initial scenarios identified in [1] to two new scenarios described in [108], we consider that the proposed mechanism sufficiently addresses most aspects of the first one. Below we provide two indicative concrete use cases:

A. Use Case 1: Bulk data transfer service for cloud operators

We consider the case of N clouds and N network operators, each of which is providing connectivity to one specific cloud operator. Therefore, the traffic of each cloud operator is handled by its home NSPs from which IP connectivity was purchased, while its home NSP is typically Tier 2 or 3 NSPs; thus, in order to deliver global Internet connectivity, they rely on purchasing transit from Tier-1 NSP. Therefore the inter-domain traffic of the clouds is delivered to Tier-1 NSP(s) through transit links; the respective traffic is typically charged under the 95-th percentile rule.

In this case, we assume that bulk data of the cloud (customers), e.g., static content of a Content Provider like CNN web-site or online personal storage of end-users like Dropbox, are periodically (e.g., every 24 hours) replicated to a backup facility, i.e. another cluster located in a different physical location, in order to increase redundancy and security. In the case of Dropbox, we also consider personal data replication to a secondary DC to meet demand by, e.g., users with high mobility; this is another case following the GSM scenario described in [1]. An alternative instantiation of this use case would be that the periodic massive bulk data transfers are performed due to the need (and respective agreement) for sharing data, for business or scientific purposes. For instance, astronomic observation data could be exchanged periodically among space agencies clouds so as to collaborate on scientific projects on deep space exploration.

The service model for this use case is push-based, while a traffic management mechanism would be necessary in order to perform destination selection, i.e. in which cloud (data center) to replicate the data, and scheduling, i.e. when to perform the replication, taking possibly into account both cost, e.g., the cost of interconnection between NSPs, and the cost of energy consumption by the data centers, and QoS metrics, e.g., transfer rate, latency, etc.

For simplicity reasons and without loss of generality, we assume that for each cloud involved in the bulk data transfer service there is a single known Point of Interconnect where the respective traffic either originates from or must be delivered to.

B. Use case 2: Federation of small/medium-sized clouds

This use case is an extension of the previous one: in particular, we consider N clouds, each of which purchases connectivity from a different Tier 2/3 NSP. The N clouds are considered to be small/medium-size and geographically dispersed; therefore, we assume that they have established a federation in order to expand their footprint, i.e. offer their (complementary) services to remote customers, to meet demand through economies of scale, i.e. “renting” on-demand resources of others in the federation, and thus, to be able to compete against large clouds. In particular, we assume that service provision is performed over the federation of clouds, where the resources of the individual clouds are combined according to business policy rules so as to create a large virtual pool of resources, at multiple network locations to the mutual benefit of all participants (see Figure

24).

Seventh Framework STREP No. 317846

D2.2 – Definitions of Traffic Management Mechanisms

Public

These business policy rules will inevitably be a product of negotiation among the interested parties, while depending on them different types of federation may exist [30], e.g., i) only technical standardization, ii) technical standardization and information sharing, or iii) alliance for joint business. In this use case, we assume that the cloud federation belongs to the third category; thus, the federation will allow the creation of large “virtual” clouds that can efficiently provision their services over large geographical regions and across multiple networks. Such a cloud federation is in accordance with the idea of collaboration among CDNs, the so-called CDNi approach [87].

collaboration among CDNs, the so-called CDNi approach [87]. Figure 24: Cloud federation. Specifically, and without loss

Figure 24: Cloud federation.

Specifically, and without loss of generality, we consider here the case of personal online storage (such as Dropbox or Zettabox) being offered over the federation. We consider a cloud provider F in France and two cloud providers I1 and I2 in Italy. If they all belong to the federation then the online storage service initially provided by F can now be served in Italy by either I1 or I2; in order words, the customers of F in Italy can store their content in the data center of either I1 or I2. Then, F has to make a decision on which of them will serve a customer’s request. This decision can be made either jointly with optimal resource and/or path selection for the content, or separately.

Additionally, both bulk data transfers (e.g., for fault-tolerance) and QoE enhancement services as discussed in the previous subsections can be considered also in the current setup, as long as the respective incentives of the individual cloud operators are aligned.

Regarding the network layer, similarly to the two previous use cases, the traffic generated by the inter-cloud communication is handled by their home NSPs by means of pure IP TE and is delivered to upper tier NSP(s) through transit links, where traffic is typically charged according to the 95-th percentile rule.

4.3.2 Definition of SmartenIT Traffic Management Mechanisms

We propose a TM mechanism for Inter-Cloud Communication (ICC) to address the aforementioned cases of the inter-cloud communication, as described in Section 0. Nevertheless, the ICC mechanism can be extended to address dynamic services, e.g., VoD service or SaaS applications offered to end-users; in such a case, QoS requirements (e.g., in terms of latency) would be more strict. The ICC mechanism involves two layers:

the network layer and the cloud layer. The constituent elements of the mechanism include:

D2.2 – Definitions of Traffic Management Mechanisms

Public

Seventh Framework STREP No. 317846

SmartenIT Information Service (SmaS): SmaS is provided by a centralized server in the network layer, and is responsible for characterizing a set of destinations for a specific amount of data to be replicated within a specific time interval. Specifically, SmaS receives as input from the cloud layer: i) the amount of data to be transferred in MBs, ii) the priority level for this amount of data declared as an integer which is mapped to a certain QoS level in terms of duration in seconds/minutes, and iii) the set of candidate destinations (i.e. IP addresses of data centers of the same or different cloud operator).

Periodically, SmaS gathers values of several cost metrics related to the underlying IP network such as latency (i.e. seconds), congestion (i.e. number of timeouts or packets dropped), available bandwidth, number of hops, geographical distance and BGP information as a proxy for cost related to interconnection agreements (e.g., transit or peering). Then, the cost characterization of the network path leading to each specific destination is performed by calculating the minimum "weight" or cost (corresponding to network cost)

(p) for each considered destination d based on BGP hop count and the aforementioned criteria that are

related to network load statistics at a given time. It is worth mentioning that the aforementioned cost metrics need to be computed over multiple (source, destination) pairs corresponding to paths of ICC and time epochs. The values of the cost metrics are greatly affected by the volatility of the network conditions compared to the time duration of the ICC data transfers. This volatility is known to be high enough to prevent QoS extensions of BGP such as qBGP to work properly in terms of information accuracy and scalability; hence, even for small N the SmaS service is important in order to gather accurate information that is required by the mechanism in order to make educated decisions and optimize resource consumption over cost minimization. SmaS is consistent with the ALTO approach [61] and the FI design principles [92].

Cloud Scheduler (CloS): CloS is a centralized service running in the cloud layer and is responsible for making decision of where to allocate data, i.e. to which cloud(s) to send data, e.g., either for fault-tolerance, or for QoE enhancement. In the case where no federation is established, a CloS instance runs in each cloud, else the federated clouds are assumed to trust a third-party entity running CloS, which is responsible for scheduling data exchanges within the federation. In the latter case, in each cloud operator a Cloud Information Service (CloI) is installed. The role of CloI is to feed CloS with the necessary cloud-related information as described above.

CloS provides as input to SmaS a list of destinations and receives back a list where a weight (representing network cost) is assigned to each one of them. CloS makes the final decision d* for a specific traffic flow of a

given cloud operator based on the total cost

includes i) the network cost

associated to

each destination d by the scheduler. Ultimately the CloS may be aiming to achieve either an optimum for that specific flow:

P

d

C

d

that characterizes each potential destination, where

L

d

C

d

P

d

, i.e. the weight provided by SmaS, and ii) the cloud cost

d * =

arg min C

d

d

(

P

d

, L

d

)

,

or a an optimum for a broader set of flows, or even for the entire cloud federation, considering at the same time the other flows as well and the impact of each decision on them.

Note that CloS treats the weights provided by SmaS as network cost. Nonetheless, SmaS reveals only relative values, this is also why they are called “weights”, and not actual network cost, as the latter might result in revealing critical information to competitors and outsiders.

Moreover, the successful operation of the proposed mechanism also relies on the following supportive or complementary functionality:

o

Inter-SmaS communication protocol: This protocol supports the exchange (or exposure) of information between the SmaSs of the NSPs involved in the inter- cloud communication. This communication protocol can follow and extend the inter- ALTO IETF approach [29].

o

(Cross-layer) CloS-to-SmaS communication protocol: This protocol is required for the communication of the network layer (SmaS) and the cloud layer (CloS).

o

CloS-to-Federated Cloud communication protocol: This protocol facilitates the periodic feed of CloS by CloIs with overlay information necessary to make its scheduling decisions.