School of Computer Science Master’s dissertation

Integration of software components in an heterogeneous distributed environment

Author: Edouard Swiac November 10, 2011

Contents
Introduction 2

I

Distributed computing

3
4 4 5 5 7 8 8 11 12 12 13 14 16 16 17

1 Foundations 1.1 1.2 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design and challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.3 Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . Openness . . . . . . . . . . . . . . . . . . . . . . . . . . . . Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failure handling . . . . . . . . . . . . . . . . . . . . . . . . . Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . Quality of service . . . . . . . . . . . . . . . . . . . . . . . .

Fallacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Trends and future 2.1 2.2 The rise of cloud computing . . . . . . . . . . . . . . . . . . . . . . How Google uses distributed systems . . . . . . . . . . . . . . . . .

1

CONTENTS

2

II

2600hz, an heterogeneous distributed platform

19
20 20 21 22 24 26 28 28 29 30 31 32 33 35 36 37 39 40 41 42 44 46 47 47 48

3 Presentation of the solution 3.1 3.2 3.3 Whistle, the cornerstone of cloud-enabled VoIP . . . . . . . . . . . 3.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 2600hz, a powerful distributed communication stack . . . . . . . . . Solving vertical scaling limitations with distributed systems . . . . .

4 Programming distributed systems 4.1 Erlang, concurrent programming for distributed applications . . . . 4.1.1 4.1.2 4.1.3 4.1.4 Open Telecom Platform . . . . . . . . . . . . . . . . . . . . Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . Erlang in industry-level communication systems . . . . . . .

5 Architecture and integration 5.1 5.2 5.3 5.4 Functional overview . . . . . . . . . . . . . . . . . . . . . . . . . . . Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Redundancy and fault tolerance . . . . . . . . . . . . . . . . . . . . Distribution of data . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 5.4.2 5.5 5.6 5.7 5.8 5.9 CAP Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . Eventual consistency . . . . . . . . . . . . . . . . . . . . . .

Unlimited concurrency . . . . . . . . . . . . . . . . . . . . . . . . . Directed events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schema flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strong supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . Speed for adding features . . . . . . . . . . . . . . . . . . . . . . . .

5.10 Fast server provisioning . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Avoid downtimes . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 6 Functional and technical specification of a module 6.1 6.2 What is hot desking? . . . . . . . . . . . . . . . . . . . . . . . . . . Integration in 2600hz . . . . . . . . . . . . . . . . . . . . . . . . . .

3 50 50 51

III

Actors of today’s cloud-enabled VoIP

55
56 56 57

7 2600hz, a disruptive startup company 7.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Conclusions

IV
7.2 7.3

Personal insight
San Francisco, startups and entrepreneurship . . . . . . . . . . . . . Erlang and distributed systems, from theory to practice . . . . . . .

59
60 62

List of Figures
1.1 3.1 3.2 4.1 4.2 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 6.2 6.3 7.1 Constant, linear, logarithmic and quadratic running times . . . . . Whistle components . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of 2600hz . . . . . . . . . . . . . . . . . . . . . . . . . . . Shared-memory and message-passing models . . . . . . . . . . . . . Erlang scalability on multi-core CPUs . . . . . . . . . . . . . . . . . Functional view of 2600hz . . . . . . . . . . . . . . . . . . . . . . . Data partitioning in BigCouch . . . . . . . . . . . . . . . . . . . . . Visual guide to NoSQL systems . . . . . . . . . . . . . . . . . . . . AMQP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Directed Events using AMQP . . . . . . . . . . . . . . . . . . . . . Supervision of Erlang processes . . . . . . . . . . . . . . . . . . . . Hot code swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . Hot desk functional specification: Login . . . . . . . . . . . . . . . . Hot desk functional specification: Logout . . . . . . . . . . . . . . . Hot desk call flow (simplified) . . . . . . . . . . . . . . . . . . . . . Startup building process . . . . . . . . . . . . . . . . . . . . . . . . 10 21 24 27 30 33 38 40 43 44 46 49 52 53 54 61

4

Abstract Most Voice over IP service providers create home-grown tools to manage their systems. As their needs and customer demands grow, their toolset falls behind and becomes a pain point. A system built to grow with millions of users in mind must be conceived to be scalable from start, and that scalability can be achieved by relying on a distributed architecture. Moreover, distributed systems benefit from the cloud: automatic scaling with on-demand servers and distribution of service across the globe. Distributed systems are hard to achieve, because they demand a strong attention to several principles to ensure communication between heterogeneous components, scalability and fault-tolerance. The architecture and integration of components is very important, and engineers will benefit from the reuse of technologies across the components. For example, using the same programming language to write different components or using software built on open standards. Building such a system frequently requires different components, and each must be chosen carefully so as to not become a bottleneck. Inspecting each component from a distributed perspective will help to detect if the component is appropriate, or if it presents weaknesses that can threaten the system in the future.

LIST OF FIGURES

1

Acknowledgments
I joined 2600hz, a software startup, as a software engineer, thanks to CEO Darren Schreiber and COO Patrick Sullivan, founders of the company. I would like to thank my two mentors, James Aimonetti and Karl Anderson, Senior software engineers at 2600hz for their support, teaching, team spirit and vision. Thanks to all my peers at 2600hz, it was a wonderful experience!

Introduction
To overcome the limitations of client-server architecture (scalability, performances), we may look at distributed architecture. This type of architecture may help organizations to solve scalability problems by using horizontal scaling instead of vertical scaling. But the design of distributed systems has to be done carefully and thoroughly, as well as the choice of its components: distribution capacities, scaling factor, level of integration. In the first part, the attention of the reader will be drawn to the advantages, principles and challenges offered by distributed systems, as well as the fallacies and weaknesses of this architecture. Distributed architectures also benefit from the elastic nature and the ubiquity of the cloud as it’s becoming an infrastructure of choice to deploy distributed applications. A thorough analysis of a distributed architecture implementation in a scalable communication platform will be done in the second part, as well as the decisions made from an architect’s point of view on the choice of the components: scalability perspective, integration in a distributed environment, cloud compatibility. Lastly we will focus the reader’s attention on the cloud VoIP market. As 2600hz, the distributed platform being discussed here, is aimed at VoIP and unified communications, a closer look at its creators and how competitors do cloud VoIP may help the reader to understand where 2600Hz, the company, is headed.

2

Part I Distributed computing

3

Chapter 1 Foundations
1.1 Why?

A distributed system is one in which autonomous components located at networked computers communicate and coordinate their actions only by passing messages. Software runs in concurrent processes on different computers on different processors. There are multiple points of controls, and multiple points of failure. This definition leads to the following significant characteristics of distributed systems: concurrency of components, lack of a global lock and independent failures of components. Concurrency In a network of computers, concurrent program execution is the norm. The capacity of the system to handle shared resources can be increased by adding more resources (for example: computers) to the network. Absence of global lock When programs need to cooperate they coordinate their actions by exchanging messages. Independent failures All computer systems can fail, and it is the responsibility of system designers to plan for the consequences of possible failures. Distributed systems can fail in new ways. This approach is different from centralized systems, where only one component with non-autonomous parts is running on a single machine. This single component is shared by users all the time, runs in a single process. It is ruled by a single point of control, thus representing a single point of failure. 4

CHAPTER 1. FOUNDATIONS

5

The challenges arising from the construction of distributed systems are the heterogeneity of their components, openness (which allows components to be added or replaced), security, scalability – the ability to work well when the load or the number of users increases – failure handling, concurrency of components, transparency and providing quality of service.

1.2

Design and challenges

While designing distributed systems, significant challenges are encountered. We should notice that not all those challenges can be faced. The broader the scope and the scale of the distributed system is, the more we are likely to encounter all those challenges at once.

1.2.1

Heterogeneity

The Internet enables users to access services and run applications over a heterogeneous collection of computers and networks. Heterogeneity (variety and difference) applies to all of the following: • networks • computer hardware • operating systems • programming languages • implementations by different developers Although the Internet consists of many different sorts of network, their differences are masked by the fact that all of the computers attached to it use the Internet protocols (like TCP) to communicate with one another. Data types such as integers may be represented in different ways on different sorts of hardware – for example, there are two alternatives for the byte ordering of integers (endianness). These differences in representation must be dealt with if messages are to be exchanged between programs running on different hardware.

CHAPTER 1. FOUNDATIONS

6

Although the operating systems of all computers on the Internet need to include an implementation of the Internet protocols, they do not necessarily all provide the same application programming interface to these protocols. For example, the calls for exchanging messages in UNIX are different from the calls in Windows. Different programming languages use different representations for characters and data structures such as arrays and records. These differences must be addressed if programs written in different languages are to be able to communicate with one another. Programs written by different developers cannot communicate with one another unless they use common standards, for example, for network communication and the representation of primitive data items and data structures in messages. For this to happen, standards need to be agreed and adopted – as have the Internet protocols. Middleware The term middleware applies to a software layer that provides a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, operating systems and programming languages. The Common Object Request Broker (CORBA), is an example. Some middleware, such as Java Remote Method Invocation (RMI), supports only a single programming language. Most middleware is implemented over the Internet protocols, which themselves mask the differences of the underlying networks, but all middleware deals with the differences in operating systems and hardware. Middleware communication is discussed further in the AMQP section of this paper. In addition to solving the problems of heterogeneity, middleware provides a uniform computational model for use by the programmers of servers and distributed applications. Possible models include remote object invocation, remote event notification, remote SQL access and distributed transaction processing. Its implementation hides the fact that messages are passed over a network in order to send the invocation request and its reply. The virtual machine approach provides a way of making code executable on a variety of host computers: the compiler for a particular language generates code for a virtual machine instead of a particular hardware order code. For example, the Erlang compiler produces code for a Erlang virtual machine, which executes it by

CHAPTER 1. FOUNDATIONS

7

interpretation. The Erlang virtual machine needs to be implemented once for each type of computer to enable Erlang programs to run.

1.2.2

Openness

The openness of a computer system is the characteristic that determines whether the system can be extended and reimplemented in various ways. The openness of distributed systems is determined primarily by the degree to which new resourcesharing services can be added and be made available for use by a variety of client programs. Openness cannot be achieved unless the specification and documentation of the key software interfaces of the components of a system are made available to software developers, like APIs (public or not). In a word, the key interfaces to communicate with the system are published. However, the publication of interfaces is only the starting point for adding and extending services in a distributed system. The challenge is to tackle the complexity of distributed systems consisting of many components engineered by different people. Systems that are designed to support resource sharing in this way are termed open distributed systems to emphasize the fact that they are extensible. They may be extended at the hardware level by the addition of computers to the network and at the software level by the introduction of new services and the reimplementation of old ones, enabling application programs to share resources. A further benefit that is often cited for open systems is their independence from individual vendors. To summarize: • Open systems are characterized by the fact that their key interfaces (or API) are published. • Open distributed systems are based on the provision of a uniform communication mechanism and published interfaces for access to shared resources. • Open distributed systems can be constructed from heterogeneous hardware and software, possibly from different vendors. But the conformance of each component to the published standard must be carefully tested and verified if the system is to work correctly.

CHAPTER 1. FOUNDATIONS

8

1.2.3

Security

Many of the information resources that are made available and maintained in distributed systems have a high value to their users. Their security is therefore of considerable importance. Security for information resources has three components: confidentiality (protection against disclosure to unauthorized individuals), integrity (protection against alteration or corruption), and availability (protection against interference with the means to access the resources). In a distributed system, clients send requests to access data managed by servers, which involves sending information in messages over a network. For example: • A doctor might request access to hospital patient data or send additions to that data. • In electronic commerce and banking, users send their credit card numbers across the Internet. In both examples, the challenge is to send sensitive information in a message over a network in a secure manner. But security is not just a matter of hiding the contents of messages from the exterior – it involves also knowing for sure the identity of the user on whose behalf a message was sent. In the first example, the server needs to know that the user is really a doctor, and in the second example, the user needs to be sure of the identity of the shop or bank with which they are dealing. The second challenge here is to identify a remote user or other agent correctly. Both of these challenges can be met by the use of encryption techniques developed for this purpose.

1.2.4

Scalability

Distributed systems operate effectively and efficiently at many different scales, ranging from a small intranet to the Internet. A system is described as scalable if it will remain effective when there is a significant increase in the number of resources and the number of users. The number of computers and servers in the Internet has increased dramatically these last years. The design of scalable distributed systems presents the following challenges:

CHAPTER 1. FOUNDATIONS

9

Controlling the cost of physical resources As the demand for a resource grows, it should be possible to extend the system, at reasonable cost, to meet it. It must be possible to add server computers to avoid the performance bottleneck that would arise if a single server had to handle all requests. In general, for a system with n users to be scalable, the quantity of physical resources required to support them should be at most O(n)1 – that is, proportional to n. For example, if a single server can support 20 users, then two such servers should be able to support 40 users. Although that sounds an obvious goal, it is not necessarily easy to achieve in practice. Controlling the performance loss For example, consider a set of data whose size is proportional to the number of users in the system. Algorithms that use hierarchic structures (like trees, heaps) scale better than those that use linear structures (lists). But even with hierarchic structures an increase in size will result in some loss in performance: the time taken to access hierarchically structured data is O(log n), where n is the size of the set of data. For a system to be scalable, the maximum performance loss should be no worse than this. Preventing software resources running out An example of lack of scalability is shown by the numbers used as Internet (IP) addresses (computer addresses in the Internet). In late 1970s, it was decided to use 32 bits for this purpose, and now the supply of available Internet addresses is running out. For this reason, a new version of the protocol with 128-bit Internet addresses is being adopted, and this will require modifications to many software components. It is difficult to predict the demand that will be put on a system years ahead. Some shared resources are accessed very frequently; for example, many users may access the same resource causing a decline in performance. Caching and replication may be used to improve the performance of resources that are very heavily used. Ideally, the system and application software should not need to change when the scale of the system increases, but this is difficult to achieve. The issue of scale is a dominant theme in the development of distributed systems. The use of replicated data, the use of caching and the deployment of multiple servers to handle commonly performed tasks, enabling several similar tasks to be performed concurrently.
1

http://en.wikipedia.org/wiki/Big_O_notation

CHAPTER 1. FOUNDATIONS

10

Figure 1.1: Comparison of constant O(1), linear O(n), logarithmic O(log n) and quadratic O(n2 ) running times

CHAPTER 1. FOUNDATIONS

11

1.2.5

Failure handling

Computer systems fail aways. Computer systems fail! Not sometimes, but always. Whether its a bug in the software, a bug in the underlying system software, and hardware failure, someone pulls the plug on the server, power goes out, natural disaster wipes out the Eastern seaboard, etc, something will take a system down. Failures in a distributed system are partial – that is, some components fail while others continue to function. Therefore the handling of failures is particularly difficult. Detecting failures Some failures can be detected. For example, checksums can be used to detect corrupted data in a message or a file. It is difficult or even impossible to detect some other failures, such as a remote crashed server in the Internet. The challenge is to manage in the presence of failures that cannot be detected but may be suspected. Tolerating failures It would not be practical for most of the services of a distributed system to attempt to detect and hide all of the failures that might occur. Their clients can be designed to tolerate failures, which generally involves the users tolerating them as well. Recovery from failures Recovery involves the design of software so that the state of permanent data can be recovered or "rolled back" after a server has crashed. Masking failures Some failures that have been detected can be hidden or made less severe. Just dropping a message that is corrupted is an example of making a fault less severe – it could be retransmitted. Two examples of hiding failures: 1. Messages can be retransmitted when they fail to arrive (as the retransmission in data networks)2 2. File data can be written to a pair of disks so that if one is corrupted, the other may still be correct (redundancy). Redundancy Services can be created to tolerate failures by the use of redundant components. Distributed systems are amongst the systems that provide a high degree of availability in the face of hardware faults. The availability of a system is a measure of
2

http://en.wikipedia.org/wiki/Retransmission_(data_networks)

CHAPTER 1. FOUNDATIONS

12

the proportion of time that it is available for use. When one of the components in a distributed system fails, only the work that was using the failed component is affected.

1.2.6

Concurrency

Both services and applications provide resources that can be shared by clients in a distributed system. Therefore there is a possibility that several clients will attempt to access a shared resource at the same time. The process that manages a shared resource could take one client request at a time. But that approach limits throughput. Therefore services and applications generally allow multiple client requests to be processed concurrently. Any shared resource in a distributed system must be responsible for ensuring that it operates correctly in a concurrent environment.

1.2.7

Transparency

Transparency is defined as hiding from the user and the programmer the separation of components in a distributed system, so that the system is perceived as a whole rather than as a collection of independent components (as stated in our definition of a distributed system). We can identify eight forms of transparency, some being more meaningful than others depending on the nature of the system: Access transparency enables local and remote resources to be accessed using identical operations. Location transparency enables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address). Concurrency transparency enables several processes to operate concurrently using shared resources without interfering between them. Replication transparency enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers.

CHAPTER 1. FOUNDATIONS

13

Failure transparency hides faults, allowing users and application to complete their tasks despite of the failure of hardware or software components. Mobility transparency allows the movement of resources and clients within a system without affecting the operation of users or programs. Performance transparency allows the system to be reconfigured to improve performance as loads vary. Scaling transparency allows the system and applications to expand in scale without any change to the system structure or the application algorithms. The two most important transparencies that characterize distributed systems are access and location transparency; their presence or absence most strongly affects the utilization of distributed resources. They are sometimes referred to together as network transparency. Transparency hides and make anonymous the resources that are not of direct relevance to the task in hand for users and application programmers.

1.2.8

Quality of service

The main nonfunctional properties of systems that affect the quality of the service experienced by clients and users are reliability, security and performance. Adaptability to meet changing system configurations and resource availability has been recognized as a further important aspect of service quality. Reliability and security issues are critical in the design of most computer systems. The performance aspect of quality of service was originally defined in terms of responsiveness and computational throughput, but it has been redefined in terms of ability to meet guarantees in time. Some applications, including multimedia applications like telephony systems, handle time-critical data – streams of data that are required to be processed or transferred from one process to another at a fixed rate. For example, a phone service might consist of a client program that is retrieving an audio flux from the server representing the voice of the other person. For a satisfactory result the successive audio data need to be streamed to the user in real-time and without hashing or interferences for the conversation to be audible and of good quality.

CHAPTER 1. FOUNDATIONS

14

In fact, the abbreviation QoS has effectively been commandeered to refer to the ability of systems to meet such deadlines. Its achievement depends upon the availability of the necessary computing and network resources at the appropriate times. This implies a requirement for the system to provide guaranteed computing and communication resources that are sufficient to enable applications to complete each task on time. The networks commonly used today have high performance but when networks are heavily loaded their performance can deteriorate, and no guarantees are provided. QoS applies to operating systems as well as networks. Each critical resource must be reserved by the applications that require QoS, and there must be resource managers that provide guarantees. like additional available servers to handle spikes of load.

1.3

Fallacies

Inexperienced programmers and external observers tend to make wrong assumptions about distributed systems and their usage of the network. 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. The network is not reliable: it may be congested at peak hours resulting in a slow traffic, latencies or packet loss. It may be shut down by an operator without notice. The path for a packet goes through different network equipments and various topologies that can alter or block the message being sent: proxies, firewalls etc. Incorrect usage of message retransmission while packets are lost due to a misuse of the

CHAPTER 1. FOUNDATIONS

15

network may lead to bandwidth waste and bottlenecks, resulting in a price increase of the bandwidth usage. Network security is an entire art in itself, as the network is an open battlefield. Malicious users and programs continually adapt to mew security measures to eavesdrop on network traffic using various techniques: man-inthe-middle attack, spoofing etc. Multiple administrators may institute conflicting policies of which senders of network traffic must be aware in order to complete their desired paths.

Chapter 2 Trends and future
2.1 The rise of cloud computing

Distributed systems are undergoing a period of significant change and this can be found in a number of influential trends: • the emergence of ubiquitous computing1 • the increasing demand for multimedia services • the view of distributed systems as a utility, or cloud computing With the increasing maturity of distributed systems infrastructure, a number of companies are promoting the view of distributed resources as a commodity or utility, drawing the analogy between distributed resources and other utilities such as water or electricity. With this model, resources are provided by appropriate service suppliers and effectively rented rather than owned by the end user. This model applies to both physical resources and more logical services: Physical resources such as storage and processing can be made available to networked computers, removing the need to own such resources on their own. At one end of the spectrum, a user may opt for a remote storage facility for file storage requirements (for example, for multimedia data such as photographs,
1

http://www.ubiq.com/ubicomp/

16

CHAPTER 2. TRENDS AND FUTURE

17

music or video) and/or for backups. Similarly, this approach would enable a user to rent one or more computational nodes, either to meet their basic computing needs or indeed to perform distributed computation. At the other end of the spectrum, users can access sophisticated data centers (networked facilities offering access to repositories of often large volumes of data to users or organizations) or indeed computational infrastructure using the sort of services now provided by companies such as Amazon and Google. Operating system virtualization is a key enabling technology for this approach, implying that users may actually be provided with services by a virtual rather than a physical node. This offers greater flexibility to the service supplier in terms of resource management and operational costs. Software services can also be made available across the Internet using this approach. Indeed, many companies now offer a comprehensive range of services for effective rental, including services such as email and distributed calendars. Google, for example, bundles a range of business services under the banner Google Apps2 . The term cloud computing is used to capture this vision of computing as a utility. A cloud is defined as a set of Internet-based application, storage and computing services sufficient to support most users’ needs, thus enabling them to largely or totally dispense with local data storage and application software. The term also promotes a view of everything as a service, from physical or virtual infrastructure through to software, often paid for on a per-usage basis rather than purchased. Note that cloud computing reduces requirements on users’ devices, allowing very simple desktop or portable devices to access a potentially wide range of resources and services. Clouds are generally implemented on cluster computers to provide the necessary scale and performance required by such services. A cluster computer is a set of interconnected computers that cooperate closely to provide a single, integrated highperformance computing capability.

2.2

How Google uses distributed systems

Google, the market leader in web search technology, has put significant effort into the design of a sophisticated distributed system infrastructure to support search
2

http://www.google.com/a

CHAPTER 2. TRENDS AND FUTURE

18

(and indeed other Google applications and services such as Google Earth). This represents one of the largest and most complex distributed systems installations of the history of computing and hence demands close examination that are unfortunately out of scope of this paper. The reader can find below the highlights of this distributed infrastructure: • An underlying physical infrastructure consisting of very large numbers of networked computers located at data centers all around the world3 • A distributed file system designed to support very large files and heavily optimized for the style of usage required by search and other Google applications (especially reading from files at high and sustained rates)4 • An associated structured distributed storage system that offers fast access to very large datasets5 • A lock service that offers distributed system functions such as distributed locking and agreement6 • A programming model that supports the management of very large parallel and distributed computations across the underlying physical infrastructure7

http://www.datacenterknowledge.com/archives/2011/08/01/ report-google-uses-about-900000-servers/ 4 http://labs.google.com/papers/gfs.html 5 http://labs.google.com/papers/bigtable.html 6 http://labs.google.com/papers/chubby.html 7 http://labs.google.com/papers/mapreduce.html

3

Part II 2600hz, an heterogeneous distributed platform

19

Chapter 3 Presentation of the solution
3.1 Whistle, the cornerstone of cloud-enabled VoIP

I worked as a software engineer on Whistle, a scalable, distributed, cloud-based software used to build powerful telephony applications with a rich set of APIs. It is the cornerstone of cloud-enabled VoIP networks, and can be described as a logic layer for telephony soft switches. As a component, Whistle is part of a bigger, heterogeneous system: 2600hz (which is also the name of the company that provide Whistle). 2600hz allows VoIP integrators and resellers to build Class 51 grade switching applications quickly and easily on any infrastructure. This platform is the vision of experienced VoIP engineers who came up with a response to the scalability problems faced by VoIP providers and integrators. Most VoIP service providers create home-grown tools to manage their systems. As their needs and customer demands grow, their toolset falls behind and becomes a pain point. Whistle, a component of 2600hz I developed modules for, is composed of two components: Ecallmanager and Applications (or WhApps). • WhApps are the logic blocks, self-contained units of functionality of Whistle. Each WhApp is in charge of a particular task in the system: trunking, users, call flows, devices, voicemail boxes etc.
1

http://en.wikipedia.org/wiki/Class_5_telephone_switch

20

CHAPTER 3. PRESENTATION OF THE SOLUTION

21

• Ecallmanager is a low coupling2 , translation layer between Applications and the underlying soft switch. Each soft switch (YATE, FreeSWITCH, Asterisk) needs its own translation layer to be able to communicate with WhApps. Whistle is written in Erlang, a concurrent programming language designed to build massively scalable, distributed applications. It is an open-source, API enabled platform, meaning anyone can consume Whistle’s APIs and integrate its own WhApp in the system to solve his problem and add value to the product.

Figure 3.1: Whistle is made of two components: Ecallmanager and Applications (WhApps). WhApps are the logic blocks of Whistle. The source code of Whistle is available on GitHub3 .

3.1.1

Applications

The following is an excerpt of some WhApps bundled with Whistle4 , and what their role consist in: Callflow Control mechanisms for handling calls.
2 3

http://en.wikipedia.org/wiki/Coupling_(computer_science) https://github.com/2600hz/whistle 4 https://github.com/2600hz/whistle/tree/master/whistle_apps/apps

CHAPTER 3. PRESENTATION OF THE SOLUTION

22

CDR Call Detail Records are generated by FreeSWITCH at the end of a call. They contains information like who called who, what number, duration of the call etc. Those records are useful for billing. Conference The Conference WhApp is used to create, modify or delete conference rooms. Crossbar Crossbar is a general purpose interface layer for Whistle, exposing Whistle APIs as an HTP REST API. DTH Integration with a VoIP billing software provider5 Registrar Stores and caches the registrations of SIP devices in the network. Trunkstore Brings the ability to start a trunking platform with ease.

3.2

2600hz, a powerful distributed communication stack

As stated previously, Whistle is a layer, and needs to be connected to other components to form an heterogeneous distributed system: 2600hz. The components Whistle needs are: • SIP servers • Telephony soft switches • Messaging bus • Data stores I qualify 2600hz as an heterogeneous because it contains many different kinds of hardware and software working together in a cooperative fashion to solve problems.6 . The components used in the reference implementation of 2600hz are the following:
5 6

http://dthvoipbilling.com/ http://infolab.stanford.edu/~burback/dadl/node95.html

CHAPTER 3. PRESENTATION OF THE SOLUTION

23

SIP server OpenSIPS acts as a load balancer for the SIP requests. The goal is to minimize the number of public network interfaces needed to inform clients and carriers of by pointing them to the load balancers (usually two for redundancy). Adding capacity becomes as easy as informing openSIPS of the new switch to balance on. Switch FreeSWITCH is used in this layer because of the tight integration obtained between Whistle and FreeSWITCH via the mod_erlang_event7 module. As FreeSWITCH is already a carrier-grade switch on its own, bringing the clustering features of Whistle on top result in a high-quality cluster of switches on which to build the platform. Control layer Whistle provides an abstraction layer to the underlying switching layer. Application developers can program their applications against the Whistle APIs and know that Whistle will take care of the details. Application developers also benefit from Whistle’s ability to distribute processing amongst the servers in the switching layer. To the application developer, Whistle is one logical switch. Whistle is composed of Ecallmanager and WhApps. Message bus Conversations between servers are primarily conducted using a standard protocol named AMQP. RabbitMQ8 was chosen as an AMQP implementation because it is a full implementation of the protocol, and it’s written in Erlang. Thus it allows to keep everything in native Erlang data types, pass things around quickly, and cluster Whistle and WhApps servers easily. It also implements all the needed brokers out-of-the-box9 . Data store BigCouch10 is a CouchDB (a document-oriented database) cluster. It is a premium choice as data store because of the distribution and redundancy capacities offered natively by BigCouch. With proper configuration, data is automatically partitioned, replicated across the different nodes.

http://wiki.freeswitch.org/wiki/Mod_erlang_event#Introduction http://www.rabbitmq.com/ 9 Brokers are discussed in section 5.6 10 https://github.com/cloudant/bigcouch
8

7

CHAPTER 3. PRESENTATION OF THE SOLUTION

24

Figure 3.2: Whistle’s components, Ecallmanager and WhApps, never communicate directly: they exchange messages using an AMQP bus. Several type of communication are taking place in the system: AMQP messages, Erlang messages and other traffic like TCP.

3.3

Solving vertical scaling limitations with distributed systems

Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged to accommodate that growth.11 . 2600hz was
11

Definition from wikipedia http://en.wikipedia.org/wiki/Scalability

CHAPTER 3. PRESENTATION OF THE SOLUTION

25

built as a solution to scalability problems12 in telephony applications. Usually, vertical scalability is the way to scale an application running on a single-server: replace current servers or hardware by more powerful servers and hardware to handle the growth. But this way of scaling has few expansion possibilities, since hardware has physical limitations. Horizontal scalability involve the addition of more computational power (or nodes) in the system to manage the additional quantity. As an example, if we have 3 FreeSWITCH server, able to each handle 25.000 concurrent calls, and the whole system concurrent calls grow to to 80.000 on average, we should be able to deploy a fourth FreeSWITCH server, increasing the system’s capacities to 100.000 concurrent calls. Distributed systems achieve horizontal scalability better than client-server architecture by nature (nodes, message-passing), which explains why 2600hz’s architecture is distributed.

12

Refer to section 1.2.4

Chapter 4 Programming distributed systems
A massive distributed infrastructure that must also be scalable is not easy to conceive. It must be envisioned from start to be distributed and scalable, it’s not something that can be thought of on the way. In the case of Whistle, scalability must not be a blocker, that being the reason the selected architecture is distributed instead of client-server. As a distributed architecture brings challenges in the design of the application (as I showed in section 1.2) the choice of the main programming language for the development of the platform is critical. Few programming languages are truly built for distributed programming. Any language capable of communicating over the network can be used to build a distributed program, but some languages are more appropriate because they include features aimed at distributed systems (messaging, fault tolerance, portability, resilience, asynchronous communication, node discovery). The most used languages for industry-level back-end systems (telecom, banking, aeronautics) are C++ and Java. These languages were not built for distributed programming because their concurrency model and communication model were based on shared memory. The features they were missing, especially messaging, were added later either at the language level or as an implementation of a standardized communication protocol: • RMI1 and Jini2 are APIs that allow distributed Java programming, but are available only for Java programs.
1 2

http://en.wikipedia.org/wiki/Java_remote_method_invocation http://en.wikipedia.org/wiki/Jini

26

CHAPTER 4. PROGRAMMING DISTRIBUTED SYSTEMS

27

• CORBA (Common Object Request Broker Architecture)3 is a standard that allow different software components written in different programming language to communicate. CORBA does not meet all the requirements of Distributed Computing like redundancy. • Message Passing Interface (MPI)4 is a communication protocol based on messagepassing for the purpose of parallel programming, implemented as extensions in C++, C#, Java, Python etc. I would like to point out the difference between parallel computing and distributed computing. • In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between processors. • In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.

Figure 4.1: In a shared memory environment (a), all CPUs read and write data from the same memory unit, while in a message-passing environment (b) each CPU is bound to a local memory and never read or alter others’ and communicate sending messages to a message bus. It is unlikely that parallel programming can be done in a distributed environment, since physical memory is not shared among all the nodes of the system.
3 4

http://en.wikipedia.org/wiki/Corba http://en.wikipedia.org/wiki/Message_Passing_Interface

CHAPTER 4. PROGRAMMING DISTRIBUTED SYSTEMS

28

4.1

Erlang, concurrent programming for distributed applications

Another programming language than C++ or Java has been chosen for the development of Whistle: Erlang. Erlang is a general-purpose concurrent, garbage-collected programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping, so that code can be changed without stopping a system. Hot swapping is particularly suitable for high availability systems like telephony systems. It was developed to solve the "time-to-market" requirements of distributed, faulttolerant5 , massively concurrent6 , soft real-time systems, and was designed with the aim of improving the development of telephony applications. Erlang presents several advantages for the development of distributed, concurrent applications and is portable. The low-level nature of C++ does not offer this level of portability, and developing Whistle in C++ would have required as many compilations as the computer architecture and operating systems Whistle is supposed to run in. A distributed system can be heterogeneous at hardware and operating system levels, and portable source code is highly desirable so efforts can be focused on development and not solving portability problems.

4.1.1

Open Telecom Platform

OTP is simultaneously a framework, a set of libraries, and a methodology for structuring applications; it’s an extension of Erlang. These are some of the main advantages of OTP: Productivity Using OTP makes it possible to produce production-quality systems in a very short time using behaviors. Stability Code written on top of OTP can focus on the logic and avoid error-prone implementations of the typical things that every real-world system needs: process management, servers, state machines, etc.
5 6

See section 1.2.5 1.2.5

CHAPTER 4. PROGRAMMING DISTRIBUTED SYSTEMS

29

Supervision The application structure provided by the framework makes it simple to supervise and control the running systems, both automatically and through graphical user interfaces. Upgradability The framework provides patterns for handling code upgrades in a systematic way. Reliable code base The code for the OTP framework is rock solid and has been thoroughly battle tested. OTP behaviors are a formalization of process design patterns. They are implemented in library modules that are provided with the standard Erlang distribution. These library modules do all of the generic process work and error handling. The specific code is written by the programmer.

4.1.2

Strengths

One of the greatest strength of Erlang for scalable, soft real-time communication systems is its concurrency model. Erlang concurrency is fast and scalable, and can handle high loads with no degradation in throughput, even during sustained peaks. Telephony systems must not suffer from call peaks and be reliable. Rather than providing threads that share memory, each Erlang process executes in its own memory space and owns its own heap and stack. Processes can’t interfere with each other. There are no locks, no synchronized methods, and no possibility of shared memory corruption, since there is no shared memory, which ensure a high stability of the platform, hence lowered downtimes. Erlang programs can be made from thousands to millions of extremely lightweight processes that can run on a single processor, can run on a multicore processor, or can run on a network of processors. Unlike Java or C++, concurrency in Erlang belongs to the programming language and not the operating system, which makes Erlang portable over different platforms (UNIX/Linux, Windows, Mac OS) and architectures (x64, x86). I have to insist on the importance of multicores CPUs of Erlang and the scalability advantage it gives to Whistle. Erlang scales accordingly to the number of cores available on the machine it’s running on. Nowadays, CPU vendors have reached the frequency limit a single-core CPU can reach. Instead of having a single, high frequency core. CPU vendors are creating CPUs with 4 or more cores of lesser

CHAPTER 4. PROGRAMMING DISTRIBUTED SYSTEMS

30

frequency. Some servers on the market have up to 32 cores. Being able to scale on number of core is unheard of other programming languages, and programmers are now starting to learn now how to program multi-core systems to anticipate the market of multi-core processors.7 .

Figure 4.2: Because Erlang uses concurrency at language at level and not the OS level, processing distribution over several CPU cores is achieved without efforts. X-axis is the number of connected clients in an Erlang application, Y-axis is the average execution time. Lastly, functional programming forbids code with side effects, which Erlang implements as referential transparency. Referential transparency requires the same results for a given set of inputs at any point in time thus a referentially transparent expression is therefore deterministic by definition. Bugs and errors in a deterministic system are easier to reproduce than in a non-deterministic one.

4.1.3

Weaknesses

But Erlang has also weaknesses, although most seems subjectives from a developer’s point of view.
7

http://www.intel.com/intelpress/sum_mcp.htm

CHAPTER 4. PROGRAMMING DISTRIBUTED SYSTEMS

31

Syntax Erlang’s syntax is based on Prolog, which can be daunting at first and have nothing in common with syntaxes of C inspired programming languages like C++, Java, PHP, Python ... Strings Erlang has no built-in type for strings. Strings are represented as a list of integers, which can degrade performance on large data sets and makes Erlang slower at intensive data processing. C++ is a better usage for this type of computation. Code organization Erlang’s namespace is flat, which can lead to naming conflicts or messy code base. Documentation Documentation is complete but lacks example. A person wanting to learn Erlang will have to buy books, because the documentation only cover the essentials and is not learning friendly.

4.1.4

Erlang in industry-level communication systems

• T-Mobile uses Erlang in its SMS and authentication systems. • Motorola is using Erlang in call processing products in the public-safety industry. • Ericsson uses Erlang in its support nodes, used in GPRS and 3G mobile networks worldwide. • Facebook is using Erlang in the back-end of its chat system. The fact that web services, retail and commercial banking, computer telephony, messaging systems, and enterprise integration, to mention but a few, happen to share the same requirements as telecom systems explains why Erlang is gaining headway in these sectors.

Chapter 5 Architecture and integration
2600hz is distributed and connected. Such a system requires a thorough design process and must be architected from a high-level point of view to view to ensure the correct implementation of distributed principles, especially those who add value from a business perspective1 : • Scalability. • Openness and extensibility. • Supervision and monitoring Software projects have a long history of failure2 , and naturally, the more complex the system is, higher are the rates of failure, especially after the software has been brought to market and problems arise while in production: impossibility to scale beyond a certain point, hard to maintain, to extend etc.

1 2

which features will be sold to customers http://www.galorath.com/wp/software-project-failure-costs-billions-better-estimation-planning-can

php

32

CHAPTER 5. ARCHITECTURE AND INTEGRATION

33

5.1

Functional overview

Figure 5.1: Once Whistle (included in the designated cloud engine) is integrated in 2600hz, and 2600hz is fully operational, the platform acts as the basis of a communication platform 2600hz is designed to build a VoIP Cloud infrastructure easily and to scale easily, with a capacity of 1 billion calls per months. This vision of a distributed VoIP system came from its creators, who are experimented VoIP engineers, as the better architecture to confront scalability problems. While distributed systems have their own requirements and principles, a resellable communication platform has its business requirements and features too, because this is what customers are going to

CHAPTER 5. ARCHITECTURE AND INTEGRATION

34

build their businesses on, and buy training and support for. Architectural choice are made accordingly to the business objectives to fulfill and the requirements to meet in order to build such a system: high scalability, reliability, massive concurrence on the first hand, and unified communications, app store capacities, reduced operational costs in the second hand From an architect’s point of view, a scalable, distributed platform is characterized by: Messaging A way for programs across different servers to talk to each other and/or know what other servers are doing Redundancy The ability to have copies of everything (data, software, etc.) all working together at the same time, with copies coming online/offline at any given time Distribution of data The ability to break information into pieces and spread it across multiple computers, allowing for adding/removing computers as demand requires Unlimited concurrency There should be no limit to how much is happening on the platform overall The above are common characteristics. But telecommunication systems have additional requirements that will be implemented as features in the platform: Directed Events The ability for message queues across servers to be set up and tear down quickly and to act as a tunnel between different explicit services without disrupting other nodes Schema Flexibility The ability to frequently upgrade data and variable structures within the entire system without bringing down clusters. Strong Supervision The ability to detect failures very quickly and re-spawn nodes and processes just as fast. In web servers, delays and failures of 50ms or more are acceptable - in voice applications, they’re ultimately not Speed for Adding Features Telecom is growing extremely rapidly. The ability to expose new features quickly, in a reliable, scalable, distributed way is paramount to a successful platform

CHAPTER 5. ARCHITECTURE AND INTEGRATION

35

Fast Server Provisioning The ability to handle spikes in the cloud by procuring and provisioning additional resources (from servers to circuits to DIDs) in an instant Avoid downtimes Downtimes are critical in live communication systems. Not even upgrades to software should result in downtime. As Whistle has to integrate seamlessly with the other components of a 2600hz, some of the problems we have to solve with this distributed architecture are • How will 2600hz use messaging to communicate with other components ? • How will 2600hz handle failure ? • How will 2600hz scale ? • How will 2600hz achieve high concurrency ?

5.2

Messaging
A way for programs across different servers to talk to each other and/or know what other servers are doing.

Message passing is the form of interprocess communication that is the most suitable in distributed systems. In traditional concurrent application built in C++ or Java, the interprocess communication model is largely based on shared-memory, which is the simultaneous access in the same data in memory by different process. Even without specifically referring to distributed systems, the shared-memory model is known for its complexity and problems when facing high concurrency. Although solutions exists (mutexes, semaphores) to control access to the same information and who can modify it, problems quickly arise, like dead-locks and race conditions or memory corruption. In certain case, the existence of a global lock eases concurrent programming by removing synchronization and locking problems, but also severely degrade performance of the system, especially on multicore architecture. As an example the GIL3 in the C implementation of Python degrades performance of Python on a multicore platform.
3

http://en.wikipedia.org/wiki/Global_Interpreter_Lock

CHAPTER 5. ARCHITECTURE AND INTEGRATION

36

Processes using shared-memory for communication are very hard to scale, since the information is located in a unique point in the network: scaling the processes means scaling the memory at the same time, leading to other problems : state distribution, coherence of the replicated data, consistency, failure of replication. Message passing can be used in two modes: synchronous and asynchronous. • Synchronous messages are sent from process A to B. Process A will wait (or block) for a response from B to continue. Synchronous exchange is used for transactions, confirmation or cooperation between processes. • Asynchronous messages are sent from process A to B. As soon as the message is sent from A, A continues its execution: it doesn’t block. B receives the message and does not have to acknowledge the reception of the message. It can be seen as a fire and forget process. Although Erlang has language-level primitives for exchanging messages, using an open standard messaging-oriented protocol, AMQP, ensures the platform stays open to third-party developers, and not let Erlang being a blocker to the extension of the system. In a distributed environment like 2600hz, message-passing is clearly the way for processes to exchange and communicate. For example, Ecallmanager and Applications may run on different servers. They will communicate exchanging messages

5.3

Redundancy and fault tolerance
The ability to have copies of everything (data, software, etc.) all working together at the same time, with copies coming online/offline at any given time.

Distributed systems are hard to design, especially because of the risk of failure that can happen on any node of the system. As discussed in the design section of distributed systems, reliability is a key point and is highly important. Users expects their phone to be usable at any time of the day, and independently of the number of users already connected to the system and the load on it that could slow global traffic.

CHAPTER 5. ARCHITECTURE AND INTEGRATION

37

Whistle uses redundancy at physical and software level. Physical redundancy is the easiest to achieve, especially with loosely coupled components like Ecallmanager and WhApps. Ecallmanager can be installed on different physical servers, and setup to communicate with the same of different FreeSWITCH. Which Ecallmanager will respond in based on a first arrived, first served: the first Ecallmanager to pickup a request will respond. A good example of software redundancy usage in Whistle is the use of BigCouch. BigCouch is a cluster of CouchDB4 database that are distributed over an arbitrary number of servers. Each node of BigCouch is capable of replication with other nodes, which ensures data redundancy as nodes goes up and down.

5.4

Distribution of data
The ability to break information into pieces and spread it across multiple computers, allowing for adding/removing computers as demand requires and failures appear.

When a system scales, each of its components must scale, otherwise the scalability of the whole system will be the limited by the scalability of its less scalable component (Weakest link theory). Distribution of data is hard to achieve by nature. Distribution of data requires synchronization and replication amongst data stores. The choice of the data store is thus critical. Distribution of data is bundled in BigCouch, the data store used by Whistle. Each node can be setup on a different server, and BigCouch will automatically spread data across those node based on consistent hashing algorithms which will ensure data is evenly spread and quickly accessible it’s sharding, or horizontal partitioning of databases. BigCouch uses 3 constants to allow fine-grained configuration of data distribution for performance, consistency and durability. • N - replication constant. N copies of each document will be written to the data store, on N different nodes.
4

CouchDB is a document oriented database

CHAPTER 5. ARCHITECTURE AND INTEGRATION

38

Figure 5.2: BigCouch spread data across its nodes using consistent hashing. • R - read quorum constant. N writes have occurred for each document, as noted above. When reads are requested, N reads are sent to the N nodes that store the particular document. The system will return to the requesting client with the document when R successful reads have returned, and agree on versioning. Lower R values often result in faster reads at the expense of consistency. Higher R values usually result in slower reads, but more consistent, or agreedupon data values returning. • W - write quorum constant. When writing the N copies, the data store will respond to the write client after W successful writes have completed. The remaining N-W writes are still being attempted in the background, but the client receives a 201 Created status and can resume execution. Lower W values mean more write throughput, and higher W values mean more data

CHAPTER 5. ARCHITECTURE AND INTEGRATION durability.

39

Horizontal scaling’s simplicity becomes clear: the more node there are, the greater the storage capacity expands without impacting lookup times. This kind of distribution does not exist in traditional databases like MySQL or SQL Server, or have to be implemented as sharding. Achieving horizontal scalability using those data stores in hard since the system is not built for data distribution. Sharding and distribution are achieved using replication (master-slave replication) with typical problems or client-server architectures: locks, replication speed etc. The two following theorems allowed us to make assumption and trade-offs on our data distribution and consistency over the long term.

5.4.1

CAP Theorem

Eric Brewer stated5 2000 that it is impossible for a distributed system to provide at the same the three following guarantees Consistency All nodes see the same data at the same time Availability Every request receive a response, whether it was successful or failed Partition Tolerance The system continues to operate despite arbitrary message loss According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. The use of BigCouch and the choice this particular NoSQL solution is for its abilities to scale horizontally because of its data-partitioning schema. This leaves either Consistency or Availability as a second guarantee. BigCouch is partition-tolerant by nature (replication), and trades consistency for availability. Since data must be replicated to nodes and the clients are not waiting for updates to be able to read from the database, the system is always available while not being consistent. If the clients had to wait for a consistent state to query the database, it would be a trade-off of availability for consistency. But BigCouch relies on eventual consistency to ultimately be consistent.

5

http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

CHAPTER 5. ARCHITECTURE AND INTEGRATION

40

Figure 5.3: Visual guide to NoSQL systems from the perspective of the CAP theorem. A distributed system can achieve only two of the three following: availability, consistency, partition-tolerance. Source: http://blog.nahurst.com/ visual-guide-to-nosql-systems

5.4.2

Eventual consistency

I would like to point here that there is a way consistency can be attained in an AP (available, partition-tolerant) system like BigCouch. Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent. CouchDB achieves eventual consistency6 between multiple databases by using in6

http://guide.couchdb.org/draft/consistency.html

CHAPTER 5. ARCHITECTURE AND INTEGRATION

41

cremental replication. Incremental replication is a process where document changes are periodically copied between servers.

5.5

Unlimited concurrency
There should be no limit to how much is happening on the platform overall

Using Erlang allows Whistle to handle massive concurrency. As Erlang concurrency is based on lightweight processes (300 words at minimum), spawning a huge number of processes will not impact system’s performance, and can increase performance on a high factor when running on a multi core processor. Erlang does not make use of the underlying OS’s threads and processes for managing its own process pool and thus does not suffer from these limitations. In Erlang: • Creating and destroying processes is very fast • Sending messages between processes is very fast • Processes behave the same way on all operating systems • There can be a very large numbers of processes Due to the immutability of Erlang data structures, there is no need to lock and it makes easier to parallelize tasks. While executing tasks in Whistle (processing calls, data processing), the ability of spawning several process increase drastically the execution time, and avoiding blocking is critical in a soft real-time system. Ulf Wiger, CTO of Erlang Solutions, a consulting firm for Erlang, stated7 : In order for concurrency to be useful as a fundamental modeling paradigm, the programmer must feel that processes are cheap enough you can efficiently create as many processes as the problem calls for. If there is any single defining characteristic of Erlang-style concurrency, it is that you should be able to model your application after the natural concurrency
7

http://ulf.wiger.net/weblog/2008/02/06/what-is-erlang-style-concurrency/

CHAPTER 5. ARCHITECTURE AND INTEGRATION patterns present in your problem. If creating a process is perceived as expensive, programmers will reuse existing processes instead; if message passing is perceived to be expensive, various techniques will be invented to avoid sending messages.

42

As I indicated in the chart pointing out the scalability of Erlang on multi-core architecture, unlimited concurrency can be achieved by using multicore CPUs in the system’s servers. The benefits from multicore CPUs will be noticeable since two major components of the system are written in Erlang: Whistle, the logic layer and BigCouch, the data store.

5.6

Directed events
The ability for message queues across boxes to be spun up and down quickly and to act as a tunnel between different explicit services without disrupting other nodes.

Directed events in Whistle are based on the messaging capacities of the system. AMQP (Advanced Messaging Queue Protocol) does message orientation, queueing, and routing (point to point or publish/subscribe), in a reliable and secure way. Several AMQP implementation are available, and RabbitMQ is an implementation I recommend because it is open-source and written in Erlang. This avoid the management of components in our system written in different languages, and benefit from the expertise of Erlang of the members of the team to solve problems that may arise with RabbitMQ. Other implementations are either commercial (StormMQ) or do not fully implement the AMQP specification (ZeroMQ). As messaging is used across organizations and inside systems, different patterns have surfaced to structure the way message exchange like design patterns for software design. Messaging patterns are a reusable solutions to common problems, and were listed first in the Enterprise Integration Patterns8 book. The AMQP model describes four different entities • Broker
8

http://eaipatterns.com/Messaging.html

CHAPTER 5. ARCHITECTURE AND INTEGRATION • Queues • Exchanges • Messages

43

The broker is the server, which manages queues and exchanges. A queue acts like a mailbox and holds messages that are sent by producers. Exchanges are logic switches which define the routing and filtering for messages. Messages are sent from producers to consumers over the network.

Figure 5.4: AMQP describes different entities that compose the system I used 3 different exchange type while developing modules for Whistle and interacting with Ecallmanager: Fanout 1:N delivery involving no routing.

CHAPTER 5. ARCHITECTURE AND INTEGRATION Direct 1:1 straight delivery

44

Topic Route messages using a key to consumer based on their subscription to that key A particularly interesting example of how AMQP is used is between Ecallmanager and WhApps. Different WhApps needs different data, while at the same time different WhApps needs to be notified of the same event. When a new voicemail is left by the caller, FreeSWITCH will transfer this message to Ecallmanager. Different WhApps, like the ones on charge of storing the message in the data store or sending a text transcription of that message to the callee, will be bound to the exchange where Ecallmanager published voicemail. Using a routing key to subscribe only to new messages, WhApps will receive automatically the new message.

Figure 5.5: Events can be routed to the right consumer using routing keys

5.7

Schema flexibility
The ability to frequently upgrade data and variable structures within the entire system without bringing down clusters.

The choice of a NoSQL solution like BigCouch (a CouchDB cluster) was based on data storage types and by and our needs. Contrary to traditional SQL solutions

CHAPTER 5. ARCHITECTURE AND INTEGRATION

45

like MySQL, PostgreSQL or Oracle Database, CouchDB stores data in document as JSON structures, and those structures can be retrieved using the unique document ID generated at insertion time. As users needs evolves rapidly, being tied to a particular data schema could be a drawback because of low flexibility. Data that was not needed today could be so tomorrow or next year. I considered and compared different NoSQL solutions, and as for programming languages, there is the right solution for the right need, and neither NoSQL or SQL are silver bullets. Whistle’s data is loosely coupled: a phone is associated with a user, a call flow is associated to a device, a voicemail message is associated to a voicemail box ... we need basic relationship, such as Foreign Keys in SQL systems. A fully relational system brings a lot of features not needed (joins, keys). As of all this complexity is not needed in Whistle, the choice of a lighter solution, more scalable, is appropriate. I would like to point out the fact that CouchDB (and BigCouch by extension) also commits to the ACID9 properties, which brings some of the benefits of SQL into the NoSQL world from the transaction perspective.

Properties of transactions in traditional SQL systems: Atomicity, Consistency, Isolation, Durability

9

CHAPTER 5. ARCHITECTURE AND INTEGRATION

46

5.8

Strong supervision
The ability to detect failures *very* quickly and re-spawn nodes and processes just as fast. In web servers, delays and failures of 50ms or more are acceptable - in voice applications, they’re not.

Erlang, the language used to build Whistle, brings supervision from its OTP package. Erlang has a philosophy of "Let it crash", meaning that with correct process supervision, crashed process will be restarted. Process in Erlang can be of two type: Worker or Supervisor. Supervisor can supervise other supervisors and workers. I discussed in chapter 1.2.5 how failure tolerance is important, and Erlang is perfect is this role.

Figure 5.6: Crashed Erlang processes can be restarted by supervisors according to a restart strategy Different restart strategies exist, but two are important: one for one, one for all and . In a one for one strategy, if a child process terminates, only that process is restarted. In a one for all strategy, if a child process stops, all other child under this supervisor are terminated and then all children are restarted. A restart frequency can also be defined, so a child that constantly crashes because of an unmanageable problem will be put aside and reported Why letting crash instead of trying to handle all possible exceptions and manually restart ? First, a programmer cannot foresee all the problems that may arise in a system. When a process crashes, the source of the error is logged and due to the immutability of data structures, the error can be replayed and quickly fixed.

CHAPTER 5. ARCHITECTURE AND INTEGRATION

47

5.9

Speed for adding features
Telecom is growing extremely rapidly. The ability to expose new features quickly, in a reliable, scalable, distributed way is paramount to a successful platform.

This is a direct implementation of the principle of Openness as I showed in 1.2.2. As the market and user needs grows, the platform must stay available for extension. Customers’ need and expectations are important, and they will check out at competitors if what they want if not quickly implemented. A distributed system must grow in a manageable fashion, because development and features addition never stops. As time passes, the technical debt must be limited by being careful with the code added to the product: unit testing, integration testing. The modularity of Whistle is a key principle that allows quick additions of features, and exposure of those features through public API. Whistle APIs are exposed via a REST API because of the ubiquity of HTTP capable programming languages. REST (REpresentational State Transfer) has gained significant popularity these recent years10 , mainly for the simplicity offered and the very high range of programming language able to implement this type of API. Any client capable of doing HTTP request is capable of doing REST calls. SOAP clients needs an implementation of the SOAP specification, which can limit the capacities of the client: out of date implementation, missing parts of the specifications, incompatibility between different versions. The WhApp in charge exposing public API is called Crossbar. Crossbar acts as a general purpose layer between Whistle APIs and the outside world. Clients may chose to implement the API as they want: creating a UI interface for a browser a desktop based application, from a script program, as a phone application etc.

5.10

Fast server provisioning

The ability to handle spikes “in the cloud” by procuring and provisioning additional resources in an instant
10

http://blog.programmableweb.com/2010/08/13/api-anti-patterns-how-to-avoid-common-rest-mistakes/

CHAPTER 5. ARCHITECTURE AND INTEGRATION

48

As distributed systems are more and more seen as an utility, real-time provisioning is an important. Whistle deployment for clients is easy: they can deploy more servers for the components of the systems that are under high charge (Ecallmanager? BigCouch? WhApps? FreeSWITCH?), and rearrange their provisioning needs as they see it. 2600hz is deployable on major cloud-hosting providers, like Amazon EC211 or Rackspace12 , using the deployment tool developed by 2600hz13 . One of the advantage of deploying applications in the Cloud instead of maintaining a custom infrastructure is cost reduction, to the order of 20%14 . Moreover, it is important to note that the client does not need to know how to setup a server, how to manage it, how to scale it. He only needs to deploy his software using a deployment tool, and the cloud hosting provider will take care of the rest. Hiring an operation team can be costly, especially for small startups and companies which IT is not the primary business.

5.11

Avoid downtimes

The ability to avoid all downtimes. Not even upgrades to software should result in downtime. Time has shown that distributed systems built in Erlang are very resilient and reliable. The language has been battle tested in real large scale industrial products, like the AXD301 switch in 1998, containing over a million lines of Erlang, and reported to achieve a reliability of 99.999% (5.26 minutes downtime / year). Whistle has not yet been deployed on large scale systems and I do not have numbers to report, but performances other companies achieved with Erlang looks promising for Whistle. Outages are costly on all plans: the system is not available to customers, customers get a bad image of the product and the company, and money stops flowing in. More dangerous is when other companies are relying on your service to provide theirs,
11 12

http://aws.amazon.com/ec2/ http://www.rackspace.com/ 13 Available at http://apps.2600hz.com/ 14 http://www.informationweek.com/news/cloud-computing/software/229218825

CHAPTER 5. ARCHITECTURE AND INTEGRATION

49

as for hosting providers. Amazon cloud-hosting service EC2 had a downtime for 3 hours last year, costing Amazon $55.000 per minute of outage.15 . Usually upgrading a system requires a downtime: disconnect users, setup the update, shut down, restart, ensure everything is going well and the update was successfully deployed. These downtimes are also costly and can threaten the system if the deployed update has problems and has not been thoroughly tested in a testing environment before deployment. Erlang features hot code swapping. Hot code swapping allows, in theory, any Erlang distributed system to run indefinitely. Updated code can be deployed in a running Whistle system without interrupting it. Note that code swapping is only possible for Erlang module using the same interface. Behind the scene, a redirection of the process using the old module is done to the new, thanks to the immutability of Erlang data structures.

Figure 5.7: Erlang modules can be swapped during runtime, allowing uninterrupted upgrades

15

http://www.socaltech.com/catchpoint__amazon_down_for_3_hours/s-0029569.html

Chapter 6 Functional and technical specification of a module
6.1 What is hot desking?

One of my development was the implementation of a hot desking module. Hot desking is a module allowing itinerant users, like sales people, to register on any phone on the network. All calls made to that person will be routed to this phone, and his voicemail box will be accessible from this phone. How can this achieve this using the openness and component distribution of 2600hz ? The user dials a feature code on the phone, like *91. He is prompted ID and PIN number, and is logged in if credentials matches his profile in database. When he is done, the user dials another feature code, *92. He is logged out of this phone. The implementation of this feature must be seen as how it will integrate with the different component of the system (phone switches, data stores, messaging bus, Whistle). • It needs to read the numbers dialed by the users: ID and PIN • It needs to send sounds to the user’s phone: "Login incorrect", "You have been logged in" • It must be able to locate the user in the data store and read his profile: compare stored ID and PIN with user input 50

CHAPTER 6. FUNCTIONAL AND TECHNICAL SPECIFICATION OF A MODULE51 • It needs to redirect calls to the phone the user registered into.

6.2

Integration in 2600hz

The hot desk profile (like ID, PIN, roaming) for a user will be configured using the Crossbar WhApp. Crossbar exposes Whistle APIs through a REST interface, accessible from any client HTTP-enabled. Hot desk profile needs to be persisted and associated to the user profile information. BigCouch is in charge of storing this data, and will store this information with the User record. Any action that involves dialing a number and sending actions (Press 1 for this, press 2 for that) is represented as a module of the Call flow WhApp. A Call flow module acts like a state-machine and describes the succession of event that will be proposed to the user. A number to dial has to be associated to a callow to ring it. Any call flow module can retrieve user’s actions on the dial pad and stream sounds to the phone like prompts thanks to Ecallmanager. Call flow modules communicates with Ecallmanager by sending AMQP messages through the messaging bus, and listens for messages sent by FreeSWITCH to Ecallmanager (which contains users’ action for example).

CHAPTER 6. FUNCTIONAL AND TECHNICAL SPECIFICATION OF A MODULE52

Figure 6.1: Flow diagram of actions required to log in a hot desk

CHAPTER 6. FUNCTIONAL AND TECHNICAL SPECIFICATION OF A MODULE53

Figure 6.2: Flow diagram of actions required to log out a hot desk

CHAPTER 6. FUNCTIONAL AND TECHNICAL SPECIFICATION OF A MODULE54

Figure 6.3: Functional diagram of hot desking integration. Once a user has registered its hot desk profile using the hot desk crossbar module, he can be reached on any phone he logged into using his hot desk credentials. A lookup on the device he is logged in is done from the call flow number

Part III Actors of today’s cloud-enabled VoIP

55

Chapter 7 2600hz, a disruptive startup company
7.1 Presentation
2600hz is focused on massively scalable cloud telecom infrastructures. Geared towards integrators and carrier resellers, they provide the most resilient, secure and cost-effective alternative to expensive and limited traditional phone systems. They also provide an open source communications software that allows any company to become its own reliable telecom carrier while slashing operational costs in half. 2600hz is a startup company headquartered in San Francisco. Founded in 2009 by CEO Darren Schreiber and COO Patrick Sullivan, the company aims at providing open-source communication softwares and telephony consulting. Private content removed

56

Conclusions
The integration of software components in an heterogeneous distributed environment is challenging and requires a well thought design and architecture. From the switch to the HTTP APIs, 2600hz successfully managed to integrate components of various nature (phone switches, data stores, messaging-bus) while ensuring it can scale and no component could potentially become a bottleneck as the platform and the number of users grow. No business wants to step down because it cannot handle the massive influx of clients due to a huge success of its product. The core principles of distributed systems, failure tolerance, concurrency and messaging, were architected from start. Erlang proved to be an excellent choice because of the distribution oriented features it offers at language level it offers: concurrency, supervision. AMQP, as an open standard used by banks and majors institutions, is a de facto reference for messaging in distributed environments. It ensures 2600hz openness and extension for the future, especially from the third-party developed applications, which can bring useful features to the platform and create an app store ecosystem. Failure is well-tolerated by Whistle, due to redundancy at all levels: from switch to data store nodes. My learnings and decisions in the design and integration of module in Whistle were successful, because my research and the solutions I proposed were innovative (like the JSON schema validator). As I joined the Erlang team while the project was already well-advanced, I supported this effort by adding more features while being aware and applying the distributed design principles and integration I demonstrated to the work I did, because each developer has to embrace the same vision for the system to continue to grow in the same direction. Knowing how to architecting distributed systems is a highly sought skill, because this is not only about VoIP, but any software solution that could benefit from a distributed design if it faces scalability problems. Whistle is still young, yet its future is bright because the VoIP market segment it is 57

CHAPTER 7. 2600HZ, A DISRUPTIVE STARTUP COMPANY

58

on has few direct competitors, and the innovative solutions (distributed architecture, openness, marketplace) it brought apart from scalability are highly demanded by resellers and integrator.

Part IV Personal insight

59

60

7.2

San Francisco, startups and entrepreneurship

Working in San Francisco, in the tech industry and in a startup is an enriching and unique experience that can’t be experimented elsewhere. San Francisco and the Silicon Valley are major centers of innovation in the US, and these ecosystems and the way they work inspired me for the future. It raised my interests in startups and entrepreneurship. Everything goes so fast within a startup, and what will be built in 10 years in a big company can be build in less than half this time in a startup because of the absence of bureaucracy and individual initiatives. I find this way more appealing and exciting. During my learnings on startups, I read Paul Graham’s essays1 about the entrepreneurial spirit, and how to grow it, and learned so much from it. In the short term, I want to continue to work in San Francisco or the Silicon Valley. I still have things to learn, on the programming side as well as on the startup side, and how to transform ideas in reality. We had entrepreneurship courses at SUPINFO, from which I learned a lot on the theory, but I need real experience from now. I have to meet with tech entrepreneurs, and listen to their story and their advices to get inspiration. And in a few years, when I will be able to bring my ideas to reality and identify a problem to solve, I will probably launch my own start up, whether it’s in the US or in France. What encourages me to continue is this way is that you can change the world and the way people think and do. I would be very proud to have this kind of impact, and glad to help other people with their problems. With $10000 you can launch your own company. The tech industry is lucky, because you don’t need to acquire a lot of physical assets to launch your business, only an idea with good programmers using free and open-source technologies. The majority of tech companies in San Francisco and the Silicon Valley runs on open-source software, which drastically lowers the costs. As I am interested in launching my own company in the future, being confronted to the success and problems a startup can meet in such a dynamic environment is a priceless experience.

1

http://www.paulgraham.com/articles.html

61

Figure 7.1: Startups are in a continuous learning process: from a minimum viable product, they receive user feedback and measure users needs to refine the product in the next iteration I discovered that failure and risk taking are major parts of the entrepreneurial spirit of the Silicon Valley, contrary to France where failure is seen as a breakdown and severely compromise your chance for launching your next project as an entrepreneur. Startups companies are very agile and adapt constantly, and do not hesitate to pivot if they realize that their initial vision needs refinement, or is not adapted to customer’s need. The unit of progress for entrepreneurs here is learning, not execution. As a continuous learner, I recognize myself in this profile. As Eric Ries, the author of The Lean Startup said: Most technology start-ups fail not because the technology doesn’t work, but because they’re making something that there is no market for Before, I tended to think that technology and feature-building was playing a great part of the start-up success, and if it was just an updated version on an existing product but with a new technology and/or more features, it would work, but I was wrong. You will be able to sell your product or service only if customers need your product and if you solve a problem they have and they are willing to pay for.

62

7.3

Erlang and distributed systems, from theory to practice

Before joining 2600hz, I knew some theory about distributed systems, few from SUPINFO a few from personal learnings. When I jumped in the project, I had little to no experience developing in Erlang. People at 2600hz trusted me and my motivation, and I’ve been able to contribute to this project. I learned Erlang to a point that I can implement my own modules. I learned Erlang/OTP and how to architect a software solution in the way Whistle has been conceived: distributed and connected. As more and more applications are going to be designed in a distributed way to benefit from all the advantages I exposed, the skills I acquired are invaluable and will serve as a foundation for my future, either as a programmer or as a startup founder. I faced limitations though: core VoIP knowledge. I am not very strong at VoIP, and sometimes, when implementing VoIP features, I felt frustrated by the VoIP requirements and a lack of advanced skills needed to implement the feature in Erlang. VoIP was more a limiting factor than Erlang. As my internship was in an american company, I improved my english skills, written or spoken. I learned to communicate in the most efficiently possible way and to be clear in my thoughts so I am understood.

Bibliography
[1] 2600hz company’s blog. http://blog.2600hz.com/. [2] 2600hz company’s wiki. http://wiki.2600hz.com. [3] Joe Armstrong. Making reliable distributed systems in the presence of software errors. PhD thesis, Swedish Institute of Computer Science, 2003. [4] Francesco Cesarini and Simon Thompson. Erlang Programming. O’Reilly Media, Inc., 2009. [5] George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair. Distributed Systems, Concepts and Design. Addison-Wesley, 5th edition, 2011. [6] Google Code University. Introduction to distributed system design, http://code.google.com/edu/parallel/dsd-tutorial.html.

63

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.