Design and Implementation of a SIP-based Centralized Multimedia Conferencing System

Author: Arun.G M.E.,(C.S.E)., Vinayaka Mission kirupananda Variyar Engineering College, Salem.

Multimedia conferencing has been on the research

Multimedia conferencing is becoming a hot topic of communication in recent years. There are already a few products of multimedia conferencing based on H.323. SIP, which is a more feasible protocol, is put on the agenda of being the call signaling protocol for conferencing. Most of the researches on SIPbased multimedia conferencing, however, have still remained on theories or experiments. In this paper, we propose a feasible framework of SIP-based centralized multimedia conferencing, which meets the requirements of the standards and also develops the theories of XCON framework proposed by IETF. We also present an actual implementation of the centralized conferencing server by exploiting open source achievements, using a few accessory devices for multimedia and data collaboration applications which are also implemented in our laboratory. This paper introduces the whole architecture of the practical system, and expatiates on the flows of conference process.

agenda for many years. It has desirable advantage for people who don’t want to spend the time and money flying all over the world for meetings which require face-to-face contact. International standardization bodies have defined protocols for urgent requirement of multimedia conferencing. The International Telecommunication Union (ITU) has made H.323 standard [1], and the Internet Engineering Task Force (IETF) carries out SIP [2]. Compared with these two protocols [3], H.323 is widely deployed and more mature due to its early adoption by the market, but has several problems, including scalability issues due to insufficient T.124 database replication protocol and limitation to binary ASN.1 format protocol. At the same time, SIP is more lightweight, flexible and extensible. It is a text-based protocol which can easily interact with other internet protocols. SIP is gradually becoming popular because of its excellent characteristics. The IP multimedia subsystem (IMS) which is a standardized next generation networking (NGN) architecture exploits SIP as its core protocol. It is reasonable to enrich the SIP conferencing


architecture with the features that H.323 is not equipped with.

output streams which can be distributed to participants. It is also controlled by the focus. .

SIP Conferencing
The standards of conferencing requirements [9] and framework [10] by SIPPING working group are earlier than XCON working group’s. We can call the system published by SIPPING working group SIP conferencing, which uses SIP for session management and conference control. As shown in Fig. 1, the SIP conferencing architecture consists of a centralized conference server and some participants. A focus is a SIP user agent which is responsible for the management of the conference using SIP signaling protocol. The focus which is addressed by a conference URI maintains a SIP signaling relationship with each participant, and handles the requests from In this section, we first list the characteristics of this conferencing system. Then we illuminate our design of the conferencing framework. The SIP-based centralized multimedia conferencing system architecture we designed is a development of the previously mentioned XCON model. Because of the strongpoint described in Section 1, we surely select SIP as call signaling protocol that accords with the title of this article. II. DESIGN OF CONFERENCING ARCHITECTURE

A. Characteristics
To build an available centralized multimedia conferencing, we introduce 4 major functions of this system: Besides the scenario that participants can nitiatively dial-in to a conference, which is already implemented in some articles, this conference can also dial-out to users who are already registered in the server. The dial-out list can be determined at the beginning of a meeting or added during the

participants by referring to the conferencing polices. The conference policy stored in policy server contains the rules that guide the decision-making process of the focus for the management of various conference requests from the participants. A conference notification service is a logical function provided by the focus. The focus can act as a notifier [11], accepting subscriptions to the conference state, and notifying subscribers about changes to that state. A mixer is responsible for handling the multimedia streams, and generating

conference by someone (e.g., the moderator of a conference) who has right to invite participants.







centralized multimedia conferencing system in Fig. 3. We can divide the whole system into two parts: server and participants. A conference participant can be a personal computer or a phone. A telephone including mobile phone can join in a conference as long as it is SIP-compliant to correctly support SIP dialog, that we call it SIP phone. People using SIP phone can enter a conference to talk with other participants, or sometimes can display his performance or receive videos of speakers if this SIP phone has camera and scream. If someone wants to use computer as conference client, he can

We can reserve a conference at our willing time in the future. When the conference begins, it invites users in the dial-out list. The conference server can distinctively send full or partial conference state information to the authorized participants who have subscribed the notifications with different demands. This multimedia conferencing system does not only support audio or video like traditional products. We can definitely call it a data conferencing, as it also enables document and image exchanges among multiple participants, desktop sharing, which lets users remotely view and control one another’s desktops, whiteboard, text messaging and chat.

download a software called SIPHELLO [14] on the homepage of the website of our laboratory, and we name it soft terminal. Our group develop the SIPHELLO to add CCP client, BFCP client, and make it support chat, application sharing and data collaboration. We exploit an open source software called Asterisk [15] as basic server, because it provides SIP stack and many interesting applications. In fact, an Asterisk component called MeetMe, implements a simple function of ad-hoc conference [4]. Although our focus has different structure, we hold on using the name of MeetMe as an enhanced version instead of the original one. We rename the developed software Center Conference Server, which has several components in logical: Focus, CCP server, BFCP server, MCP client, and XCAP

B. Framework

(Configuration Access Protocol) [16] client. We detail the architecture of center conference server in Section 5. As in centralized conferencing the heavy load of media rocess will become the bottle-neck of the whole system, we take edia process function out of original server. We use a set of media servers to share the system load. When a participant enters a conference, a RTP stream is established between the participant and a media server in the set. The focus control the media server relying on a protocol called MCP defined by ourselves. We use Media Mixer [17] to process (e.g., receive, send, mix, distribute) media stream such as audio and video. Certainly, a MCP client is settled on the center conference server, and a MCP server is on the media server. The MSRP server exploited by our group, which is up to the definition of standards of IETF, is based on a MSRP lib [18]. Participants in a meeting can chat, exchange data, share desktop and applications through MSRP sessions. The server behaves as a switcher. It communicates with center conference server using MCP, that we also make a version for MSRP. To a certain extent, media server and MSRP server play the same role in the system. Center conference server sends PUBLISH requests [19] to presence server when the conference state has changed. (Here, the method PUBLISH is an extension of SIP) The presence server transmits the

state of conference in NOTIFY requests [11], which is also extended from SIP, to each participant ccording to their subscription. PUBLISH and NOTIFY can’t be processed by normal SIP phones, and only the special clients supporting these protocols can make use of these functions in our conference.Database server stores the conferences’ and users’ information in it; e-mail server can send mail to notify the access of conferences to people and invite people to dial-in to the meeting; the manager terminal can work for the administrators to manage the system. Fig. 4 shows a partial architecture of protocol layers in this system. At transport layer, we use both TCP and UDP. At application layer, SIP used by both focus and presence server and RTP used by media server are on UDP. BFCP, CCP, MSRP and MCP are based on TCP.

There are two kinds of conference data model in this system. One is called registered conference, and the other one is called active conference. People who want to hold a meeting, either start immediately or reserve in a future time, should first send CCP requests across a direct connection between the client and CCP server to ask server to create a registered conference. Users can query the information of blueprints the system provides. A conference blueprint is a static conference object used to describe a typical conference setting supported by the system. We can choose one kind to ask the system to build a registered conference, adding the time of this meeting wants to start and maintain, or a list of people you want to invite via email or dial-out directly at the beginning of the meeting. Of course, if someone is familiar with this system, he can make a template himself, and request the server to build an expecting meeting designed as In this section, we depict the flows of conference running mechanism. It is mostly composed of conference creation, running and destroying, participants joining and leaving, and diverse manipulations of participants. Participants have different authorities depending on their roles in the meeting. A special role called moderator has the highest priority. his favor. An activ e conference is actually another state of a registered conference. At the time of the first participant entering the conference room, an active conference object establishes. System initializes the active conference based on a registered conference. The active conference ends when no participant is in the meeting-room or the conference time exceeds its duration time, which is defined when we build this meeting or modified during the conference.


A. Conference Creation

B. Flows of Participants Entering a Conference
Fig. 5 illustrates the flows of dial-in process of this system. In this scenario, a user knows the conference URI and dials-in to join this conference. The first INVITE request has information of client device in SDP [20] section. The client should support audio at least. The focus will authorize the participant by prompting user to key in an access password if it is a private meeting via IVR (Interactive Voice Response) function. If the user is the first participant of the conference, this INVITE will activate the focus and start a new active conference. However, the conference URI must have been reserved prior to its use. If the conference is up and running already, the dialing-in participant is joined to the conference by its focus. After that, focus transmits a RE-INVITE request to the participant with SDP containing media, MSRP, and BFCP information which the focus gets respectively from media server, MSRP server and BFCP server. The new participant is really present in the conference after negotiating SDP in the second transaction. Then Focus sends PUBLISH request to presence server to announce that a new participant has enter this meeting, and the latter notifies this information to participants. If the new adder subscribes to this conference notification, the presence server should also send NOTIFYmessage

to him. It is important to note that there is no dependency on the new participant’s SUBSCRIBE or the NOTIFY to other participants -- they occur asynchronously and independently. Another way of adding participants to a conference is started from conference server called dial-out method. There are two scenarios of using this method. One is that at the reserved start time of a conference, the focus checks the list of people who should be invited, then dials-out to those in the list but haven’t be present at this meeting. The other situation is that the moderator of a conference wants to add someone into this running meeting, and he asks the focus to add a participant by dial-out method. The flows of dial-out process are the same as dial-in process except the first INVITE transaction. At the beginning, the focus sends an INVITE to the participant containing a Contact header field with the conference URI. This message’s SDP contains audio information of center conference server used for IVR module. The next steps are the same as dial-in steps. If it is the beginning of a conference and there is no one in this conference room, the conference will not be an active one until the first person answers the call.

C. End a Session
There are three ways to terminate a session: the participant hangs up the call, the moderator moves out the participant from the conference, the conference ends at the reversed time. The difference

of the three ways is who sends the BYE request. After terminating a session, the focus asks the media server, MSRP server and presence server to stop the communication with the participant, and then notifies other participants in the conference. In a running conference, moderator can manipulate the conference via CCP, participants can request various rights via BFCP. We would not detail the functions of system in this article. Focus is the most important part here. It maintains a registered conference list, an active conference list, and logs conference state. The monitor module in it checks the start time and end time of conferences, to decide whether to begin a new conference or stop a running one. This module also starts a new thread for a dial-out transaction. Every participant has a substantive thread in focus. SIP stack processes SIP signaling, and IVR module communicates with users for authorization. The functions of CCP and BFCP server are depicted in IETF standards, which we mentioned in Section 2. We extend some commands for conference controlling, referencing the XCON Scheduler protocol in CONFIANCE. BFCP server is modified by our group based on the prototype libBFCP [21]. MCP client is used for communicating with MCP server settled on media server and MSRP server, via MCP which we defined ourselves. This module capsules the interface of media and MSRP controlling commands in it. XCAP client publishes conference state to its server on presence server.





Center conference server is the key part of the centralized conference system. In Fig. 6, we can see that this server is made up of focus, CCP server, BFCP server, MCP client and XCAP client. In this paper, we have presented a design for a SIPbased centralized multimedia conferencing system. This system has a lot of new functions to meet

advanced requirements, such as dial-out method and reservation for conference. We can choose a suitable soft terminal to implement various functions, or select a normal SIP phone to be a common participant. It is possible to enlarge scale of conference by increasing the number of media servers. In the future, we plan to design a distributed conferencing architecture supporting much larger scale in a global enterprise environment which consists of a large number of heterogeneous networks and devices. In addition, sidebar conference mentioned in standards of XCON framework is another target. [1] G. ITU-T Rec. H.323, “Packet based Multimedia July 2003. [2] J. Rosenberg, H. Schulzrinne et al., “SIP: Session Initiation Protocol,” RFC 3261, IETF, June 2002. [3] K. Katrinis, S. Zurich, G. Parissidis, and B. Plattner, “A Comparison of Frameworks for Multimedia Conferencing: SIP and H.323,” 8th IASTED International Conference on Internet and Multimedia systems and Applications, 2004. [4] M. Barnes, C. Boulton, and O. Levin, “A Framework for Centralized Conferencing,” RFC 5239, IETF, June 2008. [5] G. Camarillo, J. Ott, and K. Drage, “The Binary Floor Control Protocol (BFCP),” RFC4582, IETF, Nov. 2006. [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A Transport Protocol for RealTime Applications,” RFC 3550, IETF, July 2003. [7] B. Campbell, R. Mahy, and C. Jennings, “The Figure 6. Structure of center conference server Message Session Relay Protocol (MSRP),” RFC 4975, IETF, Sep. 2007. [8] [9] REFERENCES O. Levin and R. Even, CONFIANCE, “High-Level Requirements for Tightly Communications System,”

Telecommunication Standardization Sector of ITU,

Coupled SIP Conferencing,” RFC 4245, IETF, Nov. 2005. [10] A. Roach, “Session Initiation Protocol (SIP)Specific Event Notification,” RFC 3265, IETF, June 2002. [11] R. Even and N. Ismail, “Conferencing Scenarios”, RFC 4597, IETF, August 2006.