You are on page 1of 14

Java-based Multimedia Collaboration and Application Sharing Environment

H. Abdel-Wahab, O. Kim, P. Kabore, and J.P. Favreau


Multimedia & Digital Video Technologies Group Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, Md 20899 (wahab,kabore,okim,favreau)@snad.ncsl.nist.gov

Abstract
Multimedia desktop conferencing systems that includes audio, video and application sharing are gaining momentum and increasingly becoming a reality. This paper describes one such system, called Java Collaborative Environment (JCE). The application sharing component of JCE allows the conference participants to share any application written in Java using its Abstract Windows Toolkit (AWT). The audio and video components of JCE are based on the new Java Media Framework (JMF). In this paper, we describe the problems encountered during the use of JMF in a conference with a large number of participants. The problems are caused by using the real-time transport protocol (RTP) within JMF for sending and receiving many audio and video streams. Our solutions to these problems are based on using a video multiplexer and an audio mixer. The video multiplexer is to display the videos of many participants into few video windows and the audio mixer is to combine all audio streams into one.

KEY WORDS: Multimedia Desktop Conferencing, Application Sharing, Multicasting, Java Media
Framework, RTP and RTCP

1 Introduction
As the Internet has gained popularity over the past decade, the need for collaborative multimedia conferencing and application sharing systems has risen signi cantly. Application sharing allows participants to view and interact with the same application (e.g., spreadsheet) during
NIST faculty and Professor of Computer Science at Old Dominion University, Norfolk, Virginia

their conference. These systems are beginning to play large roles in research, education as well as businesses. JCE (Java Collaborative Environment), which is developed by the NIST and in collaboration with Old Dominion University, uses Java-based collaboration mechanisms that provide solutions to overcome the platform-dependency problems for collaborative computing in heterogeneous systems. Java's main feature, which is making it an increasingly popular programming language, is that the bytecodes that are produced can be run on any platform which has a Java Virtual Machine. This enables application developers to write the application once and have it run anywhere. In JCE, mechanisms were developed to intercept, distribute and recreate the user events that allow Java applications to be shared transparently. All the Graphical User Interface (GUI) components de ned in the standard Java's Abstract Windows Toolkit (AWT) package (java.awt) are extended in JCE to implement the mechanisms. It also includes libraries and utilities that deal with low level issues such as network communications (e.g., multicasting the data between the conference participants), conference management (e.g., joining and leaving a session), and oor control management. Session participants can have real-time interaction with each other and remotely work together as a team. A comprehensive description of JCE application sharing mechanisms are described in 1]. The focus of this paper is on extending JCE to include Java Media Framework (JMF) to o er both audio and video. This enables us to have an integrated platform-independent desktop conferencing system. The paper is organized as follows. Section 2 describes JCE user interface and functions. Section 3 describes the problems encountered during the use of JMF in a conference that has a large number of participants. Section 4 describes a video multiplexer to deal with the video problem while Section 5 describes an audio mixer to deal with the audio problem. Finally, Section 6 gives our conclusions.

2 JCE Interface and Functions


Figure 1 shows the main interface of JCE. Through this interface, a person can join any of the on-going conferences or to start a new conference. We limit the number of conferences that a person may join to one, since practicably speaking, audio and video devices can only be used in the context of one conference. The JCE interface has the following basic components: 1. Menu Par: contains the traditional File, and Tools Available. 2. Participants list: contains the names of current participants. Clicking into a name, e.g. wahab, in this list will display an image of wahab and other known information about 2

Figure 1: Screen Dump of JCE Main Interface wahab (full name, email, phone). The image can be snapped life by wahab's camera and displayed as still image, or it can be retrieved from a well known archive directory if the picture can't be obtained life, (wahab's camera is turned o ). 3. Tool list: contains the names of tools being shared. 4. Floor Control: shared tools can be used by all participants according to several "Floor Choices". Among these choices are the "Request and Get", where a participant may click on the Request button to obtain the oor of the shared tools (therefore preempting the current oor holder). A user my release the oor at any time by clicking on the Release button. 5. Audio Control: to allow the user to increase/decrease/mute both the speaker and the microphone. Muting the speaker stops playing any audio while muting the microphone stops recording any audio. 6. Video Control: to turn the camera on/o . Turning the camera o prevents sending any video out of the localhost. 3

Figure 2: Screen Dump of JCE Video Interface 7. Video Windows: This is to show N participants. Practicably N should not be more than 4 windows. If the number of participants is greater than N , we use the "Floor Control" mechanism to decide which participants should occupy the N video windows. Figure 2 show JCE video windows when N =2.

3 Adapting JMF for Large Groups


Java Media Framework (JMF) developed by SUN Microsytesm, Silicon Graphics Inc. and Intel Corp. provides a platform-neutral framework for capturing and displaying time-based media such as audio and video 3, 6, 11, 12]. The JMF is designed to support most standard media content types and can be used to synchronize and present time-based media from diverse sources. This is in contrast to most of the existing media players for desktop computers which are heavily dependent on native code. The JMF provides an abstraction that hides device implementation details from application developers and scales across di erent protocols, delivery mechanisms, and media types. In addition, JMF makes it easy to incorporate any type of media in client applications and applets, while maintaining the exibility needed for more sophisticated applications and platform customization. Application programmers can use 4

JMF to create and control any standard media type using few simple method calls. Technology providers can extend JMF to support additional media formats or perform custom operations by creating and integrating new types of media controllers, media players, and media data sources. JMF has two major parts: media player and media capture. While the JMF media player has been released, the JMF media capture is still under development. Most of the current application uses of JMF is to play recorded media stored in les. Since the media capture capabilities are essential for multimedia conferencing applications we have decided to develop our own capture programs for a limited number of media formats and devices for UNIX and Windows platforms. While the programs are written in Java, it uses native methods for device handling. JMF uses the Real-time Transport Protocol (RTP) 13] for handling audio and video streams. The problem with using RTP in JMF is that, for each individual media stream, JMF creates a separate media player. Thus if there are M participants in a collaborative session, JMF creates M video and M audio players. This is acceptable only if the M is a small number (e.g., 2-5 participants). However, for many applications such as the Interactive Remote Instruction (IRI) system 8] for distance learning applications, the number of participants can be as large as 32 (a typical IRI session has between 12 to 32 students in addition to the teacher). In this kind of applications, it is neither possible nor manageable to create such large number of audio and video players at each participant's workstation. In the following we will discuss our solution to make the number of JMF video players (N ) a small xed number independent of the number of participants (M ). Typically, the relationship between N and M can be expressed as follows:

M >= N and 1 < N < 5.


In addition, will discuss how to e ectively manage and control the N video windows among the M participants. At any time each participant can show his/her captured videos in any one of the N shared windows. Moreover, we show how to reduce the number of JMF audio players from M to just only one audio player using an audio mixing technique. Our scheme allows each participant to speaks whenever he/she wants.

4 The JMF Video Multiplexer


Figure 3 illustrates our solution to the problem of reducing the number of video players for a collaborative session that has a large number of participants. In this gure, we have three 5

User1

Video Player

1
VG
Video Player

2
Video Sender User 1

User2

R1 S VG

Video Player

1
Floor Control Video Sender User2 R2
Video Multiplexer
W

Video Player

User3
Video Sender User 3
Video Player

1
VG
Video Player

Figure 3: The JMF video Multiplexer participants (user1, user2 and user3 ). At each participant's workstation, there are two JMF video players to display the videos streams into two windows (W1 and W2). Under this con guration, only two participants at any given time are allowed to capture and send their video streams to be displayed into the two JMF windows. The question now is how to restrict only two participants to be active senders at any given time?

4.1 Floor Control Component


Our solution in answering the above question is based on the concept of using "Floor Control" mechanism 4]. Accordingly, we associate with each video window a "token" and a participant must obtain the token associated with the window before he/she is allowed to send his/her video stream for display in a JMF window. In the example shown in Figure 3 since we have two windows, W1 and W2 , the oor control have only two tokens to manage. In the state shown in Figure 3, User1 has the token for W1 6

Payload Type

Sequence Number Time Stamp SSRC Identifer

Figure 4: The RTP header and User3 has the token for W2 . User2 at this state, is not allowed to send his/her video. He/she must rst negotiate, through the Floor Control component, to replace one of the other two users before his/her video can appear in one of the two windows. There are many possible policies and GUI interfaces for Floor Control 4]. One such policies/interfaces is "Click and Grab" where a participant may click on any desired video window Wi (in our example, i is either 1 or 2) or push a button associated with the desired window to grab the token of that window. This will cause a preemption of the current token holder from sending his/her video to window Wi and allow the new participant to replace him by capturing and sending his video to Wi . It is the responsibility of the Floor Control component to insure that at most one and only one participant is allowed to capture and send his video to any given window at any given time.

4.2 RTP protocol header


JMF requires the senders of audio and video streams to use the RTP protocol 13]. Although it is not requirement of the RTP protocol, usually each RTP packet is sent as a UDP packet which, in turn, is sent as an IP packet. Each RTP packet contains an RTP header which has the elds shown in Figure 4. Here is a brief description of each eld contained in an RTP header:

Payload Type: indicate what type of data is contained in each packet. E.g., for video frames
what type of encoding and compression is used.

Sequence Number: is used by the receiver to estimate how many packets are being lost. Time Stamp is used by the receiver to reconstruct the timing produced by the sender and
is used for synchronization of audio and video streams. 7

SSRC (sender source): identi er eld is used to distinguish each stream. The RTP pro-

tocol citeRTP-url speci es the rules needed to insure that the SSRC of each stream is globally unique within the session. The JMF player creates a new player of the appropriate type for each SSRC identi er.

4.3 Video Sender Component


Once a video component has the token for window Wi , it captures the user video and encapsulate each fragment of a video frame into an RTP header and send it as a UDP packet to the Video Multiplexer at udp port Ri. As soon as the token is seized by some other sender (via the mediation of the Floor Control as described earlier) the sender stops its capture and send operation. It will restart again only when it gets a token for a window. Note that the sender might change the destination port number for the udp packets during its operation depending on which window token it has. For window Wi it sends the udp packets to port Ri, where 1 <= i <= N , and N is the number of available video windows.

4.4 Video Multiplexer Component


As shown in Figure 3 the Video Multiplexer component has N udp receiving sockets, R1; R2; RN and one sending socket S . For each video stream i, i = 1; 2; :::; N , the multiplexer keeps a constant SSRCi and a counter SeqNumi . The RTP header of each datagram received from the socket Ri is modi ed as follows:

SeqNumi = SeqNumi + 1
Header.SeqNum = SeqNumi Header.SSRC = SSRCi Note that the two remaining eld of the RTP header, Payload Type and Time Stamp, are not modi ed by the multiplexer. After the RTP header of a packet is modi ed, the packet is sent out of socket S to the multicast address VidGrp. Each JMF video player creates a udp socket V G and joins it to the multicast group VidGrp. The VidGrp is speci ed by the user as a pair of IP multicast address and port number, e.g., 224.144.251.104:49152.

4.5 The JMF video player


We built a Java-based GUI interface to integrate all the video windows created by JMF into one frame, instead of having the windows scattered all over the desktop. Figure 2 shows a the 8

video interface for a conference with two participants. The JMF will never create more than N video windows since the Video multiplexer restricts the number of SSRCs to N distinct values. Each participant JMF video player creates a udp socket V G and joins it the video multicast group VidGrp as explained earlier.

5 The JMF Audio Mixer


Figure 5 illustrates our solution to the problem of reducing the number of audio players from M to just one. In addition, there is no restriction on how many persons can talk simultaneously. Thus any person at any time can speak without the need for a permission from a central authority. This is in contrast to the way the video is handled, where a token must be obtained before a person can send out his video as described earlier. The main di erence between audio and video is that we allow audio signals from multiple sources to be mixed and presented to one audio player. In general, video mixing does not make sense, expect for very special cases under tight human control and intervention. In gure 5, the Audio Sender process at each user workstation uses a silence detection algorithm to recognize when the user is talking. On detecting speech, the audio samples are captured and sent out as unicast udp packets to the Audio Mixer. The Audio Mixer has two udp sockets, a receive socket R and a send socket S . Any packet received from socket R are de-multiplexing based on the source host IP-address carried in every packet. For each new audio source, we create a new queue and packets belonging to the same source are inserted in the same queue. To limit the number of queues, we utilize the fact that maximum number of needed queues is equal to the maximum number of simultaneous talking participants and we use the less-frequently-used rule for paging allocation common in operating systems (see for example 9]) to manage the incoming audio queues. Periodically (the period is derived from the sampling rate of audio devices) the front samples of all the queues are mixed together to form one audio packet and inserted into a Mix que (see 2] for the details of the mixing algorithm). An RTP header is generated and added to each packet in the Mix que and the the packet is sent out of socket S to the multicast address AudGrp. Each JMF audio player creates a udp socket AG and joins it to the multicast group AudGrp. The AudGrp is speci ed by the user as a pair of IP multicast address and port number, e.g., 224.144.251.104:49154. Note that the choice of these numbers is arbitrary, but the user has to select di erent port numbers for the audio and video (e.g., in this case, the video port in Section 4.4 is 49152 while the audio port here is 49154). Since we only have one audio stream originated from the Audio Mixer, i.e. there is only one SSRC identi er, JMF will will create only one JMF audio player at each user workstation. 9

user1
Audio Sender user1 AG Audio Player 1

Audio Mixer
Q1 Audio Sender user2 R Q2 Q3 Mix Q S

user2
AG Audio Player 2

user3
Audio Sender user3 AG Audio Player 3

Figure 5: The JMF audio Mixer

6 Distributed Implementation
Our current implementation of the Video Multiplexer (Figure 3) and the Audio Mixer (Figure 5) are essentially centralized. Though reliability is, in general, an important issue in collaborative systems, we have chosen not to concentrate on this aspect at the initial phase of the project. In this section we discuss one approach to make our system more reliable by discussing a distributed implementation of our basic centralized approach. The proposed distributed implementation shares or reuses most of the components of the current centralized implementation. Figure 6 and Figure 7 shows the distributed version of the Video Multiplexer and the Audio Mixer, respectively. The details of each is described next.

6.1 Video Multiplexer


In this distributed implementation, we have N multicast groups, G1, G2 , ..., GN , one for each video window. A participant who has the oor to use window i will send the video to Gi . Now, we create a Video Multiplexer process for each participant (in contrast to only one process for the whole session as was shown in Figure 3). Socket Ri of each process joins the multicast 10

User1

R1 S

Video Player

1
Video Multiplexer

VG
Video Player

R2

2
Video Sender User 1

User2

R1 S

Video Player

1
Video Multiplexer

VG
Video Player

Floor Control

Video Sender User2

R2

User3
Video Sender User 3
R1 S

Video Player

1
Video Multiplexer

VG
Video Player

R2

Figure 6: Disrtiuted version of the JMF video Multiplexer group Gi . The internal operation of the process is essentially the same as the one described in Section 4.4, with th only di erence is that the RTP-packets will be send out of socket S to only its JMF player using unicast datagrams. Thus the JMF player socket VG will be a unicast UDP socket. Since the Video Multiplexer and the JMF are both running at the same host, we could explore the use of a more e cient interprocess communication mechanism other than the udp datagram sockets.

6.2 Audio Mixer


Figure 7 shows that we have replicated the Audio Mixer process for each participant. Each Sender i has an assigned multicast group Mi for sending its captured audio. The R socket of each Audio mixer process joins all the audio multicast groups. The internal workings of this process is the same as we described in Section 5. The only di erence is that the mixed audio packets are send in unicast datagrams to only its own JMF audio player. Thus the AG socket of 11

Audio Mixer
Q1 R Q2 Q3 Mix Q S

user1
AG Audio Player 1

Audio Sender user1

Audio Mixer
Q1

user2
Mix Q S

Audio Sender user2

Q2 Q3

AG

Audio Player 2

Audio Mixer
Q1

user3
Mix Q S

Audio Sender user3

Q2 Q3

AG

Audio Player 3

Figure 7: Distributed version of the JMF audio Mixer each audio player will not join any multicast group as was the case in the centralized implement ion. Similarly, we could nd a more e cient interprocess communication mechanism between the mixer and the payer rather than using a pair of udp sockets.

7 Conclusions
The Java Collaborative Environment (JCE) presented in this paper allows conference participants to use audio, video and to share Java applications among diverse systems such as UNIX workstations and Windows-based PCs. Through JCE any traditional or collaborative single-user Java applications can be shared and by using the JCE simple user interface, JCE participants may launch new shared applications and circulate the oor among themselves to control and manipulate these shared applications. Beside shared applications, audio followed by video, in this order, are important to support full and e ective collaboration among participants. Almost all PCs and Workstations now have audio devices (microphone and speakers), though they are often not compatible with each other and may use di erent audio formats. Thus, the use of JMF ensures that all participants can talk and hear each other without being concerned about compatibility between their respective devices. While the JMF audio 12

and video players have been released for almost one year now, the capture part is still under development and there is no de nitive date of when it should be available. Therefore, we have implemented a limited java-based capture program to help us experiment with JMF player. The problems encountered during our experiments with JMF and novel solutions to the encountered problems were described in this paper.

References
1] H. Abdel-Wahab, B. Kvande, O. Kim and J.P. Favreau, "An Internet Collaborative Environment for Sharing Java Applications", Proceedings of the 5th IEEE workshop on Future Trends of Distributed Computing Systems (FTDCS'97), Tunis, Tunisia, pp. 112-117, October, 1997. 2] A. Gonzalez and H. Abdel-Wahab, "Audio mixing for interactive multimedia communications", Proceedings of Fourth International Conference on Computer Science and Informatics, 217-220, Durham, NC, Oct. 1998. 3] B. Day, Java Media Players, O'Reilly, 1998. 4] H-P. Dommel and J.J. Garcia-Luna-Acves,\Floor Control for Multimedia Conferencing and Collaboration", ACM Journal on Multimedia Systems, Vol 5, No. 1, January 1997. 5] B. Eckel, Thinking In Java, Prentice-Hall, 1998. 6] R. Gordon and S. Talley, JMF Java Media Framework, Prinice-Hall, 1998. 7] O. Kim, P. Kabore, J.P. Favreau and H. Abdel-Wahab, \Platform-Independence Support for Multimedia Desktop Conferencing and Application Sharing", Proceedings of Seventh IFIP Conference on High Performance Networking (HPN'97) White Plains, New York, pp. 115-139, (May 1997). 8] K. Maly, H. Abdel-Wahab, C.M.. Overstreet, C. Wild, A. Gupta, A. Youssef, E. Stoica, E. Al-Shaer, \ Distance Learning and Training over Intranets", IEEE Internet Computing, Vol 1, No. 1, pp. 60-71, 1997. 9] A. Silbershatz and B. Galvin, Operating System Concepts Fifth Edition, Addison Wesley, 1994. 10] R. Steinmetz and K. Nahrstedt, Multimedia: Computing, Communications & Applications Prentice-Hall, 1995. 13

11] S. Sullivan, L. Winzeler, J. Deagen and D. Brown, Programming with the Java Media Framework, John-Wiley, 1998. 12] http://www.javasoft.com/products/java-media/jmf 13] http://www.cs.columbia.edu/ hgs/rtp/

14

You might also like