Wireless Transmission of Video using WLAN

by Alexander Conrad Stevens School of Information Technology and Electrical Engineering, University of Queensland.

Submitted for the degree of Bachelor of Engineering
in the division of Mechatronic Engineering

November 2011.

November 6, 2011 The Head of School School of Information Technology and Electrical Engineering University of Queensland St Lucia, Q 4072

Dear Professor Paul Strooper, In accordance with the requirements of the degree of Bachelor of Engineering in the School of Information Technology and Electrical Engineering, I present the following thesis entitled “Wireless Transmission of Video using WLAN”. This work was performed under the supervision of Dr Konstanty Bialkowski. I declare that the work submitted in this thesis is my own, except as acknowledged in the text and footnotes, and has not been previously submitted for a degree at the University of Queensland or any other institution. Yours sincerely,

Alexander Conrad Stevens

To my parents and grandparents; for supporting my ambitions through life. . .

I would love to show my earnest appreciation for the guidance from my supervisor, Dr Konstanty Bialkowski. He was patient, informative and guided me in the right direction when I strayed from the path of this work. It would be a pleasure to continue to liaise with him in the future. I would also like to thank Mr Ross Finlayson of Live Networks, Inc. - for the Live555 streaming libraries; and Jason Garrett-Glaser and his team of x264 developers - for the x264 H.264/AVC encoding library. Their contributions to the open source community and providing the means for myself to complete this thesis, will be forever acknowledged.


A call for efficient use of wireless networks for streaming of video has become apparent in today’s age of on-demand content. This is ever more evident in the Unmanned Aerial Vehicle (UAV) research and development sectors - streamed video must be delivered of reasonable quality, in real-time and of a high framerate over 802.11b/g/n wireless LAN for the research solution utilising the video. Further compounding the task is that most research and consumer UAVs are of a form factor far too small for the average consumer desktop computing solution. This thesis is inherently on the topic of providing a complete open-source software solution that runs on a low power, light-weight computing platform and provides the aforementioned features. Developed within this software is a custom rate-control control algorithm that utilises the Live555 Real-time Transport Protocol (RTP) and Real-time Transport Control Protocol (RTCP) library, the x264 H.264/AVC encoding library and the Video4Linux2 application programming interface. The software solution is then built and ran within a Linux distribution named Ubuntu, upon an ARM development platform called the Pandaboard. This solution has been tested and optimised for a Pandaboard mounted atop the popular Parrot AR.Drone consumer UAV. It has proven that streaming high framerate H.264 video from a UAV platform is achievable through various rate control techniques, like on-the-fly resolution adjustments and adjusting H.264 quality parameters. However, without the use of an H.264 encoder optimised for the Pandaboard’s Digital Signal Processor, the Pandaboard cannot encode video of high enough quality to saturate the wireless network. That is until, the wireless LAN is at the limits of its range or the wireless LAN is negotiating heavy traffic.


Acknowledgments Abstract List of Figures List of Tables 1 Thesis Overview 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Theory 2.1 The H.264 Codec . . . . . . . . . . . 2.2 IEEE 802.11 Wireless and Protocols . 2.3 The RTP and RTCP Protocols . . . 2.4 Rate Control for Streams . . . . . . . i ii v vi 1 1 2 3 5 5 7 8 11 13 13 15 15 16 18 20 20 22 23

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

3 Literature Review 3.1 Military Implementations . . . . . . . . . . . . . 3.2 The AR.Drone by Parrot SA . . . . . . . . . . . 3.3 UAV Traffic Surveillance . . . . . . . . . . . . . 3.4 Real-time Encoding and Transmission of H.264 3.5 Adaptive Rate Control for RTP Streams . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 Design of Platform 4.1 Choice in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The UAV Platform and Camera . . . . . . . . . . . . . . . . . . . . . iii

iv 5 Design of Software 5.1 Video4Linux2 . . . . . . . . . . . 5.2 x264 Open Broadcast Encoder . . 5.3 RTP and RTCP with Live555 . . 5.4 The spyPanda Control Algorithm

CONTENTS 26 27 28 30 31 34 34 36 38 40

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

6 Results of spyPanda 6.1 Pandaboard Encoded Framerate Results 6.2 Pandaboard Output Bitrate Results . . . 6.3 Jitter and Latency Results . . . . . . . . 6.4 Discussion on Picture Quality . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

7 Conclusions 42 7.1 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . 42 7.2 Possible future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A Program listings B Companion disk B.1 Main C File . . . . . . . . . . . . B.2 Video4Linux2 Implementation . . B.3 x264 Implementation . . . . . . . B.4 Live555 Implementation . . . . . B.5 Miscellaneous C Implementations B.6 Sample Results . . . . . . . . . . B.7 Report LaTeX Source and Items . B.8 This Report . . . . . . . . . . . .

45 46 46 46 46 46 47 47 47 47

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

List of Figures
2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri Aimonen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The RTP Header as defined in RFC-1889 [25] . . . . . . . . . . . . . 9 Receiver Report for the RTP Control Protocol [25] . . . . . . . . . . 10 The MQ-1 Predator in full flight [3] . . . . . . . . . . . . . . . . . . The RQ11 Raven being hand launched [15] . . . . . . . . . . . . . . The Parrot AR.Drone with forward facing camera [21] . . . . . . . The proposed UAV traffic surveillance system [27] . . . . . . . . . . The Texas Instruments results of the real-time encoding algorithm. [26, Page 4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of TFRC use in an RTP/RTCP environment [29] . . . . . Results of the TFRC method vs. no method [29] . . . . . . . . . . . PID method employed by the Tos and Ayav’s Method [31] . . . . . The BeagleBoard-xM and its Peripherals [5] . . The Pandaboard and its Peripherals [9] . . . . . The AR.Drone, Pandaboard and Logitech C910 without protective Styrofoam . . . . . . . . . . . . . . . . . . 13 14 15 16 17 18 19 19

. . . . . . . . . . . . 21 . . . . . . . . . . . . 22 Camera - with and . . . . . . . . . . . . 25

5.1 5.2 6.1 6.2 6.3 6.4 6.5 6.6

Overview block diagram of the spyPanda software . . . . . . . . . . . 26 Illustration displaying stride [11] . . . . . . . . . . . . . . . . . . . . . 33 Framerate attained over progressions of dropped resolutions Bitrate attained over progressions of dropped resolutions . . Jitter experienced in the stream . . . . . . . . . . . . . . . . Stable flight of AR.Drone at 320x240 pixels . . . . . . . . . Unstable flight of AR.Drone at 320x240 pixels . . . . . . . . High motion flight of AR.Drone at 320x240 pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 37 39 40 40 41


List of Tables
5.1 6.1 Sample of spyPanda’s ordered linked list of resolutions and framerates 27 Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3 . 36


Chapter 1 Thesis Overview
1.1 Introduction

Generally, video streaming is performed on wired networks providing almost unfettered access to multimedia content for the end-user. However, one can see the benefits of a device utilising existing and cheap wireless networks to provide video content - whether it be in real-time or on-demand. The same applies for Closedcircuit TV (CCTV) security systems - where the video is streamed through physical wires to a base station - could be improved by using a modular wireless system. The choice between wired and wireless networks though, is there for the consumer to decide. However, in an application like an Unmanned Aerial Vehicle (UAV), it can be considered almost impractical to tether the UAV to a CAT5e/6 LAN cable to provide the end-user with the UAVs on-board video. So proprietary digital and analogue solutions have been developed to stream video over existing wireless technologies. These solutions generally have static and unchangeable configurations unless licensing agreements are met, and these agreements can sometimes be costly to the party requiring them. Third and fourth generation (3G/4G) mobile networks have been utilised by video call applications on mobile phones to provide real-time conferencing to loved ones and colleagues; though, these networks are costly, even when rented from a carrier. So for a portable and cheap streaming solution, existing and widespread networks are required - which leads to the next point. In today’s day and age, almost every home in the developed nations around the world can purchase and set up their own 802.11 wireless network in as little as 1 hour with little to no configuration. This IEEE standardised technology requires no licensing fees, can be extended by directional antennae to reported record ranges of up to 382 kilometres [23], and provides 1



sufficient network bandwidth capacity to stream high quality video to an end user. To stream real-time video using a digital wireless technology like 802.11, it would require the use of a well established video codec. The video codec is designed to use motion prediction estimation and various other techniques to compress video. Though, like many forms of intellectual property, some codecs require royalty fees if the video streaming product is to be sold. For the most part, a research organisation can use such codecs without restriction, and there exists open-source implementations of quality codecs like WebM/VP8, Vorbis and H.264/AVC, all free for modification and use. Many of these codecs are used in home appliances already, for example, the H.264/AVC video codec is used in the Blu-ray Disc TM technology and both MPEG-2 and H.264/AVC are used for Digital Television in many countries because of their high compression. Once a codec is chosen, the video can be compressed and sent through the wireless network to the client. Some companies implement their own protocols which encapsulate and packet the video ready for streaming, only to force clients into a product lock-in. For hobbyists, enthusiasts and research institutions, most choose an open protocol named the Real-time Transport Protocol (RTP) and its sister protocol, the Real-time Transport Control Protocol (RTCP). They are both standardised under the International Telecommunications Union, and provide a means of encapsulating the video and receiving network and video statistics from servers and clients. Aggregating all of these technologies will make it possible to create a complete system for real-time streaming of video over wireless networks from a UAV. Already, there is a consumer product called the AR.Drone as developed by Parrot SA which provides a full UAV platform with a video camera, a custom ARM based platform running Linux, and their own proprietary implementation of real-time video streaming. However, the software is not open-source and is limited in customisation to only the provided application programming interface (API).


Aim of Thesis

Like the video streaming software upon the AR.Drone by Parrot SA - and as the introduction implies - the objectives of the thesis is develop an open-source software solution for real-time, adaptive video streaming at reasonable qualities, and high framerates. It will implement libraries to interface to a camera, compress the captured video frames using an encoder (MPEG-2, H.264, VP8, etc), and stream using



the standardised RTP/RTCP protocols. On top of all of these libraries, it will then implement a control algorithm that adjusts various aspects of quality to maintain a real-time and high framerate video stream. The software will be optimised, built and ran upon a low-power, high performance platform small enough to be then mounted atop either a custom or pre-built UAV for testing purposes. These points can be summarised in the following: • Real time encoding of high quality captured video on board the portable platform • Adaptive control of bit-rate via an algorithm and wireless connection feedback • Monitor the quality of wireless transmission for dropped Intra-frames • Maintain an acceptable latency between capture of video and display on client (under 200ms) • Ensure end-user can display video stream (standardised streams, or a custom built client) • Ensure that the hardware on the portable platform can perform all associated tasks while keeping a minimal footprint on power usage and size The main reason for developing this thesis, is that open-source implementations that provide streaming from the camera, all the way to the client is hard to come by. There’s implementations that stream video from a camera over RTP, like the media player VLC, but they generally have no rate control aspects. Also, most of these implementations are only tested on full desktop computers with x86 architectures and are not optimised for use on small development boards.


Thesis Outline

The following chapters should clearly explain: • Chapter 2, Theory A summary of the H.264 codec, IEEE 802.11 Wireless Standard, and the Realtime Transport Protocol and Real-time Transport Control Protocol • Chapter 3, Literature Review Examples of prior work placed into the field of real-time streaming of video, real-time encoding, and other works that relate to this thesis.

4 • Chapter 4, Design of Platform


An explanation of the decisions in hardware and operating system used to test and develop the final program. • Chapter 5, Design of Software A walkthrough of the design and steps taken to develop the “spyPanda” software used to stream the adaptive real-time video. • Chapter 6, Results and Discussion Results from the designed software, and what could have been improved or implemented. • Chapter 7, Conclusions and Future Work A summary of how well the designed software fared for the objective, and a short list of possible future work to improve on the platform.

Chapter 2 Theory
2.1 The H.264 Codec

First off, the H.264 standard was developed and designed by the ITU-T (International Telecommunications Union Standards Department), and was designed to complement existing and future networking technologies [34]. Built upon the prior H.263 standard, it aimed to reduce the bit-rate by 50% [20] compared to previous standards; however, from this, increased computational complexity was inevitable. While the reduction in bit-rate is perfect for the next section Real Time Wireless Transmission of Streamed Video, it unfortunately brings up another issue for this section. With increased computational complexity, many options arise to alleviate strain on the processor: 1. Optimise the encoder for the specified processor architecture; which V. Iverson, J. McVeigh and B. Reese have explained for Intel architectures [20] 2. Choose hardware that allows for hardware based H.264 encoding [8] 3. Choose hardware that has optimisations written within the x264 encoder [18] 4. Use a rate control algorithm to dictate limits on the encoded picture and hardware; as previously investigated by N. Srinivasamurthy, S. Nagori, G. Murthy and S. Kumar, TI [26] From these selections though, it can be clearly seen that a rate control algorithm (option 4) can be used in conjunction with any of the other options to great effect. However, to optimise the encoder for the architecture (option 1) is mostly mutually exclusive with choosing hardware that already has optimisations in x264 (option 3). So from this quick analysis, it can be seen that the most optimal results for real-time 5



playback can come back from choosing hardware with optimisations for the x264 encoder, hardware with a hardware based H.264 accelerator and to develop a rate control algorithm. Options 2 and 3 have been briefly browsed over in the previous section, where the PandaBoard is an ARM based Cortex A9 processor which has NEON optimisations from x264 [18] and a digital signal processor which provides hardware based encoding of H.264 video [8]. This leaves option 4 with a form of rate control to dictate the encoders limits to successfully provide 30 frames per second at real time. Potentially though, option 4 could be made redundant by options 2 and 3 if the manufacturer specifications are upheld, but having option 4 as a secondary system linked with the adaptive control in the next section is still a possibility. As a preface to real time encoding of H.264, it was essential to get an idea of the output of a H.264 encoder. As a simplified description, the encoder will split the captured picture into a series of blocks (as small as 4x4 pixels per block). Each block is then placed through a discrete cosine transform (DCT) to convert the block to its frequency components, this is the first step to compression. Depending on the type of frame that is split into blocks, these blocks can also be turned into secondary layer luma and chroma types, to once again compress the picture. These blocks combined, make up a total frame when decoded. [34] [16] For every number of frames in a video sequence, an Intra-frame is taken; a frame in which contains essentially a full representation of a picture in the video. Although, it itself uses Intra-frame prediction, in which analyses pixels next to each other, approximates the region and hence compresses the frame further. In-between every gap of Intra-frames, there exists two types of Inter-frames the Predicted-frames and Bi-directional Predicted-frames. These use the previous Intra-frames and each other to approximate and predict movement (P-Frame and B-Frame) and also use the next frames and Intra-frame to predict future movement (B-Frame); this method is essentially a supreme form of compression. [34] [16] The three frames can be summarised as so and are demonstrated in the following illustration (Figure 2.1): • Intra-frame (I-frame): Relies on only itself for decompression, minimal compression • Predicted-frame (P-frame): Relies on previous frames to decompress, medium compression



• Bi-directional Predicted-frame (B-Frame): Relies on previous and succeeding frames, maximum compression

Figure 2.1: Example of I, P, and B-Frame Prediction. Picture courtesy of Petteri Aimonen After each frame is encoded, the encoder will then split them up with descriptive headers into network packet sized portions. This is the Network Abstraction Layer of the H.264 standard, and is designed to effectively pass streamed video over a network technology. [34] [16] So, a simple definition to real time encoding is not that an outputted frame from an encoder is instantaneous; instead the frame is captured, encoded and packeted in under 33.3 milliseconds with a frame rate of 30 frames per second.


IEEE 802.11 Wireless and Protocols

Matthew Gast’s book, 802.11 Wireless Networks: The Definitive Guide [19] talks about the underlying principles behind the 802.11 standard. The book explains the Media Access Control (MAC) layer and its attempts to hide the complexity of the wireless system underneath. Strength of signal, hidden nodes, obstacles and the clients distance away from the base station effect the quality of transmission significantly. Many fall-backs have been designed into the MAC layer, to effectively send data over wireless space. In high wireless capacity environments say that of a University campus the hidden node problem is clearly evident. It creates dead spots between two base stations in which can only be mitigated by RTS/CTS clearing, these being Request to Send and Clear to Send frames that are sent between wireless base stations. Effectively, a base station in a busy environment, will send out a request to silence the other base stations in proximity and their peers to stop collision of signals. Once the silence has been completed, the other base stations will return a CTS frame to allow transmission of the main frame; after the main frame is sent from the original base station, the original should expect to receive an ACK (or acknowledgement) frame



back. If any of these frames are not completed, the process of sending the frame restarts, and hence transmission latency increases. [19, Pages 34–36] Whenever there’s a drop in a frame, the frame is attempted to be resent, and a retry counter is placed on that frame. Each time the counter increases, the time allocated to retry a send in that transmission increases (called the contention window), and hence this increase of time takes up more bandwidth, and increases the latency of sent and sending transmissions. To make the matter even more complicated, each frame is allocated a time called the Duration/ID field, which notifies the recipient to expect a certain busy time for the transmission currently in progress. Fragmentation also has an effect on latency, with packets being sent out by the home base station there is a certain amount of fragmentation that could occur, if any fragments are lost though, the whole packet will have to be resent. All of these potential problems increase latency, and so it is important to control the data size for each frame and hence packet being sent to maintain real time wireless transmission. [19, Pages 50 & 57] Another contributor to latency, is the Transmission Control Protocol (TCP). It essentially lies over the top of the MAC layer to ensure highly reliable transport of frames to other wireless recipients. However, in the situation of transmitting real time video over a mobile network, TCP proves to be a set back, as can be seen in [32]. PRSCTP and UDP (Partially-Reliable Stream Control Transmission Protocol and User Datagram Protocol) are seen to generally transmit frames of MPEG-4 video in under an average of 25 milliseconds, while TCP is seen to do similar, but unfortunately would spike in latency to in-excess of 2.5 seconds per frame for as many as 25 frames [32, Figure 7]. This leads to a conclusion that a UDP based protocol would be needed to stream video far more effectively.


The RTP and RTCP Protocols

Leading on from the previous section, it was explained that for the application of a real-time stream within a 802.11 wireless network, one would need to the use of a UDP based protocol or similar. This leads to the Real-time Transport Protocol and the Real-time Transport Control Protocol. The Real-time Transport Protocol (under RFC-1889 [25]) was designed for media and data with real time characteristics. It essentially lies upon existing network infrastructure (like TCP and more generally UDP), and packets the media with a



header that contains a sequence number, a time stamp, a payload type description and various other RTP specific identifiers. This header can also be extended to facilitate custom implementations of the RTP scheme. The header is laid out in this fashion (Figure 2.2):

Figure 2.2: The RTP Header as defined in RFC-1889 [25] There is one problem though with the default RTP header, and that is that it does not include statistics about the stream. This was an intentional design, as if every RTP header packeted with the payload included statistics about network latency, quality, etc. would include too much overhead for a real-time implementation. This is where the RTCP implementation was introduced. The RTCP protocol was designed to overcome this question of Quality of Service (QoS). It comprises of 5 different packet types [25]: • Sender Report (SR): Includes transmission and reception statistics from participants that are active senders, and are sent to the receivers • Receiver Report (RR): Includes transmission and reception statistics from participants that are not active senders, and are sent to the senders • Source Description (SDES): Includes optional details about the source • Disconnection (BYE): Indicates end of participation • Application Functions (APP): Indicates custom functions as specified for the target application For the purpose of this thesis though, it is only necessary to have an in depth knowledge of the receiver report, since it is the server that controls the encoding



parameters. The receiver report (as can be seen in Figure 2.3), is comprised of the details of the source, the fraction of packets lost since last sender report, the total number of packets lost, highest sequence number received, the interarrival jitter, time of last sender report, and the delay since the last sender report and the time that this receiver report was sent.

Figure 2.3: Receiver Report for the RTP Control Protocol [25] For streaming of video the important aspects are interarrival jitter, fraction of packets lost and the delay since last sender report. Interarrival jitter is essentially the estimate of order that a stream has from the expected details of the video/audio. If the data comes through unordered, not within the expected time stamp, larger than expected, etc. it increases the jitter associated with the stream. The delay since last sender report essentially is the time that the client takes to receive, process and play the media since the last sender report. These statistics could theoretically be used to drop the quality of the picture if a clients computer is not powerful enough to play the stream. The fraction of packets lost can also be used in conjunction with the total number of packets lost to calculate the total number of packets received.

2.4. RATE CONTROL FOR STREAMS This is demonstrated in the following equation: CurrentP acketsExpected = Losti − Losti−1 i=0 F ractionLost



Even though RTP has Quality of Service statistics through RTCP, it does not actually perform any guarantees of delivery and requires the underlying network infrastructure (like TCP) to handle this. As mentioned previously though, RTP is generally built upon the UDP infrastructure - which has a “send and forget” principle. So Sunhun Lee and Kwangsue Chung investigate the Real-time Transport Protocol implemented with a TCP-Friendly Rate Control Scheme [22]. This is to avoid congestion collapse developed from UDP packets and their unwillingness to adhere to TCP’s congestion control. However, with their TCP-Friendly RTP, they were still encountering about 160 millisecond round trip times [22, Figure 3]. In the case of the thesis though, it can be assumed that there will be no interfering TCP traffic coming from either the UAV or the base station receiving the video. So hence, a pure method of using RTP could be used to develop UDP-like transmission times with slightly more reliability. The benefit of using RTP in this case, is that the H.264 encoder packets its data suitable for UDP, TCP and RTP/RTCP formats through its NAL standard. If we proceed one step further, a sister protocol of RTP as mentioned before, is the Real-time Transport Control Protocol. It’s primary purpose is that of the RTP protocol, plus it receives information back from the clients about the quality of the data distribution [30]. It was designed for multiple-user environments, but makes acquirement of network information simple for a unicast system. Another benefit is that RTP/RTCP streams are supported through the Live555 project [17] within the Videolan media program, VLC and also within the MPlayer series of video players.


Rate Control for Streams

The basic objective of rate control for video streams is to minimise the throughput enough, so that the medium can keep a consistent flow. This is quite analogous to the flow of traffic in a busy stretch of road. If too many cars are sent down the stretch of road, the more they pile and cause congestion. However, if less cars are sent, an even flow of cars can pass unhindered. The traffic can be adjusted until the near point of congestion, but not enough to cause it. This is in its simplest form, a form of bit-rate rate control. Bit-rate rate control can be used by streams utilising an encoder, to drop the out-



put bit-rate from the encoder enough so that the network can handle the stream. A stream though, can also require too much of the encoder where the encoder cannot stream high quality video at the maximum allowable bit-rate of the network. In this case, the program has to perform bit-rate rate control on the encoder itself to allow maximum throughput that is possible of the encoder. These methods can be implemented by a simple PID algorithm [33] (Equation 2.2), depending on the implementation of the encoder. Err = BitrateAim − BitrateCurr Err−ErrOld Derivative = T imeCurr−T imeOld Integral = IntegralOld + Err ∗ (T imeCurr − T imeOld) OutputBitrate = Kp ∗ Err + Ki ∗ Integral + Kd ∗ Derivative (2.2)

This is a simple yet effective solution to obtain a desired bitrate for the stream. However, it can be seen that BitrateAim shall need to be initialised and changed according to the conditions of the network or limits of the encoder. Bitrate could also be interchanged for different variables; like packet delay, jitter, network bit-rate, encoder bit-rate, encoder quality, resolution etc.

Chapter 3 Literature Review
In the following literature review, we will investigate prior or similar attempts to the problem of streaming H.264 video in real time over wireless, and each of its inherent difficulties. The difficulties in: a suitable portable platform to encode video in real time; encoding H.264 video in real time; and, streaming the encoded video over 802.11 wireless in real time. Other implementations using custom or existing technologies will also be reviewed, and most or all are mostly related to the field of UAV surveillance.


Military Implementations

There are a few military applications of surveillance using UAVs have been implemented, like the General Atomics MQ-1 Predator and the cheaper AeroVironment RQ-11 Raven. These quarter million (1x Raven Craft, and Control System [2]) and 40 million dollar (4x Predator Craft, and Control System [3]) systems are inaccessible to the civilian populace.

Figure 3.1: The MQ-1 Predator in full flight [3] 13



The MQ-1 Predator contains a colour nose camera, a day variable-aperture TV camera, a variable-aperture infrared camera, and synthetic aperture radar. These cameras provide full real-time video (except the radar) but utilise a direct line of sight proprietary wireless link and a satellite link for beyond horizon flight. Whereas the Raven provides 3 different attachable cameras that connect to the nose of the UAV. The first has both a forward facing and side facing camera, the second is a nose mounted infrared camera and the third is a side mounted infrared camera. These camera feeds are provided in real-time through a line of sight proprietary wireless link. The Raven looks remarkably similar to a hobby remote control plane (as can be seen in Figure 3.2); however, instead of a 20 minute run time like that of a hobby plane, it provides 50-60 minutes of flight. Notably though, these system have had many years of development and military backing to provide a sophisticated and extremely reliable system. However, there are other, cheaper alternatives for surveillance UAVs.

Figure 3.2: The RQ11 Raven being hand launched [15]




The AR.Drone by Parrot SA

It is not only militaries around the world that are using UAVs; civilian companies are developing commercial UAVs for educational use and entertainment. One such company that is making an impact in the civilian sector is Parrot SA with their quad-rotor (or quadrocopter) AR.Drone - named so for the Augmented Reality games it provides.

Figure 3.3: The Parrot AR.Drone with forward facing camera [21] It utilises a 640x480 pixel forward facing camera providing video at 15 frames per second, and a downward facing camera providing 176x144 pixels at 60 frames per second. The custom implemented P.264 video is streamed in real-time (about 100ms latency) via an 802.11 ad-hoc connection to a computer or smart-phone and allows the user to control the UAV. For a civilian application, it is rather cheap at US$300 for a relatively customisable system. This platform will be mentioned in later chapters as it will be used in testing and comparison.


UAV Traffic Surveillance

Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong and Janice McNair of the University of Florida, have explored the use of Surveillance UAVs [27]. The problem was that the Department of Transportation had to upgrade their surveillance of highway traffic, from magnetic loop detectors to something far less primitive. So they investigated the potential of UAVs in use to quantify traffic conditions in real time.



These UAVs could be dispatched over specific and/or vast amounts of area in short amounts of time as opposed to fixed cameras. By using this system, there would be no need of wired infrastructure, the savings would have been considerable.

Figure 3.4: The proposed UAV traffic surveillance system [27] However, they proposed a system in which the UAV would transmit the video data directly to an external computer to encode, instead of compressing it in real time on-board the system. This approach has one major drawback: limited bandwidth for higher quality video. The video and added data would then be transmitted to a microwave based tower from the UAV based on existing TV station infrastructures. This however, has added cost because of FCC regulations due to the licensing and use of certain bands. This problem could have been mitigated by using a direct line of sight, cheap, unlicensed, 802.11 wireless technology (like the Venezuelans [23]) to stream the video back to base.


Real-time Encoding and Transmission of H.264

As described in the theory section, The H.264 Codec (Section 2.1); it was explained that real time video is defined as the time taken to capture a frame, encode and packet the frame within the allocated framerate. For example, under 33.3 millisec-



onds for 30 frames per second video. With this in mind, the Texas Instruments team (Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy and Satish Kumar ) proposed in their findings within [26] that a rate control algorithm for real time encoding would limit the total packeted count per frame, and hence the maximum size of each NAL unit. This is due to the latency involved in creating more headers for each NAL unit, which when minimised will help minimise the amount of data sent through a transmission stream. In addition to minimising the NAL unit count, they proposed that each slave processor (like the DSP on the OMAP4430) have an offloaded pipeline task that computes the motion estimation, intra-frame prediction, transformation and quantisation of residual picture, etc. [26, Pages 2]; while the main processor computes the main macro-block loop. However, by the results of the TI team and encoding 1080p video, in reducing the picture size and hence the NAL units, quality loss can be expected when vast amounts of motion or if a complex scene is encountered. Although, the overall quality in a non-complex video is increased since the encoder is given more time to process a frame. The efficient use of encode times in their findings can be seen in the following graph (Figure 3.5).

Figure 3.5: The Texas Instruments results of the real-time encoding algorithm. [26, Page 4]




Adaptive Rate Control for RTP Streams

There are generally two different camps when it comes to streaming multimedia with rate control over RTP. One is that uses RTP on top of a UDP framework, and the other is one that is TCP friendly. TCP-Friendly Rate Control (TFRC) essentially utilises the UDP layer with a control method that competes with TCP packets in the network, in a fair manner. This is the topic of Ktawut Tappayuthpijarn’s (and his team) work in adaptive video streaming over a mobile network [29]. The objective of this scheme, was to optimally control a video stream over the said mobile network using the extended H.264 Scalable Video Coding standard and a RTP based, TFRC method to send the data. It utilises the feedback received from RTP and RTCP in the form of the following feedback shown in Figure 3.6.

Figure 3.6: Example of TFRC use in an RTP/RTCP environment [29] From this feedback, the server can then calculate the expected rate to adhere to while being fair to TCP packets within the network - instead of flooding the connection. This method essentially doubles the receiving rate to acquire the sending rate (X) until it saturates the network, in which it uses the method shown in Equation 3.1 [29]: X = min(xtcp , 2 ∗ ReceivingRate) xtcp = R
2p 3

s + tRT O (3
3p ) 8

(3.1) ∗ p(1 + 32p2 )

Where s is the packet size, tRT O is the packet timeout p is the loss event rate, and R is the Round Trip Time. This is an excellent method in which to calculate an expected bit-rate to send over a network. However, their implementation is tailored to resend dropped packets in the application up to 4 times, and as can be read in Chapter 2.2, it will extend latency of sent packets. This means that the TFRC method, while useful for streaming, is not an excellent real-time source for multimedia, and this is confirmed in the results that can be seen from Figure 3.7



Figure 3.7: Results of the TFRC method vs. no method [29] This is not the end of the line for rate control over RTP though. In the application of this thesis, the platform only has to compete with SSH (Secure Shell) over a wireless connection - which inherently uses low network capacity. So the use of a plain UDP based RTP/RTCP connection with a simple PID control algorithm like the one used by Uras Tos and Tolga Ayav in their Adaptive RTP Rate Control Method [31] - can suffice.

Figure 3.8: PID method employed by the Tos and Ayav’s Method [31] The method essentially uses the same PID algorithm in Equation 2.2, except uses packet loss fraction and outputs an expected bit-rate. It is then passed through a limiting function (L(u(t))), to clip between the lowest and highest bit-rates. This method can provide low latency, low loss results in single streams applications.

Chapter 4 Design of Platform
4.1 Choice in Hardware

To meet the requirements set in place for the thesis, the base hardware has to receive, encode and transmit the video captured via an 802.11 based network. This has to be done while in a portable scenario and because of this, it has to be of a power efficient nature. Capturing video and transmitting it, are fairly trivial requirements to be satisfied. Most development boards contain some kind of 802.11 wireless technology, and a video camera can be connected via USB (for example, a USB web-cam). This narrows the search down to potential candidates that can handle the processing needed to encode video. Processors from Intel have graced the offices of the business sector and also the homes of the consumer market. Intel though, has mainly designed high processing capability processors with immense power requirements, much like those of the i7 line of processors with a measured system load of 80 Watts in Idle and a system draw of 128 Watts under load [12]. While these processors are more than capable of encoding H.264 video in real time [12], they are in no real position to be placed on a mobile platform and powered by battery. Since a generalised CPU is inefficient for the task of encoding a video stream, alternatives had to be found. Two contenders to the development board market, the newly revised BeagleBoard-xM [4] and the PandaBoard [8], are perfect for a development environment, since they support expansion headers, UART, JTAG debugging and are extremely power efficient (in the sub 5 Watts range). Both boards contain ARM based processors with digital signal processors, in which ARM NEON optimisations are supported by the x264 encoder. Although, since these boards are of ARM based 20



RISC architecture, the only options for operating systems are only Linux, Android, QNX, Symbian OS and Windows Mobile CE.

Figure 4.1: The BeagleBoard-xM and its Peripherals [5] Both are similarly equivalent in respects to hardware, except for their processor packages and that the BeagleBoard-xM has no built in 802.11 functionality and has only 512MByte of LPDDR RAM. Overlooking the absence of WLAN in the BeagleBoard which can be integrated in via USB anyway we can see that the Texas Instruments DM3730 [6] media core in the BeagleBoard is essentially an OMAP3 based media core but with a 1GHz ARM Cortex-A8 processor. The DM3730 has built in digital signal processing based on the C64x+ line of DSP’s produced by Texas Instruments (TI), which claims to enable 720p decoding and encoding, but unfortunately doesn’t state any specific codec. The BeagleBoard measures in at a meagre 3.35 inches by 3.45 inches and weighs 37 grams; as can be seen in Figure 4.1. On the other hand, the PandaBoard supports a Texas Instruments OMAP4430 [7] media core. This has a dual core 1GHz ARM Cortex-A9 MPCore, in which is joined by an IVA3 based hardware accelerator with a similar C64x+ DSP, which TI claims to provide 1080p H.264 encoding of up to 30 frames per second (with C64x+ optimisations enabled). The PandaBoard also provides 1GB of DDR2 RAM along with built in 802.11b/g/n based wireless and is slightly larger with dimensions of 4.5 inches by 4.0 inches and weighs just 74 grams; as can be seen in Figure 4.2.



Figure 4.2: The Pandaboard and its Peripherals [9] From this brief overview of both boards, it seems clear that the newer generation PandaBoard would be more than capable enough of producing results with minimal customisation of hardware.


The Operating System

As mentioned in the previous section (4.1), ARM based processors do not have support from many of the popular desktop operating systems. The short list of operating systems that do run however, are Linux, Android, QNX, Symbian OS and Windows Mobile CE. However, the Pandaboard has so far only received active support by Linux based distributions and Android - so this automatically rules out the rest. Android is a Linux based, mobile operating system with a custom environment tailored for the small screened smart phones and tablets of today. It is primarily



designed as a user based system with API’s and provided toolkits to develop applications, and as such does not always have required libraries to develop applications. In this case, RTP/RTCP and H.264 encoding libraries that have been tailored for the Android environment would be hard to come by. This is no burden however, since the Linux kernel underneath Android is also used in servers, desktop computers, laptops and any device with a processor capable of running it. There are quite literally hundreds of distributions of Linux tailored for certain applications, devices, and environments. In this case though, the ARM based Pandaboard supports Angstrom Linux, Gentoo Linux, and Ubuntu Linux. Each have their own perks, and some for development are just easier than others. However, due to interest and prior knowledge about the Ubuntu distribution, it was decided that the easy to use system would be used for development. This became a two-edged sword, as a team of ARM developers (called Linaro [14]) are contributing to the upstream Linux kernel for improved ARM support, and have developed pre-made images of the Ubuntu distribution on top of these improved Kernel images for the Pandaboard. This version of the distribution is nominated as a “headless” or server version, meaning that it does not include a desktop environment (or user interface) and can be connected to via a UART serial or SSH connection. This provides a minimal footprint in memory and streamlines the system. Ubuntu, being a fork of the Debian GNU/Linux distribution, also implements the easy to use dpkg/apt package management system. This provides easy access to a wealth of libraries for development, including RTP/RTSP and H.264 encoding libraries.


The UAV Platform and Camera

For the thesis, a UAV platform was chosen that would satisfy the following criteria: • Have lift capability to lift a camera and the Pandaboard • Be simple enough to modify and mount camera and Pandaboard • Provide existing capability to control platform with no need for modification (but potentially could be extended) • Provide a stable platform to receive conclusive results • Be cheap enough for a small thesis project such as this As can be seen from this criteria, there exists do it yourself hobby remote control planes and quadrocopters. However, the design of these systems do not quite cover



the aim and topic of this project. So a pre-built system that can be bought from local retailers was decided on, and this was the previously mentioned, AR.Drone by Parrot SA (Chapter 3.2). It is reported by experiments in forums on the internet [28], that the AR.Drone can perform to maximum altitude (20 feet) with 253 grams of weight located centrally over the battery. Since the Pandaboard is 70 grams, the power of the AR.Drone is proven to be enough to take off with the Pandaboard mounted. This however, leaves the selection of the camera to be determined by weight and capability. The camera would have to be light enough to be mounted to the AR.Drone, and be able to provide various resolutions from resolutions as small as 160x120 pixels to even 1920x1080 pixels (for testing purposes). It would also have to provide as least the YUV420 format (for direct encoding with x264, as will be discussed in Chapter 5.2), support the Pandaboards on-board camera connector or just USB and fully support the Video4Linux2 drivers. This narrowed the search target down to USB Video Class (UVC) compliant camera devices, like many Logitech web cameras. The Logitech C910 is a full HD capable webcam (30 fps) [13] with Carl Zeiss optics and autofocus, stereo microphones and supports the UVC standard in which Linux has also implemented. However, the mounting bracket it comes with, weights in at around 300 grams - which would have to be removed. This however would prove beneficial, as the bracket utilised a mounting system that could be used to mount the camera to the AR.Drone. The final configuration of the test platform can be viewed in Figure 4.3. For testing purposes, a Styrofoam protective shield was mounted on top of the Pandaboard to mitigate against potential damage in an unforeseen circumstance. The Pandaboard, is powered by a simple 5 Volt/3 Amp voltage regulator circuit, connected to the AR.Drone’s 11.1V, 1000mAh, 10C Lithium Polymer battery. This circuit is light enough to be picked up by the AR.Drone and provides a continuous source of voltage for the Pandaboard - until the Lithium Polymer battery reaches a voltage below 5.5 Volts.



Figure 4.3: The AR.Drone, Pandaboard and Logitech C910 Camera - with and without protective Styrofoam

Chapter 5 Design of Software
This chapter will guide the reader through the various aspects of the software developed within this thesis. This section will provide a brief overview of the architecture of this software. The software, spyPanda (named after the development board and its inherent potential use), will use the Video4Linux2 driver stack to receive YUV420 format video from the Logitech C910 connected to the Pandaboard’s USB port. This video will then be buffered in memory, and retrieved by the open source x264 H.264/MPEG-4 AVC library, ready for encoding and compression. Once compressed, the encoder will notify the Live555 RTP/RTCP streaming library, that a new frame is ready to be packeted and sent over the network using the Real-time Transport Protocol. The video is then streamed to and played by the client, statistics are calculated, and then finally a Real-time Transport Control Protocol Receiver Report packet is sent back every interval (as defined by the servers bandwidth, generally 5 seconds) with the statistics of the last burst of packets since the last sender report. This data is then used by the spyPanda control algorithm to control the encoder parameters, to increase or decrease, framerate, resolution and quality to ensure a real-time stream of video over the wireless connection. This overview can be simplified in Figure 5.1.

Figure 5.1: Overview block diagram of the spyPanda software 26





The Video4Linux2 API [24] is a set of calls that can be used to directly interface with a piece of video or radio hardware from the Linux host. Initially, a program will open up the device with read and/or write permissions and then issue ioctl calls from the program which tell the kernel device driver what to do. In the spyPanda implementation these ioctl calls are encapsulated using the v4l2 ioctl function located in the libv4l2 library. For error checking, this has been inserted into the xioctl function within the v4l2 camera.c source file (Appendices B.2). After opening the device (open device and v4l2 open), spyPanda will probe the device for resolutions in the same aspect ratio (from the resolution) that was passed to it from the command line at startup, or from a default ratio of 4:3. This device probing function (known as save device resolutions) utilises a linked list (Table 5.1) to order resolutions from least demanding for the encoder to most demanding - calculated by a simple and logical formula as shown in Equation 5.1: P rocessorDemand = width ∗ height ∗ f ramerate (5.1)

Linked List WxH@FPS Demand


-1 160x120@30 576000

HEAD 320x240@30 2304000

+1 320x240@60 4608000


Table 5.1: Sample of spyPanda’s ordered linked list of resolutions and framerates Once spyPanda has verified that the selected (or default) resolution exists in the list, it can finally execute the init device function. This function will use xioctl to set the device with a format (V4L2 PIX FMT YUV420 for efficiency when passing to the encoder) and a resolution (e.g. 640x480 pixels). It will then request the driver to initiate a BUF COUNT number of memory mapped buffers for the device, then map that amount of buffers and finally exchange those buffers with the driver using VIDIOC QBUF. Since the device is finally ready to be functional, the framerate can then be set by textttset device framerate and the stream turned on using VIDIOC STREAMON.



Now that the video is ready to be captured, spyPanda can use the start read frame and stop read frame functions to indicate to the driver that the frame can be copied to the memory mapped location in memory, ready to be encoded. When the program needs to reinitialise or just clean the device from memory, it will call the uninit device (and if closing, close device) function to turn the stream off, free and unmap the memory mapped buffers, and then finally request that the rest of the buffers be destroyed within the driver.


x264 Open Broadcast Encoder

The x264 encoding library is an implementation of a H.264/MPEG-4 AVC encoder. It supports many CPU architectures like x86, x86 64, PowerPC, Sparc and ARM; and is optimised to run on these. In particular, it has NEON optimisations for the ARM architecture [18], which can be used upon the Pandaboard to further accelerate its capability to encode raw video. However, the x264 library does not implement C64x+ DSP accelerations since the framework to offload the processing using the Direct Memory Access (DMA) method has not been written. This would have been a massive task for the current thesis, and one which would require far more knowledge and experience in DSP’s and kernel drivers. So the NEON optimisations would have to suffice. Encoding within the x264 library is a relatively simple task, except for the sheer amount of variables to initially set up. There are many presets to initially base the encoder upon though, in which make the encoder a breeze to set up. The initial parameters used to initialise the encoder are the preset “veryfast” and tuning named “zerolatency” (using x264 param default preset(x264param, "veryfast", "ze rolatency"). These parameters have been use in addition to the “Constrained Baseline Profile” (CBP) that is specified in the H.264 standard and that has been implemented in the x264 encoder (using x264 param apply profile(encoder, "base line")). Each preset defines slightly different parameters that the encoder needs to abide by; however, the x264 encoding library simplifies a standard set of 10 presets. These presets generally implement different analysing algorithms, inter/intra partitions, transforms, refinements, etc. all of which can be viewed within the x264 source code under the function x264 param apply preset located in common/common.c. The tunings however, generally specify the number of threads and number of frames it can buffer. In the case of “zerolatency”, it specifies that the encoder cannot look



ahead in any way, has no B-Frames (since this would require frames buffered), uses the frames per second for the encoders own rate control, and enables slice-based threading. There is though, a modified x264 library named the Open Broadcast Encoder which aims to provide an encoder that provides real-time encoding. This library is simple, as it uses a single parameter - x264param.sc.f speed (the ratio from real-time, eg. 1.0 is equivalent to real-time) - to perform rate control on the encoder. By using this parameter, the quality of video can be controlled depending on the feedback from the client and also whether the Pandaboard is stressing under the load - this will be explained in Chapter 5.4. Now that the profile (CBP), preset (“veryfast”) and the tuning (“zerolatency”) have been chosen, they can be implemented in an initialising function within spyPanda named init encoder param). This will set the encoder to use these preset parameters with modified parameters have been optimised for the Pandaboard. In particular, a default resolution of 320x240 at 30 frames per second, no B-Frames, a real-time ratio of 1.0, and a constant rate factor of 25 with a limit of 40 (the higher, quality will drop within the video and compress further in scenes of motion). The control algorithm will however modify some of these default parameters as the program runs, and so are subject to change. After the default spyPanda encoding parameters have been set, and the encoder opened, the encoder can finally start encoding frames stored in the buffer by the Video4Linux2 driver. The encode frame function essentially calls start read frame, points the 3 planes - located within the enc.pic in.img plane array - to locations within the buffer that adhere to the YUV420 specification. From here, the 3 planes are then passed to the library function x264 encoder encode to encode the frame, and save a pointer to the memory holding the NAL unit in the spyPanda variable, enc.nal. The frame size is then saved in the NAL unit along with the payload, ready to be passed to and triggered for the Live555 library using triggerFrameRe ceived. The average encode time and frame size, is also kept track of by means of using the gettimeofday function, and a difference in time between the start and end of a encoded frame. These statistics will then be used to calculate expected bitrate and framerate for the control algorithms. The code for this implementation, can be located within the x264 control.c source file (Appendices B.3).




RTP and RTCP with Live555

The Live555 [17] set of streaming libraries, is a standards compliant RTP/RTCP/RTSP library that has the ability to stream many codecs over a network connection. It is supported as cross platform on Windows, Linux and Mac; and is integrated as a client or server in many open source media applications. This means that, VLC media player and MPlayer - which both use Live555 as a client - can be used, along with many other RTSP compliant media players, to play the streamed video. The Live555 library however, does not offer pre-written support for sourcing video frames or byte streams from encoders. So, a subclass of the library FramedSource a means of standardising capture from a file, encoder or another RTP source, etc has to be written to encapsulate the encoder, ready for reading. This is be a modified copy of the DeviceSource example that the library provides, and can be seen in x264EncoderSource.cpp under Appendices B.4. This code is essentially scheduled in the library’s event loop, and calls deliverReadyFrame to retrieve the NAL payload produced by the encoder. The event loop knows when a new frame is ready and is told the new address of the NAL payload, when the triggerFrameReceived function is called (located in live555.cpp of Appendices B.4). Now that the Live555 library can finally read from the encoder, a dedicated Live555 thread is created using the pthread library. This is due to the design of the library, as it uses an event loop to schedule specified tasks, and would detriment the concurrency that the x264 encoder requires to process frames in an efficient manner. However, the Live555 library is in no way considered to be thread safe, so any updates to the library would have to be passed through using triggers (like triggerFrameRe ceived) or global variables. Once the thread has started, spyPanda has to wait for Live555 to complete its setup - this is implemented using a pthread cond wait and mutex locks, both in main.c (Appendices B.1). The Live555 runtime can then be setup within this separate thread, by specifying a succession of configurations for the library to read and stream the encoded video. For spyPanda, an RTSP server has been set up since it is the most widely supported protocol, and uses the RTP and RTCP application layers, with the UDP transport layer underneath. The implementation that is used to set up this RTSP server, is a modified clone of the Live555 implementation of the testH264VideoStreamer.cpp RTSP server that is located within the Live555 “testProgs” source directory. This modified code can be seen in live555.cpp within Appendices B.4.



Firstly, a port number for both the RTP and RTCP servers are specified as 18888 and 18889 respectively. Both of these ports are then “groupsocked” (a Live555 socket like object) and binded, ready to be used. The RTP Groupsock can then be linked to a H264VideoRTPSink, which will handle the passing of H.264 packets and then an RTCPInstance can be created to provide QoS statistics. The RTCP instance is initiated with an estimate of the session bandwidth, by acquiring an estimate from the wl1271 (Pandaboard wireless card) wireless driver and interfaced using ioctl calls as demonstrated in iw stats.c in Appendices B.5. After these modules are loaded, the RTSP server can finally be created, and a ServerMediaSession with an RTCP subsession added and created. However, the video source - in this case, the encoder - still has not been added; this is where the x264EncoderSource sub-class is finally added and used. In the setup of the thread, a single video frame was captured and encoded, and the memory address of the first NAL unit from this frame was passed through. The x264EncoderSource sub-class is then initialised with this NAL unit address, and is then instantiated as a FramedSource. However, the frames from the encoder source still need to be split, ready to be repackaged correctly for use in the RTP stream. This is the job of the H264VideoStreamFramer class, in which is a filter that breaks the H.264 elementary stream into an RTP workable state. Once this is initialised and ready, the stream is finally ready to be played and the Live555 thread broadcasts that it is completed, ready for the x264/Video4Linux2 thread to proceed. Once playing though, the spyPanda control algorithm requires stream statistics from Live555. This is completed through the H264VideoRTPSink class, in which holds a transmission statistics database (TransmisionStatsDB) saved from every connected client and their respective Receiver Reports. In the case of spyPanda, an iterator passes through the database, and selects the very last (or the most recently connected) clients transmission statistics. These statistics contain standard feedback from Receiver Reports and are requested by the custom functions specified in the “Quality of Service Section” of live555.cpp in Appendices B.4.


The spyPanda Control Algorithm

The spyPanda control algorithm is simply a jitter based system, where an optimal jitter is discovered and the control algorithm tries to attain that optimal jitter. It also monitors the framerate that is produced from the encoder, and acts on the



quality/size of the picture if the output encoder framerate drops below 66% of the specified encoder framerate (enc.x264param.i fps num). The control algorithm is simple; keep dropping the quality of the video until the framerate and jitter levels are back to within the acceptable limits. If the quality cannot be dropped any further, lead to more aggressive action and drop the resolution to the next lowest demanding resolution (from the linked list as specified in Chapter 5.1). This system essentially attempts to drop the quality and compression in order to reduce the load that the encoder has on the Pandaboard hardware and hence, increase the bit-rate over the network. It is implemented by using the real-time ratio that is supplied by the x264 Open Broadcast Encoder (which drops quality of video if it is below a ratio of 1.0, and increases if above) and by using the measured variables from the average encode time (enc.ave enc) and the estimated jitter acquired from the TransmissionStatsDB in Live555. Every interval that a new RTCP packet is received, spyPanda will read the data within this packet and make a decision as to whether it will drop the real-time ratio or the resolution. When not within the bounds of the jitter or framerate, it will proportionally drop (similar to the PID algorithm from Equation 2.2) the real-time ratio by the last ratio, effectively dropping the quality of the video. If this ratio is dropped too far, finally drop the resolution; in this case, the ratio is reset to 1.0, and the process starts again. However, if the new resolution has processing leeway, the ratio will increase (increasing quality), until the network/encoder can no longer handle the quality or if it is greater than a top ratio limit. In this case, it will attempt to increase the resolution if the new resolutions old jitter reading is within acceptable bounds, or if that resolution just has not been used before. On any resolution swap, the jitter for that last resolution is saved in the linked list, ready to be checked if required. Implementing this control algorithm within spyPanda proved to elude any prospect that it would work in practice. Modifying the resolution within the encoder would throw errors about stride, which in practice is the padding after a resolution width, as demonstrated in Figure 5.2. In the case of the x264 implementation though, the stride was equal to the width of the image and so a picture could not dynamically increase. This applied to the Video4Linux2 library as well, as you could not just pass in the same sized picture without the stride error biting once again.



Figure 5.2: Illustration displaying stride [11] It was considered to use the libswscale (or Software Scale) implementation from the FFmpeg libraries [10] to scale the larger camera images into smaller ones for x264. However, it was deemed too processor intensive and so the encoder is completely reinitialised with the new configurations. This process also requires that the camera dynamically change its resolution. Initially, the same idea of restarting was applied to the camera; however, a complete v4l2 close and v4l2 open proved to take a substantial amount of time to operate. So a simple uninit device and init device (from v4l2 camera.c, Appendices B.2) was implemented to force the camera to change resolution. This method proved successful in dropping the resolution, but took about 0.5 seconds to restart and increased latency for the first couple of seconds of the stream. This was due to the catchup that was needed for the client to play the first few seconds of the stream, and flush the buffer - and also happens at the start of every spyPanda RTSP session.

Chapter 6 Results of spyPanda
The results for this chapter, were based on a configuration of spyPanda that uses Instantaneous Decoding Refresh (IDR) frames and starts at a resolution of 640x480 pixels at 30 frames per second (FPS). This configuration can be run with the command: ./spyPanda -i -r 640x480@30 Since the Pandaboard’s processor is only capable of so much, it was deemed that sections dedicated to the results of its encoding performance and stream performance would be listed.


Pandaboard Encoded Framerate Results

As can be seen from Figure 6.1, initially the Pandaboard is struggling to encode at a rate (FPS ) above the encoders specified framerate (Encoder FPS ). At ∼11 seconds, spyPanda drops the specified Encoder FPS to 20 frames per second. At this point though, the Pandaboard is on the verge to effectively delivering the video at the expected framerate. However, the jitter (as shown in Figure 6.3) dictates that the resolution needs to be dropped further to produce even results (and hence low latency). So at ∼22 seconds, the resolution is dropped further from 640x480 pixels at 20 frames per second, to the next lowest demanding resolution of 320x240 pixels at 60 frames per second. The framerate is continually dropped in the camera and encoder until the jitter finally stabilises at a point that exhibits the least latency. It should be noted though, that the result FPS, is the potential framerate that could be encoded using that resolution. The encoder does not stream a framerate higher than what is expected, only equal to or lower. 34



Figure 6.1: Framerate attained over progressions of dropped resolutions




Pandaboard Output Bitrate Results

The bitrate in this case also exhibits a similar progression as the real-time ratio and resolutions are changed. As can be seen in Figure 6.2, the bitrate attempts to keep underneath the estimated Wireless LAN bitrate capacity according to the Jitter exhibited by the stream. The encoder output bitrate slowly increases as the compression is reduced to alleviate stress on the Cortex-A9 processor that is encoding the video. It can also be seen that using this control scheme, the processing capability of the Pandaboard will never have the ability to stream over 2000Kbit/s. This means that only at far ranges, when the link quality is low, that the current bitrate from the encoder will become a problem. However, constraints on jitter should alleviate concerns and further drop the resolution. These drops in resolution are specified at times within the following table: Time (ms) Resolution 0 11’012 22’528 25’842 29’542

640x480@30 640x480@20 320x240@60 320x240@30 320x240@24

Table 6.1: Times for Resolution changes in stream for Figures 6.1, 6.2 and 6.3



Figure 6.2: Bitrate attained over progressions of dropped resolutions




Jitter and Latency Results

In the case of jitter, it could be seen that at the beginning of each encoder start/restart (or resolution/framerate change), that the Jitter would spike and then settle after an arbitrary time. This spike would be observed (clientside) in an increase of latency (of more than half a second) from capturing and then viewing the image. This spike added more than half a second to the original latency, until it finally settled. It can also be seen that the time taken to change resolution is ∼0.5 seconds; which means that the client is expecting frames between this down time. The client will then attempt to increase the buffer size (and hence latency buffer) until a time out is reached, and the stream is cut. However, when the video does return, this buffer latency is still kept high, and the new video is played with added latency on top of the encoding latency and stream latency. This also proves that jitter can be used as a rough estimate to how much latency a stream has on client side - which is expected, since jitter is a measure of time invariance of the stream. The next reasonable assumption would then be to slowly stress the Pandaboard until the stream (at client-side) seems to lag, or have noticeable latency. At this point, the jitter would be recorded and used to set the maximum jitter a stream should have to be deemed real-time. The jitter found to ensure a real-time stream was found to be at a jitter of ∼3000. As can be seen, for the resolution changes - until the change to 320x240 pixels at 24 frames per second - the jitter is far too high to maintain a low latency stream. So the real-time ratio is dropped to attempt to reduce the time (and quality) that is taken to encode each frame. However, even with these measures, the time taken is still too high to reduce the latency. So aggressive action is taken by dropping the resolution and/or framerate whenever the ratio drops below 0.6. However, when the optimal jitter is finally reached, the ratio has room to increase quality. This can be observed at ∼29 seconds, and eventually increases until it caps at a ratio of ∼2.0.



Figure 6.3: Jitter experienced in the stream




Discussion on Picture Quality

For the sake of a high framerate and low latency, the picture quality does suffer. On average, the Pandaboard drops the resolution down to 320x240 pixels just to keep a low latency. This can be seen in Figure 6.4:

Figure 6.4: Stable flight of AR.Drone at 320x240 pixels However, when high motion is encountered, the Constant Rate Factor (CRF) algorithm of the x264 encoder kicks in, and limits the amount of compression, and removes vast amounts of quality. As can be seen in Figure 6.5:

Figure 6.5: Unstable flight of AR.Drone at 320x240 pixels However, in times of unpredictable motion (simultaneous yaw and elevation), the quality finally gives in and pixelates the picture beyond recognition. Which can be seen in Figure 6.6:



Figure 6.6: High motion flight of AR.Drone at 320x240 pixels Simple motion did not pose a serious issue in the design of the spyPanda solution, since the CRF algorithm is designed to reduce the quality of the video enough in motion, that the human eye cannot tell the difference. However, when pixelation occurs, it would have been beneficial to include a motion stabilisation pre-processing algorithm, so that the encoder does not have to use extra processing power to predict and compress further. The trials of the spyPanda program did support the fact that a higher framerate and low latency is far more beneficial than quality in terms of the ability for a user to control and observe the said UAV. In a security sense though, it could be far more beneficial to include an on-board algorithm that turns up quality when a suspicious activity is found. In this case, the use of DSP accelerated encoding could have vastly improved the results of the picture quality and increased the number of potential resolutions that spyPanda could have used.

Chapter 7 Conclusions
7.1 Summary and conclusions

The final product of the thesis - spyPanda - has demonstrated the ability to provide a low latency, high framerate H.264 video stream from a UAV to a client. Unfortunately, to provide these requirements, the control algorithm had to reduce the quality of the video to allow the Pandaboard’s Cortex-A9 processor to encode the video. However, it was deemed in user tests that a high framerate and low latency video stream was far more beneficial to the observation of the UAV, than a high quality, noticeably “laggy” (high latency), low framerate solution. It was suggested that the use of a DSP accelerated encoder would have improved the quality of the feed, while retaining the high framerate and low latency qualities. This also demonstrated that the limiting factor of the platform was the software based encoding upon the Pandaboard’s processor. The spyPanda algorithm also successfully changed resolutions of the feed in a short enough amount of time so that the client did not deem the RTP stream to be timed out. This method made use of the fact that time invariance jitter - as calculated from the RTP stream - could be successfully used as feedback to optimise a stream to provide low latency video. To summarise, the open-source spyPanda platform succeeded in all of the initial goals of the project of providing a reliable low-latency, high framerate, adaptive video stream over a standards compliant application layer (RTP/RTCP/RTSP).





Possible future work

As has been discussed, the main areas of improvement would be to include a DSP optimised encoder. This could be implemented in one of the following ways: • Purchase a license from Texas Instruments to use their proprietary encoders • Modify the existing x264 source code to include custom optimisations for the C64x+ line of DSP’s This would most likely be the second option, in order to keep the open source nature of the spyPanda program. If the DSP optimisations prove to be not effective at reducing the pixelation (due to unpredictable motion) in the video stream, a pre-processing motion stabilisation algorithm could be implemented. This would essentially lie in the middle of the Video4Linux2 and x264 encoding layers, and act as a “spring and damper” for the motion of video in the hopes of reducing motion and blur. Alternatively, a floating lens could be used upon the platforms camera to passively reduce the effects of extreme motion [1].



Appendix A Program listings
The spyPanda adaptive and real-time, H.264 video streaming program provides an open source solution to many real-time streaming products. It utilises the well established x264 encoder (Open Broadcast Encoder variant) and the live555 RTSP libraries, with a customisable and extendable control algorithm. Currently, spyPanda uses the Bazaar revisioning system and stores its code on Launchpad. Install Bazaar using: sudo apt-get install bzr And get the source code using: bzr clone lp: alex-stevens/+junk/spyPanda The code can also be viewed online at this address: http://bazaar.launchpad.net/ alex-stevens/+junk/spyPanda/files The revision that is referenced in this version of the document is revision 72.


Appendix B Companion disk
Due to the size of the source code for the program, the source code can be found online or within the companion disk.


Main C File

This can be located on the companion disc in: spyPanda/main.c


Video4Linux2 Implementation

This can be located on the companion disc in: spyPanda/v4l2 camera.c and spyPanda/v4l2 camera.h


x264 Implementation

This can be located on the companion disc in: spyPanda/x264 control.c and spyPanda/x264 control.h


Live555 Implementation

This can be located on the companion disc in: spyPanda/x264EncoderSource.cpp and spyPanda/x264EncoderSource.hh





Miscellaneous C Implementations

Linked Lists can be located on the companion disc in: spyPanda/linked list.c and spyPanda/linked list.h Wireless LAN statistics can be located on the companion disc in: spyPanda/iw stats.c and spyPanda/iw stats.h


Sample Results

The results used in this document can be located under Results/stats.csv These results were used in the creation of the Framerate, Bitrate and Jitter graphs. Figures 6.1, 6.2, 6.3 respectively.


Report LaTeX Source and Items

The source for this document is located within the Report-latex directory under the companion disc.


This Report

This report can be located within the root directory of this companion disc, and is named 41719882 stevens.pdf



[1] What is optical shift image stabilizer? optis.html. [2] Rq-11 raven. htm, 2005. http://www.canon.com/bctv/faq/


[3] Mq-1 predator unmanned aerial vehicle. http://www.162fw.ang.af.mil/ resources/factsheets/factsheet.asp?id=11932, February 2008. [4] Beagleboard-xm product reference. http://beagle.s3.amazonaws.com/ design/xM-A/BB_xM_SRM_A2_01.pdf, 2010. [5] Beagleboard.org - hardware-xm. 2010. http://beagleboard.org/hardware-xM,

[6] Davincitm dm37x video processors. http://focus.tij.co.jp/jp/lit/ml/ sprt571/sprt571.pdf, 2010. [7] Omap 4 mobile applications platform. swpt034a/swpt034a.pdf, 2010. http://focus.ti.com/lit/ml/

[8] Pandaboard platform specifications. http://www.pandaboard.org/content/ platform, 2010. [9] Pandaboard references — pandaboard. http://pandaboard.org/content/ resources/references, 2010. [10] Ffmpeg. http://ffmpeg.org/, 2011. [11] Image stride. http://msdn.microsoft.com/en-us/library/aa473780%28v= vs.85%29.aspx, 8 September 2011. [12] Intel core i7 2600k cpu benchmark. Product/287, 2011. http://www.anandtech.com/bench/




[13] Logitech hd pro webcam c910. http://www.logitech.com/en-au/ webcam-communications/webcams/devices/6816, 2011. [14] Open source software for arm socs. http://www.linaro.org/, 2011. [15] Sgt. 1st Class Michael Guillory. Up, up and away. http://usarmy.vo.llnwd. net/e2/-images/2006/11/22/1024/army.mil-2006-11-22-114612.jpg, November 2006. [16] Panasonic Corporation. Mpeg-4 avc/h.264 codec technology explanation. http: //pro-av.panasonic.net/en/technology/technology.pdf. [17] Ross Finlayson. liveMedia/. Live555 streaming media. http://www.live555.com/

[18] Jason Garrett-Glaser. Announcing arm support for x264. http://x264dev. multimedia.cx/archives/142, 24 August 2009. [19] Matthew Gast. O’Reilly Media, Inc., 2nd edition, 25 April 2005. [20] V. Iverson, J. McVeigh, and B. Reese. Real-time h.24-avc codec on intel architectures. In ICIP International Conference on Image Processing, volume 2, pages 757–760, 24-27 October 2004. [21] Ben Kuchera. Parrot ar.drone to attack this september, for $300. http://arstechnica.com/gaming/news/2010/06/ parrot-ardrone-to-attack-this-september-for-300.ars. [22] Sunhun Lee and Kwangsue Chung. Cp-friendly rate control scheme based on rtp. In Information Networking. Advances in Data Communications and Wireless Networks, Lecture Notes in Computer Science, volume 3961, pages 660–669, 2006. [23] Nilay Patel. Venezuelans set new wifi distance record: 237 miles. http://www.engadget.com/2007/06/19/ venezuelans-set-new-wifi-distance-record-237-miles/, June 2007. [24] Michael H Schimek, Bill Dirk, Hans Verkuil, and Martin Rubli. Video for Linux Two API Specification, volume 0. Bytesex.org. [25] Henning Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: a transport protocol for Real-Time applications. RFC 3550, Internet Engineering Task Force, 2003.



[26] Naveen Srinivasamurthy, Soyeb Nagori, Girish Murthy, and Satish Kumar. Subpicture based rate control algorithm for achieving real time encoding and improved video quality for h.264 hd encoder on embedded video socs. In 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application, pages 1–6, 15-17 December 2010. [27] Suman Srinivasan, Haniph Latchman, John Shea, Tan Wong, and Janice McNair. Airborne traffic surveillance systems: video surveillance of highway traffic. In Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks, 10-16 October 2004. [28] symon. Payload of the a.r. drone. http://www.ardrone-flyers.com/forum/ viewtopic.php?f=7&t=38, 5 September 2010. [29] Ktawut Tappayuthpijarn, Guenther Liebl, Thomas Stockhammer, and Eckehard Steinbach. Adaptive video streaming over a mobile network with tcpfriendly rate control. June 2009. [30] Javvin Technologies. chapter RTCP: RTP Control Protocol, page 145. 2nd edition. [31] Uras Tos and Tolga Ayav. Adaptive rtp rate control method. In 2011 35th IEEE Annual Computer Software and Applications Conference Workshops, 2011. [32] Hongtao Wang, Yuehui Jin, Wendong Wang, Jian Ma, and Dongmei Zhang. The performance comparison of prsctp, tcp and udp for mpeg-4 multimedia traffic in mobile network. In International Conference on Communication Technology Proceedings, volume 1, pages 403–406, 9-11 April 2003. [33] Tim Wescott. Pid without a phd. http://igor.chudov.com/manuals/ Servo-Tuning/PID-without-a-PhD.pdf, October 2000. [34] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the h.264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(6):560–576, July 2003.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.